Last time we demonstrated how to use the ODBC Foreign Data wrapper, this time we'll continue our journey into Foreign Data Wrapper land by demonstrating what I'll call the File FDW family of Foreign Data Wrappers. There is one that usually comes packaged with PostgreSQL 9.1 which is called fdw_file but there are two other experimental ones I find very useful which are
developed by Andrew Dunstan both of which Andrew demoed in PostgreSQL Foreign Data Wrappers and talked
about a little bit Text files from a remote source. As people who have to deal with text data files day in and out, especially ones from mainframes, these satisfy a certain itch.
- file_fdw - for querying delimited text files.
- file_fixed_length_fdw - this one deals with fixed length data. We discussed methods of importing fixed length data in Import Fixed width data. This is yet another approach but has the benefit that you can also use it to import just a subset of a file.
- file_text_array_fdw - this one queries a delimited file as if each delimiete row was a text array. It is ideal for those less than perfect moments when someone gives you a file with a 1000 columns and you don't have patience to look at what the hell those columns mean just yet.
In this article, we'll just cover the file_fdw
one, but will follow up in subsequent articles, demonstrating the array and fixed length record ones.
Before we begin, We'll create a staging schema to throw all our foreign data tables.
CREATE SCHEMA staging;
We'll also create a folder on on the root of our postgres server called fdw_data where we will house all the flat files. Make sure the postgres service account has access to it.
Using file_fdw
We'll start with the fdw_file foreign data wrapper extension which is useful for querying CSV, tab-delimited and other delimited files.
Setting up Foreign Data Table using the file_fdw driver
- Install the extension with the SQL statement
CREATE EXTENSION file_fdw;
- Create a Foreign server This is pretty much a formality for file_fdw since hte file always is on the local postgresql server.
CREATE SERVER file_fdw_server FOREIGN DATA WRAPPER file_fdw;
- Create a Foreign Table
For this exercise, we'll pull a tab delimited data file from FAA Aircraft Reference called aircraft.txt and save to /fdw_data folder. The file layout is found at AircraftFileLayout.txt
Note that the aircraft.txt has a header row. We'll save this to our fdw_data folder
CREATE FOREIGN TABLE staging.aircraft (
Model Char (12),
Last_Change_Date VarChar(10),
Region VarChar(2),
Make VarChar(6),
Aircraft_Group VarChar(6),
Regis_Code VarChar(7),
Design_Character VarChar(3),
No_Engines VarChar(11),
Type_Engine VarChar(2),
Type_Landing_Gear VarChar(2),
TC_Data_Sheet_Number VarChar(8),
TC_Model VarChar(20)
) SERVER file_fdw_server
OPTIONS (format 'csv',header 'true'
, filename '/fdw_data/aircraft.txt', delimiter E'\t', null '');
You can change the location of a file as well. For example, the query above will not work if you are on windows. You'll need to specify the drive letter as well.
You can do this:
ALTER FOREIGN TABLE staging.aircraft OPTIONS ( SET filename 'C:/fdw_data/aircraft.txt' );
Now to query our table, we simply do this:
SELECT model, region, make, last_change_date
FROM staging.aircraft
WHERE make LIKE 'BEECH%'
ORDER BY last_change_date DESC LIMIT 5;
model | region | make | last_change_date
--------------+--------+--------+------------------
B200CT | CE | BEECH | 12/18/2001
3000 | CE | BEECH | 12/17/2001
B300C | CE | BEECH | 10/11/2001
C12C | CE | BEECH | 10/11/2001
65A901 | CE | BEECH | 10/11/2001
Notice in the above definition that the Last_change_date field is set to varchar(10), but when we look at the data, it's in American date format
and sorting is not using date sorting so records get sorted by month first. To force date sorting, we can make this change
We could do this:
ALTER FOREIGN TABLE staging.aircraft ALTER COLUMN last_change_date type date;
And that works great if you are in the US since the default DateStyle is MDY and rerunning the query gives:
model | region | make | last_change_date
-------------+--------+--------+------------------
U21A | CE | BEECH | 2004-01-28
B200CT | CE | BEECH | 2001-12-18
3000 | CE | BEECH | 2001-12-17
65A901 | CE | BEECH | 2001-10-11
C12C | CE | BEECH | 2001-10-11
However if your DateStyle is 'DMY' as it is in Europe and other places,
you'll get an error when you go to query the data. We couldn't find a way to force the DateStyle property of a foreign table column aside from doing this:
set DateStyle='MDY'
before running a query on the table
Tracked: May 06, 10:41
Tracked: May 09, 23:33
Tracked: Jul 15, 16:01