Last time we demonstrated how to query delimited text files using the fdw_file that comes packaged with PostgreSQL 9.1+, this time we'll continue our journey into Flat file querying Foreign Data Wrapper using an experimental foreign data wrapper designed for also querying delimited data, but outputting it as a single column text array table.
This one is called file_textarray_fdw and developed by Andrew Dunstan. It's useful if you are dealing with for example jagged files, where not all columns are not properly filled in for each record or there are just a ton of columns you don't want to bother itemizing before you bring in. The benefit is you can still query and decide how you want to break it apart. You can grab the source code from file_text_array_fdw source code. If you are on windows, we have compiled binaries in our Bag o' FDWs for both PostgreSQL 9.1 32-bit FDW for Windows bag and PostgreSQL 9.1 64-bit FDW for Windows bag that should work fine with the EDB installed windows binaries.
For other systems, the compile is fairly easy if you have the postgresql development libraries installed.
We'll be using the same database schema and system file folder to house things as we did in our prior article.
Setting up Foreign Data Table using the file_textarray_fdw driver
- Install the extension binaries - for windows users, just copy the files in the same named folders of your PostgreSQL install. On other systems, you'll need to compile and it should automatically install the files for you using
make install
or you can do just a make
and copy the .sql and .control files into the PostgreSQL share/extensions folder, and the .so files into your PostgreSQL lib folder. Keep in mind, you need PostgreSQL 9.1 or above.
- Install the extension in your database of choice with the SQL statement
CREATE EXTENSION file_textarray_fdw;
- Create a Foreign server This is pretty much a formality for file_textarray_fdw as well since file always is on the local postgresql server.
CREATE SERVER file_tafdw_server FOREIGN DATA WRAPPER file_textarray_fdw;
- Create user mapping -
This is often necessary to do for most foreign data wrappers, though for some reason we were able to skip this step with the file_fdw one.
We need to create a foreign data mapping user to a database user. We don't want to itemize what users can access our data source, so'll create just a group mapping for all users using the group role public
which all our users of our database are members of.
CREATE USER MAPPING FOR public SERVER file_tafdw_server;
- Download the data
For this exercise, we'll use the 2010 Gazetteer natial place which is tab-delimited and can be downloaded from
census 2010 places data and more specifically the Gaz_places_national.txt
which we will save in our local server fdw_data folder.
- Create a Foreign Table - Now we are ready for the fun part, creating our table.
CREATE FOREIGN TABLE staging.places2010( x text[] ) SERVER file_tafdw_server
OPTIONS (filename '/fdw_data/Gaz_places_national.txt', encoding 'latin1', delimiter E'\t');
You can change the location of a file as well. For example, the query above will not work if you are on windows. You'll need to specify the drive letter as well.
You can do this:
ALTER FOREIGN TABLE staging.places2010 OPTIONS ( SET filename 'C:/fdw_data/Gaz_places_national.txt' );
Now to query our table, we do this:
SELECT x[1] As state, x[4] As place
FROM staging.places2010
WHERE x[1] = 'STUSPS'
OR (x[1] IN('RI', 'DC') AND x[4] ILIKE 'wa%');
This will give both the header column and all data for Rhode Island and DC fitting our criterion.
state | place
--------+-------------------------
STUSPS | NAME
DC | Washington city
RI | Wakefield-Peacedale CDP
RI | Warwick city
RI | Watch Hill CDP
Note I kept the header, this allows for easily flipping to a more friendly hstore structure. For this next example, you'll need hstore installed and
if you don't have it install it with:
CREATE EXTENSION hstore;
We'll use a common table expression, but you can just as easily dump the data into a table of your choosing.
WITH
cte As (SELECT hstore(headers.x, p.x) As kval
FROM (SELECT x
FROM staging.places2010
WHERE x[1] = 'STUSPS') As headers
CROSS JOIN staging.places2010 As p
WHERE (p.x[1] IN('RI', 'DC') AND p.x[4] ILIKE 'wa%') )
SELECT kval->'STUSPS' As state, kval->'NAME' As name, kval->'GEOID' As geoid
FROM cte
ORDER BY kval->'NAME';
state | name | geoid
-------+-------------------------+---------
RI | Wakefield-Peacedale CDP | 4473130
RI | Warwick city | 4474300
DC | Washington city | 1150000
RI | Watch Hill CDP | 4475200
Possible Options for the file_textarray_fdw
Every FDW has its own set of options, which makes sense since FDWs are so varied in purpose. I only used a couple of options in this tutorial for the foreign table, but here is a more exhaustive list of what's available for file_textarray_fdw
- filename - the file path of the file in the table
- encoding - the character encoding of the file. In this example, the file will not work with my default db encoding of UTF-8. If you get a byte sequence invalid error when querying your table, you need to specify the encoding since it means the encoding of the file is not the same as your db
- format
- delimiter - this is the character delimiter for the columns, the default is the tab characer E'\t', so we didn't really need to specify it, but if you are using something like '|' or comma or some other separator, you'll need it.
- quote - same as copy mode - what is the character used in quoting text. Our data wasn't quoted so we didn't need this.
- format -
- header - this should be set to 'true' or 'false'. Works the same as psql copy. It will strip the first row if set to true. In this case we wanted to keep the header so we could easily flip to hstore.
- null -- what to consider as the null character.
- escape -- character used for escaping text, e.g. if you had a quote or delimeter in text, what would escape it so you know it should be included. We didn't need to worry since no special characters were used such as tab.
Tracked: Jul 15, 02:34
Tracked: Jan 19, 14:08