File FDW Family: Part 2 file_textarray_fdw Foreign Data Wrapper

Last time we demonstrated how to query delimited text files using the fdw_file that comes packaged with PostgreSQL 9.1+, this time we'll continue our journey into Flat file querying Foreign Data Wrapper using an experimental foreign data wrapper designed for also querying delimited data, but outputting it as a single column text array table. This one is called file_textarray_fdw and developed by Andrew Dunstan. It's useful if you are dealing with for example jagged files, where not all columns are not properly filled in for each record or there are just a ton of columns you don't want to bother itemizing before you bring in. The benefit is you can still query and decide how you want to break it apart. You can grab the source code from file_text_array_fdw source code. If you are on windows, we have compiled binaries in our Bag o' FDWs for both PostgreSQL 9.1 32-bit FDW for Windows bag and PostgreSQL 9.1 64-bit FDW for Windows bag that should work fine with the EDB installed windows binaries. For other systems, the compile is fairly easy if you have the postgresql development libraries installed.

We'll be using the same database schema and system file folder to house things as we did in our prior article.

Setting up Foreign Data Table using the file_textarray_fdw driver

  1. Install the extension binaries - for windows users, just copy the files in the same named folders of your PostgreSQL install. On other systems, you'll need to compile and it should automatically install the files for you using make install or you can do just a make and copy the .sql and .control files into the PostgreSQL share/extensions folder, and the .so files into your PostgreSQL lib folder. Keep in mind, you need PostgreSQL 9.1 or above.
  2. Install the extension in your database of choice with the SQL statement CREATE EXTENSION file_textarray_fdw;
  3. Create a Foreign server This is pretty much a formality for file_textarray_fdw as well since file always is on the local postgresql server.
    CREATE SERVER file_tafdw_server FOREIGN DATA WRAPPER file_textarray_fdw;
  4. Create user mapping -

    This is often necessary to do for most foreign data wrappers, though for some reason we were able to skip this step with the file_fdw one. We need to create a foreign data mapping user to a database user. We don't want to itemize what users can access our data source, so'll create just a group mapping for all users using the group role public which all our users of our database are members of.

    CREATE USER MAPPING FOR public SERVER file_tafdw_server;
  5. Download the data

    For this exercise, we'll use the 2010 Gazetteer natial place which is tab-delimited and can be downloaded from census 2010 places data and more specifically the Gaz_places_national.txt which we will save in our local server fdw_data folder.

  6. Create a Foreign Table - Now we are ready for the fun part, creating our table.
    CREATE FOREIGN TABLE staging.places2010( x text[] ) SERVER file_tafdw_server
    OPTIONS (filename '/fdw_data/Gaz_places_national.txt', encoding 'latin1', delimiter E'\t');

    You can change the location of a file as well. For example, the query above will not work if you are on windows. You'll need to specify the drive letter as well. You can do this:

    ALTER FOREIGN TABLE staging.places2010 OPTIONS ( SET filename 'C:/fdw_data/Gaz_places_national.txt' );

Now to query our table, we do this:

SELECT  x[1] As state, x[4] As place
FROM staging.places2010
WHERE x[1] = 'STUSPS' 
    OR (x[1] IN('RI', 'DC') AND x[4] ILIKE 'wa%');

This will give both the header column and all data for Rhode Island and DC fitting our criterion.

 state  |          place
--------+-------------------------
 STUSPS | NAME
 DC     | Washington city
 RI     | Wakefield-Peacedale CDP
 RI     | Warwick city
 RI     | Watch Hill CDP

Note I kept the header, this allows for easily flipping to a more friendly hstore structure. For this next example, you'll need hstore installed and if you don't have it install it with:

CREATE EXTENSION hstore;

We'll use a common table expression, but you can just as easily dump the data into a table of your choosing.

WITH 
 cte As (SELECT hstore(headers.x, p.x) As kval
  FROM (SELECT x 
     FROM staging.places2010 
     WHERE x[1] = 'STUSPS') As headers
    CROSS JOIN staging.places2010 As p
  WHERE  (p.x[1] IN('RI', 'DC') AND p.x[4] ILIKE 'wa%') )
  SELECT kval->'STUSPS' As state, kval->'NAME' As name, kval->'GEOID' As geoid
    FROM cte
    ORDER BY kval->'NAME';
 state |          name           |  geoid
-------+-------------------------+---------
 RI    | Wakefield-Peacedale CDP | 4473130
 RI    | Warwick city            | 4474300
 DC    | Washington city         | 1150000
 RI    | Watch Hill CDP          | 4475200

Possible Options for the file_textarray_fdw

Every FDW has its own set of options, which makes sense since FDWs are so varied in purpose. I only used a couple of options in this tutorial for the foreign table, but here is a more exhaustive list of what's available for file_textarray_fdw