Foreign Data Wrap (FDW) Text Array, hstore, and Jagged Arrays

As we discussed in file_textarray_fdw Foreign Data Wrapper, Andrew Dunstan's text array foreign data wrapper works great for bringing in a delimited file and not having to worry about the column names until they are in. We had demonstrated one way to tag the field names to avoid having to keep track of index locations, by using hstore and the header column in conjunction. The problem with that is it doesn't work for jagged arrays. Jagged arrays are when not all rows have the same number of columns. I've jury rigged a small example to demonstrate the issue. Luckily with the power of PostgreSQL arrays you can usually get around this issue and still have nice names for your columns. We'll demonstrate that too.

Set up the foreign table

This particular file is noteworthy in 2 ways - it has line breaks in one of the columns which is a quoted column, thus the need to specify quote attribute, and not all rows have the same number of columns. You can download the sample file planet_aggregators.csv if you want to play along.

We set up the table with the following code:

CREATE SERVER file_tafdw_server FOREIGN DATA WRAPPER file_textarray_fdw;

CREATE USER MAPPING FOR public SERVER file_tafdw_server;

CREATE FOREIGN TABLE planet_aggregators( x text[] ) SERVER file_tafdw_server
 OPTIONS (filename 'C:/fdw_data/planet_aggregators.csv', format 'csv', encoding 'latin1', delimiter E',', quote E'"');

Querying our table

Querying the table is easy. The standard way works and gives you a all rows.

SELECT * FROM planet_aggregators;

Output is

                                                          x
----------------------------------------------------------------------------------------------------------------------
 {Site,Description,"Home Page","Tag 1","Tag 2","Tag 3"}
 {"Planet PostGIS","PostGIS - the best spatial database ever.                                                        +
 Find out all that's happening in PostGIS land. ",http://planet.postgis.net,postgis,postgresql,gis}
 {"Planet PostgreSQL","PostgreSQL - The most advanced open source databases",http://planet.postgresql.org,postgresql}
 {"Planet OSGeo","Open Source Geospatial technologies",http://planet.osgeo.org,gis}

This doesn't work because in order to use the hstore function that matches headers with values, the header and value arrays must be the same bounds. Makes sense.

-- get error: array must have same bounds
 WITH 
 cte As (SELECT hstore(headers.x, p.x) As kval
  FROM (SELECT x 
     FROM planet_aggregators
     LIMIT 1) As headers
    CROSS JOIN planet_aggregators As p
 )
  SELECT kval->'Site' As site_name, kval->'Home Page' As web_site
    FROM cte offset 1;

Fear not, the power of arrays will save us

WITH 
 cte As (SELECT hstore(headers.x[1:3], p.x[1:3]) As kval, p.x[4:6] As features
  FROM (SELECT x 
     FROM staging.planet_aggregators
     LIMIT 1) As headers
    CROSS JOIN staging.planet_aggregators As p
 )
  SELECT kval->'Site' As site_name, kval->'Home Page' As web_site, features
    FROM cte offset 1;
     site_name     |           web_site           |         features
-------------------+------------------------------+--------------------------
 Planet PostGIS    | http://planet.postgis.net    | {postgis,postgresql,gis}
 Planet PostgreSQL | http://planet.postgresql.org | {postgresql}
 Planet OSGeo      | http://planet.osgeo.org      | {gis}