PostgreSQL 9.4 and below doesn't support importing whole set of tables from a FOREIGN server, but PostgreSQL 9.5 does with the upcoming Import Foreign Schema. To use will require FDW wrapper designers to be aware of this feature and use the plumbing in their wrappers. IMPORT FOREIGN SCHEMA for ogr_fdw come PostgreSQL 9.5 release is on the features ticket list.
The ogr_fdw comes with this to die for commandline utility called ogr_fdw_info
that does generate the table structures for you and will also list all the tables in the Foreign data source if you don't give it a specific table name. So with this utility I wrote a little hack involving using PostgreSQL COPY PROGRAM
feature to call out to the ogr_fdw_info
commandline tool to figure out the table names and some DO magic to create the tables.
Though ogr_fdw is designed to be a spatial foreign data wrapper, it's turning out to be a pretty nice non-spatial FDW as well especially for reading spreadsheets which we seem to get a lot of. This hack I am about to demonstrate I am demonstrating with LibreOffice/OpenOffice workbook, but works equally well with Excel workbooks and most any data source that OGR supports.
The wrapper function just spits out the SQL to do the work. The reason I have it not do the work is because depending on my needs, I may say want to prefix the table names with something or change the data types it emits. For example in SQL Server, unless the timestamp field is the last field in the table, ogr_fdw doesn't handle it right. I still have to investigate this one. So where it says timestamp, for SQL Server I swap it out for varchar.
Note that I also hard-coded the executable path in there which is very specific to where it will end up if you are running PostgreSQL 9.4 on windows.
Some things to be cognizant of
CREATE OR REPLACE FUNCTION ogr_fdw_sql_table(
data_source text,
layer text DEFAULT ''::text)
RETURNS text AS
$$
BEGIN
DROP TABLE IF EXISTS ogr_fdw_out;
CREATE TEMP TABLE ogr_fdw_out(out text);
IF layer > '' THEN
EXECUTE 'COPY ogr_fdw_out FROM PROGRAM ''"C:/Program Files/PostgreSQL/9.4/bin/ogr_fdw_info" -s "'
|| data_source || '" -l "'
|| layer || '"'' WITH (format ''csv'', delimiter ''|'')';
ELSE
EXECUTE 'COPY ogr_fdw_out FROM PROGRAM ''"C:/Program Files/PostgreSQL/9.4/bin/ogr_fdw_info" -s "'
|| data_source
|| '"'' WITH (format ''csv'', delimiter ''|'')';
END IF;
RETURN (SELECT string_agg(out,E'\n') from ogr_fdw_out);
END;
$$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;
For demonstrating this little puppy in action, I need a workbook with several spreadsheets to do it justice. So I downloaded a LibreOffice spreadsheet template from Monthly Home Budget Workbook and created a new workbook called Budget2015.ods from that. Note that LibreOffice, OpenOffice, and the Microsoft Excel 2007 and above (the .xlsx) format all are some variant of compressed XML. I will reiterate that what data sources you can read with ogr_fdw is dictated by your libgdal library and what support it is compiled with. In the case of reading LibreOffice/OpenOffice workbooks and the newer Microsoft XLSX files, your GDAL library needs to be compiled with Expat. In order to read older Excel files (xls format), you need to have your libgdal built with FreeXL support. The windows builds we created and detailed in PostgreSQL 9.4 bag-o-fdws are built with Expat and FreeXL (and as you can see, offer xlsx and ods support) and xls.
So to test out, I throw my new fangled budget workbook in my FDW folder and ran this little statement.
SELECT ogr_fdw_sql_table('C:/fdw_data/Budget2015.ods');
And the output is this beautiful thing. Now as a non-spatial person, you have to get over the hump that Layer means Table as far as you are concerned
Layers: Dashboard Form 2 - Equity Form 3 - Income Form 4 - Lump Sum Planning Form 5 - Monthly Cash Flow Form 7 - Allocation Form 8 - Spending Details Form 9 - Work Expenses Debt Analysis Categories and Items
So we've got a couple of spreadsheets (tables, layers) in this workbook. I'm going to demonstrate how to see the structure with 2 different widely different spreadsheets just to demonstrate a particular behavior of the OGR ODS driver (the XLSX and XLS drivers seem to behave the same as far as I can tell) when faced with something that is not quite tabular looking (or lacking headers) vs. is tabular.
SELECT ogr_fdw_sql_table('C:/fdw_data/Budget2015.ods', 'Dashboard');
The output of the Dashboard layer structure is:
CREATE SERVER myserver FOREIGN DATA WRAPPER ogr_fdw OPTIONS ( datasource 'C:/fdw_data/Budget2015.ods', format 'ODS' ); CREATE FOREIGN TABLE dashboard ( fid integer, field1 varchar, field2 varchar, field3 real, field4 varchar, field5 varchar, field6 varchar, field7 varchar ) SERVER myserver OPTIONS ( layer 'Dashboard' );
Now if you open up the workbook in LibreOffice Calc and look at the dashboard sheet, you'll see it's got charts and rows of data with no header. It has no header for the table. The only thing that is sure is that
column 3 has got numbers. So since OGR needs to ascibe a column name and can't infer one from the top row, it just calls them field1.. fieldn. OGR always adds an additional column, called fid
at the beginning which is really a row number identifier. You'll also sometimes see a geom
column as the second column if it thinks it's a spatial datasource.
If you were to do the same exercise using:
SELECT ogr_fdw_sql_table('C:/fdw_data/Budget2015.ods','Form 4 - Lump Sum Planning');
You'd get the same CREATE SERVER command since they come from the same workbook, but your CREATE FOREIGN TABLE would look like this:
CREATE FOREIGN TABLE form_4___lump_sum_planning (
fid integer,
item varchar,
annual_amount varchar,
monthly_amount integer )
SERVER myserver
OPTIONS ( layer 'Form 4 - Lump Sum Planning' );
Note how the table name and the field names have been changed so they are legal field and column names for PostgreSQL (not requiring you to quote the columns or table name).
For example the spaces in the header of spreadsheet cells got converted to underscore and lower case: e.g. Monthly Amount became monthly_amount. At first glance,
it seems a little strange it considered annual_amount varchar and monthly_amount integer. OGR ODS driver by default infers the datatype from the data it sees (there are override environment variables for this, but can't pass them into the OGR_FDW yet I don't think). The Monthly Amount column
had just 0s in it and the Annual Amount column had no values. If you go back and fill in the Annual Amount column with numbers (so that monthly_amount recomputes to numbers with decimals), and then you rerun this command, you'll find that the annual_amount and monthly_amount both then show as real
data type.
To prepare the data for querying directly from PostgreSQL, we're going to:
CREATE SCHEMA budget2015;
Create a foreign server pointing to this workbook. For this take the CREATE SERVER statement and replace myserver
with: svr_budget2015_workbook
.
CREATE SERVER svr_budget2015_workbook
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (
datasource 'C:/fdw_data/Budget2015.ods',
format 'ODS' );
Apply some DO magic to create the foreign tables: This do magic drops the foreign tables if they exist and recreates them. And also replaces references to myserver with the new server name. Finally it prefixes the tables with buget2015. so they get dumped in the budget2015 schema. You could do the same with set search_path before you run the DO
.
DO language plpgsql $$
DECLARE var_sql text;
BEGIN
var_sql := (WITH conn As (SELECT 'C:/fdw_data/Budget2015.ods'::text As conn),
cte_tb_names AS
(SELECT unnest(
string_to_array(ogr_fdw_sql_table(conn), E'\n') ) As table_name
FROM conn)
, tb AS (SELECT trim(table_name) As table_name
FROM cte_tb_names WHERE table_name NOT LIKE 'Layers:%' and trim(table_name) > '')
SELECT
string_agg( replace(regexp_replace(ogr_fdw_sql_table(conn, tb.table_name),
'CREATE SERVER (.*);(.*)CREATE FOREIGN TABLE ([a-z0-9\_]+)',
E'DROP FOREIGN TABLE IF EXISTS budget2015.\\3;CREATE FOREIGN TABLE budget2015.\\3'),
'myserver','svr_budget2015_workbook'), E'\n') As sql FROM tb, conn);
EXECUTE var_sql;
END ;$$;
Finally to verify I got the worksheets as linked foreign tables, I run this query:
SELECT table_name
FROM information_schema.tables
WHERE table_schema='budget2015'
ORDER BY table_name;
Which outputs this
table_name ---------------------------- categories_and_items dashboard debt_analysis form_2___equity form_3___income form_4___lump_sum_planning form_5___monthly_cash_flow form_7___allocation form_8___spending_details form_9___work_expenses
Now I can query some tables:
SELECT creditor, amount, interest
FROM budget2015.debt_analysis
WHERE interest < 0.08
ORDER BY creditor;
creditor | amount | interest -----------+--------+---------- Credit 1 | 20000 | 0.0775 Credit 10 | | 0.03 Credit 11 | | 0.04 Credit 7 | | 0.04875 Credit 8 | | 0.01 Credit 9 | | 0.02