Postgres OnLine Journal: January / February 2015
An in-depth Exploration of the PostgreSQL Open Source Database
 

Table Of Contents

PL Programming
Using PostgreSQL Contribs

PL Programming

 

PLV8 binaries for PostgreSQL 9.4 windows both 32-bit and 64-bit



I'm still in the middle of building packages for our clients for the packages that aren't normally distributed for windows to make upgrading to PostgreSQL 9.4 smooth. One of those is PL/V8 which we use for some custom functions. I had mentioned how I build PL/V8 for PostgreSQL windows, and the instructions are a bit out of date., but I put more up to date instructions on my gist page. I tend to use gist a lot as a public scrap book with hopes someone else can learn from my toils and save them some trouble. At some point I should probably get more organized with my scrapbooks.

I've listed below PL/V8 binaries I built for PostgreSQL 9.4 for windows 64-bit and 32-bit. These I tested with EDB PostgreSQL 9.4 windows installs and seem to work fine. These were built with latest 1.4 branch (a little after 1.4.2 to grab the extra 9.4 fixes) of PL/V8.

PL/V8 PostgreSQL 9.4 binaries

We hope windows users find these useful.


Using PostgreSQL Contribs

 

Updated Foreign Data Wrappers for PostgreSQL 9.3 Windows



As stated in last article, I've packaged FDW binaries for PostgreSQL 9.3 windows 32-bit and 64-bit and added in the ogr_fdw one. These we've tested with the standard EDB Vc++ built PostgreSQL windows installs and work fine with those.

This package is an updated list from ones we've distributed before that includes ogr_fdw and recompiled with latests source from www_fdw and file_textarray

Below are the zip files that contain the binaries and dependency files for our favorite FDWs (that aren't normally available for windows):

If you are using Chrome, Chrome is sometimes flagging the zip files on this site as possible malware with comment Not commonly Downloaded Warning. If you get such a notice click the drop down option it provides and choose Keep. Alternatively try the 7z versions, which you can extract with 7-zip. These for some reason aren't being flagged.

FDWs in the 9.3 downloads

  • file_textarray_fdw for working with delimited text files, loads each line as a text array.
  • https://github.com/cyga/www_fdw - for querying web services
  • ogr fdw - A monster of an FDW that supports many spatial and non-spatial datasource. Have some weird data source/file you've never heard of before (especially spatial)? There's a good chance OGR FDW can query it. Note this GDAL build has more drivers than what gets packaged with the PostGIS version. I have also included ODBC, so full set of drivers supported is (read-only since this fdw only supports read at the moment): ogr_fdw related articles on our site with some examples.
    Supported Formats:
      -> "ESRI Shapefile" (read/write)
      -> "MapInfo File" (read/write)
      -> "UK .NTF" (readonly)
      -> "SDTS" (readonly)
      -> "TIGER" (read/write)
      -> "S57" (read/write)
      -> "DGN" (read/write)
      -> "VRT" (readonly)
      -> "REC" (readonly)
      -> "Memory" (read/write)
      -> "BNA" (read/write)
      -> "CSV" (read/write)
      -> "GML" (read/write)
      -> "GPX" (read/write)
      -> "KML" (read/write)
      -> "GeoJSON" (read/write)
      -> "GMT" (read/write)
      -> "ODBC" (read/write)
      -> "WAsP" (read/write)
      -> "PGeo" (readonly)
      -> "MSSQLSpatial" (read/write)
      -> "PCIDSK" (read/write)
      -> "OpenFileGDB" (readonly)
      -> "XPlane" (readonly)
      -> "AVCBin" (readonly)
      -> "AVCE00" (readonly)
      -> "DXF" (read/write)
      -> "Geoconcept" (read/write)
      -> "GeoRSS" (read/write)
      -> "GPSTrackMaker" (read/write)
      -> "PGDump" (read/write)
      -> "GPSBabel" (read/write)
      -> "SUA" (readonly)
      -> "OpenAir" (readonly)
      -> "PDS" (readonly)
      -> "WFS" (readonly)
      -> "HTF" (readonly)
      -> "AeronavFAA" (readonly)
      -> "Geomedia" (readonly)
      -> "EDIGEO" (readonly)
      -> "GFT" (read/write)
      -> "GME" (read/write)
      -> "SVG" (readonly)
      -> "CouchDB" (read/write)
      -> "Idrisi" (readonly)
      -> "ARCGEN" (readonly)
      -> "SEGUKOOA" (readonly)
      -> "SEGY" (readonly)
      -> "XLS" (readonly)
      -> "ODS" (read/write)
      -> "XLSX" (read/write)
      -> "ElasticSearch" (read/write)
      -> "PDF" (read/write)
      -> "Walk" (readonly)
      -> "CartoDB" (readonly)
      -> "SXF" (readonly)

Using PostgreSQL Contribs

 

Installing PostGIS packaged address_standardizer on Ubuntu



One of the changes coming to you in PostGIS 2.2 are additional extensions. Two ones close to my heart are the address_standardizer (which was a separate project before, but folded into PostGIS in upcoming 2.2) and the SFCGAL extension for doing very advanced 3D stuff (was just an sql script in older versions, but made an extension in 2.2 and new functions added). We had a need to have address standardizer running on our Ubuntu box, but since PostGIS 2.2 isn't released yet, you can't get it without some compiling. Luckily the steps are fairly trivial if you are already running PostGIS 2.1. In this article, I'll walk thru just building and installing the address_standardizer extension from the PostGIS 2.2 code base. Though I'm doing this on Ubuntu, the instructions are pretty much the same on any Linux, just replacing with your Linux package manager.

Compiling and installing address_standard 2.2 on PostGIS

If you don't have PostGIS already, you can install it via PGDG. Instructions are here: PostGIS Ubuntu 9.3 Apt. You shouldn't really need PostGIS anyway except possibly to get past the PostGIS configure step.

  1. In order to build PostgreSQL extensions, you need the PostgreSQL dev package which you install with:

    apt get-install postgresql-server-dev-9.3
  2. In order to get past the configure step of PostGIS, you need these additional dev packages

    apt get-install libxml2-dev libgeos-dev libproj-dev libpcre3-dev libxml2-dev
  3. Next download the PostGIS 2.2 dev tar ball and follow below steps to make and install the address_standardizer extensions:
    wget http://postgis.net/stuff/postgis-2.2.0dev.tar.gz
    tar xvf postgis-2.2.0dev.tar.gz
    cd postgis-2.2.0dev
    ./configure --without-raster

    You should have an output that looks something like:

    PostGIS is now configured for x86_64-unknown-linux-gnu
    
     -------------- Compiler Info -------------
      C compiler:           gcc -g -O2
      C++ compiler:         g++ -g -O2
      SQL preprocessor:     /usr/bin/cpp -traditional-cpp -w -P
    
     -------------- Dependencies --------------
      GEOS config:          /usr/bin/geos-config
      GEOS version:         3.4.2
      PostgreSQL config:    /usr/bin/pg_config
      PostgreSQL version:   PostgreSQL 9.3.5
      PROJ4 version:        48
      Libxml2 config:       /usr/bin/xml2-config
      Libxml2 version:      2.9.1
      JSON-C support:       no
      PCRE support:       yes
      PostGIS debug level:  0
      Perl:                 /usr/bin/perl
    
     --------------- Extensions ---------------
      PostGIS Raster:       disabled
      PostGIS Topology:     enabled
      SFCGAL support:       disabled
      Address Standardizer support:       enabled
    
     -------- Documentation Generation --------
      xsltproc:
      xsl style sheets:
      dblatex:
      convert:
      mathml2.dtd:          http://www.w3.org/Math/DTD/mathml2/mathml2.dtd
  4. Now we are ready to compile and install:
    cd extensions/address_standardizer
    make && make install

    If all goes well, your final output looks something like:

    /usr/bin/perl mk-sql.pl 'PostgreSQL 9.3.5' address_standardizer.sql > address_standardizer--2.2.0dev.sql
    /usr/bin/perl pagc-data-psql lex lexicon.csv > us-lex.sql
    /usr/bin/perl pagc-data-psql gaz gazeteer.csv > us-gaz.sql
    /usr/bin/perl pagc-data-psql rules rules.txt > us-rules.sql
    /bin/mkdir -p '/usr/lib/postgresql/9.3/lib'
    /bin/mkdir -p '/usr/share/postgresql/9.3/extension'
    /bin/mkdir -p '/usr/share/postgresql/9.3/extension'
    /bin/mkdir -p '/usr/share/doc/postgresql-doc-9.3/extension'
    /usr/bin/install -c -m 755  address_standardizer-2.2.so '/usr/lib/postgresql/9.3/lib/address_standardizer-2.2.so'
    /usr/bin/install -c -m 644 address_standardizer.control '/usr/share/postgresql/9.3/extension/'
    /usr/bin/install -c -m 644 address_standardizer--2.2.0dev.sql us-lex.sql us-gaz.sql us-rules.sql '/usr/share/postgresql/9.3/extension/'
    /usr/bin/install -c -m 644 README.address_standardizer '/usr/share/doc/postgresql-doc-9.3/extension/'

Using Address Standardizer

In order to enable the address standardizer in a specific database, connect to the database and run:

CREATE EXTENSION address_standardizer

If you prefer the GUI guided tour, once you install address_standardizer binaries (as we did), you should see it in pgAdmin extension drop down options

I'm still in the middle of packaging the standardization data sets into a separate data extension for easier consumption by end-users. For now you can just load the us-lex.sql, us-gaz.sql, us-rules.sql files via PSQL that get installed in the /usr/share/postgresql/9.3/extension/ folder.

If you want to experiment with using the extension, refer to the PostGIS 2.2 dev manual: Installing and using Address Standardizer, which is still a bit of a work in progress.

More books coming

Leo and I are still very busy writing PostgreSQL related books, so I haven't had quite as much time to devote to PostGIS documentation as I would have liked. If you didn't know - our PostgreSQL: Up and Running, 2nd edition recently came out in Print. PostGIS in Action 2nd Edition is due out late February / Early March. We have also started our 3rd PostgreSQL/PostGIS book which we will announce once we've gotten further into it. If you've seen me on #postgis IRC and wondered Why is Regina so engrossed in Graphy theory, you can probably guess what that book is about. Part of the joy of writing is learning new things as you go along and pushing yourself to experiment in different ways with technologies you love.


Using PostgreSQL Contribs

 

Using SSL https connections with www_fdw on windows



One of the foreign data wrappers I included in the PostgreSQL 9.3 Windows FDW bag and PostgreSQL 9.4 Windows FDW bag is the www_fdw extension used for querying web services. Someone asked that since I didn't build curl with SSL support, they are unable to use it with https connections. The main reason I didn't is that the EDB installs come with ssleay32.dll and libeay32.dll (even the 64-bit) which are dependencies of curl when built with SSL support. I wanted to minimize the issue of distributing dlls that are packaged with Windows PostgreSQL installers already.

Though this article is specific to using www_fdw on Windows systems, many of the issues are equally applicable to other platforms, so may be worth a read if you are running into similar issues with using specialty SSL certificates on Linux/Unix/Mac.

You'll know you have curl built without ssl support, because when you try to create an https Foreign server, you'll get this error from WWW_FDW

ERROR:  Can't get a response from server: Protocol "https" not supported or disabled in libcurl

Unfortunately curl is not distributed with the EDB PostgreSQL windows builds. However you can easily swap out the non-ssl enabled curl binary I included in the FDW package with an SSL enabled one and get SSL support in the www_fdw as well as the ogc_fdw driver we included. Here's what you do:

Swapping out the non-ssl enabled curl with ssl-enabled curl

  1. Download one of the windows binaries from http://curl.haxx.se/download.html. I'm running PostgreSQL 9.4 64-bit, so I chose Win64 - Mingw64 one. I chose it, not because it was Mingw64, but because it had the fewest number of dependencies.
  2. This particular package also includes ssleay32.dll and libeay32.dll, but since EDB already has those libraries, I did not copy those files. The curl library is called libcurl.dll, but the name I bound with is libcurl-4.dll. So to swap out the one I provide with this SSL enabled one, just rename the libcurl.dll to libcurl-4.dll and then copy into your PostgreSQL/9.4/bin (or 9.3 if on 9.3) folder

Now you are almost ready to go. Your www_fdw understands SSL. There is now one more small problem of certificates to contend with.

Working with SSL certificates

Even after you install the version of curl with SSL support, if you create your www FDW server and foreign table with the Google search example code here: https://github.com/cyga/www_fdw/wiki/Documentation, you might get the error:

ERROR:  Can't get a response from server: SSL certificate problem: unable to get local issuer certificate

The simplest way I've discovered to resolve this issue on windows is to explicitly specify the certificate as part of the WWW FDW Server definition. So here's what you do

  1. Download the ca-bundle.crt described in http://curl.haxx.se/docs/caextract.html
  2. Copy that into a folder accessible by the postgres service account (you might have to grant permissions). I like to put it in a folder like C:\SSLCerts
  3. Now create your WWW FDW Server with this SQL command and pass in the certificate using the www_fdw cainfo option

    CREATE SERVER www_fdw_google_search_server FOREIGN DATA WRAPPER www_fdw 
        OPTIONS (uri 'https://ajax.googleapis.com/ajax/services/search/web?v=1.0', 
            cainfo 'C:/SSLCerts/ca-bundle.crt');
  4. If you had already created the SERVER and don't want to have to drop your Foreign tables, you can update the SERVER, as you can with any FDW server, with a command like this:

    ALTER SERVER www_fdw_google_search_server OPTIONS (ADD cainfo 'C:/SSLCerts/ca-bundle.crt');

    If you need to change the path of the certificate, say you moved it to D, you can use the SET command instead of ADD to change an exisiting option:
    ALTER SERVER www_fdw_google_search_server OPTIONS (SET cainfo 'D:/SSLCerts/ca-bundle.crt');

Using PostgreSQL Contribs

 

Import Foreign Schema hack with OGR_FDW and reading LibreOffice calc workbooks



PostgreSQL 9.4 and below doesn't support importing whole set of tables from a FOREIGN server, but PostgreSQL 9.5 does with the upcoming Import Foreign Schema. To use will require FDW wrapper designers to be aware of this feature and use the plumbing in their wrappers. IMPORT FOREIGN SCHEMA for ogr_fdw come PostgreSQL 9.5 release is on the features ticket list. The ogr_fdw comes with this to die for commandline utility called ogr_fdw_info that does generate the table structures for you and will also list all the tables in the Foreign data source if you don't give it a specific table name. So with this utility I wrote a little hack involving using PostgreSQL COPY PROGRAM feature to call out to the ogr_fdw_info commandline tool to figure out the table names and some DO magic to create the tables.

Though ogr_fdw is designed to be a spatial foreign data wrapper, it's turning out to be a pretty nice non-spatial FDW as well especially for reading spreadsheets which we seem to get a lot of. This hack I am about to demonstrate I am demonstrating with LibreOffice/OpenOffice workbook, but works equally well with Excel workbooks and most any data source that OGR supports.

This hack came in really handy for processing workbooks (with several spreadsheets e.g. an Excel or LibreOffice workbook) and SQL Server tables. Sadly I tried it on my MS Access data source, and while it could read the list of tables, PostgreSQL returned blank text when trying to grab the table structure. This was really weird considering from the commandline I can see the table structure. Haven't figured out what caused that. I blame it on some funky character MS Access is spitting out that PostgreSQL is reading as a black hole character or some weird permission issue with reading MS access catalogs from within PostgreSQL. The commandline worked fine using directly even for MS Access and reading the table once the foreign table was created worked fine as well.

The ogr_fdw_info wrapper function

The wrapper function just spits out the SQL to do the work. The reason I have it not do the work is because depending on my needs, I may say want to prefix the table names with something or change the data types it emits. For example in SQL Server, unless the timestamp field is the last field in the table, ogr_fdw doesn't handle it right. I still have to investigate this one. So where it says timestamp, for SQL Server I swap it out for varchar.

Note that I also hard-coded the executable path in there which is very specific to where it will end up if you are running PostgreSQL 9.4 on windows.

Some things to be cognizant of

  • The COPY FROM is a server SQL command (not the psql variant). As such the executable must be on the server and accessible by the postgres service account
  • Your data source must also be accessible from the server postgres process, which usually means it must reside on the server somewhere unless its a webservice like CartoDB or WFS.
  • Your paths will obviously be different if on Linux/Unix/Mac and probably much shorter. I purposely used the windows path because windows users have it harder with windows love of spaces and all that.
CREATE OR REPLACE FUNCTION ogr_fdw_sql_table(
    data_source text,
    layer text DEFAULT ''::text)
  RETURNS text AS
$$
BEGIN
    DROP TABLE IF EXISTS ogr_fdw_out;
    CREATE TEMP TABLE ogr_fdw_out(out text);
    IF layer > '' THEN 
    EXECUTE 'COPY ogr_fdw_out FROM PROGRAM ''"C:/Program Files/PostgreSQL/9.4/bin/ogr_fdw_info" -s "' 
        || data_source || '" -l "' || layer || '"''';
    ELSE
            EXECUTE 'COPY ogr_fdw_out FROM PROGRAM ''"C:/Program Files/PostgreSQL/9.4/bin/ogr_fdw_info" -s "' 
                || data_source || '"'' ';
    END IF;
    /** postgresql converts each output to a row, 
     but we want as a single string back **/
    RETURN (SELECT string_agg(out,E'\n') from ogr_fdw_out);
END;

$$
  LANGUAGE plpgsql VOLATILE SECURITY DEFINER;

For demonstrating this little puppy in action, I need a workbook with several spreadsheets to do it justice. So I downloaded a LibreOffice spreadsheet template from Monthly Home Budget Workbook and created a new workbook called Budget2015.ods from that. Note that LibreOffice, OpenOffice, and the Microsoft Excel 2007 and above (the .xlsx) format all are some variant of compressed XML. I will reiterate that what data sources you can read with ogr_fdw is dictated by your libgdal library and what support it is compiled with. In the case of reading LibreOffice/OpenOffice workbooks and the newer Microsoft XLSX files, your GDAL library needs to be compiled with Expat. In order to read older Excel files (xls format), you need to have your libgdal built with FreeXL support. The windows builds we created and detailed in PostgreSQL 9.4 bag-o-fdws are built with Expat and FreeXL (and as you can see, offer xlsx and ods support) and xls.

Reading LibreOffice Calc workbooks (ODS) direclty from PostgreSQL

So to test out, I throw my new fangled budget workbook in my FDW folder and ran this little statement.

SELECT ogr_fdw_sql_table('C:/fdw_data/Budget2015.ods');

And the output is this beautiful thing. Now as a non-spatial person, you have to get over the hump that Layer means Table as far as you are concerned

Layers:
  Dashboard
  Form 2 - Equity
  Form 3 - Income
  Form 4 - Lump Sum Planning
  Form 5 - Monthly Cash Flow
  Form 7 - Allocation
  Form 8 - Spending Details
  Form 9 - Work Expenses
  Debt Analysis
  Categories and Items

So we've got a couple of spreadsheets (tables, layers) in this workbook. I'm going to demonstrate how to see the structure with 2 different widely different spreadsheets just to demonstrate a particular behavior of the OGR ODS driver (the XLSX and XLS drivers seem to behave the same as far as I can tell) when faced with something that is not quite tabular looking (or lacking headers) vs. is tabular.

SELECT ogr_fdw_sql_table('C:/fdw_data/Budget2015.ods', 'Dashboard');

The output of the Dashboard layer structure is:

CREATE SERVER myserver
  FOREIGN DATA WRAPPER ogr_fdw
  OPTIONS (
    datasource 'C:/fdw_data/Budget2015.ods',
    format 'ODS' );

CREATE FOREIGN TABLE dashboard (
  fid integer,
  field1 varchar,
  field2 varchar,
  field3 real,
  field4 varchar,
  field5 varchar,
  field6 varchar,
  field7 varchar )
  SERVER myserver
  OPTIONS ( layer 'Dashboard' );

Now if you open up the workbook in LibreOffice Calc and look at the dashboard sheet, you'll see it's got charts and rows of data with no header. It has no header for the table. The only thing that is sure is that column 3 has got numbers. So since OGR needs to ascibe a column name and can't infer one from the top row, it just calls them field1.. fieldn. OGR always adds an additional column, called fid at the beginning which is really a row number identifier. You'll also sometimes see a geom column as the second column if it thinks it's a spatial datasource.

If you were to do the same exercise using:

SELECT ogr_fdw_sql_table('C:/fdw_data/Budget2015.ods','Form 4 - Lump Sum Planning');

You'd get the same CREATE SERVER command since they come from the same workbook, but your CREATE FOREIGN TABLE would look like this:

CREATE FOREIGN TABLE form_4___lump_sum_planning (
  fid integer,
  item varchar,
  annual_amount varchar,
  monthly_amount integer )
  SERVER myserver
  OPTIONS ( layer 'Form 4 - Lump Sum Planning' );

Note how the table name and the field names have been changed so they are legal field and column names for PostgreSQL (not requiring you to quote the columns or table name). For example the spaces in the header of spreadsheet cells got converted to underscore and lower case: e.g. Monthly Amount became monthly_amount. At first glance, it seems a little strange it considered annual_amount varchar and monthly_amount integer. OGR ODS driver by default infers the datatype from the data it sees (there are override environment variables for this, but can't pass them into the OGR_FDW yet I don't think). The Monthly Amount column had just 0s in it and the Annual Amount column had no values. If you go back and fill in the Annual Amount column with numbers (so that monthly_amount recomputes to numbers with decimals), and then you rerun this command, you'll find that the annual_amount and monthly_amount both then show as real data type.

To prepare the data for querying directly from PostgreSQL, we're going to:

  1. Create a schema to house foreign tables: CREATE SCHEMA budget2015;
  2. Create a foreign server pointing to this workbook. For this take the CREATE SERVER statement and replace myserver with: svr_budget2015_workbook.

    CREATE SERVER svr_budget2015_workbook
      FOREIGN DATA WRAPPER ogr_fdw
      OPTIONS (
        datasource 'C:/fdw_data/Budget2015.ods',
        format 'ODS' );
  3. Apply some DO magic to create the foreign tables: This do magic drops the foreign tables if they exist and recreates them. And also replaces references to myserver with the new server name. Finally it prefixes the tables with buget2015. so they get dumped in the budget2015 schema. You could do the same with set search_path before you run the DO.

    DO language plpgsql $$
    DECLARE var_sql text;
    BEGIN 
        var_sql := (WITH conn As (SELECT 'C:/fdw_data/Budget2015.ods'::text As conn),
        cte_tb_names AS 
            (SELECT unnest(
                string_to_array(ogr_fdw_sql_table(conn), E'\n') ) As table_name
                FROM conn)
            , tb AS (SELECT trim(table_name) As table_name
                FROM cte_tb_names WHERE table_name NOT LIKE 'Layers:%' and trim(table_name) > '')
        SELECT 
               string_agg( replace(regexp_replace(ogr_fdw_sql_table(conn, tb.table_name), 
                        'CREATE SERVER (.*);(.*)CREATE FOREIGN TABLE ([a-z0-9\_]+)', 
                  E'DROP FOREIGN TABLE IF EXISTS budget2015.\\3;CREATE FOREIGN TABLE budget2015.\\3'), 
            'myserver','svr_budget2015_workbook'), E'\n') As sql FROM tb, conn);
        
        EXECUTE var_sql;
    END ;$$;
    
    
  4. Finally to verify I got the worksheets as linked foreign tables, I run this query:

    SELECT table_name
        FROM information_schema.tables
    WHERE table_schema='budget2015'
    ORDER BY table_name;
    

    Which outputs this

             table_name
    ----------------------------
     categories_and_items
     dashboard
     debt_analysis
     form_2___equity
     form_3___income
     form_4___lump_sum_planning
     form_5___monthly_cash_flow
     form_7___allocation
     form_8___spending_details
     form_9___work_expenses

Now I can query some tables:

SELECT creditor, amount, interest
    FROM budget2015.debt_analysis 
WHERE interest < 0.08
ORDER BY creditor;
 creditor  | amount | interest
-----------+--------+----------
 Credit 1  |  20000 |   0.0775
 Credit 10 |        |     0.03
 Credit 11 |        |     0.04
 Credit 7  |        |  0.04875
 Credit 8  |        |     0.01
 Credit 9  |        |     0.02