Postgres OnLine Journal : February / March 2013

From the Editors

PostGIS 2.0.3 for Windows x32 and x64 installers

What's new and upcoming in PostgreSQL

PostgreSQL 9.3 postgres_fdw: a Test Drive

PostgreSQL Q & A

How to add database name to log output Beginner
Should I install using PostGIS extension

Application Development

Saying Happy Valentine in PostGIS Advanced

A Product of Paragon Corporation
http://www.paragoncorporation.com/
http://www.postgresonline.com/

From the Editors

PostGIS 2.0.3 for Windows x32 and x64 installers

We have all the setup executables for PostgreSQL 8.4-9.2 32-bit and PostgreSQL 9.1-9.2 x64-bit for PostGIS 2.0.3. We'll have the 9.0x64-bit ones up shortly but do have the binaries.

We are planning to release these to stackbuilder sometime next week, but would like some feedback first with any issues people run into.

One person already reported a serious bug in our original packaging which we fixed. If you already installed 2.0.3 the x64-bit version or any of the 9.2 versions (both x32 and x64) that was not marked as -2.zip, -2.exe, you'll want to reinstall. The 8.4-9.1 32-bits were not affected so just have a -1.

Packages can be downloaded from:
http://download.osgeo.org/postgis/windows

General details on what is included:

http://postgis.net/windows_downloads

As always if you are installing for PostgreSQL x64 make sure to pick a zip or installer with 64 in the name. The unmarked or marked with 32 are just for PostgreSQL 32-bit.

Back to Table Of Contents

What's new and upcoming in PostgreSQL

PostgreSQL 9.3 postgres_fdw: a Test Drive

Just recently saw Michael Paguier's article on the recent commit of postgres_fdw so I was excited to try this out for myself. Questions we wanted to answer/experience were:

Can I connect against different versions of PostgreSQL? Yes to test I tried connecting from my PostgreSQL 9.3 dev instance to my 9.2 instance on another box.
Can I use a foreign type like -- you guessed it PostGIS if we have same version of PostGIS installed on both databases? Yes. I didn't try with different versions so not sure if that would work especially since the structure changed a bit between 1.5 and 2.0. I suspect 2.0 and 2.1 would work fine and might be a good way to cheat run 2.1 on 9.3 but creating derivatives of my 2.0 data with functions only available in 2.1.
Will it use my spatial index? No or at least we couldn't figure out how and I don't see how it is possible, but would be nice if it could be
Can I have a table defined with dynamic SQL? Nope or at least the docs don't mention it. This is a feature I really liked about the ODBC_FDW because often times I want the table to be filtered especially if I need my filter to use an expensive index like a spatial one or utilize functions only available on the foreign server. If that were part of the definition of the foreign table, then it could in theory process that part on the foreign server. So this makes postgres_fdw not a good replacement for dblink in many cases.

So here are the tests we did to experiment with:

The 9.3 server

Download latest 9.3 snapshot and compile with the pgsql_fdw extension - all documented in Mike's short but very useful Postgres 9.3 feature highlight: postgres_fdw
Compile PostGIS - I'm using 2.1.0 SVN on both boxes though my 9.3 (compiled with mingw-w64 64-bit) is fresh and my 9.2 (Windows EDB provided PostgreSQL 9.2 64-bit) is a couple of weeks older version of 2.1.0. 2.1 is just too Cool that we even run it in production now.

Install both extensions on a new PostgreSQL 9.3 database

CREATE EXTENSION postgres_fdw;
 CREATE EXTENSION postgis VERSION "2.1.0SVN";

Setting up the Foreign Server, Users, Tables

For this test I connected to a tiger database I use for geocoding that has PostGIS 2.1.0SVN installed on it.

I created an account on our production server called: tiger_guest and gave it these rights

GRANT USAGE ON SCHEMA tiger TO tiger_guest;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO tiger_guest;
GRANT SELECT ON ALL TABLES IN SCHEMA tiger TO tiger_guest;

On spanking new 9.3 desktop server:

CREATE SERVER postgres92prod_server
 FOREIGN DATA WRAPPER postgres_fdw 
  OPTIONS (host 'prod_server', port '5432', dbname 'tiger');

CREATE USER MAPPING FOR public SERVER 
 	postgres92prod_server 
 	OPTIONS (user 'tiger_guest', password 'whatever');
 	
--this is just a subset of columns of the prod table --
 CREATE FOREIGN TABLE edges (gid int, statefp varchar(2), countyfp varchar(3)
  , tlid bigint, fullname varchar(100), the_geom geometry)
 SERVER postgres92prod_server OPTIONS (schema_name 'tiger', table_name 'edges');

To test some queries -- this one is really slow proving the spatial index is not used, the plan I suspect you can't fully rely on since it won't show the skipping inheritance tables etc which is done on prod server as needed:


'--run directly on main server takes 20ms, on 9.3 foreign table takes 370ms
SELECT fullname FROM edges 
 WHERE fullname > '' AND ST_DWithin(the_geom, ST_GeomFromText('POINT(-71.3 41.9)', 4269), 0.05) 
  LIMIT 10;

However this query with a helper fast scan is faster. It only uses spatial index on main server so hard to tell if btree index can be used.


'--run directly on main server takes 20ms, on 9.3 foreign table takes 9 seconds (46ms)
SELECT fullname FROM edges 
 WHERE fullname LIKE 'B%' 
  AND ST_DWithin(the_geom, ST_GeomFromText('POINT(-71.3 41.9)', 4269), 0.05) 
  LIMIT 10;

All and all a promising start.

Back to Table Of Contents

PostgreSQL Q & A

How to add database name to log output Beginner

PostgreSQL allows you to customize statement logging in numerous ways. You can globally set the level of logging you want at both the postgresql.conf (that will affect all databases) or at the database level using the various log_statement* variables. Most of these are documented in runtime config logging.

For this brief article, we'll talk about the log_line_prefix variable. By default this variable is very minimalistic and just prefixes the log lines with the date time of the statement. If you have just one database (like the way Oracle runs) this is not a big deal, but if you are like us and run several databases on one PostgreSQL instance, it would be nice to as part of the prefix include the database name.

Customizing Log Line Prefix

You can do this by changing your log_line_prefix in your postgresql.conf to include the %d variable. Something like
log_line_prefix = '%t %d '

You can also edit by using pgAdmin Server Configuration option.

After you have done this, make sure to reload the config with:

SELECT pg_reload_conf();

Or clicking the reload (green arrow icon) in pgAdmin Server Config dialog

The log_line_prefix can also include various other useful variables if you choose such as the %u user, and %h remote host.

Back to Table Of Contents

PostgreSQL Q & A

Should I install using PostGIS extension

Should I install using PostGIS extension? is an FAQ that comes up quite a bit in PostGIS circles and unfortunately askers get mixed answers. In How to upgrade your database to PostGIS 2.0 we covered WHY you should use extensions. In this article I'll use my Lincoln-Douglas debate skills to argue why you shouldn't. Keep in mind that I put a great deal of effort in fitting PostGIS extensions into the existing PostGIS build structure and I eat my own dogfood, so I might be a little biased and a poor defender of the counter argument.

Reasons why you shouldn't

Lets start off with the obvious reasons, why you simply can't use PostGIS extensions:

You are STILL using PostGIS 1.5 (or god forbid some even lower version) and have not embraced the kewlness of PostGIS 2 series
You are using PostgreSQL 9.0 or below. Unfortunately extensions is only available for PostgreSQL 9.1+.
You compiled PostGIS 2+ without raster support. Unfortunately some people ran into problems compiling with raster support and some distributions don't offer PostGIS 2 with raster support. Not being able to use the extension install process without PostGIS raster,I take a lot of blame for. Initially we were going to have raster as a separate extension dependent on the core PostGIS, much like the postgis_topology, and upcoming postgis_tiger_geocoder and pgrouting. I really wanted everybody to have raster and made it required for the PostGIS extension support - figuring it would nudge folks (particularly package maintainers) in the right direction. I figured, if little old me building on a crippled Microsoft windows platform could manage to compile with raster support, then certainly it must be a trivial exercise for a superior Linux/Unix/Mac OS compiled by an experienced Linux hacker to do it. Guess I was wrong there.

Now for the grey areas:

Somewhat tricky to test in with development versions
The biggest issue for development was that you couldn't upgrade your extension if you didn't have a versioned release and we never have versioned releases until we release. I think I took care of this issue by introducing, what I call the next AKA the yo-yo option:
ALTER EXTENSION postgis UPDATE TO "2.1.0SVNnext"; -- this you do the next time you upgrade -- ALTER EXTENSION postgis UPDATE TO "2.1.0SVN"; --if you need to upgrade again -- ALTER EXTENSION postgis UPDATE TO "2.1.0SVNnext";
This is the approach I use for upgrading my production systems running 2.1.0SVN and allowed me to discover a very important bug, which I still need to backport to 2.0. It's a good check for me to make sure we didn't introduce a feature in code base that is not upgradeable using extensions.

The main issue that remains is our regression model doesn't support testing extensions, because our make check, doesn't actually install PostGIS in PostgreSQL install folders, but instead installs in a staging area and changes the scripts to point to this staging area. Since extension script files currently have to be in local PostgreSQL extension folder, this means we can't test before installing.

I'm hoping we can come up with some work-around like defining a temporary extension something like postgis_check that gets deleted and dropped after make check is done similar to what we do with the regress database we create during make check. This in thinking about shouldn't be too difficult to do except for the issue the extension would have to be copied to final install folder and we need to know that before make install. Another option, which would be more powerful is have true older versions of PostGIS ready to be upgraded using our buildbots and have them try to do true upgrades as a second battery of tests.
Not invented here.
Some people just don't like things if they don't fully understand how it works, did not invent it, or do not feel they have full control of it. I'm not naming names, but you know who you are. They consider extensions VooDoo magic because it hides the fact there is a script doing something.
It reduces your freedom, much like not being able to run around with an armed weapon reduces your freedom. How does it reduce your freedom? Well with many of the PostGIS upgrade scripts, you could run thru them even if they resulted in errors. Extensions require a clean install, meaning nothing can fail. You can't drop functions and other dependent objects willy-nilly that are part of the extension just because you feel like it.

Why should you install with PostGIS extensions?

I'm going to sum it up, before I elaborate: It is the sanest, easiest way to go, especially for production use.. Now I'm going to bore you by repeating myself. You can stop here if you are already sold on using extensions. The main reason I poured so much effort into packaging the PostGIS extension is because it would solve a couple of my big pain points for both myself and many others.

Platform independent way of enabling PostGIS in your database. No matter which OS you are on, to enable PostGIS in a database you do:
CREATE EXTENSION postgis;
Before you had to find the files to install, which was particularly painful on windows because it was buried either in C:\Program Files\PostgreSQL\9.2\share\contrib\postgis... or C:\Program Files (x86)\PostgreSQL\9.2\share\contrib\postgis... or some other god awful place. And if you were on Linux/Unix -- good luck figuring out where your distro put these files. I always had to resort to doing a pg_settings query to figure this out.

Then you had to remember which files to install -- postgis.sql, rtpostgis.sql, spatial_ref_sys.sql. Right there I estimated we lost 80% of the audience that would otherwise use PostGIS.
Then there are those smart folks who think -- hey I can install two versions of PostGIS in the SAME database or better yet I can accidentally install 2 different versions of PostGIS in the SAME database or the same version in different schemas. COOL. When you accidentally stumble into this mess, you often don't find out until everyone is screaming at you that the queries have grinded to a halt.

The postgis extension, regardless of which schema you installed it, won't let you get into this drunken stupor. It will recognize you already have a version of PostGIS installed and won't let you move on without uninstall or just upgrading.

I know from experience and after lossing a couple of hours of my life, I vowed this will never happen to ME again. Precautions need to be put in to save Regina and others from this nightmare.
You can DROP the postgis extension safely.
DROP EXTENSION postgis;
will only drop postgis if you don't have tables or other extensions that rely on it. If you want to drop it and everything associated with it, you can use
DROP EXTENSION postgis CASCADE;
with caution. The non-extension PostGIS uninstall script, just drops without asking questions, possibly dropping data with it.
You can move PostGIS into another schema later if you are not happy with it in public with single line.
ALTER EXTENSION postgis SET SCHEMA some_schema;.

Back to Table Of Contents

Application Development

Saying Happy Valentine in PostGIS Advanced

A while back I mentioned to Bborie (aka dustymugs) and Sandro (aka strk): We're missing ability to label our images with text. Picture this: What if someone has got a road or a parcel of land and they want to stamp the parcel id or the road name on it and they don't want to have to depend on some piece of mapping software. Many report writers and databases can talk to each other directly such as we demonstrated in Rendering PostGIS raster graphics with LibreOffice and we've got raster functionality in the database already. Can we cut out the middleman? Some ideas came to mind. One we could embed a font lib into PostGIS thus having the ultimate bloatware minus the built-in coffee maker. strk screeched. He's still recovering from my constant demands of having him upgrade his PostgreSQL version. Okay fair enough.

Bborie suggested why don't you import your letters as rasters and then vectorize them. So that's exactly what we are going to demonstrate in this article and in doing so we will combine some of the new features coming in PostGIS 2.1 we've been talking about in Waiting for PostGIS 2.1 series.

Warning this article will have a hint of Rube Goldbergishness in it. If you are easily offended by seeing stupid tricks done with SQL, stop reading now. We are going to take a natural vector product and rasterize it just so we can vectorize it again so we can then rasterize it again. Don't think about it too much. It may trigger activity in parts of your brain you didn't know were there thus resulting in stabbing pains similar to what you experience by quickly guplping down a handful of Wasabi peas.

So here are the steps for creating your own font set you can overlay on your geometries and rasters.

See if you can spot the use of window functions and CTEs in these examples.

Step 1: Find a font you like

I haven't tried, but in theory you can just write out letters evenly spaced on a piece of paper and scan the paper so you have a single image. This would allow you to automate your own writing in the database thus allowing your database be able to impersonate you. A very exhilarating, but slightly disturbing idea.

Too many years of using a keyboard has made my technical drawing skills less than legible, so instead:

I grabbed a free font set called Kankin
Installed it on my computer
Opened up a spreadsheet and in each cell typed in a different letter across with the font.
Took a snapshot image of masterpiece spreadsheet. Which gave this image:
Imported this image into PostGIS with raster2pgsql using command-line packaged tool:
raster2pgsql -F -Y pics/fonts/kankin.png font_pics | psql -U postgres -d testpostgis210 -h localhost -p 5432
For the rest of these steps you are going to need to be running a fairly recent PostGIS 2.1.0 development version which has the ubber-cool ST_Tile function which we demonstrated here and is just well super cool.

Step 2: Chop your image into letter bits, reclass, vectorize

For this exercise I'm assuming the letters are evenly spaced. One of the changes in 2.1.0 is that ST_Polygon function ALWAYS returns a multipolygon. Prior versions sometimes returned Polygons.

Create a table to hold new letter vectors:

CREATE TABLE font_set(letter char(1), geom geometry(Multipolygon), font_name varchar(50)
  , CONSTRAINT pk_font_set PRIMARY KEY (letter,font_name));

Do a quick histogram using the raster ST_Histogram function to get a sense of where the pixel values that repesent letters. In a simple world every thing would be as clear as black and white . You'd only have 2 values: that which represents the drawing of a letter and that which represents whitespace. But our world has all this noise caused by differences in pressure you put on your pen, the ink running out and so forth, and the imperfectness of your snapshotting. Our histogram demonstates the issue.

WITH fp AS(SELECT ST_Tile(ST_Band(rast,1),ST_Width(rast)/36, ST_Height(rast)) As rast 
        FROM font_pics LIMIT 1 )
SELECT (h).*
FROM (SELECT ST_Histogram(rast) As h
        FROM fp) As foo;

The output of the above query is shown below and was so fast I didn't even bother timing it - so as you can see our letter drawing probably falls in the 0-30 range.

  min   |   max   | count |       percent
--------+---------+-------+----------------------
      0 |  2.9375 |  1984 |     0.11372234323054
 2.9375 |   5.875 |    23 |  0.00131835377737017
  5.875 |  8.8125 |    32 |  0.00183423134242806
 8.8125 |   11.75 |    16 | 0.000917115671214032
  11.75 | 14.6875 |    18 |  0.00103175513011579
14.6875 |  17.625 |    35 |  0.00200619053078069
 17.625 | 20.5625 |    16 | 0.000917115671214032
20.5625 |    23.5 |    22 |  0.00126103404791929
   23.5 | 26.4375 |    15 | 0.000859795941763155
26.4375 |  29.375 |    16 | 0.000917115671214032
 29.375 | 32.3125 |     0 |                    0
32.3125 |   35.25 |     0 |                    0
  35.25 | 38.1875 |    36 |  0.00206351026023157
38.1875 |  41.125 |    36 |  0.00206351026023157
 41.125 | 44.0625 |   541 |   0.0310099736329245
44.0625 |      47 | 14656 |    0.840077954832053

Looking at the above and eyeballing looks like we want to keep 0 - 38 and chuck the rest. What we need is a way to reclassify our world into black and white.

Luckily PostGIS 2+ has a function ST_Reclass that allows us to chuck the pixel values we don't want. Now we reclassify and vectorize. See the Seamless Vector / Raster operations flying around like a lab experiment gone nuts. We leave it as an exercise to the reader to figure out what we are doing in this snippet -- or you can read the Second edition of PostGIS in Action book in our upcoming Chapter 7. Chapter 7 first draft will be available on MEAP next month and we hope to explain some of these operations.

INSERT INTO font_set(letter, geom, font_name)
WITH fp AS(SELECT ST_Tile(ST_Band(rast,1),ST_Width(rast)/36, ST_Height(rast)) As rast 
        FROM font_pics WHERE filename = 'kankin.png' )
    , fpi As (SELECT (ROW_NUMBER() OVER())::integer As pos, rast
FROM fp)
SELECT substring('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', pos,1) As letter
    , ST_Polygon(
        ST_SetUpperLeft(ST_SetBandNoDataValue(
            ST_Reclass(ST_Band(rast,1)
            ,'[0-38]:1,(38-255]:0'::text, '1BB'::text)
              ,0),0,0)
       ) As geom
    , 'kankin'
FROM fpi;

Putting our fontset to work

So now we've got a table of vectorized letters. To make our new vectorized fontset easier to use, we create a word function.

CREATE OR REPLACE FUNCTION get_word_vector(param_word text, param_font_name text) 
 RETURNS geometry
AS
$$
  WITH letters As
   (SELECT geom, width, sum(width + 1) OVER (ORDER By pos) As run_length
        FROM (SELECT geom, letter,ROW_NUMBER() OVER() As pos, ST_XMax(geom) As width
                , ST_YMax(geom) As height
        FROM font_set CROSS JOIN regexp_split_to_table($1, E'') As l 
        WHERE letter = l AND font_name = $2) As foo
)

SELECT ST_Union(
     ST_Translate(geom, run_length - width,0) 
     ) As word_geom
FROM letters; 
$$
language 'sql';

And now we use it to write a somewhat complicated looking SQL statement that can probably be wrapped into a function and be significantly shorter. This utilizes the ST_Union aggregate function for raster, which is much improved in speed in upcoming 2.1.

-- takes about 300ms to do this (that includes outputting the png image)
WITH the_words AS
        (SELECT pos, ST_Translate(get_word_vector(word, 'kankin'), 0, -pos*100) As geom
                FROM  (VALUES (0,'HAPPY'),(1,'POSTGIS'),(2,'VALENTINE')  ) AS my(pos, word)
        )
 , canvas
    AS (SELECT ST_MakeEmptyRaster((ST_XMAX(ext) - ST_XMin(ext))::integer, 
        (ST_YMAx(ext) - ST_YMin(ext))::integer,0, 0, 1, -1, 0, 0, 0) As rast
        FROM (SELECT ST_Extent(geom) As ext FROM the_words) As foo
            )
SELECT ST_AsPNG(
    ST_Resize(ST_Union(ST_AsRaster(geom
     , canvas.rast
     , ARRAY['8BUI','8BUI', '8BUI']::text[]
     , ARRAY[pos*75,6,0], ARRAY[0,0,0]) 
        ), 0.3,0.3))
     FROM  the_words CROSS JOIN canvas;

Which of course outputs this:

Back to Table Of Contents