In prior articles we talked about the new PostgreSQL 9.1 extension model and upcoming PostGIS 2.0 extensions which we have experimental builds of so far. In this article and other's to follow, we shall provide a documentary of our venture into this new extensions world. We'll discuss some of the obstacles we had with building extensions, lessons learned, and foolishness exposed, with the hope that others can learn from our experience.
First off, the upcoming PostGIS 2.0 extensions will be packaged as at least two extensions -- postgis which will contain both PostGIS geometry/geography types, functions, meta views and tables as well as raster type and associated functions and tables. Topology support, while a part of upcoming PostGIS 2.0, will be packaged as a separate extension called postgis_topology. The main reason for breaking topology out as a separate extension is that it is always stored in a schema called topology and is not relocatable to another schema. The way the current extension model works, all the parts of your extension should live in the same schema. Later we plan to package tiger geocoder as an extension, but this one probably makes more sense to live on http://pgxn.org/ since it is only of interest to United States users, , is purely plpgsql with dependency on PostGIS, and we had beefed it up as part of a consulting contract for a company running PostGIS 1.5. It's the only piece documented in PostGIS 2.0 that works on 1.5 as well (aside from the tiger toplogy loader which has dependency on toplogy), although it has always lived as an extra in the PostGIS code base.
We'll probably package postgis_legacy_functions as an extension too for those people who badly need those 500 alias functions I chucked.
We mentioned in our prior article that we ran into some issues with how our extension worked -- e.g. topology referencing the postgis extension. Most of these turned out just to be ignorance on my part as to how the different pieces fit together and I'll elaborate on these.
Much of what will be described here is also documented in Packaging Related Objects into an Extension.
In the future I'm hoping we'll also see plr and pgrouting packaged as extensions which are common favorites of PostGIS users.
First off, why do you even want to use the extension model?
One line statement to install a set of functions in a database that are aware of the fact that they exist as a unit.
CREATE EXTENSION postgis_topology;
One line statement to bundle loose functions as a contiguous unit.
CREATE EXTENSION postgis_topology FROM unpackaged;
ALTER EXTENSION postgis_topology UPDATE TO '2.0.0a1';
DROP EXTENSION postgis_topology;
Or if you really really want to drop everything dependent on an extension, including YOUR DATA
DROP EXTENSION postgis_topology CASCADE;
For example -- this is the message I would get if I tried to run: CREATE EXTENSION postgis_topology;
without postgis extension first being installed
ERROR: required extension "postgis" is not installed
SELECT * FROM pg_available_extensions;
All these benefits take some effort on the extension packager to make possible.
It knows by file name convension and the control file. For example, for postgis_topology, we have these files in the share/extension folder named:
postgis_topology.control postgis_topology--2.0.0a1.sql postgis_topology--2.0.0a1--2.0.0.sql postgis_topology--unpackaged--2.0.0a1.sql
PostgreSQL looks at this and now knows there exists a version 2.0.0a1 and a file that will migrate a 2.0.0 to 2.0.0a1. This is because I mislabeled the first version of the extension 2.0.0 though 2.0.0 is not released yet. This allows the packager to make possible both downgrade and upgrade paths. the sql files (except for the unpackaged), are the same scripts you would run in prior versions of PostgreSQL, except they are not allowed to contain BEGIN/COMMIT; clauses. So in short making a PostGIS extension that can only install new versions was relatively painless once you learned the nuances of json, control and make install script conventions of the extension model.
The control file is a text file that dictates what other extensions are required, what is the default version to install if none is specified, if an extension can be relocated to a different schema and if not, what schema should it be installed in. Below is an example. NOTE: my version has a a1 at the end since its an alpha release.
# postgis topology extension comment = 'postgis topology spatial types and functions' default_version = '2.0.0a1' relocatable = false schema = topology requires = postgis
The install from unpackaged is a special CREATE EXTENSION command that doesn't install new functionality, but bundles functions, tables etc. you already have in your database as an extension. You would need this feature for example if you installed PostGIS 2.0.0 not as an extension and want it to become an extension. In this scenario you have 900+ PostGIS functions floating about unaware of the fact that they are part of the same family. This command will christen them as members of The PostGIS family. In the case of PostGIS Topology family, we are talking about 70 someodd functions, types, and meta tables. Still a number too large to count on your fingers and toes.
The CREATE EXTENSION postgis_topology FROM unpackaged
is all made possible by a script in the share/extension folder that has the name something like
postgis_topology--unpackaged--2.0.0a1.sql
However it looks very different from an install script in that the commands in it are of the form:
ALTER EXTENSION postgis_topology ADD FUNCTION ...;
repeat the recipe for every object that your package contains
When you are talking about 70 functions that's scary, and when 900+ with developers adding new ones daily, that is a nightmare
unless you have a plan.
Sed and I have known each other for a while, but we have been in the past, casual interlopers. I dismissed him as a tool of those underground UNIX folks too in love with their cryptic commands to see the light of day and he dismissed me as a misguided window's user unequipped to appreciate the marvels of stream manipulation.
Then one day I had a problem that it seemed Sed might have the best answer to and a 1 line answer at that. I love one line answers which is why I love SQL and spatial SQL.
Problem: How do I convert Sandro's topology.sql install script file which has commands like:
CREATE TABLE topology.topology (
id SERIAL NOT NULL PRIMARY KEY,
name VARCHAR NOT NULL UNIQUE,
SRID INTEGER NOT NULL,
precision FLOAT8 NOT NULL,
hasz BOOLEAN NOT NULL DEFAULT false
);
CREATE OR REPLACE FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer)
RETURNS SETOF topology.GetFaceEdges_ReturnType
AS
$$
stuff here
$$
$$
LANGUAGE 'plpgsql' VOLATILE;
to: something that looks like this:
ALTER EXTENSION postgis_topology ADD TABLE topology.topology;
ALTER EXTENSION postgis_topology ADD FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer);
And Sed said, "I can do that with one line of code." More on that later