Lessons learned Packaging PostGIS Extensions: Part 2

One of the great lessons learned in building PostGIS extensions is my rediscovery of SED. SED turned out to be mighty useful in this regard and I'll explain a bit in this article. Unfortunately there is still a lot I need to learn about it to take full advantage of it and most of my use can be summed up as monkey see, monkey scratch head, monkey do. In addition I came across what I shall refer to as Pain points with using the PostgreSQL Extension model. Part of which has a lot to do with the non-granular management of changes in PostGIS, the day to day major flux of changes happening in PostGIS 2.0 space, and my attempt at trying to creat upgrade freeze points amidst these changes. When PostGIS 2.0 finally arrives, the freeze points will be better defined and not change from day to day. So some of these issues may not be that big of a deal.

The Joy of SED: Building unpackaged extension install scripts

In the Packaging PostGIS Extensions Part 1, I mentioned a problem I had with converting a topology install script to an Extension unpackaged script. The idea of the extension unpackaged script is to package already installed loose functions, tables, types, operators,etc into a package. This would come about if you installed a contrib module the old fashioned way and then wanted to mark it as an extension so you could experience all the joys the extension model provides.

Problem: So the topology install script looked something like this:

CREATE TABLE topology.topology (
	id SERIAL NOT NULL PRIMARY KEY,
	name VARCHAR NOT NULL UNIQUE,
	SRID INTEGER NOT NULL,
	precision FLOAT8 NOT NULL,
	hasz BOOLEAN NOT NULL DEFAULT false
);
CREATE OR REPLACE FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer)
  RETURNS SETOF topology.GetFaceEdges_ReturnType
AS
$$
stuff here
$$
$$
LANGUAGE 'plpgsql' VOLATILE;
and I needed to change it to something that looks like this:
ALTER EXTENSION postgis_topology ADD TABLE topology.topology;
ALTER EXTENSION postgis_topology ADD FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer);

This little sed script worked well:

sql/$(EXTENSION)--unpackaged--$(EXTVERSION).sql: ../../topology/topology.sql
	sed -e '/^CREATE \(OR REPLACE\|TRIGGER\|TYPE\|TABLE\|VIEW\)/!d;' \
		-e 's/OR REPLACE//g' \
		-e 's/CREATE\(.*\)/ALTER EXTENSION $(EXTENSION) ADD\1;/' \
		-e 's/DEFAULT [a-zA-Z]\+//g' \
		-e 's/\(BEFORE\|AS\)\(.*\)/;/' \
		-e 's/(;/;/' \
		-e 's/\\(;/;/' \
		-e 's/;;/;/g' $< > $@
	

, though needs some slight modifications to work more seamlessly. To get it to work I had to make sure all the function arguments were on a single line in topology.sql. I've been struggling to get it to work with functions where the arguments are broken up in multiple lines. I've learned enough to know it's possible, but just haven't come up with the right sequence of sed to make it happen. This was a minor set back since most functions in PostGIS geometry, geography, raster, and topology have function args defined on a single line in there respective scripts so only about 10 or so need changing. It's just bothersome that you can create a valid install script that can't be converted to an unpackaged script. Any people with suggestions I'm all ears.

To summarize what this script does.

The Joy of SED: Building extension upgrade scripts

I developed a similar answer for creating an upgrade script and this one actually doesn't care about multilines etc. This worked great for topology, but for combining raster / postgis combo package I ran into some set backs. These set backs have more to do with limitations of extensions model than anything to do with SED. My script looked like this:

sql/$(EXTENSION)--2.0.0a2--$(EXTVERSION).sql: sql_bits/postgis_raster_upgrade_minor.sql
	sed  -e 's/BEGIN;//g' -e 's/COMMIT;//g' \
		 -e '/^\(CREATE\|ALTER\) \(CAST\|TYPE\|TABLE\|SCHEMA\|DOMAIN\|TRIGGER\).*;/d' \
	 	 -e '/^\(CREATE\|ALTER\|DROP\) \(CAST\|TYPE\|TABLE\|SCHEMA\|DOMAIN\|TRIGGER\)/,/\;/d' \
	 	 -e 's/^DROP \(AGGREGATE\|FUNCTION\) [\(IF EXISTS\)]*\(.*\);/ALTER EXTENSION $(EXTENSION) DROP \1 \2;DROP \1 IF EXISTS \2 ;/' \
	 	 $<< >> $@

Basically the intent was to create a micro upgrade script. The first line strips BEGIN and COMMIT clauses since the CREATE and ALTER Extension commands put in their own begin and commit.

Since PostGIS doesn't introduce any new complex types in theory for micro upgrades, I could avoid including CASTS, TABLES, SCHEMA, DOMAIN etc. creations. That is what the second line of the sed part does -- strip out all undroppable object types from the update that is all defined on a single line. The third line drops alls such things defined on multiple lines by deleting all phrases that start with CREATE ... up to the ;. It wouldn't work for functions since they could have ; in the body, but for CASTS types etc it works fine. The final sed command looks for AGGREGATE or FUNCTION drops and then adds a line before them to drop the AGGREGATE/FUNCTION from the EXTENSION. This is where my issue starts.

In the PostGIS 2.0 wind of changes, people are renaming function argument names every so often, changing AGGREGATES internals, or dropping functions replacing them with equivalents that take default args. Such changed require a DROP of the Aggregate or Function first if it exists. IF EXISTS works great since you don't have to worry about the case of your script throwing an error if it doesn't exist. However, the ALTER EXTENSION syntax does not support the DROP IF EXISTS, so it has to exist. For my older installs, the functions I am trying to drop may not exist. If a object is part of an EXTENSION, it has to be dropped from the extension before you can drop it. This forces me to use a DROP CASCADE clause, which I hate using because it could destroy user defined objects that aren't part of the extension. In those cases I would prefer the update to fail rather than risking destroying user generated data. Keep in mind that in standard release time, these deeper kind of changes wouldn't be allowed so might be a non-issue.