Materialized geometry_columns using Event Triggers

One of the big changes in PostGIS 2.0 was that the geometry_columns table became a read only view. This change was not without sacrifice. On the plus it meant a table no longer needed to be maintained which was a relief for those who had to constantly ensure the sideline table was up to date. Also on the plus it meant we could base the views on the current user so that if a user connected, they would only see tables they had access to. The sacrifice made was a bit of speed. In most use cases, the speed difference is hardly noticeable and only noticeable if you have a query constantly polling this table. However if you have a database of 15,000 geometry tables it could take as long as 450ms as noted in this ticket.

So just playing with ideas, could the new PostgreSQL 9.3 event triggers and materialized views provide a solution. Keep in mind this is just food for thought. We're not condoning people go out and hack their PostGIS install. We suspect if we implement such a thing in PostGIS core it will change from what we propose here. If you saw our prior article Caching data with materialized views and statement level triggers you know where this is going. We'll add yet another concept to this cocktail and that is what we'll call schema hiding which by that we mean counting on the order of a search_path to hide a named table/view you don't want.

Step 1: Install PostGIS in its own schema

Step one is to install PostGIS in its own schema and make sure it comes after the public schema. This insures that if you do for some reason create a geometry_columns view or table or what have you in public, it will be used before the postgis packaged one.

CREATE SCHEMA postgis;
CREATE EXTENSION postgis SCHEMA postgis;
ALTER DATABASE test_gis SET search_path = "$user",public,postgis;

Step 2: Create materialized geometry_columns

Now we'll create a geometry_columns materialized view in the public scheam. Note that because in step 1 we set the search path such that the geometry_columns view provide by postgis would be used after one in the user's schema or public schema. So now we are in position to put forth an alternative geometry_columns and one that will be used instead of the installed one.

The view we build is almost identical to the one provided with postgis except we took out the logic that limits the list of tables to just those the current user has select rights to. We don't want the materialized view to be user dependent.

CREATE MATERIALIZED VIEW public.geometry_columns AS 
 SELECT current_database()::character varying(256) AS f_table_catalog, 
    n.nspname::character varying(256) AS f_table_schema, 
    c.relname::character varying(256) AS f_table_name, 
    a.attname::character varying(256) AS f_geometry_column, 
    COALESCE(NULLIF(postgis.postgis_typmod_dims(a.atttypmod), 2)
    , postgis.postgis_constraint_dims(n.nspname::text, c.relname::text, a.attname::text), 2) AS coord_dimension, 
    COALESCE(NULLIF(postgis.postgis_typmod_srid(a.atttypmod), 0)
    , postgis.postgis_constraint_srid(n.nspname::text, c.relname::text, a.attname::text), 0) AS srid, 
    replace(replace(COALESCE(NULLIF(upper(postgis.postgis_typmod_type(a.atttypmod)), 'GEOMETRY'::text)
      , postgis.postgis_constraint_type(n.nspname::text, c.relname::text, a.attname::text)::text
      , 'GEOMETRY'::text), 'ZM'::text, ''::text), 'Z'::text, ''::text)::character varying(30) AS type
   FROM pg_class c, 
    pg_attribute a, 
    pg_type t, 
    pg_namespace n
  WHERE t.typname = 'geometry'::name 
     AND a.attisdropped = false 
     AND a.atttypid = t.oid 
     AND a.attrelid = c.oid 
     AND c.relnamespace = n.oid 
     AND (c.relkind = 'r'::"char" OR c.relkind = 'v'::"char") 
     AND NOT pg_is_other_temp_schema(c.relnamespace) 
     AND NOT (c.relname = 'raster_columns'::name) ;

Step 3: Create Event trigger on table CREATE, ALTER, DROP, CONSTRAINT ..

For this magic step, we are going to define a data definition language trigger (DDL) that refreshes our view whenever a new table is created or altered or we add constraints. We ned the constraint one to catch constraint based geometry tables.

CREATE OR REPLACE FUNCTION trig_refresh_geometry_columns() RETURNS event_trigger AS $$
BEGIN
    IF tg_tag IN('CREATE TABLE','CREATE TABLE AS'
        , 'CREATE FOREIGH TABLE', 'DROP TABLE', 'ALTER TABLE'
        , 'CREATE VIEW', 'ALTER VIEW') THEN
      REFRESH MATERIALIZED VIEW public.geometry_columns;
    END IF;
END;
$$ LANGUAGE plpgsql;
 
CREATE EVENT TRIGGER trig_01_refresh_geometry_columns ON ddl_command_end 
 EXECUTE PROCEDURE trig_refresh_geometry_columns();     

Step 4: Test Drive

CREATE TABLE testgeom(gid serial primary key, geom geometry(POINT,4326) );
ALTER TABLE testgeom ADD COLUMN geom2 geometry(POLYGON, 4326);

SELECT f_table_name As tname, f_geometry_column As gc, srid, type 
 FROM public.geometry_columns;
  tname   |  gc   | srid |  type
----------+-------+------+---------
 testgeom | geom  | 4326 | POINT
 testgeom | geom2 | 4326 | POLYGON

Test 5: Speed test

The script below generates 1000 geometry tables.

 
DO LANGUAGE 'plpgsql'
$$
DECLARE i int;
BEGIN
  FOR i IN 1 .. 1000 LOOP
      EXECUTE 'CREATE TABLE gtest' || i::text || '(gid serial primary key, geom geometry(POINT,4326))';          
  END LOOP;
        
END 

$$

If run with the trigger in place, it takes 63062 ms (a little over a minute to finish). Without the trigger it takes 3,831ms ~ 3seconds). This is because each call to CREATE TABLE causes a REFRESH VIEW. So moral of that story, if you plan to create 1000 tables in one go, then disable the trigger and manually refresh the view yourself.

ALTER EVENT TRIGGER trig_01_refresh_geometry_columns DISABLE;
--do your crazy thing -
ALTER EVENT TRIGGER trig_01_refresh_geometry_columns ENABLE;
REFRESH MATERIALIZED VIEW geometry_columns;

Now to test speed of querying. Since we put the public first in our search_path, when we connect to database, the public.geometry_columns one will be used if we don't schema qualify.

-- 11ms
SELECT count(*) from geometry_columns; 

-- 41ms 
SELECT count(*) from postgis.geometry_columns;

-- this uses the one in public - takes 11 ms
SELECT srid, type, count(*) as tot 
  FROM geometry_columns
  GROUP BY srid,type;


--this uses the one in postgis takes 61 ms
SELECT srid, type, count(*) as tot 
  FROM postgis.geometry_columns
  GROUP BY srid,type;