Caching data with materialized views and statement level triggers

One exciting new feature coming in PostgreSQL 9.3 is materialized views. Materialized views is really a mechanism for caching data of a query. It is especially useful if you have long running queries where the answers change infreqently. There are many things unfortunately that materialized views won't do where you are still better off with regular views.

They don't refresh themselves automatically. Thus requiring a cron job/pgagent job or a trigger on something to refresh.
They can't be user dependent or time dependent. For example if you have a view that does something like WHERE user=current_user(), then a materialized view is out of the question.

For PostGIS work we need some views that take more than 10 seconds to compute. For example for one project we have a base parcel table and tables broken out by fiscal year that inherit from the base parcel table. We also have housing and neighborhood revitalization projects whose parcels are keyed based on parcel id. Housing and revitalization projects by their very nature change the landscape by creating and destroying parcels. So the parcel set you started out with may not be what you end up with. In order to reference an older project that has changed the parcel divide landscape, we sometimes need to reference old no longer existent parcels. What we do to achieve this is to have a table, We'll call it parcelmax that contains 3 columns -- the fiscal year a parcel was last seen, the parcel id, and the geometry of the parcel. Quering our parcel hierarchy takes too long so as part of the process of adding a new table we always rebuild this table from the parcel hierarchy. We, now in PostgreSQL 9.3, have the option to use a materialized view.

Here is structures of our tables just for last 2 fiscal years.

CREATE TABLE parceltime(pid varchar(20), geom geometry(MultiPolygon,26986)
  , fy integer
  ,   CONSTRAINT pk_parceltime PRIMARY KEY (pid, fy));

CREATE TABLE parcel_2013(
  CONSTRAINT pk_parcel2013 PRIMARY KEY (pid,fy)
  ) INHERITS (parceltime);
ALTER TABLE parcel_2013 ALTER COLUMN fy SET DEFAULT 2013;
ALTER TABLE parcel_2013 ADD CONSTRAINT chk_fy CHECK (fy = 2013);

CREATE TABLE parcel_2012(
  CONSTRAINT pk_parcel2012 PRIMARY KEY (pid,fy)
  ) INHERITS (parceltime);
ALTER TABLE parcel_2012 ALTER COLUMN fy SET DEFAULT 2012;
ALTER TABLE public.parcel_2012 ADD CONSTRAINT chk_fy CHECK (fy = 2012);

Building the materialized view

The materialized view to replace our old parcelmax looks like this:

DROP MATERIALIZED VIEW IF EXISTS parcelmax;
CREATE MATERIALIZED VIEW parcelmax AS
SELECT DISTINCT ON(pid) pid, fy, geom
  FROM parceltime
  ORDER BY pid, fy DESC;
  
CREATE UNIQUE INDEX iudx_parcelmax_pid ON parcelmax (pid);
CREATE INDEX idx_parcelmax_geom ON parcelmax USING gist(geom);

So now whenever we update our parcel hierarchy or add a new table to the hierarchy, we need to run:

REFRESH MATERIALIZED VIEW parcelmax;

Triggering automagical rebuild

This is a smidge better than what we used to do, but frankly not that much better. Most of the time where we forget to refresh our table happens when we have to correct geometries in a table or delete an invalid record. It's these times we wish something automagical would happen. Luckily PostgreSQL has something called statement triggers. We've never had much need for statement level triggers except for logging. However the marriage between materialized views and statement level triggers seems like a good one especially for tables you update infrequently.

Here is the code to cause the view to rebuild in event any dependency tables change:

CREATE OR REPLACE FUNCTION trig_refresh_parcelmax() RETURNS trigger AS
$$
BEGIN
    REFRESH MATERIALIZED VIEW parcelmax;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql ;


CREATE TRIGGER trig_01_refresh_parcelmax AFTER TRUNCATE OR INSERT OR UPDATE OR DELETE
   ON parcel_2013 FOR EACH STATEMENT
   EXECUTE PROCEDURE trig_refresh_parcelmax();

CREATE TRIGGER trig_01_refresh_parcelmax AFTER TRUNCATE OR INSERT OR UPDATE OR DELETE
   ON parcel_2012 FOR EACH STATEMENT
   EXECUTE PROCEDURE trig_refresh_parcelmax();

Test out our statement makeup

To test this architecture out, we'll add some records to both tables.

-- Yields 0 records --
SELECT COUNT(*) FROM parcelmax;
-- testing (10500 records added)
INSERT INTO parcel_2012(pid, geom)
SELECT (i::text || size::text)::varchar(20)
  , ST_Multi(ST_Buffer(
        ST_Transform(
           ST_SetSRID(
             ST_Point(-71.0574 + i*0.00001,42.3581 + i*0.00001)
             ,4326)
             ,26986), size))
FROM generate_series(1,1000,2) i , generate_series(100,300,10) As size;

-- yields 10500 records --
SELECT COUNT(*) FROM parcelmax;

--testing add to parcel_2013 (12600 records added)
INSERT INTO parcel_2013(pid, geom)
SELECT (i::text || size::text)::varchar(20)
  , ST_Multi(ST_Buffer(
        ST_Transform(
           ST_SetSRID(
             ST_Point(-71.0574 + i*0.00001,42.3581 + i*0.00001)
             ,4326)
             ,26986), size))
FROM generate_series(1,1200,2) i , generate_series(100,300,10) As size;

-- yields 12600 records --
SELECT COUNT(*) FROM parcelmax;