STRICT on SQL Function Breaks In-lining Gotcha

One of the coolest features of PostgreSQL is the ability to write functions using plain old SQL. This feature it has had for a long time. Even before PostgreSQL 8.2. No other database to our knowledge has this feature. By SQL we mean sans procedural mumbo jumbo like loops and what not. This is cool for two reasons:

This inlining feature is part of the secret sauce that makes PostGIS fast and easy to use. So instead of writing geom1 && geom2 AND Intersects(geom1,geom2) -- a user can write ST_Intersects(geom1,geom2) . The short-hand is even more striking when you think of the ST_DWithin function.

With an inlined function, the planner has visibility into the function and breaks apart the spatial index short-circuit test && from the more exhaustive absolute test Intersects(geom1,geom2) and has great flexibility in reordering the clauses in the plan.

In PostGIS 1.5, we accidentally broke this secret sauce for Geography ST_Intersects and ST_Covers, ST_CoveredBy by putting in a STRICT clause in our SQL function declaration as documented in our bug ticket. So a query that would normally take 50 ms to run was taking 10 seconds.

There is nothing we could find that suggests STRICT should have this effect. Is it by design in PostgreSQL or a bug? STRICT should in theory ensure that any input going into a function that is NULL should result in a NULL output, but how does this translate to loss of transparency?

To demonstrate difference in what the plans look like, how you can tell planner is loosing visibility into the function. We will create our dummy data set.


-- create dummy PostGIS geography data --
CREATE TABLE geogtest(gid SERIAL primary key, geog geography(POLYGON,4326));
CREATE INDEX idx_geogtest_geog
  ON geogtest
  USING gist
  (geog);
  
INSERT INTO geogtest(geog)
SELECT ST_Buffer(geog,random()*10) As geog
FROM (SELECT ST_GeogFromText('POINT(' || i*0.5 || ' ' || j*0.5 || ')') As geog
    FROM generate_series(-350,350) As i 
        CROSS JOIN generate_series(-175,175) As j
        ) As foo
LIMIT 1000;

vacuum analyze geogtest;

Then we create two versions of our function, one with STRICT and one without STRICT. The functions are otherwise exactly the same.


-- create our 2 intersects functions  -- 
CREATE OR REPLACE FUNCTION ST_IntersectsBlackBox(geography, geography)
  RETURNS boolean AS
$$
SELECT $1 && $2 AND _ST_Distance($1, $2, 0.0, false) < 0.00001
$$
  LANGUAGE 'sql' IMMUTABLE STRICT;
  
CREATE OR REPLACE FUNCTION ST_IntersectsTransparent(geography, geography)
  RETURNS boolean AS
$$
SELECT $1 && $2 AND _ST_Distance($1, $2, 0.0, false) < 0.00001
$$
  LANGUAGE 'sql' IMMUTABLE;  


Then we test them out.


-- THE BIG BAD BLACK, UGLY, AND SLOW ---
SELECT f.gid as gid1, count(f2.gid) As tot
FROM geogtest As f INNER JOIN geogtest As f2
    ON ST_IntersectsBlackBox(f.geog, f2.geog)
GROUP BY f.gid;

-- takes 5775 - 6000 ms on PostgreSQL 9.0 beta 1 - similar for 8.4 and 8.3
-- Explain analyze 
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.00..348794.42 rows=1000 width=8) (actual time=6.850..6720.039 rows=1000 loops=1)
   ->  Nested Loop  (cost=0.00..347115.25 rows=333333 width=8) (actual time=0.102..6717.928 rows=1000 loops=1)
         Join Filter: st_intersectsblackbox(f.geog, f2.geog)
         ->  Index Scan using geogtest_pkey on geogtest f  (cost=0.00..115.25 rows=1000 width=2308) (actual time=0.014..1.039 rows=1000 loops=1)
         ->  Seq Scan on geogtest f2  (cost=0.00..87.00 rows=1000 width=2308) (actual time=0.001..0.443 rows=1000 loops=1000)
 Total runtime: 6720.376 ms


black box

-- THE BEAUTIFUL, FAST, and NOTHING TO HIDE --
SELECT f.gid as gid1, count(f2.gid) As tot
FROM geogtest As f INNER JOIN geogtest As f2
    ON ST_IntersectsTransparent(f.geog, f2.geog)
GROUP BY f.gid;

-- Takes 48 ms on PostgreSQL 9.0 beta 1 - similar for 8.4 and 8.3
-- Explain analyze 
QUERY PLAN
--------------------------------------------------------------------------------
 HashAggregate  (cost=970.52..983.02 rows=1000 width=8) (actual time=34.100..34.491 rows=1000 loops=1)
 ->  Nested Loop  (cost=0.00..963.86 rows=1333 width=8) (actual time=0.205..33.154 rows=1000 loops=1)
       Join Filter: (_st_distance(f.geog, f2.geog, 0::double precision, false) < 1e-005::double precision)
       ->  Seq Scan on geogtest f  (cost=0.00..87.00 rows=1000 width=2308) (actual time=0.009..0.619 rows=1000 loops=1)
       ->  Index Scan using idx_geogtest_geog on geogtest f2  (cost=0.00..0.61rows=1 width=2308) (actual time=0.025..0.026 rows=1 loops=1000)
             Index Cond: (f.geog && f2.geog)
 Total runtime: 35.720 ms


transparent

Okay there is a lot you can see yada yada yada like for those of us who lack patience and let our eyes gaze yawningly only to be mesmerized by the colors and the lines of the PgAdmin diagram - in the PgAdmin graphical picture the big fat line leading to the Group Aggregate is obviously fatter than the skinnier line leading to the Hash aggregate for the transparent one. The most important giveaway is that in the beautiful plan, we don't see the SQL function named anywhere. Its dissolved into two parts -- the spatial index condition and the more costly _ST_Distance.., but in the BIG BLACK and UGLY there it is ST_IntersectsBlackBox like a bullet-proof impenetrable vest that the planner just takes as is like a patient afraid of the surgeon's scalpel and hiding a big tumor screaming: I'm fine really, don't touch me. Nothing to see here.

Now there are certain conditions where you want your function to be a black box, like in cases when there is absolutely no way the planner can optimize it with an index scan. There is no point in wasting the planner's time allowing it to inspect it. However in these cases, we want the planner to say Ah yes I see you are using a construct that can be aided with this spatial index I have here. Let me take you apart and put you back together in a more efficient order.