One of the coolest features of PostgreSQL is the ability to write functions using plain old
SQL. This feature it has had for a long time. Even before PostgreSQL 8.2. No other database to our knowledge has this feature. By SQL we mean sans procedural mumbo jumbo like
loops and what not. This is cool for two reasons:
- Plain old SQL is the simplest to write and most anyone can write one and is just what the doctor ordered in many cases. PostgreSQL even allows you to write
aggregate functions with plain old SQL. Try to write an aggregate function in SQL Server
you've got to pull out your Visual Studio this and that and do some compiling and loading and you better know C# or VB.NET. Try in MySQL and you better learn C.
Do the same in PostgreSQL (you have a large choice of languages including SQL) and the code is simple to write. Nevermind
with MySQL and SQL Server, you aren't even allowed to do those type of things on a shared server or a server where the IT department is paranoid. The closest
with this much ease would be Oracle, which is unnecessarily verbose.
- Most importantly -- since it is just SQL, for simple user-defined functions, a PostgreSQL sql function can often be in-lined into the overall query plan since
it only uses what is legal in plain old SQL.
This inlining feature is part of the secret sauce that makes PostGIS fast and easy to use.
So instead of writing geom1 && geom2 AND Intersects(geom1,geom2) -- a user can write
ST_Intersects(geom1,geom2) . The short-hand is even more striking when you think of the ST_DWithin function.
With an inlined function, the planner has visibility into the function and breaks apart the
spatial index short-circuit test && from the more exhaustive absolute test Intersects(geom1,geom2)
and has great flexibility in reordering the clauses in the plan.
In PostGIS 1.5, we accidentally broke this secret sauce for Geography ST_Intersects and ST_Covers, ST_CoveredBy
by putting in a STRICT clause in our SQL function declaration as documented in our bug ticket.
So a query that would normally take 50 ms to run was taking 10 seconds.
There is nothing we could find that suggests STRICT should have this effect. Is it by design in PostgreSQL or a bug?
STRICT should in theory ensure that any input going into a function that is NULL should result in a NULL output, but how does this translate to loss of transparency?
To demonstrate difference in what the plans look like, how you can tell planner is loosing visibility into the function. We will create our dummy data set.
CREATE TABLE geogtest(gid SERIAL primary key, geog geography(POLYGON,4326));
CREATE INDEX idx_geogtest_geog
ON geogtest
USING gist
(geog);
INSERT INTO geogtest(geog)
SELECT ST_Buffer(geog,random()*10) As geog
FROM (SELECT ST_GeogFromText('POINT(' || i*0.5 || ' ' || j*0.5 || ')') As geog
FROM generate_series(-350,350) As i
CROSS JOIN generate_series(-175,175) As j
) As foo
LIMIT 1000;
vacuum analyze geogtest;
Then we create two versions of our function, one with STRICT and one without STRICT. The functions are otherwise
exactly the same.
CREATE OR REPLACE FUNCTION ST_IntersectsBlackBox(geography, geography)
RETURNS boolean AS
$$
SELECT $1 && $2 AND _ST_Distance($1, $2, 0.0, false) < 0.00001
$$
LANGUAGE 'sql' IMMUTABLE STRICT;
CREATE OR REPLACE FUNCTION ST_IntersectsTransparent(geography, geography)
RETURNS boolean AS
$$
SELECT $1 && $2 AND _ST_Distance($1, $2, 0.0, false) < 0.00001
$$
LANGUAGE 'sql' IMMUTABLE;
Then we test them out.
SELECT f.gid as gid1, count(f2.gid) As tot
FROM geogtest As f INNER JOIN geogtest As f2
ON ST_IntersectsBlackBox(f.geog, f2.geog)
GROUP BY f.gid;
QUERY PLAN
GroupAggregate (cost=0.00..348794.42 rows=1000 width=8) (actual time=6.850..6720.039 rows=1000 loops=1)
-> Nested Loop (cost=0.00..347115.25 rows=333333 width=8) (actual time=0.102..6717.928 rows=1000 loops=1)
Join Filter: st_intersectsblackbox(f.geog, f2.geog)
-> Index Scan using geogtest_pkey on geogtest f (cost=0.00..115.25 rows=1000 width=2308) (actual time=0.014..1.039 rows=1000 loops=1)
-> Seq Scan on geogtest f2 (cost=0.00..87.00 rows=1000 width=2308) (actual time=0.001..0.443 rows=1000 loops=1000)
Total runtime: 6720.376 ms
SELECT f.gid as gid1, count(f2.gid) As tot
FROM geogtest As f INNER JOIN geogtest As f2
ON ST_IntersectsTransparent(f.geog, f2.geog)
GROUP BY f.gid;
QUERY PLAN
HashAggregate (cost=970.52..983.02 rows=1000 width=8) (actual time=34.100..34.491 rows=1000 loops=1)
-> Nested Loop (cost=0.00..963.86 rows=1333 width=8) (actual time=0.205..33.154 rows=1000 loops=1)
Join Filter: (_st_distance(f.geog, f2.geog, 0::double precision, false) < 1e-005::double precision)
-> Seq Scan on geogtest f (cost=0.00..87.00 rows=1000 width=2308) (actual time=0.009..0.619 rows=1000 loops=1)
-> Index Scan using idx_geogtest_geog on geogtest f2 (cost=0.00..0.61rows=1 width=2308) (actual time=0.025..0.026 rows=1 loops=1000)
Index Cond: (f.geog && f2.geog)
Total runtime: 35.720 ms
Okay there is a lot you can see yada yada yada like for those of us who lack patience and let our eyes gaze yawningly only to be mesmerized by the colors and the lines of the PgAdmin diagram - in the PgAdmin graphical picture the big fat line leading to the Group Aggregate
is obviously fatter than the skinnier line leading to the Hash aggregate for the transparent one. The most important giveaway is that in the beautiful plan, we don't see the SQL function named anywhere. Its dissolved into two parts -- the spatial index condition and the more costly _ST_Distance..,
but in the BIG BLACK and UGLY there it is ST_IntersectsBlackBox like a bullet-proof impenetrable vest that the planner
just takes as is like a patient afraid of the surgeon's scalpel and hiding a big tumor screaming: I'm fine really, don't touch me. Nothing to see here.
Now there are certain conditions where you want your function to be a black box, like in cases when there is absolutely no way the planner
can optimize it with an index scan. There is no point in wasting the planner's time allowing it to inspect it. However in these cases,
we want the planner to say Ah yes I see you are using a construct that can be aided with this spatial index I have here. Let me take you apart and put you back together in a more efficient order.
I know a lot has been said about this beautiful value we affectionately call NULL, which is neither here nor there and that manages to catch many of us off guard with its casual neither here nor thereness. Database specialists who are really just back
Tracked: Jun 22, 04:14
Tracked: Jul 26, 01:09