Common table expressions are perhaps our favorite feature in PostgreSQL 8.4 even more so than windowing functions. Strangely enough I find myself using them more in SQL Server too now that PostgreSQL supports it.
CTEs are not only nice syntactic sugar, but they also produce better more efficient queries. To our knowledge only Firebird (see note below), PostgreSQL,SQL Server, and IBM DB2 support this, though I heard somewhere
that Oracle does too or is planning too UPDATE: As noted below Oracle as of version 9 supports non-recursive CTEs. For recursion you need to use the Oracle proprietary corresponding by syntax.
As far as CTEs go, the syntax between PostgreSQL, SQL Server 2005/2008, IBM DB2 and Firebird
is pretty much the same when not using recursive queries. When using recursive queries, PostgreSQL and Firebird use WITH RECURSIVE to denote a recursive CTE where as SQL Server and IBM DB2 its just WITH.
All 4 databases allow you to have multiple table expressions within one WITH clause anda RECURSIVE CTE expression can have both recursive and non-recursive CTEs. This makes writing complex queries especially where you have the same expressions used multiple times in the query,
a lot easier to debug and also more performant.
In our article on How to force PostgreSQL to use a pre-calculated value
we talked about techniques for forcing PostgreSQL to cache a highly costly function. For PostgreSQL 8.3 and below, the winning solution was using OFFSET which is not terribly cross platform and has the disadvantage of
materializing the subselect. David Fetter had suggested
for 8.4, why not try CTEs. Yes CTEs not only are syntactically nice, more portable, but they help you write more efficient queries. To demonstrate, we shall repeat the same exercise we did in that
article, but using CTEs instead.
We first start by creating our super slow function.
CREATE OR REPLACE FUNCTION fn_very_slow(IN param_sleepsecs numeric) RETURNS numeric AS
$$
BEGIN
PERFORM pg_sleep(param_sleepsecs);
RETURN param_sleepsecs;
END;
$$
LANGUAGE 'plpgsql' IMMUTABLE STRICT ;
--runs in 4524 ms
SELECT fn_very_slow(i*0.5) As firstcall
FROM generate_series(1,5,2) As i;
--runs in 9032 ms - no cache, but in spatial functions (say ST_Distance)
-- we have tried this does sometimes cache and return in 4524ms
SELECT fn_very_slow(i*0.5) As firstcall,
fn_very_slow(i*0.5) As secondcallsame
FROM generate_series(1,5,2) As i;
--runs in 9032 ms - no cache
SELECT firstcall,
firstcall + 1 As secondcalldifferent
FROM (SELECT fn_very_slow(i*0.5) As firstcall
FROM generate_series(1,5,2) As i
) As foo;
--the olden days hack
--runs in 4524 ms - caches
SELECT firstcall,
firstcall + 1 As secondcalldifferent
FROM (SELECT fn_very_slow(i*0.5) As firstcall
FROM generate_series(1,5,2) As i
ORDER BY 1 OFFSET 0) As foo;
--The world with 8.4 CTEs
--runs in 4524 ms
WITH
ctefn AS (
SELECT fn_very_slow(i*0.5) As slow_calc
FROM generate_series(1,5,2) As i)
SELECT slow_calc As firstcall,
slow_calc + 1 As secondcallsame
FROM ctefn;
The beauty of even non-recursive CTEs really shows itself when you start doing complex self joins or using the same complex subselect in multiple locations of a bigger query. Observe:
--doing self joins the olden days
--the olden days hack
--runs in 9012 ms - caches
SELECT foo1.i As foo1_i, foo2.i As foo2_i, foo1.slow_calc As foo1_call,
foo2.slow_calc + 1 As foo_2_call
FROM (SELECT fn_very_slow(i*0.5) As slow_calc, i
FROM generate_series(1,5,2) As i
ORDER BY 1 OFFSET 0) As foo1 LEFT JOIN
(SELECT fn_very_slow(i*0.5) As slow_calc, i
FROM generate_series(1,5,2) As i
ORDER BY 1 OFFSET 0) As foo2 ON foo1.i >= foo2.i;
--doing self join in 8.4 with CTES
-- runs in 4512 ms
WITH
ctefn AS (
SELECT fn_very_slow(i*0.5) As slow_calc, i
FROM generate_series(1,5,2) As i)
SELECT foo1.i As foo1_i, foo2.i As foo2_i, foo1.slow_calc As foo1_call,
foo2.slow_calc + 1 As foo_2_call
FROM ctefn As foo1 LEFT JOIN ctefn AS foo2 ON foo1.i >= foo2.i;
Tracked: Jul 12, 22:28