CASE with set returning functions in PostgreSQL 10

One of the changes coming in PostgreSQL 10 is the ability for the CASE .. WHEN statement to return multiple rows if the expression contains a set returning function. To demonstrate the feature, we'll use the classic generate_series function:

Since this article was written, this feature has been removed, because it caused old logic that had sets from doing something different without warning. Refer to this discussion for details.

generate_series with CASE WHEN

SELECT CASE generate_series(1,10) WHEN 1 THEN 'First' ELSE 'Not First' END;

In PostgreSQL 10, this returns:

   case
-----------
 First
 Not First
 Not First
 Not First
 Not First
 Not First
 Not First
 Not First
 Not First
 Not First
(10 rows)

In PostgreSQL 9.6 and below, you are slapped with an error message:

ERROR:  set-valued function called in context that cannot accept a set

Interesting, right, but how is this useful? Why would I ever do that instead of something like this which will work in all versions?

SELECT CASE i WHEN 1 THEN 'First' ELSE 'Not First' END
FROM generate_series(1,10) AS i;

The answer seems reading from the commit notes, that it's a side effect of other optimizations and introduction of new ProjectSet executor node.

As a side effect, the previously prohibited case of multiple set returning arguments to a function, is now allowed. Not because it's particularly desirable, but because it ends up working and there seems to be no argument for adding code to prohibit it.

Currently the behavior for COALESCE and CASE containing SRFs has changed, returning multiple rows from the expression, even when the SRF containing "arm" of the expression is not evaluated. That's because the SRFs are evaluated in a separate ProjectSet node. As that's quite confusing, we're likely to instead prohibit SRFs in those places. But that's still being discussed, and the code would reside in places not touched here, so that's a task for later.

It should be noted though that, the new approach if you wanted to do something kinda crazy like that does seem to be faster than the standard approach.

Here is a quick test I did to compare the results and timing:

New feature way:

SELECT count(*)
FROM (SELECT CASE generate_series(1,10000000) WHEN 1 THEN 'First' ELSE 'Not First' END) AS f;

Uses this new thing called a ProjectSet executor and takes ~1 second

Aggregate  (cost=32.52..32.53 rows=1 width=8) (actual time=1036.048..1036.048 rows=1 loops=1)
  Output: count(*)
  ->  Result  (cost=0.00..20.02 rows=1000 width=32) (actual time=0.004..722.375 rows=10000000 loops=1)
        Output: CASE (generate_series(1, 10000000)) WHEN 1 THEN 'First'::text ELSE 'Not First'::text END
        ->  ProjectSet  (cost=0.00..5.02 rows=1000 width=4) (actual time=0.002..329.822 rows=10000000 loops=1)
              Output: generate_series(1, 10000000)
              ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
Planning time: 0.034 ms
Execution time: 1036.082 ms

Good old backwards compatible way:

SELECT count(*)
   FROM (SELECT CASE i WHEN 1 THEN 'First' ELSE 'Not First' END
             FROM generate_series(1,10000000) AS i) AS f;

Aggregate  (cost=12.50..12.51 rows=1 width=8) 
	(actual time=1800.258..1800.259 rows=1 loops=1)
  Output: count(*)
  ->  Function Scan on pg_catalog.generate_series i  (cost=0.00..10.00 rows=1000 width=0) 
  		(actual time=863.841..1469.087 rows=10000000 loops=1)
        Output: i.i
        Function Call: generate_series(1, 10000000)
Planning time: 0.032 ms
Execution time: 1818.415 ms

Wow that's much slower.

One thing I love about PostgreSQL that you miss with closed-source and even with other open source projects, is that when you are curious about a new feature or why it's there there is plenty of dialogue on the hacker list and commit logs to keep you entertained.