PostgreSQL 8.4 Faster array building with array_agg

Monday, July 13. 2009

PostgreSQL 8.4 Faster array building with array_agg

One of the very handy features introduced in PostgreSQL 8.4 is the new aggregate function called array_agg which is a companion function to the unnest function we discussed earlier. This takes a set of elements similar to what COUNT, SUM etc do and builds an array out of them. This approach is faster than the old used array_append , array_accum since it does not rebuild the array on each iteration.

Sadly it does not appear to be completely swappable with array_append as there does not seem to be a mechanism to use it to build your own custom aggregate functions that need to maintain the set of objects flowing thru the aggregate without venturing into C land. This we tried to do in our median example but were unsuccessful.

In PostGIS 1.4 Paul borrowed some of this array_agg logic to make the PostGIS spatial aggregates much much faster with large numbers of geometries. So collecting polygons or making a line out of say 30,000 geometries which normally would have taken 2 minutes or more (just accumulating), got reduced to under 10 seconds in many cases. That did require C code even when installed against PostgreSQL 8.4. Though in PostGIS you reap the benefits as far as geometries go even if you are running lower than 8.4.

We had originally thought array_agg was a PostgreSQL only creation, but it turns out that array_agg is a function defined in the ANSI SQL:2008 specs and for one appears to exist in IBM DB2 as well. I don't think Oracle or any other database supports it as of yet.

As we had demonstrated in the other article, we shall demonstrate the olden days and what array_agg brings to the table to make your life easier.

In the olden days prior to 8.4, when you wanted to get the list of employees that reports to a supervisor as an array and also the salaries of these people as an array so you could pass it to a funky R-statistical function to do interesting calculations with it, you would have had one of two choices.

Create an aggregate function for said R-statistical function, which would be slow if you had a lot of numbers
Or do an ugly ARRAY subselect thing

As mentioned the aggregate solution is personally nicer and elegant if its a function you call often, but was slow when accumulating large numbers of records in each grouping. Below is a very contrived example that uses our favorite system catalog to demonstrate its use and we are doing it because we are too lazy to create dummy data and this query will work in all databases. In the below for each table in our database we are getting two sets of arrays -- one that has the columns in the table and another that has the distinct types of columns in each table.


--The olden days
--Note this is not a terribly interesting example since an array_accum 
-- built with array_append would work fine here as the arrays don't get that big for tables
--but just imagine we had something that had 10,000 elements in each grouping
SELECT t.table_schema, t.table_name, 
 ARRAY(SELECT CAST(c.column_name AS text) 
        FROM information_schema.columns AS c
    WHERE c.table_schema = t.table_schema AND c.table_name = t.table_name) as col_names,
  ARRAY(SELECT DISTINCT CAST(c.data_type AS text) 
            FROM information_schema.columns AS c
        WHERE c.table_schema = t.table_schema 
        AND c.table_name = t.table_name) As dat_types
 FROM information_schema.tables AS t
GROUP BY t.table_schema, t.table_name;


--The 8.4 way
SELECT c.table_schema, c.table_name, 
 array_agg(CAST(c.column_name AS text)) as col_names,
 array_agg(DISTINCT CAST(c.data_type AS text)) As dat_types
 FROM information_schema.columns AS c
GROUP BY c.table_schema, c.table_name;

Posted by Leo Hsu and Regina Obe in 8.4, basics, db2, intermediate, postgis at 22:55 | Comments (2) | Trackback (1)

Trackbacks

Trackback specific URI for this entry

Log Buffer #154: a Carnival of the Vanities for DBAs
Welcome to the 154th edition of Log Buffer, the weekly review of database blogs. Let’s dive right in, shall we? Oracle On Radio Free Tooting, Andrew Clarke says, “No SQL, so what?” taking as his keynote something Nuno Souto said: ...

Weblog: Pythian Group Blog
Tracked: Jul 17, 14:58

Comments

Display comments as (Linear | Threaded)

The "olden days" example gives an error:

ERROR: missing FROM-clause entry for table "c"
LINE 1: SELECT c.table_schema, c.table_name,

I think you just need to "SELECT table_schema, table_name ...".

#1 Joe on 2009-07-14 12:16 (Reply)

Thanks. Corrected. I guess I never learned the lesson to never change an example without retesting it.

#1.1 Regina on 2009-07-14 12:31 (Reply)

Add Comment

Name
Email
Homepage
In reply to
Comment	E-Mail addresses will not be displayed and will only be used for E-Mail notifications. To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly. Enter the string from the spam-prevention image above: Phone* What is eight minus seven?
	Remember Information? Subscribe to this entry

PostgreSQL 8.4 Faster array building with array_agg

Postgres OnLine Journal

PostGIS in Action About the Authors Consulting

Monday, July 13. 2009

PostgreSQL 8.4 Faster array building with array_agg

Entry's Links

Quicksearch

Calendar

Categories

Archives

Subscribe

Blog Administration

PostgreSQL 8.4 Faster array building with array_agg

Postgres OnLine Journal PostGIS in Action About the Authors Consulting

Monday, July 13. 2009

PostgreSQL 8.4 Faster array building with array_agg

Entry's Links

Quicksearch

Calendar

Categories

Archives

Subscribe

Blog Administration

Postgres OnLine Journal

PostGIS in Action About the Authors Consulting