One of the very handy features introduced in PostgreSQL 8.4 is the new aggregate function called array_agg which is a companion function to the unnest function we discussed earlier. This
takes a set of elements similar to what COUNT, SUM etc do and builds an array out of them. This approach is faster than the old used array_append , array_accum since it does not rebuild the array on each iteration.
Sadly it does not appear to be completely swappable with array_append as there does not seem to be a mechanism to use it to build your own custom aggregate functions that need to maintain the set of objects flowing thru the aggregate without venturing into C land. This we tried to do
in our median example but were unsuccessful.
In PostGIS 1.4 Paul borrowed some of this array_agg logic to make the
PostGIS spatial aggregates much much faster with large numbers of geometries. So collecting polygons or making a line out of say 30,000 geometries which normally would have taken 2 minutes or more (just accumulating), got reduced to under 10 seconds in many cases.
That did require C code even when installed against PostgreSQL 8.4. Though in PostGIS you reap the benefits as far as geometries go even
if you are running lower than 8.4.
We had originally thought array_agg was a PostgreSQL only creation, but it turns out that array_agg is a function defined in the ANSI SQL:2008 specs and for one appears to exist in IBM DB2 as well. I don't think
Oracle or any other database supports it as of yet.
As we had demonstrated in the other article, we shall demonstrate the olden days and what array_agg brings to the table to make your life easier.
In the olden days prior to 8.4, when you wanted to get the list of employees that reports to a supervisor as an array and also the salaries
of these people as an array so you could pass it to a funky R-statistical function to do interesting calculations with it, you would have had one of two choices.
- Create an aggregate function for said R-statistical function, which would be slow if you had a lot of numbers
- Or do an ugly ARRAY subselect thing
As mentioned the aggregate solution is personally nicer and elegant if its a function you call often, but was slow
when accumulating large numbers of records in each grouping. Below is a very contrived example that uses our favorite system catalog
to demonstrate its use and we are doing it because we are too lazy to create dummy data and this query will work in all databases. In the below for each table in our database we are getting two sets of arrays -- one that has the columns in the table and
another that has the distinct types of columns in each table.
--The olden days
--Note this is not a terribly interesting example since an array_accum
-- built with array_append would work fine here as the arrays don't get that big for tables
--but just imagine we had something that had 10,000 elements in each grouping
SELECT t.table_schema, t.table_name,
ARRAY(SELECT CAST(c.column_name AS text)
FROM information_schema.columns AS c
WHERE c.table_schema = t.table_schema AND c.table_name = t.table_name) as col_names,
ARRAY(SELECT DISTINCT CAST(c.data_type AS text)
FROM information_schema.columns AS c
WHERE c.table_schema = t.table_schema
AND c.table_name = t.table_name) As dat_types
FROM information_schema.tables AS t
GROUP BY t.table_schema, t.table_name;
--The 8.4 way
SELECT c.table_schema, c.table_name,
array_agg(CAST(c.column_name AS text)) as col_names,
array_agg(DISTINCT CAST(c.data_type AS text)) As dat_types
FROM information_schema.columns AS c
GROUP BY c.table_schema, c.table_name;
Welcome to the 154th edition of Log Buffer, the weekly review of database blogs. Let’s dive right in, shall we? Oracle On Radio Free Tooting, Andrew Clarke says, “No SQL, so what?” taking as his keynote something Nuno Souto said: ...
Tracked: Jul 17, 14:58