A while ago we demonstrated how to create cross tabulation tables using tablefunc extension aka (Pivot Tables) (basically collapsing rows into columns). Recently someone asked me how to do the reverse (convert columns to rows). He found a solution to the problem here: http://stackoverflow.com/questions/1128737/unpivot-and-postgresql using a combination of array and unnest. That approach is very similar to SQL Server's built-in Pivot SQL predicate. The solution seemed nice enough except similar to the SQL Server Unpivot, it required knowing the column names beforehand so very hard to genericize. So would it be possible to accomplish this feat without knowing the columns names (except for the key) and be able to do it with one SQL statement. I realized that the PostgreSQL hstore extension fit the bill nicely. In this article I'll demonstrate both approaches by creating a view using both.
If you don't have hstore installed and are running PostgreSQL 9.1+, you can use the handy
CREATE EXTENSION hstore;
For lower versions, look for the hstore.sql file in share/contrib
and run in your database.
For this exercise, I'm going to grab data the Gazetteer zipcode tabulation areas http://www.census.gov/geo/maps-data/data/gazetteer2010.html
First create table to hold the data:
CREATE TABLE zcta5 (zip char(5) primary key, pop10 integer, hu10 integer
, aland numeric(16,2), awater numeric(16,2)
, aland_sqmi numeric(16,2), awater_sqmi numeric(16,2)
, intptlong numeric(8,5), intptlat numeric(8,5));
Load the data with psql:
--in psql
\cd /Gaz_zcta_national
\copy zcta5 FROM Gaz_zcta_national.txt DELIMITER E'\t' CSV HEADER
We create a view that renders our table unpivoted:
CREATE VIEW vw_zcta_unpivot_hs
AS
SELECT zip, (h).key, (h).value As val
FROM (SELECT zip, each(hstore(foo) - 'zip'::text) As h
FROM zcta5 as foo ) As unpiv ;
Take it for a test drive by outputting just one zip
--10 ms uses index too
-- returns 8 rows
SELECT * FROM vw_zcta_unpivot_hs WHERE zip = '02109';
zip | key | val
-------+-------------+-----------
02109 | hu10 | 2462
02109 | aland | 449654.00
02109 | pop10 | 3771
02109 | awater | 292691.00
02109 | intptlat | -71.05063
02109 | intptlong | 42.36722
02109 | aland_sqmi | 0.17
02109 | awater_sqmi | 0.11
--output the whole thing 3,360ms (264960 records) --SELECT * from vw_zcta_unpivot_hs;
-- no network effects-- -- 290msSELECT count(*) from vw_zcta_unpivot_hs;
The thing that is kind of cool about this approach is that if I add another column to my table, I don't need to redo the view for it to output that new column as a row. To demonstrate
ALTER TABLE zcta5 ADD COLUMN hu_income10 numeric(12,2);
SELECT * FROM vw_zcta_unpivot_hs WHERE zip = '02109';
zip | key | val
-------+-------------+-----------
02109 | hu10 | 2462
02109 | aland | 449654.00
02109 | pop10 | 3771
02109 | awater | 292691.00
02109 | intptlat | -71.05063
02109 | intptlong | 42.36722
02109 | aland_sqmi | 0.17
02109 | awater_sqmi | 0.11
02109 | hu_income10 |
The second approach is to use array and unnest. As mentioned, the thing I didn't like about this approach is that it's not dynamic - you have to itemize the columns. It also on the downside requires you to cast all the column values before hand if they are not all the same type. It does have a couple of benefits:
To test performance I ran this:
CREATE OR REPLACE VIEW vw_zcta_unpivot_ary
AS
SELECT zip
, unnest(
array['hu10', 'aland', 'pop10'
, 'awater', 'intptlat', 'intptlong', 'aland_sqmi', 'awater_sqmi']
) AS key
, unnest(
array[hu10::text, aland::text, pop10::text
, awater::text, intptlat::text, intptlong::text
, aland_sqmi::text, awater_sqmi::text]
) AS val
FROM zcta5;
The single record one takes same time and uses index but shorter plan:
--where filter 10 ms --
SELECT *
FROM vw_zcta_unpivot_ary
WHERE zip = '02109';
-- network effects --
-- 3120 ms (264960 records)
SELECT * from vw_zcta_unpivot_ary;
-- no network effects --
-- 200 ms
SELECT count(*) FROM vw_zcta_unpivot_ary;
So there you have it, two very different approaches for accomplishing the same task.