A while ago we demonstrated how to create cross tabulation tables using tablefunc extension aka (Pivot Tables) (basically collapsing
rows into columns).
Recently someone asked me how to do the reverse (convert columns to rows). He found a solution to the problem here: http://stackoverflow.com/questions/1128737/unpivot-and-postgresql
using a combination of array and unnest. That approach is very similar to SQL Server's built-in Pivot SQL predicate.
The solution seemed
nice enough except similar to the SQL Server Unpivot, it required knowing the column names beforehand so very hard to genericize.
So would it be possible to accomplish this feat without knowing the columns names (except for the key) and be able to do it with one SQL statement. I realized that the
PostgreSQL hstore extension fit the bill nicely. In this article I'll demonstrate both approaches by creating a view
using both.
Installing Hstore
If you don't have hstore installed and are running PostgreSQL 9.1+, you can use the handy
CREATE EXTENSION hstore;
For lower versions, look for the hstore.sql file in share/contrib
and run in your database.
Load Our data
For this exercise, I'm going to grab data the Gazetteer zipcode tabulation areas
http://www.census.gov/geo/maps-data/data/gazetteer2010.html
First create table to hold the data:
CREATE TABLE zcta5 (zip char(5) primary key, pop10 integer, hu10 integer
, aland numeric(16,2), awater numeric(16,2)
, aland_sqmi numeric(16,2), awater_sqmi numeric(16,2)
, intptlong numeric(8,5), intptlat numeric(8,5));
Load the data with psql:
\cd /Gaz_zcta_national
\copy zcta5 FROM Gaz_zcta_national.txt DELIMITER E'\t' CSV HEADER
The Hstore unpivot view
We create a view that renders our table unpivoted:
CREATE VIEW vw_zcta_unpivot_hs
AS
SELECT zip, (h).key, (h).value As val
FROM (SELECT zip, each(hstore(foo) - 'zip'::text) As h
FROM zcta5 as foo ) As unpiv ;
Take it for a test drive by outputting just one zip
SELECT * FROM vw_zcta_unpivot_hs WHERE zip = '02109';
zip | key | val
-------+-------------+-----------
02109 | hu10 | 2462
02109 | aland | 449654.00
02109 | pop10 | 3771
02109 | awater | 292691.00
02109 | intptlat | -71.05063
02109 | intptlong | 42.36722
02109 | aland_sqmi | 0.17
02109 | awater_sqmi | 0.11
SELECT * from vw_zcta_unpivot_hs;
SELECT count(*) from vw_zcta_unpivot_hs;
The thing that is kind of cool about this approach is that if I add another column to my table, I don't
need to redo the view for it to output that new column as a row. To demonstrate
ALTER TABLE zcta5 ADD COLUMN hu_income10 numeric(12,2);
SELECT * FROM vw_zcta_unpivot_hs WHERE zip = '02109';
zip | key | val
-------+-------------+-----------
02109 | hu10 | 2462
02109 | aland | 449654.00
02109 | pop10 | 3771
02109 | awater | 292691.00
02109 | intptlat | -71.05063
02109 | intptlong | 42.36722
02109 | aland_sqmi | 0.17
02109 | awater_sqmi | 0.11
02109 | hu_income10 |
The array unpivot view
The second approach is to use array and unnest. As mentioned, the thing I didn't like about this
approach is that it's not dynamic - you have to itemize the columns. It also on the downside requires you to cast all the column
values before hand if they are not all the same type. It does have a couple of benefits:
- It uses built-in types so no need to install an extension though it requires PostgreSQL 8.4+ because of unnest
- You can grab just a subset very efficiently and if you only care about numeric columns would probably be best.
- It's generally faster.
To test performance I ran this:
Create the view
CREATE OR REPLACE VIEW vw_zcta_unpivot_ary
AS
SELECT zip
, unnest(
array['hu10', 'aland', 'pop10'
, 'awater', 'intptlat', 'intptlong', 'aland_sqmi', 'awater_sqmi']
) AS key
, unnest(
array[hu10::text, aland::text, pop10::text
, awater::text, intptlat::text, intptlong::text
, aland_sqmi::text, awater_sqmi::text]
) AS val
FROM zcta5;
Test some queries
The single record one takes same time and uses index but shorter plan:
--where filter 10 ms --
SELECT *
FROM vw_zcta_unpivot_ary
WHERE zip = '02109';
Output the whole view
SELECT * from vw_zcta_unpivot_ary;
SELECT count(*) FROM vw_zcta_unpivot_ary;
So there you have it, two very different approaches for accomplishing the same task.
Tracked: Jan 06, 07:01