Simulating Row Number in PostgreSQL Pre 8.4

PostgreSQL 8.4 will have a ROW_NUMBER() windowing function so this little hack will hopefully be unnecessary when 8.4 is in production.

Getting back to this exercise, this was actually inspired by Hubert's recent article Tips N’ Tricks - setting field based on order. Why this inspired me, I guess because it stirred up memories about the often forgotten utility of arrays in PostgreSQL and I thought it would answer a question that was haunting me - How do I assign sequential numbers to a list?. The article just didn't quite read the way I expected it to and actually was answering another question I cared much less about, but it did get the juices flowing. So without much further ado.

Problem:

You have data that looks like this:

name_id		last_name		first_name
jjones		Jones			John
psmith		Smith			Paul
wgates		Gates			William		
wgrant		Grant			Wallace

And someone asks you to output it like this:

row_number	name_id			last_name		first_name
1		wgates			Gates			William	
2		wgrant			Grant			Wallace
3		jjones			Jones			John
4		psmith			Smith			Paul


--CREATE test data
CREATE TABLE people(name_id varchar(50) PRIMARY KEY, last_name varchar(50), 
    first_name varchar(50));
INSERT INTO people(name_id, last_name, first_name)
(VALUES ('jjones', 'Jones', 'John'), 
('psmith', 'Smith', 'Paul'), 
('wgates', 'Gates', 'William')
    , ('wgrant', 'Grant', 'Wallace'));

Solutions: Here we propose 3 solutions and which you choose is based on preference, use-case and experience.


--Approach 1: The all in one WTF
--  Advantage: 1 step and no temp junk created
--  so safe to nest into other subqueries and reuse
--- Disadvantage: Messy to read, 
--      could be slow for large sets, 
--      and relies on a primary key
SELECT row_number, oldtable.*
FROM (SELECT * FROM people) As oldtable
CROSS JOIN (SELECT ARRAY(SELECT name_id  
        FROM people 
        ORDER BY last_name, first_name) As id)  AS oldids
CROSS JOIN generate_series(1, (SELECT COUNT(*) 
                    FROM people)) AS row_number
WHERE oldids.id[row_number] =  oldtable.name_id
ORDER BY row_number;

--Approach 2: Closer to Hubert's Examples
--Advantage: Easy to read - intuitive, doesn't rely on a primary key
--Disadvantage: Creates temp junk in the db 
--      which means reusing in same session you must drop
--  and using in nested subquery results may be unpredictable
--I prefer the WTF 
--I don't know what it is about explicitly creating stuff in the 
--database even if it lasts for that one session
-- just irritates me - granted its much simpler and probably faster 
CREATE TEMP sequence temp_seq;
SELECT nextval('temp_seq') As row_number, oldtable.*
FROM (SELECT * FROM people ORDER BY last_name,first_name) As oldtable;

--Approach 3:  This will work I think even in MySQL :)
--Advantage: Cross Platform - yeh
--Disadvantage:  Potentially slow correlated subquery for large datasets  
--  and gets somewhat messy the more orderings you tack on
-- relies on your order by producing unique results
SELECT (SELECT COUNT(*) FROM people 
    WHERE 
    (COALESCE(people.last_name,'') || COALESCE(people.first_name,'')) <= 
    (COALESCE(oldtable.last_name,'') 
    || COALESCE(oldtable.first_name,''))) As row_number, 
    oldtable.*
FROM (SELECT * 
    FROM people 
    ORDER BY 
    last_name, first_name) As oldtable;