PostgreSQL 8.4 will have a ROW_NUMBER() windowing function so this little hack will hopefully be unnecessary when 8.4 is in production.
Getting back to this exercise, this was actually inspired by Hubert's recent article Tips N’ Tricks - setting field based on order. Why this inspired me, I guess because it stirred up memories about the often forgotten utility of arrays in PostgreSQL and I thought it would answer a question that was haunting me - How do I assign sequential numbers to a list?. The article just didn't quite read the way I expected it to and actually was answering another question I cared much less about, but it did get the juices flowing. So without much further ado.
Problem:
You have data that looks like this:
name_id last_name first_name jjones Jones John psmith Smith Paul wgates Gates William wgrant Grant Wallace
row_number name_id last_name first_name 1 wgates Gates William 2 wgrant Grant Wallace 3 jjones Jones John 4 psmith Smith Paul
--CREATE test data
CREATE TABLE people(name_id varchar(50) PRIMARY KEY, last_name varchar(50),
first_name varchar(50));
INSERT INTO people(name_id, last_name, first_name)
(VALUES ('jjones', 'Jones', 'John'),
('psmith', 'Smith', 'Paul'),
('wgates', 'Gates', 'William')
, ('wgrant', 'Grant', 'Wallace'));
Solutions: Here we propose 3 solutions and which you choose is based on preference, use-case and experience.
--Approach 1: The all in one WTF
-- Advantage: 1 step and no temp junk created
-- so safe to nest into other subqueries and reuse
--- Disadvantage: Messy to read,
-- could be slow for large sets,
-- and relies on a primary key
SELECT row_number, oldtable.*
FROM (SELECT * FROM people) As oldtable
CROSS JOIN (SELECT ARRAY(SELECT name_id
FROM people
ORDER BY last_name, first_name) As id) AS oldids
CROSS JOIN generate_series(1, (SELECT COUNT(*)
FROM people)) AS row_number
WHERE oldids.id[row_number] = oldtable.name_id
ORDER BY row_number;
--Approach 2: Closer to Hubert's Examples
--Advantage: Easy to read - intuitive, doesn't rely on a primary key
--Disadvantage: Creates temp junk in the db
-- which means reusing in same session you must drop
-- and using in nested subquery results may be unpredictable
--I prefer the WTF
--I don't know what it is about explicitly creating stuff in the
--database even if it lasts for that one session
-- just irritates me - granted its much simpler and probably faster
CREATE TEMP sequence temp_seq;
SELECT nextval('temp_seq') As row_number, oldtable.*
FROM (SELECT * FROM people ORDER BY last_name,first_name) As oldtable;
--Approach 3: This will work I think even in MySQL :)
--Advantage: Cross Platform - yeh
--Disadvantage: Potentially slow correlated subquery for large datasets
-- and gets somewhat messy the more orderings you tack on
-- relies on your order by producing unique results
SELECT (SELECT COUNT(*) FROM people
WHERE
(COALESCE(people.last_name,'') || COALESCE(people.first_name,'')) <=
(COALESCE(oldtable.last_name,'')
|| COALESCE(oldtable.first_name,''))) As row_number,
oldtable.*
FROM (SELECT *
FROM people
ORDER BY
last_name, first_name) As oldtable;