People have asked us how to calculate running totals a number of times; not a lot but enough that we feel we should document the general technique.
This approach is fairly ANSI-SQL standard and involves using SELF JOINS. In a later article we shall describe how to calculate moving averages which
follows a similar technique but with some extra twists.
Note that the below examples can also be done with a correlated sub-select in the SELECT clause and in some cases that sometimes works better. Perhaps
we shall show that approach in a later issue.
We tend to prefer the look of the SELF JOIN though and in practice it is generally more efficient since its easier for planners to optimize and doesn't always result in a nested loop strategy.
Just feels a little cleaner and if you are totaling a lot of columns (e.g number of items, products) etc,
much more efficient.
Question 1: Calculate running total for a customer by order but don't include in the total the current order amount?
Solution 1: This is one of the cases where the use of a SELF JOIN comes in handy. For this particular example we shall assume
we have a table of orders and for each order, we would like to know for that given customer the total price of goods they have purchased prior to date of order.
For sake of argument we shall assume the order_datetime has full timestamp of order so it is a fairly rare or non-existent situation that a customer will have
2 orders with the same timestamp.
SELECT n.customer_id, n.order_id, n.order_total,
SUM(o.order_total) As running_total
FROM orders n LEFT JOIN orders o
ON (o.customer_id = n.customer_id
AND n.order_datetime > o.order_datetime)
GROUP BY n.customer_id, n.order_datetime, n.order_id
ORDER BY n.customer_id, n.order_datetime, n.order_id;
Question 2: How to calculate running total for each day?
In this case we want to know the total profit of the company for each day and running total for each day including current day. In this case we can use the more efficient
INNER JOIN since we know that the prior and including current will have the current order date as well.
Its debateable if we need the ORDER BY in the subselect. That is mostly there to try to force the planner to materialize the subselect which would tend to be faster. The below
query should work fine in MySQL 5 and above as well.
NOTE: if you tried
such a thing in Microsoft SQL Server 2005 and below the below would not work for 2 reasons.
- SQL Server 2005 and lower does not like order bys in subselects so you will need to remove that OR use the (SELECT TOP 100 PERCENT * FROM orders ORDER BY order_datetime) hack. Anyrate forcing an
order would probably not change the plan in SQL Server 2005. Haven't tried on beta of SQL Server 2008
- SQL Server 2005 and below have no Date data type. Date and Datetime with Timestamp were introduced in SQL Server 2008 so the CAST part will probably work in SQL Server 2008. In prior versions
there is some messy code you have to write involving subtraction of time. This is a PostgreSQL blog so we will not go into that. But here is a good link that covers that messiness - http://weblogs.sqlteam.com/jeffs/archive/2007/01/02/56079.aspx.
To give SQL Server 2005 and above due credit - this would be done more efficiently using the windowing functions introduced in SQL Server 2005 and above and so would the above example.
Solution 2:
SELECT n.order_date, n.order_total, SUM(o.order_total) As running_total
FROM (SELECT CAST(order_datetime As date) As order_date,
SUM(order_total) As order_total
FROM orders
GROUP BY CAST(order_datetime As date)
ORDER BY CAST(order_datetime As date)) n INNER JOIN
(SELECT CAST(order_datetime As date) As order_date,
SUM(order_total) As order_total
FROM orders
GROUP BY CAST(order_datetime As date)
ORDER BY CAST(order_datetime As date)) o
ON (n.order_date >= o.order_date)
GROUP BY n.order_date
ORDER BY n.order_date;
One thing that is pretty neat about windowing functions in PostgreSQL 8.4 aside from built-in windowing functions (row_number(), rank(), lead(), lag(), dense_rank(), percent_rank(), cume_dist(), first_value, last_value, nth_value) as documented in the man
Tracked: May 27, 17:01
Tracked: Jul 14, 15:00