People have asked us how to calculate running totals a number of times; not a lot but enough that we feel we should document the general technique. This approach is fairly ANSI-SQL standard and involves using SELF JOINS. In a later article we shall describe how to calculate moving averages which follows a similar technique but with some extra twists.
Note that the below examples can also be done with a correlated sub-select in the SELECT clause and in some cases that sometimes works better. Perhaps we shall show that approach in a later issue. We tend to prefer the look of the SELF JOIN though and in practice it is generally more efficient since its easier for planners to optimize and doesn't always result in a nested loop strategy. Just feels a little cleaner and if you are totaling a lot of columns (e.g number of items, products) etc, much more efficient.
Question 1: Calculate running total for a customer by order but don't include in the total the current order amount?This is one of the cases where the use of a SELF JOIN comes in handy. For this particular example we shall assume we have a table of orders and for each order, we would like to know for that given customer the total price of goods they have purchased prior to date of order. For sake of argument we shall assume the order_datetime has full timestamp of order so it is a fairly rare or non-existent situation that a customer will have 2 orders with the same timestamp.
SELECT n.customer_id, n.order_id, n.order_total,
SUM(o.order_total) As running_total
FROM orders n LEFT JOIN orders o
ON (o.customer_id = n.customer_id
AND n.order_datetime > o.order_datetime)
GROUP BY n.customer_id, n.order_datetime, n.order_id
ORDER BY n.customer_id, n.order_datetime, n.order_id;
In this case we want to know the total profit of the company for each day and running total for each day including current day. In this case we can use the more efficient INNER JOIN since we know that the prior and including current will have the current order date as well. Its debateable if we need the ORDER BY in the subselect. That is mostly there to try to force the planner to materialize the subselect which would tend to be faster. The below query should work fine in MySQL 5 and above as well.
NOTE: if you tried such a thing in Microsoft SQL Server 2005 and below the below would not work for 2 reasons.
To give SQL Server 2005 and above due credit - this would be done more efficiently using the windowing functions introduced in SQL Server 2005 and above and so would the above example.
Solution 2:
SELECT n.order_date, n.order_total, SUM(o.order_total) As running_total
FROM (SELECT CAST(order_datetime As date) As order_date,
SUM(order_total) As order_total
FROM orders
GROUP BY CAST(order_datetime As date)
ORDER BY CAST(order_datetime As date)) n INNER JOIN
(SELECT CAST(order_datetime As date) As order_date,
SUM(order_total) As order_total
FROM orders
GROUP BY CAST(order_datetime As date)
ORDER BY CAST(order_datetime As date)) o
ON (n.order_date >= o.order_date)
GROUP BY n.order_date
ORDER BY n.order_date;