SQL Idiom Design Patterns

Programming Design Patterns define recommended approaches of solving common application problems. Within design patterns is a subset of design patterns called Idioms. Idioms you can think of as a strategy for expressing recurring constructs or if you will sub-problems and often take advantage of the special features of a language. They tend to be specific to a programming language and can not be reused in other languages they were not specifically designed for. To demonstrate the differences lets compare two design patterns we commonly use.

Macro Design Pattern

Design Pattern:

For core pieces of your application try to create a buffer between those components you can change and those you can't.

When developing applications, we often use industry standard abstraction layers and add our own abstraction layers on top of them. For example when building .NET applications, our pages never inherit directly from System.Web.UI.Page, but instead from an intermediary class we have control over.

The mind set behind this is that an industry standard abstraction layer is one that has taken into consideration all sorts of optimization concerns, been thoroughly tested by mass numbers of users. It in essense has a lot of good black box stuff we don't want to reinvent. Since it is one used by the industry, you have little control over its direction and it may not completely handle your particular use-case. The easiest way to customize it for your specific needs is to put your own abstraction wrapper around it and make it a rule never to call the inner abstraction. So we start off just putting a wrapper around the parts we plan to use that do nothing but wrap. We may then decide later on we need the wrapper to do more. The buffer also allows us to rip out the inner layer and replace with a more efficient one.

Case in point - Lets say you are trying to bullet-proof your application against SQL Injection attacks, if all SQL goes thru your abstraction layer - you can put a validation check around it that verifies the SQL conforms to the unique policies of your application before it gets passed to the sub abstraction layer. You can also do all sorts of interesting things like create your own dialect of SQL or translate Transact-SQL dialect to PSQL that then gets fed to your final database abstraction layer.

Micro Design Pattern: AKA the idiom

The way in which different languages choose to navigate and roll data constructs is different. Procedural languages use while loops, for loops and so forth. This is such a common thing to do for Relational Databases that SQL has constructs INNER JOIN, LEFT JOIN, CROSS JOIN, WHERE, GROUP BY etc. So something that seems so seemingly procedural in nature is completely abstracted away in SQL idioms.

A person used to summarizing data with :


orders.sort('salesperson,customerid')
for each order in orders
	if order.salesperson != cursalesperson then
		if cursalesperson > '' then
			tallystats[] = array(curtotal, custcount)
		end if
		curtotal = orders.total
		cursalesperson = order.salesperson
		curcustomer = order.customerid
		custcount = 1
	else
		curtotal += orders.total
		if curcustomer != order.customerid then
			custcount += 1
			curcustomer = order.customerid
		end if
	end if
loop
if cursalesperson > '' then
	tallystats[] = array(curtotal, custcount)
end if

	
May be puzzled to see the SQL idiom equivalent of
SELECT salesperson, SUM(total) As salestotal, 
		COUNT(DISTINCT customerid) As custcount
	FROM orders
	GROUP BY  salesperson
	ORDER BY salesperson
	

As you demand solutions for more complicated versions of these questions, you quickly realize the benefit of these SQL idioms. At the end of the day we choose our languages because of the ecosystem built around them (e.g. what platforms they can run on, tools to build with, libraries to use) and fitness of the idioms it provides to solve our problems.

More SQL Idioms - the FOR 1..n Loop Idiom Equivalent

This one is by far one of our favorites. I'll call this the Dummy Number trick.

Basic Use Case:
You need a for loop that gives you a sequence of dates, numbers etc. SQL does not support FOR loops. To achieve this feat you use a table - lets call this the Dummy Number table that has nothing but a sequence of numbers say between 1 and 200000. PostgreSQL has the handy generate_series() function which creates this dummy number table for you on the fly.

Below are a couple of concrete examples:

Creating repeating labels - say business cards
Lets say you need to create a set of business cards, but you don't want to waste paper - so you want your cards to start at position 11, where you had left off and you want to print 10 labels for the employee 'Joey'. The below will generate blanks for the first 10 slots and repeating Joey fields for the next 10.

SELECT CASE WHEN n > 10 THEN e.first_name ELSE '' END As first_name, 
	CASE WHEN n > 10 THEN e.last_name ELSE '' END As last_name, 
	CASE WHEN n > 10 THEN e.title ELSE '' END As title, 
	CASE WHEN n > 10 THEN e.phone ELSE '' END As phone
FROM employees e CROSS JOIN generate_series(1,20) n
WHERE e.first_name = 'Joey'
ORDER BY n;

Create a report that gives you sales for each day of year 2007 even for days that have no sales

SELECT pdays.doy, SUM(o.order_total) As sales_total
FROM (SELECT CAST('2007-01-01' As date) + n As doy
		FROM generate_series(0,364) n ) pdays 
			INNER JOIN orders o ON o.sale_date = pdays.doy;
GROUP BY pdays.doy
ORDER BY pdays.doy

PostGIS - this trick comes in very handy in geometry manipulation
Create a new data set with all lines in your table cut up such that no line is greater than 100 units. Assumes no line segment is longer than 10000x100 units long.

SELECT field1, field2, ST_Line_Substring(the_geom, 100.00*n/length,
  CASE
	WHEN 100.00*(n+1) < length THEN 100.00*(n+1)/length
	ELSE 1
  END) As the_geom
FROM 
  (SELECT sometable.field1, sometable.field2, sometable.the_geom,
  ST_Length(sometable.the_geom) As length 
  FROM sometable 
  ) AS t 
CROSS JOIN generate_series(0,10000) AS n 
WHERE n*100.00/length < 1;

PostgreSQL Array Idioms

PostgreSQL has idioms that are fairly unique to it. Its extensive support of arrays as first class data-types introduces a lot of interesting use-cases. Below is another example using PostGIS for nearest neighbor search.

For each building - list the 3 closest police precincts that are within 5000 units (our units are in feet so ~ 1 mile away)

	SELECT m.building_name, m.nn_precincts[1] As nn_precinct1, 
		m.nn_precincts[2] As nn_precinct2, 
		m.nn_precincts[3] As nn_precinct3
		FROM 
		(SELECT b.building_name, b.the_geom, 
			ARRAY(SELECT p.precint_name
					FROM police_precints p
		WHERE ST_DWithin(b.the_geom, p.the_geom, 5000) 
		ORDER BY ST_Distance(b.the_geom, p.the_geom) 
		LIMIT 3) As nn_precincts 
FROM buildings b  ) m
ORDER BY m.builidng_name