SQL Coding Standards To Each His Own Part II

Tuesday, January 06. 2009

SQL Coding Standards To Each His Own Part II

Both Josh Berkus and Hubert made blog entries about our last excerpt. In general I will start off by saying that we are more or less in agreement on what is good SQL coding format.

Here are the things I think we can all agree on

SQL Keywords should be upper cased
Field names should be prefixed with their tables especially when you have a multi-join statement involved
Use JOIN syntax instead of stuffing everything in the WHERE though we are highly against just saying JOIN. We prefer INNER JOIN

The major points of contention I think are

Should you use aliases over tables and if you use aliases should you keep them short or more descriptive. Josh thinks table names should be used where possible and when aliases are used they should be longer than a few characters and Hubert agrees with us that short aliases are fine and in fact desirable. I think we all agree aliases should be meaningful when used, but our idea of what it means to be meaningful is a little different.
In use of JOIN syntax -- we prefer using INNER JOIN instead of using just JOIN and in fact find it quite irritating that PostgreSQL rewrites our INNERs as JOIN. I suspect Hubert and Josh and many other PostgreSQL folk are on the other side of the fence on this. The reason we feel strongly about this is there are so many kinds of JOINS - INNER JOIN, RIGHT JOIN, LEFT JOIN, CROSS JOIN, FULL JOIN, and the horrid NATURAL JOIN (that should be shot and put out of its misery). To just say JOIN to us is just confusing.
While you can write LEFT OUTER JOIN, the OUTER is kind of pointless because no one goes around writing LEFT INNER JOINS
Use well supported standards where possible. This means CURRENT_TIMESTAMP instead of now(). now() is not in all databases, but most relational databases support CURRENT_TIMESTAMP

So let us start by looking at Josh's last SQL statement and compare to how we would write it. Josh I hope you didn't come up with this table structure. We personally hate the use of id as a field identifier for every table, but that's another story. We prefer something like forum_id, post_id etc. and whats the deal with giving a field name of date. Stay away from data types as field names.

-- Josh
SELECT persons.id, persons.first_name, persons.last_name, forums.category,
   COUNT(DISTINCT posts.id) as num_posts,
   COALESCE(MAX(comments.rating), 0) AS highest_rating,
   COALESCE(MIN(comments.rating), 0) AS lowest_rating
FROM persons JOIN posts ON persons.id = posts.author
   JOIN forums on posts.forum = forums.id
   LEFT OUTER JOIN comments ON posts.id = comments.post
WHERE persons.status > 0
   AND forums.ratings = TRUE
   AND comments.date > ( now() - INTERVAL '1 year')
GROUP BY persons.id, persons.first_name, persons.last_name, forums.category
HAVING count(DISTINCT posts.id) > 0
ORDER BY persons.last_name, persons.first_name; 


-- Us

-- Leo tends to prefer upper case alias for table lower case or proper case for fields.  
-- Leo also likes to see his SELECT fields across one line regardless how many because 
-- he has 2 wide screen ubber huge monitors and wants to take full advantage of them
-- I couldn't demonstrate this without messing up the format of the page.

SELECT P.id, P.first_name, P.last_name, F.category, COUNT(DISTINCT PO.id) as num_posts, 
  COALESCE(MAX(C.rating), 0) AS highest_rating, COALESCE(MIN(C.rating), 0) AS lowest_rating
FROM persons AS P INNER JOIN posts AS PO ON P.id = PO.author
   INNER JOIN forums As F ON PO.forum = F.id
   LEFT JOIN comments AS C ON PO.id = C.post
WHERE P.status > 0
   AND F.ratings = TRUE
   AND C.date > ( CURRENT_TIMESTAMP - INTERVAL '1 year')
GROUP BY P.id, P.first_name, P.last_name, F.category
HAVING COUNT(DISTINCT PO.id) > 0
ORDER BY P.last_name, P.first_name; 


-- Regina flip flops and often depends on database 
-- she is working with that day or moment
-- whether she upper cases or lower cases or proper cases.
-- It must be noted that Regina also has 2 wide  ubber screens to work with
-- because Leo believes every programmer should have at least 36 inches to work with.
-- Once she finally gets adjusted to the enormity of these things, 
-- she will use the space to keep 20 applications open at the same time and train her mind 
-- for massive parallel concentration and get 20 times more work done
SELECT pe.id, pe.first_name, pe.last_name, f.category, COUNT(DISTINCT po.id) AS num_posts,
   COALESCE(MAX(c.rating), 0) AS highest_rating, COALESCE(MIN(c.rating), 0) AS lowest_rating
FROM persons AS pe INNER JOIN posts AS po ON pe.id = po.author
   INNER JOIN forums AS f ON po.forum = f.id
   LEFT JOIN comments AS c ON po.id = c.post
WHERE pe.status > 0
   AND f.ratings = TRUE
   AND c.date > ( CURRENT_TIMESTAMP - INTERVAL '1 year')
GROUP BY pe.id, pe.first_name, pe.last_name, f.category
HAVING COUNT(DISTINCT po.id) > 0
ORDER BY pe.last_name, pe.first_name;

I will start off by saying for the above example -- Josh is lucky because his table names are naturally short and we are disadvantaged in this example because two tables start with the same letter. Imagine if the table names were much more descriptive how repetitive and noisy Josh's scene would become.

The reason we prefer short aliases over long is that it allows you to fit more fields on the same line and scan them quickly and still prefix with the table names. Leo prefers upper case his table because he can quickly spot where a table name ends and field begins. Generally speaking you usually have 4 tables or fewer in a statement, the FROM clause is not that far away as a legend, and the tables usually start with different letters, so its not a large mind cramp for us to remember F is for Forum and C is for Comments.

Unlike other languages, you then need to keep on repeating this variable over and over again which is much more typing as well as just makes your statement that much longer to digest.

As a last gripe to what Josh said. Just because you have 255 characters at your disposal as opposed to the 8 you once had, doesn't mean you should feel compelled to use them all. Long live Fortran. It is older than any language I can think of including COBOL and has stood the test of time and is still undergoing innovation.

Posted by Leo Hsu and Regina Obe in basics at 07:06 | Comments (5) | Trackbacks (0)

Trackbacks

Trackback specific URI for this entry

No Trackbacks

Comments

Display comments as (Linear | Threaded)

In regards to the table.id argument, why not consider semantics? table.id is semantic. table.table_id is just plain redundant and insulting to coding intelligence.

Why do the extra work for no explicit benefit? Note how you guys state, "Field names should be prefixed with their tables." You might as well go on to say, "... and id columns should also be prefixed with their tables." Read that back to yourselves and let your mind wobble. Thanks for helping to push non beneficial "standards" to make my day-to-day tasks more of a pain in the ass.

#1 alan on 2009-01-07 04:26 (Reply)

Alan,

First of all these are just opinions and part of the point of this article is that standards are not quite as cut and dry as some people think. Josh I'm sure feels strongly about his style just as much as we feel strongly about our style.

The 2 problems I see with id is
1) You always have to alias it in the SELECT because who wants to see id in a multi join output when you can't see the tables that are being joined.

In many cases we need to output the primary key for example if the query is a view to be joined later with other things and if you are going to have to alias to a meaningful name, you might as well do it in the table.

2) Its just distracting to see all fields named the same unless they mean the same thing (yah we can argue an id is an id, but an id is not always an integer so why should it always have the same name and that's a whole other argument as to whether we should always use surrogate keys). first_name is a first_name and always means the same thing okay I get it.

When you are using short aliasing of tables its especially annoying.

PO.id is just not quite as useful to see as PO.post_id. In Josh' model its not quite as bad since he always spells out the table name. So I guess in Josh's model it works if you never need to output the primary key in the query.

#1.1 Regina on 2009-01-07 10:43 (Reply)

Hi. Just a small nitpick.

Not once did you mention the word "column", but used the word "field" 8 times.

Columns are not fields.

Here is a great explanation:

http://www.developersdex.com/gurus/articles/118.asp

Regards.

#2 RDL on 2009-01-09 04:02 (Reply)

RDL,

I'll have to think about this a bit more. You are probably right. Rereading Celko's paper I could read it a couple of ways.

The way I have always understood it, a column defines the whole set of data of a specific slot (column) in a table. A field defines a specific element of data that lives in a specific row in a specific column.

So when I think of writing SQL statements -- I think of them as fields, because I am not selecting the whole column 0 table (I am only selecting a set of fields that live in a column). But yes I am selecting from columns so perhaps I should call it columns.

So when I am defining a table structure I think columns. When I am selecting data, I think fields.

The other reason I just assume not bother talking about columns is we do a lot of financial consulting and when you start talking columns - the first crazy thought that pops into finance peoples minds is (Ah so a database is a spreadsheet - we have columns too). I just assume not get into that mindset of thinking of databases as spreadsheets even though they both have columns and I have partaken in the guilty pleasure of simulating spreadsheet behavior in databases :)

#2.1 Regina on 2009-01-09 11:45 (Reply)

Really, Very nice Information for the SQL Programmers and very important features should take care always.

Ajaykumar Sinha
SQl Expert India

#3 Ajyakumar Sinha on 2009-06-09 06:39 (Reply)

Add Comment

Name
Email
Homepage
In reply to
Comment	E-Mail addresses will not be displayed and will only be used for E-Mail notifications. To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly. Enter the string from the spam-prevention image above: Phone* What is nine minus seven?
	Remember Information? Subscribe to this entry

SQL Coding Standards To Each His Own Part II

Postgres OnLine Journal

PostGIS in Action About the Authors Consulting

Tuesday, January 06. 2009

SQL Coding Standards To Each His Own Part II

Entry's Links

Quicksearch

Calendar

Categories

Archives

Subscribe

Blog Administration

SQL Coding Standards To Each His Own Part II

Postgres OnLine Journal PostGIS in Action About the Authors Consulting

Tuesday, January 06. 2009

SQL Coding Standards To Each His Own Part II

Entry's Links

Quicksearch

Calendar

Categories

Archives

Subscribe

Blog Administration

Postgres OnLine Journal

PostGIS in Action About the Authors Consulting