Choosing the right Database Procedural Language PL

One of the great selling points of PostgreSQL is its pluggable PL language architecture. MySQL is known for its pluggable storage and PostgreSQL is known for its pluggable PL language architecture. From Monty's notes on slide 12 looks like MySQL may be working on a pluggable PL language architecture of their own. The most common of these languages are the all-purpose languages SQL and C (these are built-in and not really PLs like the others, but we'll throw them in there), PLPgSQL which is also built-in but not always enabled, PL/Perl, PL/Python, and the domain specific languages PL/R, PL/SH and gaining popularity Skype released PL/Proxy. There are others in the family such as PL/Tcl, PL/PHP, PL/Ruby, PL/Scheme (a dialect of Lisp), PL/Java, PL/Lua and PL/LOLCode (for kicks and as a reference implementation. Think of LOLCode as PostgreSQL Pluggable PL equivalent of MySQL's BLACK HOLE storage engine.) .

The other interesting thing about the PostgreSQL PL language architecture is that it is a fairly thin wrapper around these languages. This means the kind of code you write in those languages is pretty much what you would write if you were doing general programming in those languages minus some spi calls. Since the handler is a just a thin wrapper around the environment, the language environment must be installed on the database server before you can use the PL language handler. This means you can have these functions utilized in your SQL statements and you can write in a language you feel comfortable with if you can get the darn PL compiled for your environment or someone has already kindly compiled it for your environment or that it is even compilable for your environment. The pluggable PL architecture means you can write a PL Handler for your favorite language or invent your own language that you can run in the database. In the end the barrier between code,data, and semantic constructs is more of a constraint imposed by compilers. If you have any doubts about the above statement, you need only look at some javascript injection attacks to bring the statement home. One of my fantasies is developing a language that morphs itself, that utilizes the database as its morphing engine and its OS and that breaks the illusion of data being data, code being code, and lacks rigid semantics. Of the languages we have worked with, SmallTalk comes closest to a language that satisfies these ideals and Lisp to a much lesser extent. Lisp lacked the semantic elegance of SmallTalk among other things.

Most people are used to having their procedural language push their data around. PL code living in PostgreSQL allows your data to push your procedural code around in a set-based way. This is a simple but pretty powerful feature since data is in general more fluid than code. For interpretated/just-in time compiled languages it can live in the database, for compiled it has to call compiled functions.

Now I shall stop here and say there are consequences to a thin wrapper that are both good and bad.

  1. Good/Bad - you are writing code you are used to. This is good because it makes people just getting used to relational database concepts feel at home. This is bad because it gives one the false confidence that the Romans will be happy when you impose your cultural bad habits on their perfect society. It is great to bring new ideas into the database, but try not to destroy the sanctity of the database by forgetting that you are in a database. Similar things have been said by DB programmers when SQL Server 2005 introduced .NET code in the database and you had all these reckless programmers doing things in .NET code that would have been more efficient in Transact-SQL. Just because you can do it doesn't mean you should. More on that later.
  2. Good - you can leverage all the goody libraries in your language of choice that others have written or you have written by calling the libraries from your database. With some caveats e.g. markings as safe and unsafe.
  3. Bad - PostgreSQL is doing a context switch to push the code into the environment your code is comfortable in. Generally the bigger and more complex the environment the more context switching that is happening.
  4. Bad - In order to manipulate data, except for the built in languages SQL, PlPgSQL, and C, PostgreSQL functions generally need to push the data into the language's environment and pull it out. This makes most languages somewhat suboptimal for set returning functions or functions that consume a lot of data.

As we mentioned in a prior article Trojan SQL Function Hack - A PL Lemma in Disguise not all languages are created equal as far as PostgreSQL is concerned and PostgreSQL has its favorites. Just as annoying as the MySQL storage engine idiosyncracies where you can have foreign keys in one storage engine and they are ignored in another, similar can be said with PL languages in PostgreSQL - they all handle sets differently and set returning functions are easier to write in some PL's than in others. Not to mention each programming environment has certain idiosyncracies of its own which make this useful yet still Leaky abstraction apparent.

Even if PostgreSQL did not have its favorites, one must keep in mind that languages are designed for a particular reason. They are designed to satisfy a particular language designers philosophies and goals. This means that Perl is optimized for the certain kinds of problems that Larry Wall liked to solve (e.g. string manipulation) and to solve them the way that Larry thought was fitting. Similarly R, S, S-Plus were designed for scientific, statistical processing, graphing. R takes some effort to get used to its terminology of data frames and factors and its way of pushing data into arrays and defining functions, but its well-worth it for what it does. Generally speaking the PL languages are not optimized for pushing data in a SET-oriented way and if you try to use them for something they were not really designed well for, you may feel comfortable but your database will suffer for your illusion of comfort. This false comfort leads people to write otherwise simple SET code in PlPython when it could have been done more efficiently and simply in PostgreSQL SQL function language or PLPgSQL language. Some people further like to encapsulate things in functions that shouldn't be encapsulated in functions in the first place because it gives them a false sense of comfort to shove stuff that they couldn't figure out how to write in a set-based way into a loop-di-loop blackbox. It may be amusing to write needlessly complicated code and kill cockroaches with hammers, but it is not a terribly efficient way of occupying your time. In fact the most impressive programmers are just clear thinkers. The real geniuses in programming are those who can restate an unsolvable problem into a solvable one or don't get caught up in the mob thinking that causes groups of people to simultaneously come up with the wonderful idea of solving the same wrong problem.

General rule of thumb when deciding which language to program a particular functionality

Below are links to various articles that demonstrate some uses of PLs :