Language Architecture in PostgreSQL

Perhaps one of the most unique and exciting things that makes PostgreSQL stand out from other database systems, are the numerous choices of languages one can use to create database functions, triggers and define new aggregate functions with. Not only can you use various languages to write your database stored functions with, but often times the code you write lives right in the database. You have no idea how cool this is until you see it in action.

The other interesting thing about the PostgreSQL language architecture is the relative ease with which new languages can be incorporated in the system.

Native Languages of PostgreSQL

There are 3 languages that come packaged with PostgreSQL (2 non-PL ones are installed automatically and not even listed as languages (C and SQL) in the languages section of a db). The defacto PL/PgSQL procedural language is available for install in all PostgreSQL distributions, but need not be installed in a db by default .

  1. C Extern which allows for binding C libraries as functions. C Extern is similar to the way languages like MySQL bind C libraries for use in DB or the way SQL Server 2005+ binds .NET assemblies as functions in SQL Server.
  2. SQL - this is a non-procedural language. It allows one to write parameterized db stored functions with plain SQL, but lacks procedural logic constructs such as IF, FOR, WHILE and so forth. It is basically a macro substitution language. Functions written in this way are basically in-lined in with the queries they are used (except in case of STABLE, IMMUTABLE defined in which case cached results are often used) in so they are more easily optimizable than functions written in other languages. NOTE: that MySQL 5+ also has a Procedural language called SQL, but the MySQL SQL language is a procedural language more in line with PostgreSQL pl/pgsql and closer in syntax to DB2's SQL PL. I'll also note that DB2 has a concept of INLINE SQL PL which is kind of like PostgreSQL sql language, although a bit more powerful.
  3. PL/PgSQL - this is PostgreSQL defacto Procedural Language. It is not always installed by default in a database but the language handler is always available for installation. The equivalent but slightly different in syntax in other systems would be Transact SQL in SQL Server/Sybase, PL/SQL in Oracle, SQL in MySQL5+, and SQL PL in DB2.

The PL languages

Aside from PL/pgSQL there are numerous other procedural languages that one can use to create database stored functions and triggers. Some of these languages are fairly stable and even more are experimental. Some are only supported on Unix/Linux, but many are supported on Unix/Linux/MacOS/windows. In any case there are 3 key components needed before you can start using a new language:

  1. The environment for the language - e.g. PHP, Perl, Python, Ruby, Java, R etc. interpreter and libraries installed on the PostgreSQL server box
  2. The compiled call handler function - this is a C-compiled function that does the transfer between the PostgreSQL environment and the language environment.
  3. The language registered in the database you wish to use it in.

Registering a language in a Database

For pl/pgsql items 1 and 2 are already done if you have a working PostgreSQL install. In order to accomplish item 3, you may need to do the following from psql or PgAdmin III query window.


 CREATE TRUSTED PROCEDURAL LANGUAGE 'plpgsql'
  HANDLER plpgsql_call_handler
  VALIDATOR plpgsql_validator;

Alternatively you can run createlang plpgsql somedb from commandline. Note createlang is a command line program that is located in the bin folder of your PostgreSQL install.

To see a list of procedural languages that you already have call handlers registered for in PostgreSQL. These are the languages you can register in your specific database - do a

SELECT * FROM pg_catalog.pg_pltemplate

A Flavor of the Procedural Languages (PLs)

In this section, we'll show a brief sampling of what functions look like written in various PLs. These are not to suggest they are the only ones that exist. For these examples, I'm going to use the $ quoting syntax introduced in PostgreSQL 8.0 which allows for not having to escape out single quotes.

SQL - the not PL language

For basic CRUD stuff,selects and simple functions, nothing hits the spot like just plain old SQL. Since this is such a common choice and often the best choice - here are 3 examples.

CREATE OR REPLACE FUNCTION cp_simpleupdate(thekey integer, thevalue varchar(50))
  RETURNS void AS
	$BODY$
		UPDATE testtable SET test_stuff = $2 WHERE test_id = $1
	$BODY$
  LANGUAGE 'sql' VOLATILE;


--Example use
SELECT cp_simpleupdate(1, 'set to this'); 


--Here is a simple example to simulate the MySQL 5.0 function
CREATE OR REPLACE FUNCTION from_unixtime(unixts integer)
  RETURNS timestamp without time zone AS
$BODY$SELECT CAST('epoch' As timestamp) + ($1 * INTERVAL '1 second') $BODY$
  LANGUAGE 'sql' IMMUTABLE;
  
  --Example use
  SELECT from_unixtime(1134657687); 
  SELECT from_unixtime(tbl.fromsomefield) FROM tbl;



CREATE OR REPLACE FUNCTION cp_test(subject varchar)
  RETURNS SETOF testtable AS
  $BODY$
	SELECT * FROM testtable where test_stuff LIKE $1;
	$BODY$
  LANGUAGE 'sql' VOLATILE;
  --Example use
  SELECT * FROM cp_test('%stuff%');


  CREATE OR REPLACE FUNCTION cp_testusingoutparams(subject varchar, out test_id int, out test_stuff varchar)
  RETURNS SETOF record AS
$BODY$
	SELECT test_id, test_stuff FROM testtable where test_stuff LIKE $1;
$BODY$
  LANGUAGE 'sql' VOLATILE;

--Example use - Note the subtle difference - the second syntax with out parameters is newer
-- It allows you to get around the messy issue of when you are returning a record type
--That a record type has no specific type.
SELECT * FROM cp_usingoutparams('%stuff%');

For details on using out parameters, check out Robert Treat's out parameter sql & plpgsql examples

PLPGSQL - a real PL Language

For more complex logic and massaging of results before sending back. You need something more powerful than standard SQL. Below are some examples using PLPGSQL.


CREATE OR REPLACE FUNCTION cp_harderupdate(thekey integer, thevalue varchar)
RETURNS void AS
$BODY$
BEGIN
	IF EXISTS(SELECT test_id FROM testtable WHERE test_id = thekey) THEN 
		UPDATE testtable SET test_stuff = thevalue WHERE test_id = thekey;
	ELSE
		INSERT INTO testtable(test_id, test_stuff) VALUES(thekey, thevalue);
	END IF;
     RETURN;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;

--Example use
SELECT cp_harderupdate(1, 'this is more stuff');


Using PL/Perl

CREATE OR REPLACE FUNCTION get_neworders() RETURNS SETOF orders AS $$
    my $rv = spi_exec_query('select * from orders where processed IS NULL;');
    my $status = $rv->{status};
    my $nrows = $rv->{processed};
    foreach my $rn (0 .. $nrows - 1) {
        my $row = $rv->{rows}[$rn];
        return_next($row);
    }
    return undef;
$$ LANGUAGE plperl;

Using PL/R a language and environment for statistics

One of my favorite PL languages to program is PL/R. The reason for this is that the R statistical environment is such a rich environment for doing statistical processing. It now is also supported on windows as well as Mac and Linux.

To learn more about R and installing PL/R. Check out our Boston GIS article PLR Part 1: Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide

Below is the classic median aggregate function in R. It uses the native median function in the R environment to create a PostgreSQL aggregate median function


CREATE or REPLACE FUNCTION r_median(_float8) 
	returns float as $BODY$ median(arg1) $BODY$ language 'plr';

CREATE AGGREGATE median (
  sfunc = plr_array_accum,
  basetype = float8,
  stype = _float8,
  finalfunc = r_median
);

--Example use
SELECT median(age) As themedian_age, period_year 
        FROM crimestats GROUP BY period_year ORDER BY period_year;

We will be covering PLR in greater detail in another article.