We've been working on converting some of our SQL Server apps to PostgreSQL. In this
article we'll describe some things to watch out for and provide a function we wrote to automate some of
the conversion.
Although both databases are fairly ANSI-SQL compliant, there are still differences
with their CREATE TABLE statements, data types, and how they handle other things that makes porting applications not so trivial.
SQL Server and PostgreSQL data type differences and equivalents
Here are a couple of key differences and similarities in data types between the two. Some are easy
to resolve and others are not.
- SQL Server bit vs. PostgreSQL boolean:
SQL Server, similar to most other relational databases I can think of
doesn't really have a true boolean type even in SQL Server 2008 and upcoming version.
Leo really loves the PostgreSQL boolean type so much so that he's willing to trade portability for having a real true/false data type.
PostgreSQL also has
a bit which can be more than 1 in length, but
it doesn't cast naturally to a boolean type in applications like SQL Server's bit or PostgreSQL boolean so we suggest you stay away from it
for boolean use. Most tools .NET etc. we work with
cast both SQL Server bit and PostgreSQL boolean to the boolean type of the application language.
When writing raw SQL, however, they are different beasts.
In SQL Server you would use a bit in SQL as if it were an integer - sorta. There are nuisances,
such as you can't add bits unless you cast them to integers etc.
but for sake of argument we can wave our hands and call it a small integer that would be
expressed LIKE:
WHERE is_active = 1
WHERE is_active = 0
In PostgreSQL, you would use a boolean as if it were a boolean and treating
it like a 0 / 1 number is not allowed unless you define type casts for such behavior.
So the way you would use PostgreSQL boolean in a where would take form:
WHERE is_active = true
WHERE is_active
WHERE is_active = false
WHERE NOT is_active
PostgreSQL does have auto casts for boolean to text which allows
you to write Oracle compatible syntax- any of these will cast ('TRUE','FALSE','true','t','f', 'false')
WHERE is_active = 'FALSE'
WHERE is_active = 'TRUE'
are all valid constructs in PostgreSQL but invalid in SQL Server
and you could use 0 and 1 too if you install some autocasts
in your database as we described in Using MS Access with PostgreSQL
which while designed for MS Access work equally well when writing raw SQL.
- SQL Server 2005/2000 datetime vs. PostgreSQL timestamp with time zone, timestamp without time zone, date, time .
This small little annoyance is a less of a concern in SQL Server 2008+ since 2008 introduced (DATE, TIME, DATETIMEOFFSET)
which are equivalent to PostgreSQL types so mapping is much cleaner for 2008.
If you are converting from SQL Server 2005/2000 you have to consciously think about whether the columns you had the old way
should really be timestamps or dates or even times and if you really want the time zone following your data around.
For most of the apps we are porting, we don't ever intend to go back nor have it work with anything but PostgreSQL,
so we've been taking liberties using syntax that will probably not be portable such as
using PostgreSQL array support, complex type, and built-in regular expression support shamelessly just because it makes so many tasks so much succinct and faster.
It's a hard decision to make consciously because of the WHAT IFS (not just about betting on success of PostgreSQL,
but the fact that you can't market your work to people who need it to work for a different database platform
or even a lower version of a database platform).
Product lock-in is a reality even when you have no vendors and when you are using open source. Standards are there to minimize product/vendor lock-in,
but they often fall short of covering your needs. Product-lockin is a step that should not be taken lightly and should be considered on an application by application basis.
It is best to make the decision consciously and play out the pros / cons of what you are gaining and losing. Most of our apps
involve lots of stats, complex analytics of which we've decided it's much more efficient to make the database the engine to perform that than
to try to write application code so that it works across multiple database platforms.
Database lock-in is the lesser of other evils.
The big PITA: Column Casing
In SQL Server the standard for casing is
Pascal case / camel case as described in our article
Of Camels and People: Converting back and forth from Camel Case, Pascal Case to underscore lower case
and in PostgreSQL people usually prefer underscores and lower case because upper or mixed case would need constant ugly quoting of columns. SQL Server can get away with this without enduring penalty
because while it respects the casing of the columns in design, it is case insensitive when it is used in an SQL statement.
So for example if you had a table structure in SQL Server with columns named:
PersonID, FirstName, LastName
you can reference them as personid, firstname, FIRSTNAME etc.
.
If you want to keep your code more or less working
without changing anything, you are best off just getting rid of the casing of SQL Server columns so you don't need
to quote them in PostgreSQL. Same issue holds for Oracle that it is sensitive to casing of tables, except in Oracle case columns
need to be uppercase to not require quoting and in PostgreSQL, columns need to be lower case to not require quoting.
However if you get really hot an bothered about not having word like breakages in your column names that
casing or underscores provide, the thought of always having to quote columns sounds like hell, and this is an application you expect to live with for a while, you may just
want to bite the bullet as we often do and change the casing when you migrate. YES, this means you probably have
to rewrite a good chunk of your code. We like to think of it as providing an opportunity to refactor our code. I LOVE refactoring.
The Quick and Dirty PostgreSQL function to convert SQL Server create table structure
To automate some of the process of converting our table structures from SQL Server to PostgreSQL
we opted to write a PostgreSQL function to do it so that we could completely control how it is done.
One of the reasons we are porting some of apps is that PostgreSQL string manipulation functions and array
support makes so many things easier. One of those things is this. SQL Server by comparison has pretty
pathetic built-in string manipulation features.
CREATE OR REPLACE FUNCTION convert_mssqlddl2pgsql(sql text,
change_camel_under boolean, default_not_nulls boolean DEFAULT true,
replace_dbo_with_schema text DEFAULT NULL)
RETURNS text AS
$$
DECLARE
var_sql text := sql;
r record;
BEGIN
IF change_camel_under THEN
var_sql := regexp_replace(var_sql, E'([a-z])([A-Z])([a-z]?)', E'\\1\_\\2\\3','g');
END IF;
var_sql := lower(var_sql);
var_sql := replace(var_sql,'[dbo].', COALESCE('[' || replace_dbo_with_schema || '].',''));
var_sql := replace(var_sql,'on [primary]', '');
FOR r IN (SELECT * FROM ( VALUES ('datetime', 'timestamp with time zone', 'CURRENT_TIMESTAMP'),
('bit', 'boolean', 'true'),
('varchar(max)', 'text', ''),
('nvarchar', 'varchar', ''),
('tinyint','smallint', '0') ,
('[int] identity(1,1)', 'serial', NULL)
) As f(ms,pg, def)) LOOP
IF default_not_nulls AND r.def IS NOT NULL THEN
var_sql := replace(var_sql, '[' || r.ms || '] not null', '[' || r.ms || '] not null DEFAULT ' || r.def);
END IF;
var_sql := replace(var_sql, '[' || r.ms || ']',r.pg) ;
var_sql := replace(var_sql, r.ms ,r.pg) ;
END LOOP;
var_sql := regexp_replace(var_sql, '[\[\]]','','g');
var_sql := regexp_replace(var_sql,'(primary key|unique) (clustered|nonclustered)', E'\\1', 'g');
var_sql := regexp_replace(var_sql, 'with \([^\)]+\)', '','g');
var_sql := regexp_replace(var_sql, '([a-z]+) asc', E'\\1','g');
var_sql := regexp_replace(var_sql, 'collate [a-z0-9\_]+', '','g');
RETURN var_sql;
END;
$$
LANGUAGE plpgsql IMMUTABLE;
SELECT convert_mssqlddl2pgsql('
CREATE TABLE [dbo].[Contacts](
[ContactID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[LastName] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[CompanyID] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[AddDate] [datetime] NOT NULL,
CONSTRAINT [PK_Contacts] PRIMARY KEY CLUSTERED
(
[ContactID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
', true, true, 'public');
create table public.contacts(
contact_id serial not null,
first_name varchar(50) null,
last_name varchar(50) null,
company_id varchar(50) null,
add_date timestamp with time zone not null DEFAULT CURRENT_TIMESTAMP,
constraint pk_contacts primary key
(
contact_id
)
)
SELECT convert_mssqlddl2pgsql('
CREATE TABLE [dbo].[Contacts](
[ContactID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[LastName] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[CompanyID] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[AddDate] [datetime] NOT NULL,
CONSTRAINT [PK_Contacts] PRIMARY KEY CLUSTERED
(
[ContactID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
', false, true, 'public');
create table public.contacts(
contactid serial not null,
firstname varchar(50) null,
lastname varchar(50) null,
companyid varchar(50) null,
adddate timestamp with time zone not null DEFAULT CURRENT_TIMESTAMP,
constraint pk_contacts primary key
(
contactid
)
)