SQL Server to PostgreSQL: Converting table structure

We've been working on converting some of our SQL Server apps to PostgreSQL. In this article we'll describe some things to watch out for and provide a function we wrote to automate some of the conversion.

Although both databases are fairly ANSI-SQL compliant, there are still differences with their CREATE TABLE statements, data types, and how they handle other things that makes porting applications not so trivial.

SQL Server and PostgreSQL data type differences and equivalents

Here are a couple of key differences and similarities in data types between the two. Some are easy to resolve and others are not.

For most of the apps we are porting, we don't ever intend to go back nor have it work with anything but PostgreSQL, so we've been taking liberties using syntax that will probably not be portable such as using PostgreSQL array support, complex type, and built-in regular expression support shamelessly just because it makes so many tasks so much succinct and faster.

It's a hard decision to make consciously because of the WHAT IFS (not just about betting on success of PostgreSQL, but the fact that you can't market your work to people who need it to work for a different database platform or even a lower version of a database platform). Product lock-in is a reality even when you have no vendors and when you are using open source. Standards are there to minimize product/vendor lock-in, but they often fall short of covering your needs. Product-lockin is a step that should not be taken lightly and should be considered on an application by application basis. It is best to make the decision consciously and play out the pros / cons of what you are gaining and losing. Most of our apps involve lots of stats, complex analytics of which we've decided it's much more efficient to make the database the engine to perform that than to try to write application code so that it works across multiple database platforms. Database lock-in is the lesser of other evils.

The big PITA: Column Casing

In SQL Server the standard for casing is Pascal case / camel case as described in our article Of Camels and People: Converting back and forth from Camel Case, Pascal Case to underscore lower case and in PostgreSQL people usually prefer underscores and lower case because upper or mixed case would need constant ugly quoting of columns. SQL Server can get away with this without enduring penalty because while it respects the casing of the columns in design, it is case insensitive when it is used in an SQL statement.

So for example if you had a table structure in SQL Server with columns named: PersonID, FirstName, LastName you can reference them as personid, firstname, FIRSTNAME etc..

If you want to keep your code more or less working without changing anything, you are best off just getting rid of the casing of SQL Server columns so you don't need to quote them in PostgreSQL. Same issue holds for Oracle that it is sensitive to casing of tables, except in Oracle case columns need to be uppercase to not require quoting and in PostgreSQL, columns need to be lower case to not require quoting.

However if you get really hot an bothered about not having word like breakages in your column names that casing or underscores provide, the thought of always having to quote columns sounds like hell, and this is an application you expect to live with for a while, you may just want to bite the bullet as we often do and change the casing when you migrate. YES, this means you probably have to rewrite a good chunk of your code. We like to think of it as providing an opportunity to refactor our code. I LOVE refactoring.

The Quick and Dirty PostgreSQL function to convert SQL Server create table structure

To automate some of the process of converting our table structures from SQL Server to PostgreSQL we opted to write a PostgreSQL function to do it so that we could completely control how it is done.

One of the reasons we are porting some of apps is that PostgreSQL string manipulation functions and array support makes so many things easier. One of those things is this. SQL Server by comparison has pretty pathetic built-in string manipulation features.

CREATE OR REPLACE FUNCTION convert_mssqlddl2pgsql(sql text, 
 change_camel_under boolean, default_not_nulls boolean DEFAULT true, 
replace_dbo_with_schema text DEFAULT NULL)
  RETURNS text AS
$$
DECLARE 
    var_sql text := sql;
    r record;
BEGIN
    IF change_camel_under THEN
        -- only match captilas [A-Z] that are preceded and followed by lower case [a-z]
        -- replace the whole match with preceding lowercase _uppercase following lowercase
        var_sql := regexp_replace(var_sql, E'([a-z])([A-Z])([a-z]?)', E'\\1\_\\2\\3','g'); 
    END IF;
    var_sql := lower(var_sql);
    var_sql := replace(var_sql,'[dbo].', COALESCE('[' || replace_dbo_with_schema || '].',''));
    var_sql := replace(var_sql,'on [primary]', '');
    FOR r IN (SELECT * FROM ( VALUES ('datetime', 'timestamp with time zone', 'CURRENT_TIMESTAMP'),     
            ('bit', 'boolean', 'true'),
            ('varchar(max)', 'text', ''), 
            ('nvarchar', 'varchar', ''), 
            ('tinyint','smallint', '0') ,
            ('[int] identity(1,1)', 'serial', NULL)
            ) As f(ms,pg, def)) LOOP
        IF default_not_nulls AND r.def IS NOT NULL THEN
            var_sql := replace(var_sql, '[' || r.ms || '] not null', '[' || r.ms || '] not null DEFAULT ' || r.def);  
        END IF;
        var_sql := replace(var_sql, '[' || r.ms || ']',r.pg) ;
        var_sql := replace(var_sql, r.ms ,r.pg) ;
    END LOOP;
    var_sql := regexp_replace(var_sql, '[\[\]]','','g');
    var_sql := regexp_replace(var_sql,'(primary key|unique) (clustered|nonclustered)', E'\\1', 'g');
    --get rid of all that WITH (PAD_INDEX ...) that sql server generates for tables
    -- so basically match any phrase WITH ("everything not containing )" ) 
    var_sql := regexp_replace(var_sql, 'with \([^\)]+\)', '','g');
    -- get rid of asc in column constraints
    var_sql := regexp_replace(var_sql, '([a-z]+) asc', E'\\1','g');
    
    -- get rid of collation
    -- for PostgreSQL 9.1 might want
    -- to just change it to 9.1 syntax
    var_sql := regexp_replace(var_sql, 'collate [a-z0-9\_]+', '','g');
    RETURN var_sql;
    
END;
$$
  LANGUAGE plpgsql IMMUTABLE;
-- EXAMPLE USAGE - convert camel to under score, dbo to public, not null to default defined constants
SELECT convert_mssqlddl2pgsql('
CREATE TABLE [dbo].[Contacts](
    [ContactID] [int] IDENTITY(1,1) NOT NULL,
    [FirstName] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [LastName] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [CompanyID] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [AddDate] [datetime] NOT NULL,
 CONSTRAINT [PK_Contacts] PRIMARY KEY CLUSTERED 
(
    [ContactID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
', true, true, 'public');
-- output --
create table public.contacts(
	contact_id serial not null,
	first_name varchar(50)  null,
	last_name varchar(50)  null,
	company_id varchar(50)  null,
	add_date timestamp with time zone not null DEFAULT CURRENT_TIMESTAMP,
 constraint pk_contacts primary key 
(
	contact_id
) 
)
-- EXAMPLE USAGE - Don't convert to under score
SELECT convert_mssqlddl2pgsql('
CREATE TABLE [dbo].[Contacts](
    [ContactID] [int] IDENTITY(1,1) NOT NULL,
    [FirstName] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [LastName] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [CompanyID] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [AddDate] [datetime] NOT NULL,
 CONSTRAINT [PK_Contacts] PRIMARY KEY CLUSTERED 
(
    [ContactID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
', false, true, 'public');
-- output --
create table public.contacts(
	contactid serial not null,
	firstname varchar(50)  null,
	lastname varchar(50)  null,
	companyid varchar(50)  null,
	adddate timestamp with time zone not null DEFAULT CURRENT_TIMESTAMP,
 constraint pk_contacts primary key 
(
	contactid
) 
)