Universal Unique Identifiers are 16-byte / 32-hexadecimal digit (with 4 -s for separation)
identifiers standardized by the Open Software Foundation.
The main use as far as databases go is to ensure uniqueness of keys across databases. This is important if you have multiple servers or disperate systems that need to replicate or share
data and each can generate data on its own end. You want some non-centralized mechanism to ensure the ids generated from each server will never overlap.
There are various open standards
for generating these ids and each standard will tie the id based on some unique identifier of the computer or a namespace or just a purely random generator algorithm not tied to anything.
Since this is a question often asked by users coming from Microsoft SQL Server, we will demonstrate in this article the same concept in Microsoft SQL Server and how you would achieve similar functionality in PostgreSQL.
I'm not sure how popular they are to use in other databases, but in SQL Server (and even in Microsoft Access), they are very popular, particularly if you need to synchronize data with various offices.
If you come from Microsoft world, you probably know UUIDs by the term GUIDs, RowGUID, or data type uniqueidentifier or the thing that the NEWID() function in SQL Server generates,
or in Microsoft Access when you choose AutoNumber and FieldSize = Replication ID. All the aforementioned use the standard 32-hexadecimal digit (with -) unique identifier. You may be wondering how you could get the same functionality in PostgreSQL.
PostgreSQL offers similar functionality via the uuid datatype. In additon there is a contrib module uuid-osp which provides various
functions for auto-generating uuids. You can use these functions as default values in your table for columns that you need to have unique identifiers or just call them directly similar to what you do with SQL Server's NEWID() function.
The main difference between the two is that SQL Server's text representation of the GUID is sometimes shown with {} (though not always) and sometimes as uppercase letters for presentation. PostgreSQL's native display representation doesn't have {} and usually uses lowercase letters.
These are merely display representations (and can even be different depending on which display tool you are using). The display representation vs the actual type causes all sorts of confusion when people try to allocate space for them in there systems.
SQL Server calls the type UniqueIdentifier and PostgreSQL calls the type uuid. Both types occupy 16-bytes of storage. For compatibility reasons with other software
or databases, many use some stanardized text representation of them particularly for transport rather than using the native type. The text representation
may require anywhere from 32 to 38 bytes (characters) contingent on if you add dashes or {}. Thus the confusion in storage requirements particularly when transporting
to databases that don't have an equivalent of this type.
The GUID/UUID is not a text in either PostgreSQL or SQL Server system. They are native typs in both systems and are stored using 16 bytes.
PostgreSQL does although have auto casts built in for converting uuids to text where as SQL Server requires an explicit cast to varchar.
As a side note: SQL Server does not support casting these to text even in SQL Server 2008 but PostgreSQL will happily cast to text or varchar.
Just something to keep in mind if you need to create code that is compatible in both systems.
To demonstrate :
In SQL Server: If we try to do this:
SELECT NEWID() + '1234';
We get an error:
The data types uniqueidentifier and varchar are incompatible in the add operator.
However we can do this:
SELECT CAST(NEWID() As varchar(50)) + '1234';
which yields:
372F2ED9-37B0-4719-BC06-7ED9D730052A1234
As with most PostgreSQL contribs, the uuid-osp module can be installed in your PostgreSQL databases by running the SQL script share/contrib/uuid-ossp.sql which is located in your PostgreSQL install folder.
Similar example in PostgreSQL using the uuid-ossp contrib module and the ANSI-SQL standard || concatenate operator (NOTE: SQL Server uses + for concatenation):
SELECT uuid_generate_v1() || '1234'
Yields: c3eeeb64-cd6c-11df-a41f-1b8d709f51b91234
We try to avoid relying on automatic casts whereever possible, just because you may get caught off guard if there are two automatic casts that
are equally acceptible or there is no automatic cast at all in place. So to write the PostgreSQL in a more safe way and also to behave more like the
SQL Server implementation, we would do
SELECT CAST(uuid_generate_v1() AS varchar(50)) || '1234'
Using Unique identifiers as row ids
There are 2 general ways of assigning unique identifiers to rows in SQL Server and PostgreSQL, and a 3rd way specific to SQL Server.
- Explicit assignment in a stored proc or code: using SQL Server NEWID() function or one of PostgreSQL's uuid-osp contrib functions take your pick: uuid_generate_v1(), uuid_generate_v1mc(), uuid_generate_v4() functions or build your own.
- Setting the default value of the column to in SQL Server - SQL Server NEWID() or one of PostgreSQL's menu of uuid generator functions or build your own for either system.
- In SQL Server -- setting data type to uniqueidentifier and is RowGUID property of the column to Yes (similar in MS Access -- choosing data type AutoNumber and Size to ReplicationID): This last option only apply's to SQL Server and is really syntactic sugar for option 2.
Now I'm not sure what algorithm SQL Server uses in its NEWID(). Could be completely random in which case its most like PostgreSQL's uuid_generate_v4() function. At anyrate it doesn't really matter too much
since all approaches will more or less guarantee unique identifiers across systems following the standard Unique Identifier form. PostgreSQL's uuid_generate_v1() generator is not completely random and generates a UUID that contains part of the
Mac Address of the generating machine. This is nice in some sense because you could tie a record back to the machine that generated it, or not so nice from a security standpoint because it exposes a bit the identity of the machine that generated it.
I tend to think of it as a nice informational feature particularly if you have several office servers and want to know the office that generated the record.
Below are some examples of using the UUID/GUID to assign identifiers in both systems. We'll cover both using the native type as well as using a text
representation for ease of portability across systems that don't support the native types.
Operation | PostgreSQL | SQL Server |
Create a table that has a UUID/GUID type automatically assigned |
CREATE TABLE equipment(
equip_id uuid PRIMARY KEY
DEFAULT uuid_generate_v1() ,
equip_name varchar(255));
|
CREATE TABLE equipment(
equip_id uniqueidentifier PRIMARY KEY
DEFAULT NEWID() ,
equip_name varchar(255));
CREATE TABLE equipment(
equip_id uniqueidentifier
PRIMARY KEY ROWGUIDCOL ,
equip_name varchar(255) );
|
Create a table that has text representation of UUID/GUID as primary for easier transportation |
CREATE TABLE equipment(
equip_id char(32) PRIMARY KEY
DEFAULT LOWER(
REPLACE(
CAST(uuid_generate_v1() As varchar(50))
, '-','')
) ,
equip_name varchar(255));
|
CREATE TABLE equipment(
equip_id char(32) PRIMARY KEY
DEFAULT LOWER(
REPLACE(
CAST(NEWID() As varchar(50)), '-','')
) ,
equip_name varchar(255));
|
Encapsulate text id generator in a function |
CREATE SCHEMA myfuncs;
CREATE FUNCTION myfuncs.fn_mynewid()
RETURNS
char(32) AS
$$
SELECT LOWER(
REPLACE(
CAST(uuid_generate_v1() As varchar(50))
, '-','')
);
$$
language 'sql';
CREATE TABLE equipment(
equip_id char(32) PRIMARY KEY
DEFAULT myfuncs.fn_mynewid(),
equip_name varchar(255));
|
Invalid use of a side-effecting
operator 'newid' within a function.
CREATE SCHEMA myfuncs
GO
CREATE VIEW myfuncs.MyNewID
AS
SELECT LOWER(
REPLACE(
CAST(NEWID() As varchar(50))
, '-','')
) AS MyNewID
GO
CREATE FUNCTION myfuncs.fn_MyNewID ()
RETURNS char(32)
AS
BEGIN
RETURN (SELECT MyNewID FROM myfuncs.MyNewID)
END
GO
CREATE TABLE equipment(
equip_id char(32) PRIMARY KEY
DEFAULT myfuncs.fn_mynewid(),
equip_name varchar(255))
GO
|
In the very last example, since we created wrapper functions with the same name, we can use the same application code base for both systems
even if we want the identifier to be generated in our application for easier retrieval.
Now to take each for a test drive using our last variant:
Action | PostgreSQL Output | SQL Server Output |
SELECT myfuncs.fn_mynewid(); | 8a0be412d21511df8757a3f328265df0 | 2b80fe3543e44fe9a378215d6e10fe4a |
INSERT INTO equipment(equip_name)
VALUES('Fax');
SELECT * FROM equipment;
| equip_id | equip_name
----------------------------------+------------
51d4ca36d21611df8448a3b219d463c5 | Fax | equip_id equip_name
-------------------------------- -------------
9c2f44bfa60546b7ab1d95253c6070a2 Fax |
Tracked: Oct 07, 21:50