Universal Unique Identifiers are 16-byte / 32-hexadecimal digit (with 4 -s for separation) identifiers standardized by the Open Software Foundation.
The main use as far as databases go is to ensure uniqueness of keys across databases. This is important if you have multiple servers or disperate systems that need to replicate or share data and each can generate data on its own end. You want some non-centralized mechanism to ensure the ids generated from each server will never overlap. There are various open standards for generating these ids and each standard will tie the id based on some unique identifier of the computer or a namespace or just a purely random generator algorithm not tied to anything. Since this is a question often asked by users coming from Microsoft SQL Server, we will demonstrate in this article the same concept in Microsoft SQL Server and how you would achieve similar functionality in PostgreSQL.
I'm not sure how popular they are to use in other databases, but in SQL Server (and even in Microsoft Access), they are very popular, particularly if you need to synchronize data with various offices. If you come from Microsoft world, you probably know UUIDs by the term GUIDs, RowGUID, or data type uniqueidentifier or the thing that the NEWID() function in SQL Server generates, or in Microsoft Access when you choose AutoNumber and FieldSize = Replication ID. All the aforementioned use the standard 32-hexadecimal digit (with -) unique identifier. You may be wondering how you could get the same functionality in PostgreSQL.
PostgreSQL offers similar functionality via the uuid datatype. In additon there is a contrib module uuid-osp which provides various functions for auto-generating uuids. You can use these functions as default values in your table for columns that you need to have unique identifiers or just call them directly similar to what you do with SQL Server's NEWID() function. The main difference between the two is that SQL Server's text representation of the GUID is sometimes shown with {} (though not always) and sometimes as uppercase letters for presentation. PostgreSQL's native display representation doesn't have {} and usually uses lowercase letters. These are merely display representations (and can even be different depending on which display tool you are using). The display representation vs the actual type causes all sorts of confusion when people try to allocate space for them in there systems. SQL Server calls the type UniqueIdentifier and PostgreSQL calls the type uuid. Both types occupy 16-bytes of storage. For compatibility reasons with other software or databases, many use some stanardized text representation of them particularly for transport rather than using the native type. The text representation may require anywhere from 32 to 38 bytes (characters) contingent on if you add dashes or {}. Thus the confusion in storage requirements particularly when transporting to databases that don't have an equivalent of this type.
The GUID/UUID is not a text in either PostgreSQL or SQL Server system. They are native typs in both systems and are stored using 16 bytes. PostgreSQL does although have auto casts built in for converting uuids to text where as SQL Server requires an explicit cast to varchar.
As a side note: SQL Server does not support casting these to text even in SQL Server 2008 but PostgreSQL will happily cast to text or varchar. Just something to keep in mind if you need to create code that is compatible in both systems.
To demonstrate :
In SQL Server: If we try to do this:
SELECT NEWID() + '1234';
We get an error:
The data types uniqueidentifier and varchar are incompatible in the add operator.
However we can do this:
SELECT CAST(NEWID() As varchar(50)) + '1234';
which yields: 372F2ED9-37B0-4719-BC06-7ED9D730052A1234
As with most PostgreSQL contribs, the uuid-osp module can be installed in your PostgreSQL databases by running the SQL script share/contrib/uuid-ossp.sql which is located in your PostgreSQL install folder.
Similar example in PostgreSQL using the uuid-ossp contrib module and the ANSI-SQL standard || concatenate operator (NOTE: SQL Server uses + for concatenation):
SELECT uuid_generate_v1() || '1234'
Yields: c3eeeb64-cd6c-11df-a41f-1b8d709f51b91234
We try to avoid relying on automatic casts whereever possible, just because you may get caught off guard if there are two automatic casts that are equally acceptible or there is no automatic cast at all in place. So to write the PostgreSQL in a more safe way and also to behave more like the SQL Server implementation, we would do
SELECT CAST(uuid_generate_v1() AS varchar(50)) || '1234'
There are 2 general ways of assigning unique identifiers to rows in SQL Server and PostgreSQL, and a 3rd way specific to SQL Server.
Now I'm not sure what algorithm SQL Server uses in its NEWID(). Could be completely random in which case its most like PostgreSQL's uuid_generate_v4() function. At anyrate it doesn't really matter too much since all approaches will more or less guarantee unique identifiers across systems following the standard Unique Identifier form. PostgreSQL's uuid_generate_v1() generator is not completely random and generates a UUID that contains part of the Mac Address of the generating machine. This is nice in some sense because you could tie a record back to the machine that generated it, or not so nice from a security standpoint because it exposes a bit the identity of the machine that generated it. I tend to think of it as a nice informational feature particularly if you have several office servers and want to know the office that generated the record.
Below are some examples of using the UUID/GUID to assign identifiers in both systems. We'll cover both using the native type as well as using a text representation for ease of portability across systems that don't support the native types.
Operation | PostgreSQL | SQL Server |
---|---|---|
Create a table that has a UUID/GUID type automatically assigned |
|
|
Create a table that has text representation of UUID/GUID as primary for easier transportation |
|
|
Encapsulate text id generator in a function |
|
|
In the very last example, since we created wrapper functions with the same name, we can use the same application code base for both systems even if we want the identifier to be generated in our application for easier retrieval.
Now to take each for a test drive using our last variant:
Action | PostgreSQL Output | SQL Server Output |
---|---|---|
SELECT myfuncs.fn_mynewid(); | 8a0be412d21511df8757a3f328265df0 | 2b80fe3543e44fe9a378215d6e10fe4a |
| equip_id | equip_name ----------------------------------+------------ 51d4ca36d21611df8448a3b219d463c5 | Fax | equip_id equip_name -------------------------------- ------------- 9c2f44bfa60546b7ab1d95253c6070a2 Fax |