Postgres OnLine Journal: January / December 2025
An in-depth Exploration of the PostgreSQL Open Source Database
 

Table Of Contents

Basics

Basics

 

Unpivoting data using JSONB Beginner



One of my favorite uses of JSONB functionality is to unpivot data.

Let's say you were handed a dataset with this structure where each type of bed column holds the count of units of that kind.


CREATE TABLE housing_data (
    neighborhood VARCHAR(50),
    studio INTEGER,
    br1 INTEGER,
    br2 INTEGER,
    br3 INTEGER,
    br4 INTEGER
);

INSERT INTO housing_data (neighborhood, studio, br1, br2, br3, br4)
VALUES ('Downtown', 500, 800, 1000, 1200, 1500),
       ('Midtown', 700, 950, 1150, 1400, 1700),
       ('Uptown', 600, 800, 1000, 1200, 1500),
       ('Suburbs', 300, 450, 600, 750, 900);

For your reporting needs, you would like each bed type count to be a separate row. Here is one of my favorite tricks for accomplishing this:

SELECT neighborhood, replace(je.key,'br','') AS bed_type, je.value::integer AS num_units
FROM housing_data AS d 
	CROSS JOIN jsonb_each_text( to_jsonb(d) - 'neighborhood') AS je;

The assumption here is that every column except for neighborhood is a bedroom unit count. The output is:

 neighborhood | bed_type | num_units
--------------+----------+-----------
 Downtown     | 1        |       800
 Downtown     | 2        |      1000
 Downtown     | 3        |      1200
 Downtown     | 4        |      1500
 Downtown     | studio   |       500
 Midtown      | 1        |       950
 Midtown      | 2        |      1150
 Midtown      | 3        |      1400
 Midtown      | 4        |      1700
 Midtown      | studio   |       700
 Uptown       | 1        |       800
 Uptown       | 2        |      1000
 Uptown       | 3        |      1200
 Uptown       | 4        |      1500
 Uptown       | studio   |       600
 Suburbs      | 1        |       450
 Suburbs      | 2        |       600
 Suburbs      | 3        |       750
 Suburbs      | 4        |       900
 Suburbs      | studio   |       300
(20 rows)
(20 rows)

If you had more than one column that wasn't a bedroom count you could revise your subtraction, to exclude that too using revised: to_jsonb(d) - 'neighborhood' - 'other_column', alternatively you could use array construct to_jsonb(d) - ARRAY['neighborhood','other_column']. This trick can also be accomplished using the hstore extension and converting a row to an hstore and using the each function of hstore.


Basics

 

Learning PostgreSQL from AI and JSON exploration Beginner



After reading Paul's teasing article on Accessing Large Language Models with AI and testing out his code, I decided I should probably stop fighting this AI smothering. I still have a distrust of AI services but downloading an AI model and using on my own local desktop or server is nice and I can break bread with that. One use I tried is using AI to generate fake data and it did a pretty decent job. I also learned a couple of things.

For these exercises I'm using my favorite AI model at least for PostgreSQL work so far phi4 which you can get here Phi 4, the 14b 9.1GB seemed decent enough and answered questions in under 5-20 seconds for most of my questions on my 4 year old graphics card, 20 cpu, 64GB ram desktop . I learned a bunch of things from my excursion down this AI path.

Lessons Learned

  • AI can tell you what features your software is missing by the mistakes it makes.
  • AI is good for generating dummy data for your exercises.
  • AI sometimes throws in stuff you never asked for.
  • Ask the same question that has more than one answer, expect to get a different answer each time.
  • AI makes mistakes but in following it along, feeding it error messages, it can correct itself and you too can learn something following it's error paths.

I shall at least for the rest of this article call my side-kick Aesop cause calling it phi4 feels a bit alien.

Figure out the stretch of your AI's knowledge of PostgreSQL

If you are going to trust advice from anything especially on a topic as important as PostgreSQL, you should get a sense of its breadth of knowledge. Just like how I run:

SELECT version();

on a PostgreSQL server to know what neat tricks are at my disposal, I do the same with Aesop.

Me:
What is the latest version of PostgreSQL

To which I am greeted with

Aesop:
As of my last update in October 2023, the latest stable release of PostgreSQL was version 15. However, please verify this information from an official source such as the PostgreSQL website or other trusted resources, as new versions might have been released since then. PostgreSQL frequently updates its software with improvements and new features, so checking their official announcements is always a good practice to ensure you have the most current information.

Creating fake data

First question I asked Aesop:

Create a PostgreSQL table with a jsonb column that has heavily nested data that has 2 rows covering trips and make sure to use GENERATED BY instead of serial.

I threw in the requirement about GENERATED BY cause the first pass, Aesop threw me back a CREATE table that used the old SERIAL syntax. With this GENERATED BY qualification it gave me this:

Aesop:
To create a PostgreSQL table where the primary key is generated using `GENERATED ALWAYS AS IDENTITY` instead of `SERIAL`, you can follow these steps. This approach, introduced in PostgreSQL 10, allows for more flexibility and control over identity columns.

CREATE TABLE trips (
    trip_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id INTEGER NOT NULL,  -- Assuming a reference to a users table
    trip_name VARCHAR(255) NOT NULL,
    trip_data JSONB,           -- Column for storing nested JSON data
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

It was nice that Aesop warned me this will only work for PostgreSQL 10 and above, cause if you are creating software that needs to work with PostgreSQL 9.6, this will not do. So I was excited, glancing at this ready to use code that Gasp DOES NOT WORK. What is wrong with this code exactly? Try running it in PostgreSQL and you'll see the error. I mean it's even nicely commented and formatted.

Poor Aesop was a little confused and threw in some MySQL goodness in there. Stuff that MySQL has that PostgreSQL doesn't but perhaps might be a nice feature for those folks too lazy to create an update trigger for updated_at.

Which begs the question: Oh why or why does my PostgreSQL not have an ON UPDATE clause for CREATE TABLE.

So commenting out that piece we have a working create table now


CREATE TABLE trips (
    trip_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id INTEGER NOT NULL,  -- Assuming a reference to a users table
    trip_name VARCHAR(255) NOT NULL,
    trip_data JSONB,           -- Column for storing nested JSON data
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- ON UPDATE CURRENT_TIMESTAMP
);

Then Aesop provided a statement to stuff in some juicy trip data, exactly 2 rows as I had asked for and it worked without any changes.


INSERT INTO trips (user_id, trip_name, trip_data)
VALUES
(
    1,
    'Summer Vacation',
    jsonb_build_object(
        'destination', jsonb_build_object(
            'city', 'Paris',
            'country', 'France'
        ),
        'activities', jsonb_build_array(
            jsonb_build_object('name', 'Eiffel Tower Visit', 'date', '2023-07-15'),
            jsonb_build_object('name', 'Louvre Museum Tour', 'date', '2023-07-16')
        ),
        'start_date', '2023-07-14',
        'end_date', '2023-07-20'
    )
),
(
    2,
    'Winter Getaway',
    jsonb_build_object(
        'destination', jsonb_build_object(
            'city', 'Zurich',
            'country', 'Switzerland'
        ),
        'activities', jsonb_build_array(
            jsonb_build_object('name', 'Lake Zurich Cruise', 'date', '2023-12-10'),
            jsonb_build_object('name', 'Museum Visits', 'date', '2023-12-11')
        ),
        'start_date', '2023-12-09',
        'end_date', '2023-12-15'
    )
);

But Let me ask the same question again, because in all good software, when you ask the same question that has several answers, you expect to get a different but correct answer each time.

And voila what do you have here, a different way of inserting json data into a PostgreSQL table. Further more, maybe Aesop got a bit tired of being asked the same question so decided, I really didn't need those user_id and timestamp columns I NEVER ASKED FOR in the first place.

>>> Create a PostgreSQL table with a jsonb column that has heavily nested data that has 2 rows covering trips and make sure to use GENERATED BY instead of serial.

Aesop:

Certainly! Below is an example of how you can create a PostgreSQL table using the `GENERATED ALWAYS AS IDENTITY` feature for automatic sequence generation, along with a highly nested `jsonb` column to store detailed trip data.

See not even a nice commented column, very minimalistic. Aesop is clearly tired of my repeated questions.


CREATE TABLE trips (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    trip_data JSONB NOT NULL
);

-- Inserting two rows into the table with heavily nested JSON data

INSERT INTO trips (trip_data) VALUES
('{
  "name": "Trip to Paris",
  "details": {
    "start_date": "2023-09-15",
    "end_date": "2023-09-20",
    "itinerary": [
      {
        "day": 1,
        "activities": ["Arrival at Charles de Gaulle Airport", "Check-in at Hotel"],
        "accommodation": {
          "hotel_name": "Le Meurice",
          "rating": 5
        }
      },
      {
        "day": 2,
        "activities": ["Visit Eiffel Tower", "Lunch at Café de Flore"],
        "notes": "Remember to book Eiffel Tower tickets in advance."
      },
      {
        "day": 3,
        "activities": ["Explore Louvre Museum"],
        "accommodation": {
          "hotel_name": "Le Meurice",
          "amenities": ["Spa", "Gym"]
        }
      }
    ]
  }
}');

INSERT INTO trips (trip_data) VALUES
('{
  "name": "Trip to Tokyo",
  "details": {
    "start_date": "2023-11-01",
    "end_date": "2023-11-10",
    "itinerary": [
      {
        "day": 1,
        "activities": ["Arrival at Narita Airport", "Check-in at Ryokan"],
        "accommodation": {
          "ryokan_name": "Tawaraya",
          "rating": 5
        }
      },
      {
        "day": 2,
        "activities": ["Visit Tokyo Tower", "Explore Asakusa Temple"],
        "notes": "Try some sushi for lunch."
      },
      {
        "day": 3,
        "activities": ["Shop at Shibuya Crossing", "Dinner at Ramen Street"],
        "accommodation": {
          "ryokan_name": "Tawaraya",
          "amenities": ["Hot Springs", "Private Baths"]
        }
      }
    ]
  }
}');

I got Aesop to create dense data, now to have it query some meaningful bits out of it.

Me:
Write a PostgreSQL SQL statement that returns the day and activity of each trip one row per activity.

And alas an answer but this is perhaps one of the uglier ways of doing this, more on that in a different article.


SELECT
    id AS trip_id,
    (jsonb_array_elements_text(trip_data->'details'->'itinerary')::jsonb ->> 'day')::integer AS day,
    jsonb_array_elements_text((jsonb_array_elements_text(trip_data->'details'->'itinerary')::jsonb -> 'activities')) AS activity
FROM
    trips;

I also asked Aesop more complicated JSON query questions which it failed at initially but with feeding it errors returned from PostgreSQL it reworked its answersand gve me code that at least would run.


Basics

 

PG 17 new random functions Beginner



Have you ever wanted to get a random integer between 1 and 10 and been a little annoyed the slightly cryptic code you had to write in PostgreSQL? PostgreSQL 17 random functions make that simpler. Sometimes it's the small changes that bring the most joy.

Random over the years

The random() function has largely stayed stagnant for a long time. In the before times, if you needed a random number, all you had to work with was a single function random() that took no arguments and returned a double precision number that is >= 0 and < 1. When have you ever needed a random double precision number that is >= 0 and < 1? Probably never, but it did provide good ammunition to get to your real task. So how do you create a random integer between 1 and 10 in pre-PG17?

One way to do it in pre PG17 era is.

SELECT (1 + random()*10)::integer

In PostgreSQL 16 we were blessed with yet another random family function, the random_normal that you need a degree in statistics to fully appreciate written by a guy with a degree in statistics standing by with Rolling dice with postgres random function to explain this marvelous function complete with diagrams and pictures because you really need pictures and diagrams to understand this.

Random in PG 17

Finally in PG 17, we were blessed with a very intuitive and short way to answer the question that 80% of the time is what we wanted a random function for in the first place. Return an integer between 1 and 10:

SELECT random(1,10);

Not only is this code shorter, it is also clearer in intent.

Now if that were not sweet enough, suppose you wanted to create random money amounts. Money at least in US has only 2 decimals. How would you create a random amount between 1 and 10 dollars

In Pre PG 17 era, I'd do this

SELECT (1 + random()*10)::numeric(4,2);

In PG 17, I can do this:

SELECT random(1.00,10.00);

Which would return a number such as 5.79, that will always have exactly 2 decimals. If your two numbers had different number of decimals, it would return the greatest number of decimals of either number. So:

SELECT random(1.00, 10.000);

would return a number like 9.657


Basics

 

Converting JSON documents to relational tables Beginner



JSON is one of the most popular ways of disseminating data between systems. It is probably the most common offered by webservices. PostgreSQL is a database perfectly suited for grabbing that data and transforming it into a more structured relational format. All this can be done directly in the database. We'll go over some ways to load and restructure json data.

Before starting, create a temporary table to store the json:

	CREATE TEMP TABLE data_json(data jsonb);

Loading Data using pg_read_file

My favorite way of reading text files I just want to stuff into a single field is pg_read_file. pg_read_file is a server side function that allows you to read all of the file or a portion of the file. There are a couple of caveats for it's use.

  • Requires super user or being member of pg_read_server_files group role
  • The file you are reading must be readable by the postgres server process
  • There is a limitation on read size and you are also restricted on how big of a file you can stuff in a column. For most files I've worked with, this has not been an issue.

There is a companion function called pg_read_binary_file for reading data in binary format or for reading text in a specific encoding.

To demonstrate, download: Boston Public Schools json format and put it in C:/temp folder. If you are on Linux would be a path such as /tmp and path references in this doc change C:/Temp to /tmp

Note that this is a GeoJSON file, which means if you had PostGIS installed you could do interesting things with this. But for this exercise, I'm going to treat it like any JSON file. I should also note that this approach doesn't work for big files that can't fit into a single column.

INSERT INTO data_json(data)
SELECT pg_read_file('C:/temp/public_schools.geojson.json')::jsonb;

Using jsonb_array_elements to expand rows

There are a couple of ways of expanding a JSON dataset into rows. The oldest way that works since PostgreSQL 9.3 is the jsonb_array_elements. You can combine this with the ->> and -> json operators to select properties. Here is how we do it with the sample dataset that is of geojson structure

CREATE TABLE boston_public_schools AS 
SELECT (je->'id')::bigint AS id, 
	(je->'geometry'->'coordinates'->>0)::float AS longitude,
	(je->'geometry'->'coordinates'->>1)::float AS latitude,
	je->'properties'->>'SCH_NAME' AS sch_name,
	je->'properties'->>'ADDRESS' AS address,
	je->'properties'->>'CITY' AS city,
	je->'properties'->>'ZIPCODE' AS zipcode
FROM data_json 
	CROSS JOIN jsonb_array_elements(data_json.data->'features') AS je;

There are 3 features being used in this example, first we are using the -> operator. This operator when applied to a jsonb or json returns back the property as a jsonb or json element. Note that you can burrow into a document by nesting these operator calls as we do with je->'geometry'->'coordinates'->>1

The companion to -> is the ->> operator which returns text instead of a json. You use this when you are done with your burrowing.

Both -> and ->> can take a text or an integer. The integer version is used only for json arrays and returns the nth element of the array. Counting of arrays in JavaScript and by extension JSON starts at 0.

So with these operators you pick out pieces of a json document, but before we do that, we'll want to expand a json document into it's separate rows. For geojson documents, there is always a features property which is an array with each element being a data row.

To break up these rows, you can use jsonb_array_elements, which is a set returning element that only works with jsonb formatted arrays and returns each element of the array as a jsonb object.

Your table should end up looking like this

SELECT * FROM boston_public_schools LIMIT 3;
id |     longitude      |     latitude      |       sch_name       |       address       |    city     | zipcode
----+--------------------+-------------------+----------------------+---------------------+-------------+---------
  1 | -71.00412000099993 | 42.38879000000003 | Guild Elementary     | 195 Leyden Street   | East Boston | 02128
  2 | -71.03047970999995 | 42.37853662100008 | Kennedy Patrick Elem | 343 Saratoga Street | East Boston | 02128
  3 | -71.03389000099997 | 42.37527000000006 | Otis Elementary      | 218 Marion Street   | East Boston | 02128
(3 rows)

Using JSON_TABLE to expand rows and columns

Introduced in PostgreSQL 17 is the ISO-SQL standard JSON_TABLE function, which if you are familar with XML follows more or less the same pattern as XMLTABLE. It utilizes json path syntax to parse out elements. Here is a repeat of the earlier exercise using JSON_TABLE.


DROP TABLE IF EXISTS boston_public_schools;
CREATE TABLE boston_public_schools AS
SELECT
    je.*
FROM
    data_json
CROSS JOIN
    JSON_TABLE (
        data_json.data,
        '$.features[*]' COLUMNS (
            id integer PATH '$.id',
            longitude float PATH '$.geometry.coordinates[0]',
            latitude float PATH '$.geometry.coordinates[1]',
            NESTED PATH '$.properties' COLUMNS (
                sch_name text PATH '$.SCH_NAME',
                address text PATH '$.ADDRESS',
                city text PATH '$.CITY',
                zipcode text PATH '$.ZIPCODE'
            )
        )
    ) AS je;

This is a pretty rich function, so you should read the docs to appreciate it's full breath. The fact it's an ISO/SQL standard function means you are more likely to find it in other relational databases. If you have a heavily nested document, the NESTED PATH subclause comes in handy for saving some typing. The main thing I don't like about it is I find it harder to comprehend and remember than the earlier syntax.