Learning PostgreSQL from AI and JSON exploration

After reading Paul's teasing article on Accessing Large Language Models with AI and testing out his code, I decided I should probably stop fighting this AI smothering. I still have a distrust of AI services but downloading an AI model and using on my own local desktop or server is nice and I can break bread with that. One use I tried is using AI to generate fake data and it did a pretty decent job. I also learned a couple of things.

For these exercises I'm using my favorite AI model at least for PostgreSQL work so far phi4 which you can get here Phi 4, the 14b 9.1GB seemed decent enough and answered questions in under 5-20 seconds for most of my questions on my 4 year old graphics card, 20 cpu, 64GB ram desktop . I learned a bunch of things from my excursion down this AI path.

Lessons Learned

I shall at least for the rest of this article call my side-kick Aesop cause calling it phi4 feels a bit alien.

Figure out the stretch of your AI's knowledge of PostgreSQL

If you are going to trust advice from anything especially on a topic as important as PostgreSQL, you should get a sense of its breadth of knowledge. Just like how I run:

SELECT version();

on a PostgreSQL server to know what neat tricks are at my disposal, I do the same with Aesop.

Me:
What is the latest version of PostgreSQL

To which I am greeted with

Aesop:
As of my last update in October 2023, the latest stable release of PostgreSQL was version 15. However, please verify this information from an official source such as the PostgreSQL website or other trusted resources, as new versions might have been released since then. PostgreSQL frequently updates its software with improvements and new features, so checking their official announcements is always a good practice to ensure you have the most current information.

Creating fake data

First question I asked Aesop:

Create a PostgreSQL table with a jsonb column that has heavily nested data that has 2 rows covering trips and make sure to use GENERATED BY instead of serial.

I threw in the requirement about GENERATED BY cause the first pass, Aesop threw me back a CREATE table that used the old SERIAL syntax. With this GENERATED BY qualification it gave me this:

Aesop:
To create a PostgreSQL table where the primary key is generated using `GENERATED ALWAYS AS IDENTITY` instead of `SERIAL`, you can follow these steps. This approach, introduced in PostgreSQL 10, allows for more flexibility and control over identity columns.

CREATE TABLE trips (
    trip_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id INTEGER NOT NULL,  -- Assuming a reference to a users table
    trip_name VARCHAR(255) NOT NULL,
    trip_data JSONB,           -- Column for storing nested JSON data
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

It was nice that Aesop warned me this will only work for PostgreSQL 10 and above, cause if you are creating software that needs to work with PostgreSQL 9.6, this will not do. So I was excited, glancing at this ready to use code that Gasp DOES NOT WORK. What is wrong with this code exactly? Try running it in PostgreSQL and you'll see the error. I mean it's even nicely commented and formatted.

Poor Aesop was a little confused and threw in some MySQL goodness in there. Stuff that MySQL has that PostgreSQL doesn't but perhaps might be a nice feature for those folks too lazy to create an update trigger for updated_at.

Which begs the question: Oh why or why does my PostgreSQL not have an ON UPDATE clause for CREATE TABLE.

So commenting out that piece we have a working create table now


CREATE TABLE trips (
    trip_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id INTEGER NOT NULL,  -- Assuming a reference to a users table
    trip_name VARCHAR(255) NOT NULL,
    trip_data JSONB,           -- Column for storing nested JSON data
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- ON UPDATE CURRENT_TIMESTAMP
);

Then Aesop provided a statement to stuff in some juicy trip data, exactly 2 rows as I had asked for and it worked without any changes.


INSERT INTO trips (user_id, trip_name, trip_data)
VALUES
(
    1,
    'Summer Vacation',
    jsonb_build_object(
        'destination', jsonb_build_object(
            'city', 'Paris',
            'country', 'France'
        ),
        'activities', jsonb_build_array(
            jsonb_build_object('name', 'Eiffel Tower Visit', 'date', '2023-07-15'),
            jsonb_build_object('name', 'Louvre Museum Tour', 'date', '2023-07-16')
        ),
        'start_date', '2023-07-14',
        'end_date', '2023-07-20'
    )
),
(
    2,
    'Winter Getaway',
    jsonb_build_object(
        'destination', jsonb_build_object(
            'city', 'Zurich',
            'country', 'Switzerland'
        ),
        'activities', jsonb_build_array(
            jsonb_build_object('name', 'Lake Zurich Cruise', 'date', '2023-12-10'),
            jsonb_build_object('name', 'Museum Visits', 'date', '2023-12-11')
        ),
        'start_date', '2023-12-09',
        'end_date', '2023-12-15'
    )
);

But Let me ask the same question again, because in all good software, when you ask the same question that has several answers, you expect to get a different but correct answer each time.

And voila what do you have here, a different way of inserting json data into a PostgreSQL table. Further more, maybe Aesop got a bit tired of being asked the same question so decided, I really didn't need those user_id and timestamp columns I NEVER ASKED FOR in the first place.

>>> Create a PostgreSQL table with a jsonb column that has heavily nested data that has 2 rows covering trips and make sure to use GENERATED BY instead of serial.

Aesop:

Certainly! Below is an example of how you can create a PostgreSQL table using the `GENERATED ALWAYS AS IDENTITY` feature for automatic sequence generation, along with a highly nested `jsonb` column to store detailed trip data.

See not even a nice commented column, very minimalistic. Aesop is clearly tired of my repeated questions.


CREATE TABLE trips (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    trip_data JSONB NOT NULL
);

-- Inserting two rows into the table with heavily nested JSON data

INSERT INTO trips (trip_data) VALUES
('{
  "name": "Trip to Paris",
  "details": {
    "start_date": "2023-09-15",
    "end_date": "2023-09-20",
    "itinerary": [
      {
        "day": 1,
        "activities": ["Arrival at Charles de Gaulle Airport", "Check-in at Hotel"],
        "accommodation": {
          "hotel_name": "Le Meurice",
          "rating": 5
        }
      },
      {
        "day": 2,
        "activities": ["Visit Eiffel Tower", "Lunch at Café de Flore"],
        "notes": "Remember to book Eiffel Tower tickets in advance."
      },
      {
        "day": 3,
        "activities": ["Explore Louvre Museum"],
        "accommodation": {
          "hotel_name": "Le Meurice",
          "amenities": ["Spa", "Gym"]
        }
      }
    ]
  }
}');

INSERT INTO trips (trip_data) VALUES
('{
  "name": "Trip to Tokyo",
  "details": {
    "start_date": "2023-11-01",
    "end_date": "2023-11-10",
    "itinerary": [
      {
        "day": 1,
        "activities": ["Arrival at Narita Airport", "Check-in at Ryokan"],
        "accommodation": {
          "ryokan_name": "Tawaraya",
          "rating": 5
        }
      },
      {
        "day": 2,
        "activities": ["Visit Tokyo Tower", "Explore Asakusa Temple"],
        "notes": "Try some sushi for lunch."
      },
      {
        "day": 3,
        "activities": ["Shop at Shibuya Crossing", "Dinner at Ramen Street"],
        "accommodation": {
          "ryokan_name": "Tawaraya",
          "amenities": ["Hot Springs", "Private Baths"]
        }
      }
    ]
  }
}');

I got Aesop to create dense data, now to have it query some meaningful bits out of it.

Me:
Write a PostgreSQL SQL statement that returns the day and activity of each trip one row per activity.

And alas an answer but this is perhaps one of the uglier ways of doing this, more on that in a different article.


SELECT
    id AS trip_id,
    (jsonb_array_elements_text(trip_data->'details'->'itinerary')::jsonb ->> 'day')::integer AS day,
    jsonb_array_elements_text((jsonb_array_elements_text(trip_data->'details'->'itinerary')::jsonb -> 'activities')) AS activity
FROM
    trips;

I also asked Aesop more complicated JSON query questions which it failed at initially but with feeding it errors returned from PostgreSQL it reworked its answersand gve me code that at least would run.