Postgres OnLine Journal: Jan 2022 - December 2023
An in-depth Exploration of the PostgreSQL Open Source Database
 

Table Of Contents

PostgreSQL Q & A
Basics
Using PostgreSQL Extensions

PostgreSQL Q & A

 

Creating cumulative sums by combining aggregation with windowing



When you want to include a running sum for each line of data in your query, you generally use a window clause with SUM. What if you don't want to list all line items of your data. You want a report that gives you a weekly sum and another that gives you the running sum for the whole year. What do you do then? We'll demonstrate how to do that.

The sample data

Let's start by creating a table with some sales data.

CREATE TABLE sales(id integer GENERATED ALWAYS AS IDENTITY,
		 product varchar(100),
		order_date date, quantity integer, 
		CONSTRAINT pk_sales PRIMARY KEY(id) 
		);
INSERT INTO sales(product, order_date, quantity)
WITH d AS( SELECT f.product, d.ts::date, 
	mod(date_part('epoch', d.ts)::integer, f.ord + 73) AS quantity
   FROM unnest(ARRAY['pen', 'book']) WITH ORDINALITY AS f(product,ord)
	 CROSS JOIN generate_series('2021-11-01'::timestamp, 
			'2022-01-07'::timestamp, interval '1.2 hours') AS d(ts)
   WHERE mod(date_part('epoch', d.ts)::integer,7) = 1
         )
SELECT *
FROM d WHERE quantity > 0;

Monthly sum

A basic monthly sum would look like this:

SELECT to_char(order_date, 'YYYY-MM') AS month, 
    product,  
    SUM(quantity)  As month_sales
FROM sales
GROUP BY product, month
ORDER BY product, month;

The Output of the above is:

  month  | product | month_sales
---------+---------+-------------
 2021-11 | book    |        2550
 2021-12 | book    |        2640
 2022-01 | book    |         510
 2021-11 | pen     |        3076
 2021-12 | pen     |        3134
 2022-01 | pen     |         614
(6 rows)

Cumulative Sum

But what if you want a sales to date as of the end of the month in addition to the monthly sum. How do you include that as an additional column?

SELECT to_char(order_date, 'YYYY-MM') AS month, 
    product,  
    SUM(quantity)  As month_sales, 
	SUM(SUM(quantity)) 
	  OVER(PARTITION BY product 
		ORDER BY to_char(order_date, 'YYYY-MM')) AS sales_to_date
FROM sales
GROUP BY product, month
ORDER BY product, month;

Which yields

  month  | product | month_sales | sales_to_date
---------+---------+-------------+---------------
 2021-11 | book    |        2550 |          2550
 2021-12 | book    |        2640 |          5190
 2022-01 | book    |         510 |          5700
 2021-11 | pen     |        3076 |          3076
 2021-12 | pen     |        3134 |          6210
 2022-01 | pen     |         614 |          6824
(6 rows)

Basics

 

VARIADIC Unnest



PostgreSQL keeps on adding new great stuff. It's hard to remember all the good stuff that has been added over the years. One of the neat ones from the past is the variadic Unnest function which I believe was introduced in PostgreSQL 9.4. It's rare that I ever had to use it, but today I was handed some data where this function was just literally what the doctor ordered. I can't do anything this sweet in other databases I have used.

Many people have seen the standard unnest used to unnest things like arrays or other component things. But did you know unnest is variadic so can take an unlimited number of arguments and keeps them balanced. If you have an array with 3 things and another with 2, you get a set with 3 things, not 6.

Observe the difference between this which gives you 6 rows


SELECT a, b
FROM unnest(ARRAY['a', 'b', 'c']) AS a, unnest(ARRAY[1,2]) AS b;

And this which gives you 3 rows


SELECT a, b
FROM unnest(ARRAY['a', 'b', 'c'], ARRAY[1,2] ) AS f(a, b);

I was handed a list of practices from a google spreadsheet where each row consisted of a single medical practice. And stuffed in those rows were all the doctors that worked in such a practice. But their names and titles were in different columns. It is not for me to ask why people enter data this way, but luckily PostgreSQL can make sense of it and explode it.

The raw data

My original table looked like this:


CREATE TABLE practices(practice_name text primary key, emp_name text, 
                                        emp_credentials text, emp_specialty text);
INSERT INTO practices(practice_name, emp_name, emp_credentials, emp_specialty)
VALUES 
('Seuss Pediatrics',  
'Theophrastus Seuss
Theodore Giesel', 
'MD
RN',
'Pediatrics
Oncology' ),
('County General',  
'Ben Casey
David Zorbal', 
'MD
MD PhD',
'Neurosurgery
Neurosurgery' ),
('Practice Of None', NULL, NULL, NULL);

Printed out looks like this:

 practice_name   |      emp_name      | emp_credentials | emp_specialty
------------------+--------------------+-----------------+---------------
 Seuss Pediatrics | Theophrastus Seuss+| MD             +| Pediatrics   +
                  | Theodore Giesel    | RN              | Oncology
 County General   | Ben Casey         +| MD             +| Neurosurgery +
                  | David Zorbal       | MD PhD          | Neurosurgery
 Practice Of None |                    |                 |
(3 rows)

Normalizing the data

And I needed to normalize it such that I have one person per row and have at least one row per practice even if no one works there.


SELECT p.practice_name, e.name, e.cred, e.spec
FROM practices AS p
    LEFT JOIN 
    LATERAL 
        UNNEST( string_to_array(emp_name, E'\n'),
                string_to_array(emp_credentials, E'\n'),
              string_to_array(emp_specialty, E'\n') )  AS e(name, cred, spec) ON true;

Which outputs a table that looks like this:

 practice_name   |        name        |  cred  |     spec
------------------+--------------------+--------+--------------
 Seuss Pediatrics | Theophrastus Seuss | MD     | Pediatrics
 Seuss Pediatrics | Theodore Giesel    | RN     | Oncology
 County General   | Ben Casey          | MD     | Neurosurgery
 County General   | David Zorbal       | MD PhD | Neurosurgery
 Practice Of None |                    |        |
(5 rows)

If you don't want to include practices with no staff, you can shorten the code a bit like so. Side note the , is short for CROSS JOIN. I like to spell it out, but people accuse me of being too pedantic about such things. Also note the below is still a LATERAL JOIN, but the word LATERAL is optional for functions.


SELECT p.practice_name, e.name, e.cred, e.spec
FROM practices AS p, 
        UNNEST( string_to_array(emp_name, E'\n'),
                string_to_array(emp_credentials, E'\n'),
              string_to_array(emp_specialty, E'\n') )  AS e(name, cred, spec) ;


Using PostgreSQL Extensions

 

PostgreSQL 15 64-bit for Windows FDWs



We are pleased to provide binaries for file_textarray_fdw and odbc_fdw for PostgreSQL 15 Windows 64-bit.

To use these, copy the files into your PostgreSQL 15 Windows 64-bit install folders in same named folders and then run CREATE EXTENSION as usual in the databases of your choice. More details in the packaged README.txt

This package contains the following FDWs:

  • odbc_fdw (version 0.5.2.3) for connecting to ODBC data sources such as SQL Server, Oracle, MS Access databases, and anything else that has a 64-bit ODBC driver. Note that since this is for PostgreSQL 64-bit, it can only use ODBC 64-bit connections.
  • file_textarray_fdw, great and fast for working with arbitrary and pesky delimited data. Especially where they weren't considerate enough to give you the same number of columns per row.

Note this package does not include ogr_fdw since ogr_fdw is packaged as part of PostGIS packages from EnterpriseDb Stackbuilder (for PostGIS >= 3.0) .

If you do not have PostGIS installed (and don't want to for some reason) and want to use ogr_fdw on windows, you can download from: Winnie's PG 15 Extras. ogr_fdw is a great FDW for querying not just spatial data, but also a ton of other file formats or relational (including odbc, dbase files, spreadsheets) since spatial is a superset.


Using PostgreSQL Extensions

 

PostgreSQL 16 64-bit for Windows FDWs



We are pleased to provide binaries for file_textarray_fdw and odbc_fdw for PostgreSQL 16 Windows 64-bit.

To use these, copy the files into your PostgreSQL 16 Windows 64-bit install folders in same named folders and then run CREATE EXTENSION as usual in the databases of your choice. More details in the packaged README.txt

These were compiled against PostgreSQL 16rc1, but should work fine against EDB PostgreSQL 16beta3.

This package contains the following FDWs:

  • odbc_fdw (version 0.5.2.3) for connecting to ODBC data sources such as SQL Server, Oracle, MS Access databases, and anything else that has a 64-bit ODBC driver. Note that since this is for PostgreSQL 64-bit, it can only use ODBC 64-bit connections.
  • file_textarray_fdw, great and fast for working with arbitrary and pesky delimited data. Especially where they weren't considerate enough to give you the same number of columns per row.

Note this package does not include ogr_fdw since ogr_fdw is packaged as part of PostGIS Bundle packages from EnterpriseDb Stackbuilder (for PostGIS >= 3.0) .

If you do not have PostGIS Bundle installed (and don't want to for some reason) and want to use ogr_fdw on windows, you can download from: Winnie's PG 16 Extras. ogr_fdw is a great FDW for querying not just spatial data, but also a ton of other file formats or relational (including odbc, dbase files, spreadsheets) since spatial is a superset.