PostgreSQL offers several options for displaying and querying tree like structures.
In Using Recursive Common Table Expressions (CTE) to represent tree structures
we demonstrated how to use common table expressions to display a tree like structure. Common Table Expressions required PostgreSQL 8.4 and above but was fairly ANSI standards compliant. In addition to that
approach you have the option of using recursive functions. There is yet another common approach for this which is specific to PostgreSQL. This is using the ltree contrib datatype
that has been supported for sometime in PostgreSQL. For one of our recent projects, we chose ltree over the other approaches because the performance is much better when you need to do ad-hoc queries over the tree since it can take advantage of btree and gist indexes
and also has built-in tree query expressions that make ad-hoc queries simpler to do; similar in concept to the tsearch query syntax for querying text.
In this article we'll demonstrate how to use ltree and along the way also show the PostgreSQL 9.0 new features conditional triggers and ordered aggregates.
Why use ltree?
The upsides are that:
- It has been supported for some time in PostgreSQL so can be used even in older versions such as PostgreSQL 8.1/8.2
- It will be generally faster than using a recursive CTE or recursive function that constantly needs to recalculate the branching
- Has built in query syntax and operators specifically designed for querying and navigating trees
The main downside of the ltree approach is that the tree structure needs to be maintained usually as an extra field to gain all the benefits it offers. This you can do fairly easily using triggers.
Support for Trees and Recursive structures in other vendor databases
The other downside is that ltree datatype is not as portable across different database vendor products, though you could simulate the behavior in other databases by maintaining the dot representation as a text field with a simple index and do basic text queries for other databases if portability is a concern.
Some other databases have similar types. For examples SQL Server 2008 has a datatype called HierarchyID which serves the same purpose as ltree but with different syntax. An example of the SQL Server hierarchyid type can be found in
Introduction to SQL Server 2008 HierarchyID. It uses path '/' notation for display instead of ltree '.' notation for display. SQL Server also uses object oriented
methods for navigation as opposed to plain old functions and operators that ltree employs. On the plus side, PostgreSQL ltree syntax is much more terse than SQL Server's HierarcyID methods but on the downside is more cryptic looking.
As a side note -- all the big three proprietary enterprise databases (Oracle 11G R2, IBM Db2, SQL Server 2005/2008) support recursive common table expressions. Prior versions of Oracle only support Oracle proprietary CONNECT BY syntax which in some cases is more flexible than the CTE standard.
To our knowledge the only Open source databases that support recursive common table expressions are PostgreSQL 8.4+ and Firebird 2.1+. Firebird's CTEs are explaind here.
Ltree at work
For these next exercises, we will use the same dataset we used on our earlier article on CTEs.
Installing ltree type
Before we can use the ltree datatype, we need to install it by running the ltree.sql in /share/contrib/ltree.sql. Once that is done we will create our table as before, but with an additional column to hold the node path. We also add two indexes to improve speed of queries.
ltree at work
CREATE TABLE supplyitem(si_id integer PRIMARY KEY, si_parentid integer, si_item varchar(100), node_path ltree);
CREATE UNIQUE INDEX idx_supplyitem_node_path_btree_idx ON supplyitem USING btree(node_path);
CREATE INDEX idx_supplyitem_node_path_gist_idx ON supplyitem USING gist(node_path);
To maintain the node_path value, we opted to use triggers so that we can insert data as easily as we did before. The below code creates a trigger and binds it to the table.
The key operator we are using is @>
which when use a @> b
returns a boolean whether b is a child or equal to a. Note there are many more operators and a rich query syntax we encourage you to explore in the official PostgreSQL ltree documentation.
CREATE OR REPLACE FUNCTION get_calculated_si_node_path(param_si_id integer)
RETURNS ltree AS
$$
SELECT CASE WHEN s.si_parentid IS NULL THEN s.si_id::text::ltree
ELSE get_calculated_si_node_path(s.si_parentid) || s.si_id::text END
FROM supplyitem As s
WHERE s.si_id = $1;
$$
LANGUAGE sql;
CREATE OR REPLACE FUNCTION trig_update_si_node_path() RETURNS trigger AS
$$
BEGIN
IF TG_OP = 'UPDATE' THEN
IF (COALESCE(OLD.si_parentid,0) != COALESCE(NEW.si_parentid,0) OR NEW.si_id != OLD.si_id) THEN
UPDATE supplyitem SET node_path = get_calculated_si_node_path(si_id)
WHERE OLD.node_path @> supplyitem.node_path;
END IF;
ELSIF TG_OP = 'INSERT' THEN
UPDATE supplyitem SET node_path = get_calculated_si_node_path(NEW.si_id) WHERE supplyitem.si_id = NEW.si_id;
END IF;
RETURN NEW;
END
$$
LANGUAGE 'plpgsql' VOLATILE;
CREATE TRIGGER trig01_update_si_node_path AFTER INSERT OR UPDATE OF si_id, si_parentid
ON supplyitem FOR EACH ROW
EXECUTE PROCEDURE trig_update_si_node_path();
CREATE TRIGGER trig01_update_si_node_path AFTER INSERT OR UPDATE
ON supplyitem FOR EACH ROW
EXECUTE PROCEDURE trig_update_si_node_path();
Now that we have the triggers in place, we can insert the data as we did before. Note that when we select, the trigger has filled in the node_path for us.
INSERT INTO supplyitem(si_id,si_parentid, si_item)
VALUES (1, NULL, 'Paper'),
(2,1, 'Recycled'),
(3,2, '20 lb'),
(4,2, '40 lb'),
(5,1, 'Non-Recycled'),
(6,5, '20 lb'),
(7,5, '40 lb'),
(8,5, 'Scraps');
SELECT si_id, si_item, si_parentid, node_path
FROM supplyitem;
si_id | si_item | si_parentid | node_path
-------+--------------+-------------+-----------
1 | Paper | | 1
2 | Recycled | 1 | 1.2
3 | 20 lb | 2 | 1.2.3
4 | 40 lb | 2 | 1.2.4
5 | Non-Recycled | 1 | 1.5
6 | 20 lb | 5 | 1.5.6
7 | 40 lb | 5 | 1.5.7
8 | Scraps | 5 | 1.5.8
To produce the same results as we did in our earlier article we have a couple of options. For PostgreSQL 9.0 we can use ordered aggs or the old ARRAY syntax and for older versions we use ARRAY subselect syntax.
SELECT s.si_id, array_to_string(array_agg(a.si_item ORDER BY a.node_path), '->') As si_item_fullname
FROM supplyitem As s INNER JOIN supplyitem As a
ON (a.node_path @> s.node_path)
GROUP BY s.si_id, s.node_path, s.si_item
ORDER BY si_item_fullname;
SELECT s.si_id, array_to_string(
ARRAY(SELECT a.si_item FROM supplyitem As a WHERE a.node_path @> s.node_path
ORDER BY a.node_path)
,
'->') As si_item_fullname
FROM supplyitem As s
ORDER BY si_item_fullname;
si_id | si_item_fullname
------+-----------------------------
1 | Paper
5 | Paper->Non-Recycled
6 | Paper->Non-Recycled->20 lb
7 | Paper->Non-Recycled->40 lb
8 | Paper->Non-Recycled->Scraps
2 | Paper->Recycled
3 | Paper->Recycled->20 lb
4 | Paper->Recycled->40 lb
To test the updating, we will make Recyled a higher level (not child of Paper). If this is working properly, it should update all the child records as well.
UPDATE supplyitem SET si_parentid = null where si_id = 2;
si_id | si_item | si_parentid | node_path
-------+--------------+-------------+-----------
1 | Paper | | 1
5 | Non-Recycled | 1 | 1.5
6 | 20 lb | 5 | 1.5.6
7 | 40 lb | 5 | 1.5.7
8 | Scraps | 5 | 1.5.8
2 | Recycled | | 2
3 | 20 lb | 2 | 2.3
4 | 40 lb | 2 | 2.4
si_id | si_item_fullname
------+-----------------------------
1 | Paper
5 | Paper->Non-Recycled
6 | Paper->Non-Recycled->20 lb
7 | Paper->Non-Recycled->40 lb
8 | Paper->Non-Recycled->Scraps
2 | Recycled
3 | Recycled->20 lb
4 | Recycled->40 lb