Using LTree to Represent and Query Hierarchy and Tree Structures

PostgreSQL offers several options for displaying and querying tree like structures. In Using Recursive Common Table Expressions (CTE) to represent tree structures we demonstrated how to use common table expressions to display a tree like structure. Common Table Expressions required PostgreSQL 8.4 and above but was fairly ANSI standards compliant. In addition to that approach you have the option of using recursive functions. There is yet another common approach for this which is specific to PostgreSQL. This is using the ltree contrib datatype that has been supported for sometime in PostgreSQL. For one of our recent projects, we chose ltree over the other approaches because the performance is much better when you need to do ad-hoc queries over the tree since it can take advantage of btree and gist indexes and also has built-in tree query expressions that make ad-hoc queries simpler to do; similar in concept to the tsearch query syntax for querying text.

In this article we'll demonstrate how to use ltree and along the way also show the PostgreSQL 9.0 new features conditional triggers and ordered aggregates.

Why use ltree?

The upsides are that:

It has been supported for some time in PostgreSQL so can be used even in older versions such as PostgreSQL 8.1/8.2
It will be generally faster than using a recursive CTE or recursive function that constantly needs to recalculate the branching
Has built in query syntax and operators specifically designed for querying and navigating trees

The main downside of the ltree approach is that the tree structure needs to be maintained usually as an extra field to gain all the benefits it offers. This you can do fairly easily using triggers.

Support for Trees and Recursive structures in other vendor databases

The other downside is that ltree datatype is not as portable across different database vendor products, though you could simulate the behavior in other databases by maintaining the dot representation as a text field with a simple index and do basic text queries for other databases if portability is a concern.

Some other databases have similar types. For examples SQL Server 2008 has a datatype called HierarchyID which serves the same purpose as ltree but with different syntax. An example of the SQL Server hierarchyid type can be found in Introduction to SQL Server 2008 HierarchyID. It uses path '/' notation for display instead of ltree '.' notation for display. SQL Server also uses object oriented methods for navigation as opposed to plain old functions and operators that ltree employs. On the plus side, PostgreSQL ltree syntax is much more terse than SQL Server's HierarcyID methods but on the downside is more cryptic looking.

As a side note -- all the big three proprietary enterprise databases (Oracle 11G R2, IBM Db2, SQL Server 2005/2008) support recursive common table expressions. Prior versions of Oracle only support Oracle proprietary CONNECT BY syntax which in some cases is more flexible than the CTE standard.

To our knowledge the only Open source databases that support recursive common table expressions are PostgreSQL 8.4+ and Firebird 2.1+. Firebird's CTEs are explaind here.

Ltree at work

For these next exercises, we will use the same dataset we used on our earlier article on CTEs.

Installing ltree type

Before we can use the ltree datatype, we need to install it by running the ltree.sql in /share/contrib/ltree.sql. Once that is done we will create our table as before, but with an additional column to hold the node path. We also add two indexes to improve speed of queries.

ltree at work

CREATE TABLE supplyitem(si_id integer PRIMARY KEY, si_parentid integer, si_item varchar(100), node_path ltree);
CREATE UNIQUE INDEX idx_supplyitem_node_path_btree_idx ON supplyitem USING btree(node_path);
CREATE INDEX idx_supplyitem_node_path_gist_idx ON supplyitem USING gist(node_path);

To maintain the node_path value, we opted to use triggers so that we can insert data as easily as we did before. The below code creates a trigger and binds it to the table. The key operator we are using is @> which when use a @> b returns a boolean whether b is a child or equal to a. Note there are many more operators and a rich query syntax we encourage you to explore in the official PostgreSQL ltree documentation.

CREATE OR REPLACE FUNCTION get_calculated_si_node_path(param_si_id integer)
  RETURNS ltree AS
$$
SELECT  CASE WHEN s.si_parentid IS NULL THEN s.si_id::text::ltree 
            ELSE get_calculated_si_node_path(s.si_parentid)  || s.si_id::text END
    FROM supplyitem As s
    WHERE s.si_id = $1;
$$
  LANGUAGE sql;


/** trigger to maintain node when parent or item id changes or record is added  
    We revised this since the first version because the order of the short-circuiting is not predictable so TG_OP needs a nested IF.
    Also some other minor simplifications
**/
CREATE OR REPLACE FUNCTION trig_update_si_node_path() RETURNS trigger AS
$$
BEGIN
  IF TG_OP = 'UPDATE' THEN
        IF (COALESCE(OLD.si_parentid,0) != COALESCE(NEW.si_parentid,0)  OR  NEW.si_id != OLD.si_id) THEN
            -- update all nodes that are children of this one including this one
            UPDATE supplyitem SET node_path = get_calculated_si_node_path(si_id) 
                WHERE OLD.node_path  @> supplyitem.node_path;
        END IF;
  ELSIF TG_OP = 'INSERT' THEN
        UPDATE supplyitem SET node_path = get_calculated_si_node_path(NEW.si_id) WHERE supplyitem.si_id = NEW.si_id;
  END IF;
  
  RETURN NEW;
END
$$
LANGUAGE 'plpgsql' VOLATILE;

/** for postgreSQL 9.0 -- you can use this syntax to save unnecessary check of trigger function **/
CREATE TRIGGER trig01_update_si_node_path AFTER INSERT OR UPDATE OF si_id, si_parentid
   ON supplyitem FOR EACH ROW
   EXECUTE PROCEDURE trig_update_si_node_path();

/** pre - postgreSQL 9.0 -- have to use this syntax **/
CREATE TRIGGER trig01_update_si_node_path AFTER INSERT OR UPDATE 
   ON supplyitem FOR EACH ROW
   EXECUTE PROCEDURE trig_update_si_node_path();

Now that we have the triggers in place, we can insert the data as we did before. Note that when we select, the trigger has filled in the node_path for us.

-- insert the data --
--load up the table (multirow constructor introduced in 8.2)
INSERT INTO supplyitem(si_id,si_parentid, si_item)
VALUES (1, NULL, 'Paper'),
(2,1, 'Recycled'),
(3,2, '20 lb'),
(4,2, '40 lb'),
(5,1, 'Non-Recycled'),
(6,5, '20 lb'),
(7,5, '40 lb'),
(8,5, 'Scraps');


SELECT si_id, si_item, si_parentid, node_path 
    FROM supplyitem;


-- result looks like --
 si_id |   si_item    | si_parentid | node_path
-------+--------------+-------------+-----------
     1 | Paper        |             | 1
     2 | Recycled     |           1 | 1.2
     3 | 20 lb        |           2 | 1.2.3
     4 | 40 lb        |           2 | 1.2.4
     5 | Non-Recycled |           1 | 1.5
     6 | 20 lb        |           5 | 1.5.6
     7 | 40 lb        |           5 | 1.5.7
     8 | Scraps       |           5 | 1.5.8

To produce the same results as we did in our earlier article we have a couple of options. For PostgreSQL 9.0 we can use ordered aggs or the old ARRAY syntax and for older versions we use ARRAY subselect syntax.

-- use a self join to find all the ancestors 
-- of selected nodes and concat and order them by ancestry level

-- 9.0 syntax   
SELECT s.si_id, array_to_string(array_agg(a.si_item ORDER BY a.node_path), '->') As si_item_fullname
FROM supplyitem As s INNER JOIN supplyitem As a
    ON (a.node_path @> s.node_path)
GROUP BY s.si_id, s.node_path, s.si_item
ORDER BY si_item_fullname;

-- Pre 9.0 syntax (can use for 9.0 as well)
SELECT s.si_id, array_to_string(
            ARRAY(SELECT a.si_item FROM supplyitem As a WHERE a.node_path @> s.node_path
                                ORDER BY a.node_path)
                                        , 
                        '->') As si_item_fullname
FROM supplyitem As s 
ORDER BY si_item_fullname;


-- output of both looks like our old recursive
si_id |      si_item_fullname
------+-----------------------------
    1 | Paper
    5 | Paper->Non-Recycled
    6 | Paper->Non-Recycled->20 lb
    7 | Paper->Non-Recycled->40 lb
    8 | Paper->Non-Recycled->Scraps
    2 | Paper->Recycled
    3 | Paper->Recycled->20 lb
    4 | Paper->Recycled->40 lb

To test the updating, we will make Recyled a higher level (not child of Paper). If this is working properly, it should update all the child records as well.

-- Detached recycled from Paper --
UPDATE supplyitem SET si_parentid = null where si_id = 2;


-- when we rerun our prior select queries we get
-- select from table

 si_id |   si_item    | si_parentid | node_path
-------+--------------+-------------+-----------
     1 | Paper        |             | 1
     5 | Non-Recycled |           1 | 1.5
     6 | 20 lb        |           5 | 1.5.6
     7 | 40 lb        |           5 | 1.5.7
     8 | Scraps       |           5 | 1.5.8
     2 | Recycled     |             | 2
     3 | 20 lb        |           2 | 2.3
     4 | 40 lb        |           2 | 2.4
     
-- pretty display query
si_id |      si_item_fullname
------+-----------------------------
    1 | Paper
    5 | Paper->Non-Recycled
    6 | Paper->Non-Recycled->20 lb
    7 | Paper->Non-Recycled->40 lb
    8 | Paper->Non-Recycled->Scraps
    2 | Recycled
    3 | Recycled->20 lb
    4 | Recycled->40 lb