Querying table, view, column and function descriptions

One of the biggest complaints about software and perhaps even more specifically FLOSS software is the lack of documentation. I'm not talking about those small little dialogs we throw in our code that rarely see the light of day. I'm talking about stuff you throw in user docs or specifications and so forth that an analyst or end-user reads.

The main reasons for this lack of documentation is that while everyone seems to want documentation, few really want to pay for it, and it's time consuming to keep documentation in synch with what the software actually does. Having documentation that is very inconsistent with the software is worse than not having any documentation at all. A good way to keep documentation up to date is to fold it into the process of developing and testing the software such that changes in software expose errors in the documentation and vice-versa or that the act of changing the software corrects the documentation.

Part of the way we try to do that on the PostGIS project is to require each function patch to include documentation. We also try to make our documentation executable by auto-generating PostgreSQL help and some test suites from the documentation. For example the process of installing documentation-generated function and type descriptions in PostgreSQL catches inconsistencies between the implementation and what we actually publish about how the functions work. Even the image generator that builds most of the images in the manual is built with PostGIS as a core of it so that when our build-bot is rebuilding the documentation it is exercising some PostGIS output functions.

I can't tell you how many times I've flagged changes in the current PostGIS 2.0 release simply by trying to install the docbook generated PostgreSQL comment descriptions and PostgreSQL complains that the function signature no longer exists that the documentations says should be there. So then I go back and correct the documentation or yell at the programmer if the documentation makes more sense than what they coded. On the other side, its easy to catch what we neglected to document simply by scanning the functions in PostgreSQL and seeing which ones don't have descriptions.

Okay as far as databases goes, many a DB Programmer/Analyst has waltzed into a project only to be befuddled about the meanings of all these tables used by the application. If you are like us, you don't even want to look at any database documentation that is not part of the definition of the database because you know 99% of the time it's so obsolete or was just a pipe dream of someone working in a vacuum that its more useless than not having any documentation at all. It is nice to have nicely formatted documentation you can read separate from the code, but even sweeter if you can query the documentation just as easily as you can query the data.

A good way of having up to date documentation is to weave it in as part of the process of defining the structure. For databases this means using foreign keys, primary keys, and using the commenting features that most relational databases offer these days. For example the databases we commonly work with, PostgreSQL, SQL Server, MS Access, and even MySQL all allow you to provide descriptions for tables, table columns, and sometimes other objects such as functions and stored procs right in the database. PostgreSQL even allows you to provide descriptions of columns in views though that's a bit messier to do. Sadly there isn't a consistent way of pulling these descriptions out of the database that will work for all of these. Each has differently defined meta tables it stores these descriptions in. For thise article, we'll demonstrate how to pull this information from PostgreSQL.

PostgreSQL does offer many useful switches in psql for querying this data, but we'll focus our attention on pulling this data via SQL. It's much easier to incorporate this information in auto-generated documentation with SQL because you can have more control what you include and how to format it.

Question: how do you get the description for all the tables and views in your database?

Answer:

The below will just list tables with descriptions. If you want all tables listed, you would just take out the d.description > '' criteria and you can even further filter to just get descriptions for tables in a particular schema.

SELECT c.relname As tname, CASE WHEN c.relkind = 'v' THEN 'view' ELSE 'table' END As type, 
    pg_get_userbyid(c.relowner) AS towner, t.spcname AS tspace, 
    n.nspname AS sname,  d.description
   FROM pg_class As c
   LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
   LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace
   LEFT JOIN pg_description As d ON (d.objoid = c.oid AND d.objsubid = 0)
   WHERE c.relkind IN('r', 'v') AND d.description > ''
   ORDER BY n.nspname, c.relname ;

The above will output somethings like this:

tname | type |  towner  | tspace |   sname    |            description

-------+------+----------+--------+------------+-----------------------------------
 edge  | view | postgres |        | tiger_topo | Contains edge topology primitives

Question: How do you get the description for all the table/view columns in a table?

For this example we just do it for one specific table.

SELECT a.attname As column_name,  d.description
   FROM pg_class As c
    INNER JOIN pg_attribute As a ON c.oid = a.attrelid
   LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
   LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace
   LEFT JOIN pg_description As d ON (d.objoid = c.oid AND d.objsubid = a.attnum)
   WHERE  c.relkind IN('r', 'v') AND  n.nspname = 'tiger_topo' AND c.relname = 'edge'
   ORDER BY n.nspname, c.relname, a.attname ;

Which outputs something like this:

   column_name   |                  description
-----------------+----------------------------------------------------------------------------
 edge_id         | Unique identifier of the edge
 end_node        | Unique identifier of the node at the end of the edge
 geom            | The geometry of the edge
 left_face       | Unique identifier of the face on the left side of the edge when looking in the direction from START_NODE to END_NODE
 next_left_edge  | Unique identifier of the next edge of the face on the left (when looking in the direction from START_NODE to END_NODE)
                   , moving counterclockwise around the face boundary
 next_right_edge | Unique identifier of the next edge of the face on the right (when looking in the direction from START_NODE to END_NODE)
                  , moving counterclockwise around the face boundary
 right_face      | Unique identifier of the face on the right side of the edge when looking in the direction from START_NODE to END_NODE
 start_node      | Unique identifier of the node at the start of the edge

Question: How do you get the descriptions for select functions?

This example pulls all the functions in the PostGIS 2.0 topology schema that have descriptions and have to do with creation.

SELECT p.proname AS funcname,  d.description
 FROM pg_proc p
   INNER JOIN pg_namespace n ON n.oid = p.pronamespace
   LEFT JOIN pg_description As d ON (d.objoid = p.oid )
     WHERE n.nspname = 'topology' and d.description ILIKE '%creat%'
   ORDER BY n.nspname, p.proname ;

Output looks like this:

     funcname     |                      description                                                                                                           
------------------+--------------------------------------------------------------------------------------------------------------------------------
 createtopogeom   | args: toponame, tg_type, layer_id, tg_objs - Creates a new topo geometry object from topo element array - 
                    tg_type: 1:[multi]point, 2:[multi]line, 3:[multi]poly, 4:collection
 createtopology   | args: topology_schema_name, srid, tolerance, hasz - Creates a new topology schema and 
                    registers this new schema in the topology.topology table.
 createtopology   | args: topology_schema_name, srid, tolerance - Creates a new topology schema 
                       and registers this new schema in the topology.topology table.
 createtopology   | args: topology_schema_name, srid - Creates a new topology schema and registers this new schema in the topology.topology table.
 createtopology   | args: topology_schema_name - Creates a new topology schema and registers this new schema in the topology.topology table.
 st_addisonode    | args: atopology, aface, apoint - Adds an isolated node to a face in a topology and returns the nodeid of the new node. 
                    If face is null, the node is still created.
 st_inittopogeo   | args: topology_schema_name - Creates a new topology schema and registers this new schema 
                    in the topology.topology table and details summary of process.
 st_modedgesplit  | args: atopology, anedge, apoint - Split an edge by creating a new node along an existing edge, 
                    modifying the original edge and adding a new edge.
 st_newedgessplit | args: atopology, anedge, apoint - Split an edge by creating a new node along an existing edge, 
                    deleting the original edge 
                    and replacing it with two new edges. Returns the id of the new node created that joins the new edges.