PLPython Part 5: PLPython meets PostgreSQL Multi-column aggregates and SVG plots

Friday, March 13. 2009

PLPython Part 5: PLPython meets PostgreSQL Multi-column aggregates and SVG plots

Recommended Books: Quick Python Book Beginning Python Python Pocket Reference

In our PLPython Part 4: PLPython meets aggregates we demonstrated the power of PostgreSQL in combination with PLPython to create a sequence of ascii plots by using the power of aggregation. Our friend Simon Greener over at Spatial Db Advisor told me the example was clever but ascii plots was so 70ish and ugly, why didn't I do SVG plots? He felt people being predominantly visual would be caught up in the ugliness of Ascii plots and miss the point. At least Bob found them neat and can appreciate the elegance in simplicity, thanks Bob..

To make it up to Simon, we shall demonstrate 2 features we left out in our prior article.

Pretty SVG graphs -- so I don't have to hear more about SVG from Simon and how I'm caught in the 70s.
Ability to plot a schedule -- schedule item, start date, end date using a multi-column aggregate function. Here is a good use for multi-column aggregates or at least I think so. (Yeh for Multi-column aggregates)

If you are into spatial processing in the database or how many ways you can play with XML particularly Oracle Locator, Oracle Spatial, PostGIS and SQL Server 2008, we highly recommend Simon's articles.

Python meets PostgreSQL multi-column aggregates

In this article we shall demonstrate how to create multi-column aggregates with PL/Python. Note you can do this with any PL or SQL language in PostgreSQL, but Python happens to have a lot of nifty libraries we wanted to put to use. For this exercise, We shall create a function agg_svgschedule. What it will do is plot each grouping of data and return an svg schedule plot for each grouping. This aggregate will take 3 columns - a schedule item, a start date and an end date. The steps we covered in How to create multi-column aggregates, can be applied here.

Our basic approach is as follows:

Download and install this nifty MIT-Licensed SVG plot library we found for Python called py-svg http://sourceforge.net/projects/py-svg and borrow it.
Since PL/Python does not support arrays as input, our aggregator will form a pipe, | delimeted list inputs as a string using svgplot_scheduledatabuild function (which although we wrote it in python need not be and probably more efficient to write in plpgsql). It must be noted that this particular library takes a single paired array of data points. So the number of elements must be in multiples so we need only one delimiter as we had in our ascii plot example.
As the final step of the aggregation, this is fed to our svgplot_scheduledata function -- which converts this to an array of strings to be fed to the plotter
The other nice thing about this library is that it doesn't use print, so we don't need to redirect the stdout as we had to with the ascii plot
Note this SVG library contains other things such as pie charts, regular graphs, bar charts, but we thought the schedule one looked kind of nifty and made the most sense for aggregate svg plots

Install Py-SVG

Installing Py-SVG is pretty easy. At the command-line run the following:

Download the Py-SVG package py-svg http://sourceforge.net/projects/py-svg
Extract it
From command line either on linux or window run:
C:\Python25\python /path/to/pysvg/setup.py install

For linux users its usually:
/usr/bin/python2.5/python /path/to/pysvg/setup.py install

Code is shown below and final output

CREATE OR REPLACE FUNCTION svgplot_schedule(param_dataset text)
  RETURNS text AS
 $$
  from SVG import Schedule
  title = "Schedule"
  g = Schedule.Schedule(dict(
      width = 640,
      height = 480,
      graph_title = title,
      show_graph_title = True,
      key = False,
      scale_x_integers = True,
      scale_y_integers = True,
      show_data_labels = True,
      show_y_guidelines = True,
      show_x_guidelines = True,
      # show_x_title = True, # not yet implemented
      x_title = "Time",
      show_y_title = False,
      rotate_x_labels = True,
      rotate_y_labels = False,
      x_label_format = "%m/%d",
      timescale_divisions = "1 day",
      add_popups = True,
      popup_format = "%m/%d/%y",
      area_fill = True,
      min_y_value = 0,
      ))
  #convert our string of | delimited values to an array
  adataset = param_dataset.split('|')
  #pipe in converting each string to number
  g.add_data(dict(data=adataset, title="Data"))
  return g.burn()
$$ LANGUAGE plpythonu;

CREATE OR REPLACE FUNCTION plot_svgschedule_databuild(param_dataset text, param_schitem text, param_schstart date, param_schend date)
  RETURNS text AS
 $$
  if param_dataset is None : 
    return str(param_schitem) + '|' + str(param_schstart) + '|' + str(param_schend)
  return param_dataset + '|' + str(param_schitem) + '|' + str(param_schstart) + '|' + str(param_schend)
$$ LANGUAGE plpythonu;


CREATE AGGREGATE agg_svgschedule(text,date, date) (
  SFUNC=plot_svgschedule_databuild,
  STYPE=text,
  FINALFUNC=svgplot_schedule
);

##Plotting fake data but lets pretend it is real
CREATE TABLE mysvg_scheduledata(pt_id serial, pt_person text, pt_schitem text, pt_start date, pt_end date);
INSERT INTO mysvg_scheduledata(pt_person, pt_schitem, pt_start, pt_end)
    SELECT DISTINCT ON(pers, activity, startdate) pers, activity, startdate, enddate
    FROM
    (SELECT (ARRAY['Jack', 'Jill', 'Tom'])[i] As pers, 
        (ARRAY['Gym', 'Crew', 'Programming', 'Skiing'])[CAST(random()*10*j As integer) % 3 + 1] As activity, 
             CURRENT_DATE + CAST(CAST(k As text) || ' day' as interval) As startdate, 
                CURRENT_DATE + CAST(CAST(k + (CAST(random()*10*j As integer) % 3 + 1) As text) || ' day' as interval) As enddate
        FROM  generate_series(1,3) As i
                CROSS JOIN  generate_series(1,4) As j
                CROSS JOIN generate_series(1,3) As k
            ) As foo
    ORDER BY pers, activity, startdate;


#and now to test
SELECT pt_person,  agg_svgschedule(pt_schitem, pt_start, pt_end) As svggraph
 FROM (SELECT pt_person, pt_schitem, pt_start, pt_end 
        FROM mysvg_scheduledata 
            ORDER BY pt_person, pt_schitem, pt_start, pt_end) As foo
 GROUP BY pt_person
 ORDER BY pt_person;

Which gives us this:

pt_person	svggraph
Jack
Jill
Tom

Posted by Leo Hsu and Regina Obe in 8.2, 8.3, 8.4, advanced, pl programming, plpython, postgresql versions at 08:28 | Comments (3) | Trackbacks (0)

Trackbacks

Trackback specific URI for this entry

No Trackbacks

Comments

Display comments as (Linear | Threaded)

Regina,

I love the "intensity" of the colours!

A clever bit of work covering a number of areas (particularly like the aggregate work) with good looking output.

Gee, look, ascii ain't that bad: \(^_^)/

;-)

Simon Greener
(an old SVG fan from way back)

#1 Simon Greener (Homepage) on 2009-03-13 02:32 (Reply)

It is really great. I also used same technic in my web site and it works.

#1.1 davetiye (Homepage) on 2009-04-18 04:30 (Reply)

This code has changed a bit since it has been published. To get it running, replace the "from SVG import Schedule" line with "from svg.charts import scheduled as Schedule".

All the rest can remain the same.

#2 Jorge Godoy on 2010-07-11 07:57 (Reply)

Add Comment

Name
Email
Homepage
In reply to
Comment	E-Mail addresses will not be displayed and will only be used for E-Mail notifications. To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly. Enter the string from the spam-prevention image above: Phone* What is six minus five?
	Remember Information? Subscribe to this entry

PLPython Part 5: PLPython meets PostgreSQL Multi-column aggregates and SVG plots

Postgres OnLine Journal

PostGIS in Action About the Authors Consulting

Friday, March 13. 2009

PLPython Part 5: PLPython meets PostgreSQL Multi-column aggregates and SVG plots

Python meets PostgreSQL multi-column aggregates

Install Py-SVG

Entry's Links

Quicksearch

Calendar

Categories

Archives

Subscribe

Blog Administration

PLPython Part 5: PLPython meets PostgreSQL Multi-column aggregates and SVG plots

Postgres OnLine Journal PostGIS in Action About the Authors Consulting

Friday, March 13. 2009

PLPython Part 5: PLPython meets PostgreSQL Multi-column aggregates and SVG plots

Python meets PostgreSQL multi-column aggregates

Install Py-SVG

Entry's Links

Quicksearch

Calendar

Categories

Archives

Subscribe

Blog Administration

Postgres OnLine Journal

PostGIS in Action About the Authors Consulting