Table Of Contents
PostGIS in Action - E-Book final version officially out
PostGIS in Action has started shipping
An Almost Idiot's Guide to Installing PostGIS 1.5 on PostgreSQL 9.0 via Yum Beginner
From the Editors
PostGIS News
PostGIS Mini Conference in Paris, France
There will be a PostGIS 1 Day mini conference in Paris June 23rd organized by Oslandia and Dalibo. For Details: PostGIS mini conference English Details and PostGIS mini conference French Details. Speaker submissions are due April 22nd. Main focus will be upcoming PostGIS 2.0 which gathering from newsgroup a lot of people are already bouncing around.
PGCon 2011 - Paul will make a show
Paul Ramsey (OpenGeo) will be speaking at PGCon 2011. PostGIS knows where you are? and everyone will know where Paul is too.
PostGIS 2.0 when?
All the new stuff coming in PostGIS 2.0, you can get a sense of from PostGIS 2.0 release notes.
The OSGEO C-Like tribe of which PostGIS is a member of had a nice code sprint in Montreal this month where Paul nailed down most of the issues he was having with 3D spatial index support so looks like we can look forward to that. Jeff Adams from Azavea took care of quite a few shapefile loader issues (shp2pgsql), and also added in many regression tests for shp2pgsql that we were sorely missing. Pierre Racine (University of Laval), David Zwarg (Azavea), and Jorge Arévalo (Deimos Space) (working remotely but definitely pitching in a lot) worked out some raster memory issues and improving ST_MapAlgebra. Olivier Courtin ironing out 3D serialization and handling of solids.
We haven't called feature freeze yet on PostGIS 2.0, but I would say most of the functionality you can expect is in place, but will undergo some more polishing. We aren't quite as disciplined as the PostgreSQL core group, but then again we are a much smaller group so can afford to be less disciplined. Such a discipline early on I think would be more stiffling than beneficial. It's still too early to tell if KNN GIST will make it, but sadly it's no where to be seen yet, much to the chagrin of some.
Raster is really kicking dust, people jumping in to contribute, memory leaks and so forth being macheted down. Some people already using it and we haven't even released 2.0 yet. That has been pretty exciting to see take shape and the excitement around it.
I'm also happy to see a lot of people on the User's list testing our Windows experimental builds. If only there was a windows build of PostgreSQL 9.1 newer than Alpha 1. We've been hesitant to build for newer since PostGIS we compile against the newer alphas gives strange errors on certain functions when used with 9.1 Alpha 1 VC builds. No we still don't have 64-bit Windows builds of PostGIS, but hope to focus on that after the book is completed. I think we will end up going the mingW64 route since VC sounds like it would be a great deal more effort with our current PostGIS toolchain.
Topology made a surprise showing as well with many many improvements from Sandro Santilli and Andrea Peri. Sandro also discovered the term 3D Topology. Once he calms down from the frightful thought, he may actually start working on that for the 2.1 release.
I've started to program in C again - tiptoeing anyway to work on ST_AsX3D output functionality and have really begun to smoke from the proverbial X3D pipe in preparation for HTML 5 enhancements. This work is funded by our new venture Arrival 3D for later use in our construction toolkits.
While 2D geometry and GIS has been fun, Augmenting that experience with 3D has been for a long time on my dream path. A true Post-GIS adventure. What comes after GIS? virtual space, the frontier you make of it. At least for me anyway. I'm also looking forward to getting back at least a tincy bit to my engineering roots.
What about PostGIS in Action book?
First of all I'm happy to report that we are closer to having book out, hmm how long have I been saying that? I am sad to report this has gone on for what feels like an eternity and I'm hoping in the end it will all be well worth the wait. It could be any day now (so we are told) for the final PDF E-book version, then a week or so after for the hard-copy (which has already started printing), and then not quite sure when the e-Pub, Mobi and other e-Book versions become available, but would guess a month or so after.
We got the final proof of the front and back cover, and all I can say is Ain't she pretty. She isn't quite the baby elepant with a unicorn horn and infrared eyes I hoped for, but she'll do this time around..
From the Editors
PostGIS in Action - E-Book final version officially out
I am happy to report, that the final proof of the PostGIS in Action E-Book got released today and the printed version is scheduled for release Aprill 11th, 2011 and should be available on Amazon and other locations around then. The other e-Reader formats will come after that. You can buy from here or download the two free chapters, if you haven't already.
Each hard-copy purchase comes with a free E-Book version. There is a coupon in the back of the book when you get it to get the E-Book versions.
Yes, I know it's been a really really long time. On the bright side, we produced twice as much content as we had set out to do and that was with keeping things as concise as we could get away with, still managing to cover more than we set out to cover, and stripping out as many unnecessary words as we could muster. So 520 pages and almost 2 years later, this is where we are.
A good chunk of the additional bulk of the book was the appendices which are about 150 pages total and focus strictly on PostgreSQL and SQL. After many comments from early reviewers, we thought it unfair not to have a good chunk of PostgreSQL and just general relational database content to familiarize programmers and GIS folks with the RDBMS that PostGIS lives in. Most GIS folk unfortunately have the hardest time with getting up to speed with SQL and just standard RDBMS management.
Two free chapters and accompanying code for all chapters
The two free chapters we selectively picked because we thought they would be most beneficial to newcomers and people new to relational databases. So the free chapters are:
- Chapter 1: What is a spatial database? Which provides a fast paced history of PostGIS, PostgreSQL, Spatial Databases and moves into an even faster journey into converting flat file restaurant locations to spatial point geometries, loading in an ESRI shapefile of roads. Then shows you how to write standard spatial queries and render the results.
- Appendix C: SQL Primer -- goes through querying information_schemas, the common points of writing SELECT, INSERT, UPDATE, DELETE SQL statements and the finer points of using aggregate functions, Windowing constructs and common table expressions as well as a brief overview of how PostgreSQL stacks up with other relational databases (SQL Server, Oracle, IBM DB2, MySQL, Firebird) in SQL features.
- All the chapter code and accompanying data. It's a bit hefty at 57 MB.
So even if you don't buy our book, we hope you find the free chapters useful.
You can get a more detailed listing of all the chapters from the PostGIS in Action book site.
We'd like to thank all those who supported us through this long and unpredictable journey. Hopefully we'll have several more, though hopefully a bit less nerve-racking than this first one.
From the Editors
PostGIS in Action has started shipping
We just got our complimentary author hard-copies of PostGIS in Action today. Those who ordered directly from Manning should be getting there's shortly too if they haven't already. Amazon and other distributors should start shipping soon as well.
We'll be saving some copies for door prizes at the next event we present at.
Basics
An almost idiot's guide to Install PostgreSQL 9.0 with Yum Beginner
In this exercise, we'll go thru installing PostgreSQL 9.0 on a Cent OS 5.5 32-bit box. This I'm testing on a GoGrid Cloud server so I can do parallel benchmarks between my windows GoGrid and Linux GoGrid server.
Upgrading from PostgreSQL 8.* to PostgreSQL 9.0
If you are upgrading from a PostgreSQL 8.4 to PostgreSQL 9.0, please refer to Devrim's article: Upgrading from 8.4 to 9.0 on Fedora / Red Hat / CentOS using RPMs.
For the rest of this article, we'll go over configuring your yum to use the PostgreSQL PGDG Yum repository managed by Devrim Gunduz, which has the latest and greatest of 9.0 as well as the 9.1 latest development release. We'll also demonstrate how to have two instances of PostgreSQL running so you can experiment with the new features of PostgreSQL 9.1 while reminiscing about the soon to be old-hat features of PostgreSQL 9.0.
Installing PostgreSQL 9.0 from Yum repository
The list of latest PostgreSQL versions and repos for Fedora 7-14, Red Hat 4-6, and CentOS 4-6 are located at http://www.pgrpms.org/reporpms/repoview/letter_p.group.html.
- Figure out which OS you are on. Note the PGDG repository only works with CentOS and Redhat 4 and above, or Fedora.
First run
uname -a
Mine happens to return:i686 i386 GNU/Linux
so I know its a 32-bit.
vi /etc/redhat-release
Mine returns:
CentOS release 5.5 (Final)
- Download the right files - from http://www.pgrpms.org/reporpms/repoview/letter_p.group.html . In our case we are running CentOS release 5.5. 32-bit so we use the CentOS repo
mkdir /pgbak cd /pgbak wget http://www.pgrpms.org/reporpms/9.0/pgdg-centos-9.0-2.noarch.rpm
- Install the rpms
rpm -ivh pgdg-centos-9.0-2.noarch.rpm
yum list | grep postgresql
You should see a bunch of entries and the ones we want are marked with pgdg90.If you see postgresql from other repositories besides PGDG, then you may want to exclude them to minimize the risk of installing the wrong thing. This is optional since these days the PostgreSQL package names now include the version number as part of the name so less likely for you to screw up: I use
emacs
mostly for editing because I can never remember how to navigate withvim
.* As root, cd /etc/yum.repos.d * Edit distro's .repo file: o On Fedora, edit fedora.repo and fedora-updates.repo, [fedora] sections o On CentOS, edit CentOS-Base.repo, [base] and [updates] sections. o On Red Hat, edit edit /etc/yum/pluginconf.d/rhnplugin.conf [main] section. + Add to the bottom of the section: exclude=postgresql*
emacs CentOS-Base.repo
Then once you've made the edits Ctrl-x Ctrl-s, Ctrl-x Ctrl-c.yum list | grep postgresql
should give you entries that look something like:postgresql90.i386 9.0.3-1PGDG.rhel5 pgdg90 postgresql90-contrib.i386 9.0.3-1PGDG.rhel5 pgdg90 postgresql90-debuginfo.i386 9.0.3-1PGDG.rhel5 pgdg90 postgresql90-devel.i386 9.0.3-1PGDG.rhel5 pgdg90 postgresql90-docs.i386 9.0.3-1PGDG.rhel5 pgdg90 postgresql90-jdbc.i386 9.0.801-1PGDG.rhel5 pgdg90 postgresql90-jdbc-debuginfo.i386 9.0.801-1PGDG.rhel5 pgdg90 postgresql90-libs.i386 9.0.3-1PGDG.rhel5 pgdg90 postgresql90-odbc.i386 09.00.0200-1PGDG.rhel5 pgdg90 postgresql90-odbc-debuginfo.i386 09.00.0200-1PGDG.rhel5 pgdg90 postgresql90-plperl.i386 9.0.3-1PGDG.rhel5 pgdg90 postgresql90-plpython.i386 9.0.3-1PGDG.rhel5 pgdg90 postgresql90-pltcl.i386 9.0.3-1PGDG.rhel5 pgdg90 postgresql90-python.i386 4.0-2PGDG.rhel5 pgdg90 postgresql90-python-debuginfo.i386 4.0-2PGDG.rhel5 pgdg90 postgresql90-server.i386 9.0.3-1PGDG.rhel5 pgdg90 postgresql90-tcl.i386 1.8.0-2.rhel5 pgdg90 postgresql90-tcl-debuginfo.i386 1.8.0-2.rhel5 pgdg90
-
I usually install the below. The devel is optional but needed if you want to compile PostGIS or some other add on later.
yum install postgresql90 postgresql90-server postgresql90-libs postgresql90-contrib postgresql90-devel
- You should get a prompt something like:
Dependencies Resolved ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: postgresql90 i386 9.0.3-1PGDG.rhel5 pgdg90 1.3 M postgresql90-contrib i386 9.0.3-1PGDG.rhel5 pgdg90 443 k postgresql90-devel i386 9.0.3-1PGDG.rhel5 pgdg90 1.5 M postgresql90-libs i386 9.0.3-1PGDG.rhel5 pgdg90 220 k postgresql90-server i386 9.0.3-1PGDG.rhel5 pgdg90 4.8 M Installing for dependencies: uuid i386 1.5.1-4.rhel5 pgdg90 53 k Transaction Summary ================================================================================ Install 6 Package(s) Upgrade 0 Package(s) Total download size: 8.3 M Is this ok [y/N]:
Type: yIt should then download the packaged and install them.
Creating data cluster and starting the service
With the new Yum setup each version of postgresql goes in its own separate folder, so you can easily have parallel installs of different versions of PostgreSQL, with as much ease as you could always do on Windows :). Devrim has the details of these changes in What-is-new-in-PostgreSQL-9.0-RPMs.html.
I'll reiterate the key elements and consequences of these changes you need to watch out for.
Just get me there fast
If you want to take all the default locations of ports and data cluster and want the service to start up on reboot, then simply do this:
su root service postgresql-9.0 initdb service postgresql-9.0 start chkconfig --list #(to see list of services) chkconfig postgresql-9.0 on
The data cluster is created by default in /var/lib/pgsql/9.0/data and the contribs folder is in /usr/pgsql-9.0/share/contrib/
To connect to your postgres server simply do this:
su postgres psql -p 5432
To install the adminpack.sql useful for viewing logs and editing postgresql.conf from the comfort of PgAdmin, don't forget to install the admin pack
su postgres psql -p 5432 -f /usr/pgsql-9.0/share/contrib/adminpack.sql
If you need to configure things the way you want to configure things
If you plan to run multiple versions of PostgreSQL e.g. stable 9.0 and devl 9.1, on the same server (so they use different ports), or you just need to be in control of all your options, you'll want to tweak your services scripts a bit. The best way to do that is to edit/create a file in /etc/sysconfig/pgsql/postgresql-9.0 with your preferred changes:
Note that # are remarks to you so don't type them in
su root emacs /etc/sysconfig/pgsql/postgresql-9.0 #if the file doesn't exist -- it will be blank which is okay PGPORT=5433 PGDATA=/pgdata90 #ctrl-x ctrl-s to save #ctrl-x ctrol-c to exitonly do these if you set the data cluseter in the sysconfig
service postgresql-9.0 stop #not necessary if you didn't init and start service postgresql-9.0 initdb service postgresql-9.0 start chkconfig postgresql-9.0 on cd /pgdata90 ls
You should see data cluster files. If you had accidentally followed the initial default steps and didn't mean to, you can simply do a
rm -rf /var/lib/pgsql/9.0/data
If you get a:
FATAL: could not open relation mapping file "global/pg_filenode.map": Permission denied
When you attempt to connect to psql. Most likely the problem is caused by a locked file from left over. You could go thru the effort of shutting down things looking, for what went wrong, but you are an Almost idiot with little patience for these things.
just do a:
reboot
and all should be fine.
On startup do a:
service postgresql-9.0 status
You should see a pid runnig ... message if all is good. If it says stopped, then check your /var/lib/pgsql/9.0/data/pg_log last log file which usually lists what went wrong. Often times it's usually caused by a typo you introduced in pg_hba.conf or postgresql.conf .
su postgres psql -p 5433 #or whatever port you chose CREATE role regina LOGIN PASSWORD 'whateveryouwant' SUPERUSER;
At the psql prompt: type the query below to verify the location of your cluster:
SELECT setting FROM pg_settings WHERE name = 'config_file';which should return an answer of (if you kept the defaults):
setting ----------------------------------------- /var/lib/pgsql/9.0/data/postgresql.confor The location of where you setup your cluster.
then do a \q
to exit the psql console
After that:
cd /var/lib/pgsql/9.0/data (or the location of your data cluster) emacs postgresql.conf
#make changes to memory, enable etc.
change #listen_addresses='localhost'
to listen_addresses='*'
if you want postgresql to run on all ips. You can set to a specific if you have multiple ips too.
if you need to change ports and data cluster -- don't do it in postgresql.conf -- remember in /etc/sysconfig/pgsql/postgresql-9.0
emacs pg_hba.conf #make changes to access .e.g add a line below if you want to be able to access postgresql from external. host all all 0.0.0.0/0 md5 ctrl-x ctrl-s to save ctrl-x ctrl-c to exit su root service postgresql-9.0 restart
Whenever you change the pg_hba.conf file, you need to do a restart or a:
service postgresql-9.0 reload
for the changes to take effect.
Turning off firewall or enabling PostgreSQL access
If after all those changes, you still can't access your PostgreSQL server from pgAdmin external, verify if you have firewall running. The GoGrid Image I grabbed had iptables on.
su root service iptables status
Just for perspective, my rule table looked like this before. As you can see no rule to enable postgres access.
Chain INPUT (policy ACCEPT) num target prot opt source destination 1 RH-Firewall-1-INPUT all -- 0.0.0.0/0 0.0.0.0/0 Chain FORWARD (policy ACCEPT) num target prot opt source destination 1 RH-Firewall-1-INPUT all -- 0.0.0.0/0 0.0.0.0/0 Chain OUTPUT (policy ACCEPT) num target prot opt source destination Chain RH-Firewall-1-INPUT (2 references) num target prot opt source destination 1 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 2 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 255 3 ACCEPT esp -- 0.0.0.0/0 0.0.0.0/0 4 ACCEPT ah -- 0.0.0.0/0 0.0.0.0/0 5 ACCEPT udp -- 0.0.0.0/0 224.0.0.251 udp dpt:5353 6 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED 7 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22 8 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:80 9 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:443 10 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited
service iptables stop
Verify you can connect after turning off. Okay now turn back on. The below example will enable tcp ports 5432-5434 (which we will use for postgres services) and insert the rule in the chain RH-Firewall-1-INPUT at row 9. Note my 10th rule is to reject all, so want all my access rules to be before that.
service iptables start iptables -I RH-Firewall-1-INPUT 9 -p tcp --dport 5432:5434 -j ACCEPT service iptables save service iptables restart
After I was done, it looked like this:
Chain INPUT (policy ACCEPT) num target prot opt source destination 1 RH-Firewall-1-INPUT all -- 0.0.0.0/0 0.0.0.0/0 Chain FORWARD (policy ACCEPT) num target prot opt source destination 1 RH-Firewall-1-INPUT all -- 0.0.0.0/0 0.0.0.0/0 Chain OUTPUT (policy ACCEPT) num target prot opt source destination Chain RH-Firewall-1-INPUT (2 references) num target prot opt source destination 1 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 2 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 255 3 ACCEPT esp -- 0.0.0.0/0 0.0.0.0/0 4 ACCEPT ah -- 0.0.0.0/0 0.0.0.0/0 5 ACCEPT udp -- 0.0.0.0/0 224.0.0.251 udp dpt:5353 6 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED 7 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22 8 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:80 9 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpts:5432:5434 10 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:443 11 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited
Also running 9.1 beta
If you want to play with 9.1 also, repeat the same steps as above except with 9.1 and the http://www.pgrpms.org/reporpms/9.1/pgdg-[your os here]-9.1-1.noarch.rpm. Note that if you plan to run both, you will need to use a sysconfig .. to prevent your two services from trying to run on the same port.
Basics
An Almost Idiot's Guide to Installing PostGIS 1.5 on PostgreSQL 9.0 via Yum Beginner
In the past I have always chosen to compile my own PostGIS because the GEOS library that came with the regular PostgreSQL yum install, was a bit antiquated. This has changed, so this time around I figured I'd give it a go at using the Yum repository 1.5.2 release of PostGIS available via Yum Package List.
Before you can follow these directions, make sure you have your PostgreSQL 9.0 setup via our An almost idiot's guide to Install PostgreSQL 9.0 with Yum.
Installing PostGIS package
Verify the postgis package is available with a:
yum list | grep postgis
You should see a list that looks like:
postgis90.i386 1.5.2-1.rhel5 pgdg90 postgis90-debuginfo.i386 1.5.2-1.rhel5 pgdg90 postgis90-docs.i386 1.5.2-1.rhel5 pgdg90 postgis90-utils.i386 1.5.2-1.rhel5 pgdg90
Now we do a:
yum install postgis90
Which should present a screen that looks like:
================================================================================ Package Arch Version Repository Size ================================================================================ Installing: postgis90 i386 1.5.2-1.rhel5 pgdg90 1.6 M Installing for dependencies: geos i386 3.2.2-1.rhel5 pgdg90 582 k proj i386 4.7.0-1.rhel5 pgdg90 162 k Transaction Summary ================================================================================ Install 3 Package(s) Upgrade 0 Package(s) Total download size: 2.3 M Is this ok [y/N]
Type y
If all goes well, you should get a screen that looks like this:
Downloading Packages: (1/3): proj-4.7.0-1.rhel5.i386.rpm | 162 kB 00:01 (2/3): geos-3.2.2-1.rhel5.i386.rpm | 582 kB 00:02 (3/3): postgis90-1.5.2-1.rhel5.i386.rpm | 1.6 MB 00:04 -------------------------------------------------------------------------------- Total 253 kB/s | 2.3 MB 00:09 Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : geos 1/3 Installing : proj 2/3 Installing : postgis90 3/3 Installed: postgis90.i386 0:1.5.2-1.rhel5 Dependency Installed: geos.i386 0:3.2.2-1.rhel5 proj.i386 0:4.7.0-1.rhel5 Complete!
Crerating a spatial database
Okay now that we have that installed. All the PostGIS binaries and contribs are located by default in: /usr/pgsql-9.0
su postgres cd /usr/pgsql-9.0/share/contrib/postgis-1.5 ls
Should show you a screen that looks like:
postgis.sql postgis_upgrade_15_minor.sql postgis_upgrade_13_to_15.sql spatial_ref_sys.sql postgis_upgrade_14_to_15.sql uninstall_postgis.sql
The postgis_comments.sql file file that installs database psql/pgAdmin online help online help is not listed. The postgis installed docs are located in /usr/share/doc/postgis90-1.5.2, but I couldn't find the postgis_comments.sql file.
If you want that, it's located in the docs folder of the official PostGIS 1.5.2 source tar ball and you install it after you have installed postgis.sql.
To create a spatial database or spatially enable a non-spatial database we do this: Note for the GRANT - if you are not paranoid about security, you could set that to ALL. For geography_columns there isn't any reason to give update rights since its a view into the system tables.
psql -U postgres -p 5432
At the psql prompt do
create database template_postgis15; \c template_postgis15 \i postgis.sql \i spatial_ref_sys.sql GRANT SELECT ON geometry_columns TO PUBLIC; GRANT SELECT ON spatial_ref_sys TO PUBLIC; GRANT SELECT ON geography_columns TO PUBLIC; \c postgres UPDATE pg_database SET datistemplate='true' WHERE datname='template_postgis15'; CREATE ROLE mygisuser LOGIN PASSWORD 'whatever' CREATEDB; CREATE DATABASE mygisdb WITH ENCODING='UTF8' OWNER=mygisuser TEMPLATE=template_postgis15; \q #logs you out.
Now we connect as our user:
psql -h localhost -U mygisuser -p 5433 -d mygisdbSELECT postgis_full_version();
postgis_full_version ------------------------------------------------------------------------------------------------------- POSTGIS="1.5.2" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September 2009" LIBXML="2.6.26" USE_STATS
SELECT ST_AsText( ST_Centroid('LINESTRING(1 2, 3 4)'::geometry)); st_astext ------------ POINT(2 3)
What about PostGIS 2.0?
Thought you would never ask. Well we've been playing with it on Windows, but have yet to compile and install on Linux. We will be doing that in the next week or so and write up a how to on that. I will say that the new features in PostGIS 2.0 are to put it lightly, mind blowing. We've got true 3D surfaces, topology, lots more processing functions, raster with mapalgebra and even export to JPEG etc which looks like it will make it into PostGIS 2.0. We have tiger geocoder with both geocoder and reverse gecoder functions to work with Tiger 2010 census. We have to cleanup the tiger loader for Linux a bit more and will do that before PostGIS 2.0 release. We've got X3D output and GML for surfaces to target CityGML needs. We have the whole world in our hands. I would say we've got a good 2 more months before we will be ready to release.
PL Programming
Using RETURNS TABLE vs. OUT parameters Intermediate
In a prior article Use of Out and InOut Parameters we demonstrated how to use OUT parameters and INOUT parameters to return a set of records from a PostgreSQL function. There is another approach to doing this, and that is to use the ANSI Standard RETURNS TABLE construct. If you come from a SQL Server or IBM DB2 background, the RETURNS TABLE construct is probably most familiar, but still how you use it and what is legal in it is a little different than it is in SQL Server or IBM DB2. We'll save the contrast compare as a topic for another article.
In terms of performance between using OUT vs. RETURNS TABLE, we haven't noticed much of a difference. The main thing that is nice about RETURNS TABLE is just that it's syntactically more pleasing in the sense that its clearer the structure of what you are returning.
In these next examples, we'll demonstrate similar examples we showed in the aforementioned article except using the
RETURNS TABLE
.
Be warned that the RETURNS TABLE
construct is only available for PostgreSQL 8.4+, while the OUT approach
has existed since PostgreSQL 8.1. With that said, if you need your code to work on 8.3 or lower, you can't use RETURNS TABLE.
When in doubt about a feature and you are creating code that needs to support earlier versions of PostgreSQL
(as we have to in the PostGIS development group),
or you want to get stubborn users off old antiquated versions of PostgreSQL and need a bit of ammunition
(as we have to (on PostGIS development including our own developers - and you know who you are :) ) )
check the
PostgreSQL feature matrix.
It will save you a lot of grief.
SQL and PLPGSQL function Return single record in RETURNS TABLE
--returning a single record using SQL function
CREATE OR REPLACE FUNCTION fn_sqltestout(param_subject text, pos integer)
RETURNS TABLE(subject_scramble text, subject_char text)
AS
$$
SELECT substring($1, 1,CAST(random()*length($1) As integer)) ,
substring($1, 1,1) As subject_char;
$$
LANGUAGE 'sql' VOLATILE;
-- example use
SELECT (fn_sqltestout('This is a test subject')).subject_scramble;
SELECT subject_scramble, subject_char FROM fn_sqltestout('This is a test subject');
--Same function but written in plpgsql
--PLPGSQL example -- return one record
CREATE OR REPLACE FUNCTION fn_plpgsqltestout(param_subject text)
RETURNS TABLE(subject_scramble text, subject_char text)
AS
$$
BEGIN
subject_scramble := substring($1, 1,CAST(random()*length($1) As integer));
subject_char := substring($1, 1,1);
RETURN NEXT;
END;
$$
LANGUAGE 'plpgsql' VOLATILE;
-- example use
SELECT (fn_sqltestout('This is a test subject')).subject_scramble;
SELECT subject_scramble, subject_char FROM fn_sqltestout('This is a test subject');
SQL and PLPGSQL function RETURNS TABLE - return multiple records
-- test data to use --
CREATE TABLE testtable(id integer PRIMARY KEY, test text);
INSERT INTO testtable(id,test)
VALUES (1, 'Potato'), (2, 'Potato'), (3, 'Cheese'), (4, 'Cheese Dog');
--SQL function returning multiple records
CREATE OR REPLACE FUNCTION fn_sqltestmulti(param_subject varchar)
RETURNS TABLE(test_id integer, test_stuff text)
AS
$$
SELECT id, test
FROM testtable WHERE test LIKE $1;
$$
LANGUAGE 'sql' VOLATILE;
-- example use
SELECT (fn_sqltestmulti('Cheese%')).test_stuff;
SELECT test_stuff FROM fn_sqltestmulti('Cheese%');
--both return:
test_stuff
------------
Cheese
Cheese Dog
-- plpgsql function returning multiple records
-- note RETURN QUERY was introduced in 8.3
-- variant 1
CREATE OR REPLACE FUNCTION fn_plpgsqltestmulti(param_subject varchar)
RETURNS TABLE(test_id integer, test_stuff text)
AS
$$
BEGIN
RETURN QUERY SELECT id, test
FROM testtable WHERE test LIKE param_subject;
END;
$$
LANGUAGE 'plpgsql' VOLATILE;
-- variant 2 use this if you need to do something additional
-- or conditionally return values or more dynamic stuff
-- RETURN QUERY is generally more succinct and faster
CREATE OR REPLACE FUNCTION fn_plpgsqltestmulti(param_subject varchar)
RETURNS TABLE(test_id integer, test_stuff text)
AS
$$
DECLARE
var_r record;
BEGIN
FOR var_r IN(SELECT id, test
FROM test WHERE test LIKE param_subject) LOOP
test_id := var_r.id ; test_stuff := var_r.test;
RETURN NEXT;
END LOOP;
END;
$$
LANGUAGE 'plpgsql' VOLATILE;
-- example use
-- This is legal in PostgreSQL 8.4+
-- (prior versions plpgsql could not be called this way)
SELECT (fn_plpgsqltestmulti('Cheese%')).test_stuff;
SELECT test_stuff FROM fn_plpgsqltestmulti('Cheese%');
-- All return:
test_stuff
------------
Cheese
Cheese Dog
Which is faster OUT or RETURNS TABLE
It's great that there are so many ways you can choose to hang yourself, but which way is the least painful and the quickest? I was wondering that myself so I did some tests and the tests seemed to suggest there is no difference in speed, however it suggested something else I really wasn't expecting (about 9.0) that what I thought was really bad form for speed (though shorter) becomes faster or a non-issue when you move to 9.0
CREATE OR REPLACE FUNCTION testspeed_table(it numeric(20))
RETURNS TABLE(newit numeric(20), itprod numeric(20))
AS
$$
SELECT j::numeric(20), $1*j::numeric(20) As itprod
FROM generate_series(1,$1::bigint) As j;
$$
LANGUAGE 'sql' VOLATILE;
CREATE OR REPLACE FUNCTION testspeed_out(it numeric(20),
OUT newit numeric(20), OUT itprod numeric(20) )
RETURNS setof record
AS
$$
SELECT j::numeric(20), $1*j::numeric(20) As itprod
FROM generate_series(1,$1::bigint) As j;
$$
LANGUAGE 'sql' VOLATILE;
The results suggest that what I thought was the case as described in Output parameters, custom data type gotchas may not always be the case or at least not for SQL written functions. More analysis is called for. Also notice how the speeds flips between 8.4 and 9.0 (or at least on Windows), where calling the ().* was worse in 8.4 and in 9.0 actually faster than the FROM syntax. This is something I'll need to investigate with plpgsql functions with real tests and on Linux platform as well to make sure its not abberations in my development environment or idiosyncracies of the test example. I would be interested in knowing the explanations of others regarding what we consider a bit surprising of results and under what conditions this manifests.
-- On a 9.0.2 Windows 32-bit
-- 1813 ms, 1750 ms
SELECT (testspeed_table(300000)).*;
-- 2453 ms, 2234ms, 2031ms
SELECT * FROM
testspeed_table(300000);
-- 1781ms, 1750ms
SELECT (testspeed_out(300000)).*;
-- 2766ms, 2015ms
SELECT *
FROM testspeed_out(300000);
-- On a 8.4 Windows 32-bit
-- 3062 ms, 3047 ms
SELECT (testspeed_table(300000)).*;
-- 2719ms, 2234ms, 2218ms
SELECT * FROM
testspeed_table(300000);
-- 3078 ms, 3078 ms, 3063 ms
SELECT (testspeed_out(300000)).*;
-- 2922 ms, 2860 ms,
SELECT *
FROM testspeed_out(300000);
Product / Service Contrast Compare
GoGrid and Amazon EC Cloud Servers compare
Some people have asked us our thoughts on what the best cloud hosting provider is for them. The answer is as you would expect, it depends. I will say right off, that our preferred at the moment is GoGrid, but that has more to do with our specific use-cases than GroGrid being absolutely better than Amazon. The reason we choose GoGrid most of the time over Amazon is we know we need the server on all the time anyway, we run mostly windows servers, we like the real live e-Mail, phone, personalized support they offer free of charge and we absolutely need to have multiple public IPs per server since we have multiple SSL sites per server (and SSL unless you go for the uber *.domain version can't be done with one IP). GoGrid starts you off with 16 public ips you can distribute any way you like. Amazon is stingy with IPs, and you basically only get one public per server unless I misunderstood. In some cases just like when we are developing for a client and they are playing around with various speeds on various OS, Amazon EC is a better option since you can just turn off the server and not incur charges. In GoGrid, you have to delete the server instead of just shutting it down.
The cloud landscape is getting bigger and more players coming on board which is good since it means you are less likely to be stuck with a provider and you have more bargaining options. We only have experience with GoGrid and Amazon EC, so we can't speak for the others. Other providers we'd like to try are SkyGone (specifically for PostGIS and other GIS hosting), RackSpace Cloud, etc. but we haven't used those so can't speak for them, but each has their own little gotchas and gems in their offerings that makes them better suited for certain needs and out of the question for others. We are just talking about Cloud server hosting, not other services like cloud application services (like what Microsoft Azure offers), Relational Database Services Like (Amazon RDS (built on MySQL) or Microsoft SQL Azure (built on SQL Server 2008)), file server services, SasS cloud like SalesForce etc, though many cloud servers (e.g. both GoGrid and Amazon include some cloud storage space pre-packaged with their cloud server hosting plans). I find all those other cloud offerings like database only hosting a bit scary, mostly because haven't experimented with them.
These are the key metrics we judge cloud server hosting plans by and sure there are more, but these are the ones that are particularly important to us when making decisions and what controls our decisions on which to deploy on. Keep in mind we work mostly with Small ISVs,new Dot coms, non-Profits that work with other non-Profits but need an external secure web application (SSL) to collect data. All that scaling and stuff we haven't really had much of a need for and our clients running much larger servers are still leery of trusting the cloud for that because of lack of control of disk types, the pricing of larger servers etc. For those type of clients if we go with cloud, we'd probably choose GoGrid since they offer a combo plan using real servers and cloud servers. I will say that for pretty intense PostGIS spatial queries with millions records of a range of geometry types and sizes (anywhere from single points to multipolygons with 20 to 80,000 or more vertices), we've been using GoGrid and been surprised how well the performance is on a modest Dual core 2GHz RAM running Windows 2008 (32-bit) - I'm talking queries that return 50 - 2000 records on a specified user drawn spatial region (out of a selection of 3 million records), simplify, transform on the fly, return spatial intersections and all usually under 4-12 seconds (from generation of query to outputting on a web client). This is even with running the web server on the same box as the database server. We haven't run anything that intensive on Amazon EC instance so can't compare.
Note that GoGrid has their own chart comparing EC2 and Rackspace with their offering so you might want to check it out. I must also say that these are purely our opinions and we were not influenced by any monetary compensation to say them.
Feature | GoGrid | Amazon EC |
---|---|---|
# of Public Ips | 16 - you can have multiple ips allocated to a server which is nice if you are running multiple SSL (different main domain) sites on a single server. | 1 per server, sucks for SSL - requires you to get an expensive *domain SSL if you want to run SSL on different domains. Note its fine for standard wildcard subdomains (a.myplace.com, b.myplace.com etc since those can be done with fairly cheap, but of course if you only need 2 its probably cheaper to have separate ips and separate single ssls) |
24 hr Real live support | Yes, and they'll try to help even if it is your fault and they have fairly quick response times. You can have live chats with support crew. | No, It's an almost completely automated system based on forms. Kind of scary if you are used to having correspondence with real people. |
Creating images | In the past GoGrid only let you build images in what they called a "SandBox". This literally changed I think in the past 2 weeks, because when I went to go build my next server recently, the sandbox was gone and in its place was a Save button for my regular server. This makes the process of saving images now about as easy as Amazon's. The main caveat is that you can't save a server that is greater than 50GB. So witht hat said you'll probably still want to keep a light weight image. | Yes, Amazon set the bar for which all cloud providers offering this feature will be judged by. |
Instance configuration and pricing Windows (Dual Core ~2GHz / core 2GB RAM) (for both if you want to run SQL Server Standard expect to pay an additional $1 or more / hr :)) MySQL Community or SQL Server Express is free on both and are installed on many images already. PostgreSQL you generally need to install yourself, which is 5 minutes to install and we much prefer setting up our own database server thank you very much. | ($0.16/hr, monthly professional $200, pay as you go $0.39/hr) for windows Business Professional gets you dual core 2GHz, 2GB RAM, 1 + you get an extra 1GB server or you can put the extra to get a 4GB server and pay ~$230. | (Dual core processor / 2 GB RAM) - ~$200/month (on-demand)($0.29/hour), (~150/month reserved instance $0.13/hour + 1 time $450/yr ) (Amazon calls this configuration their High-CPU Medium Instance (c1.medium) |
Instance configuration and pricing Linux (Dual Core ~2GHz / core 2GB RAM) | ~$200/month for Linux (they charge the same for CentOS Linux or Windows or SUSE Linux) gets you dual core 2GHz, 2GB RAM, 120 gig drive + an exttra 1.5 GB RAM hr to spare. | ~$130/month (on-demand)($0.17/hour), (~$80/month reserved instance $0.06/hr + 1 time $450/yr ) (Amazon calls this configuration their) gets you a High-CPU Medium Instance (c1.medium) |
Pre-built Images | They have a few but no where near as many as Amazon and they only recently launched their community images program to allow people to make their images public like what Amazon has had for a while | Lots and some with PostgreSQL already pre-configured (at least for Unix). |
Can shut off server to not get charged | No - have to delete the server | Yes - we haven't played with the API features of each so not sure if you can do this with an API call. Anyrate this does make Amazon more ideal for making build farms. |
Hybrid model of Cloud and Dedicated servers | Yes - for high-end servers, the dedicated pricing is cheaper too and you can setup private networks between the two. This is because GoGrid is an off-shoot of ServerPath, a standard dedicated hosting provider | No |
Build Wizards | Yes - they had theirs before Amazon and I still find it easier to use than Amazon's and that is the main reason we chose them over Amazon in the beginning. | Yes, but they started off with an API mode and probably still have the strongest API deploy which is probably more attractive to Linux Administrators than it is to Windows Administrators. |
Cloud Storage | Yes - 10GB free/month + 0.15 for each additional GB you use (which is charged at maximum you use at any point in that month). We usually use around 60GB mostly to store our images and simple backups so we pay like $6/month. Note if you need more, you'd be better adding one of their storage plans. Cloudstorage is only available via your GoGrid private network so really mostly for backup. Data transfer is free. | Yes, via Elastic Block Storage or S3. EBS is actually what your data lives on where as in GoGrid case the data is part of the server. EBS can be backed up to Simple Storage (S3)which can be accessed from anywhaere. S3 pricing starts at $.14/GB/month and goes down to 0.055 if you use over 5000TB. Data transfer out is $0.100/GB external and free internal. |
Trial Plans | They used to offer a 1 month trial which would cover 1/2 GB RAM server, but that seems to have ended. | Yes currently have a deal - They offer a micro instance (basically a 613MB Single core server) for 750 hours Linux/Unix each month for one year. |
Pricing Schemes
Sometimes I think all cloud providers and it's probably true of most industries are involved in a conspiracy scheme to confuse you with their pricing to get the most bit of money they can get out of you and to ensure you can never exactly compare their pricing to any other cloud provider's pricing. So I have warned you. I'm not sure I quite understand even the GoGrid and Amazon pricings except to say you can't compare them directly to know which one you are getting a better deal on. It really depends how you use your servers and your comfort level with automated support. Plus they keep on changing their services. For example Amazon is soon to release, if they haven't already, their real live support option and GoGrid recently launched their community shared images and partner images and now allow saving servers (no longer a sandbox) presumably to compete with Amazon's community AMI model and save server feature.
Amazon Reserved Instance Pricing vs. On Demand Pricing
For Elastic Compute (EC) services, Amazon has what they call Reserved Instance vs. On Demand Pricing and I really wish I had paid attention earlier because we could have saved some money.
Reserved Instance pricing is your best bet if you know you are going to be using Amazon for a year or more and you know the kind of server size and processing you need. So how it works is you buy a Reserved instance for a specific processor/ram size and pay an immediate cost (lets call this insurance money, its like health insurance because it allows you to pay what I will call a co-pay price for your usage, but doesn't cover all ailments) and then you get charged from then on the lower usage rate (this is like your co-pay) which is usually around 50% off of the On Demand pricing for an instance of what you reserved for. Keep in mind you can still turn your server off without incurring charges and so forth, but you don't get back the insurance, so if you are doing just development and have your instance turned off most of the time, it probably isn't worthwhile to do Reserved Instance since you loose I'll call it the year insurance you paid to reserve the instance. So our suggestion, if you don't know what kind of server will work for you, start with on demand, but once you've found the sweet spot of price/functionality, add a reserved instance for that size.
GoGrid Monthly Plan vs. Pay As You Go
Go Grid has a similar concept to Amazon's Reserved Instance / On Demand, but it's different in that they could care less what kind of server you are deploying and how many you are deploying. They charge you based on what they call RAM hours -- a 1 GB RAM server takes up 1*24*30 (720) RAM hours a month, a 2 GB RAM would be 2*24*30 (1440) etc.. . To not get charged for a server, you have to delete the server (shutting it off is not good enough as you can with Amazon). They have different monthly plans with the Professional Cloud being the first tier and probably the most common which buys you a prepaid (2500 RAM hours = ~ 3.5 GB Dual-Quad Core server). With that they charge you $200 month and are guaranteed $200/month from you even if you don't completely utilize your allocated RAM hours and can charge you overage charges if you exceed the GB RAM hour allocation (for overage you page $0.09/GB Hr instead of the prepaid 0.08/GB hr (so it's still pretty reasonable). The overage charge is still much cheaper than Pay As you go pricing. What you get basically $0.08 per Server RAM hour (so a 2GB Server uses 2 RAM hours per hour ~$0.16) -- if you were to compute the charge of this, you would find it costs $~130/month, but you are paying $200. The extra can get you another RAM hour which you can either use to build a 1 GB RAM/1 core server or add in. Unfortunately, the way they scale the servers, you can only bump up a 2GB server in 2GB/2 core increments which means you either lose the rest of the $200 you are paying, use it toward a 1GB RAM, have a sandbox of some other computer running half the time, or you bump your server up to 4GB and pay a $0.09 overage per hour for the additional 0.5 GB RAM hour you are using (that will put you at around ~$230 for a 4GB RAM/4 core server). They did within the past year introduce the option of allowing you to bump up and down the GB RAM of your server as needed so you could bump up your server to 4GB RAM when you need a bit more processing power and bump down when done. This option is only available with newer built servers, so long time customers will have to rebuild their server to get this feature.
Comparing servers
Each provider measures their server configurations differently. I find GoGrid's a bit easier to navigate since they use more standard terminology and have fewer permutations. With Amazon you need to do a bit of more translation and they use terminology like compute units, medium processor high memory etc which is all terminology specific to them.
Amazon Servers
Amazon has what they call instance types. They measure their processors in what they call EC2 compute units - 1 EC2 Compute = 1-1.2 GHz Xeon/Opteron processor. The High CPU medium instance (basically 350 local storage, ~ 2 virtual cores (2.4 GHz), 1.7 GB RAM) is the only one we've tried and seems to work well for the workloads we have tried them on. We'll soon be experimenting with the micro instances for development testing purposes. The pricing on Amazon servers is different for Linux vs. Windows with Linux being much cheaper. So you generally get a better deal on Linux for Amazon and about the same deal on Amazon / GoGrid for Windows all else being the same. I haven't paid too much attention to the transfer charges since we usually seem to be within reasonable limits that we haven't noticed a difference. If you use a lot of data transfer or disk space, you'll want to scrutinize that closer since I think GoGrid and Amazon differ a lot in terms of when they start charging you for transfer and what they charge as well as alotted disk space. We haven't scrutinized to this level of detail.
GoGrid Servers
GoGrid server configurations are much easier to figure out but of course means fewer options. The processors are about the same kind of processor -- each core being about (~2GHz) and you get charged per GB RAM with the more GB RAM you have per server, the more cores you get. So if you have 2GB RAM you get 2 cores, 4GB RAM you get 4 cores, and so on and also the more storage you get. You can use their pricing calculator to get a better sense.
Service Level Agreements (SLAS)
GoGrid has a 100% uptime SLA while Amazon has 99.5% SLA and GoGrid compensation is very generous, relative to other cloud hosting providers. I tend to ignore these most of the time. I'm more concerned with actual uptime since $100 is not going to compensate you for loss of business. I will say that early on when GoGrid was starting there were more outages than I would have liked such that it did make us consider switching, but things have gotten a 1000 times better such that we experience very few if any outages these days. Here is an interesting article someone posted in Reddit. Do SLAs really matter? A 1 year case study of 38 cloud services.