Thursday, February 14. 2013
Continue reading "Saying Happy Valentine in PostGIS"
A while back I mentioned to Bborie (aka dustymugs) and Sandro (aka strk): We're missing ability to label our images with text.
Picture this: What if someone has got a road or a parcel of land and they want to stamp
the parcel id or the road name on it and they don't want to have to depend on some piece of mapping software. Many report writers and databases can talk to each other directly
such as we demonstrated in Rendering PostGIS raster graphics with LibreOffice
and we've got raster functionality in the database already. Can we cut out the middleman?
Some ideas came to mind. One we could embed a font lib into PostGIS thus having the ultimate bloatware minus the built-in coffee maker.
strk screeched. He's still recovering from my constant demands of having him upgrade his PostgreSQL version. Okay fair enough.
Bborie suggested why don't you import your letters as rasters and then vectorize them. So that's exactly what we are going to demonstrate in this article
and in doing so we will combine some of the new features coming in PostGIS 2.1 we've been talking about in Waiting for PostGIS 2.1 series.
Warning this article will have a hint of Rube Goldbergishness
If you are easily offended by seeing stupid tricks done with SQL, stop reading now.
We are going to take a natural vector product and rasterize it just so
we can vectorize it again so we can then rasterize it again. Don't think about it too much. It may trigger activity in parts of your brain you didn't know were there
thus resulting in stabbing pains similar to what you experience by quickly guplping down a handful of Wasabi peas.
So here are the steps for creating your own font set you can
overlay on your geometries and rasters.
See if you can spot the use of window functions and CTEs in these examples.
Sunday, July 15. 2012
Continue reading "Foreign Data Wrap (FDW) Text Array, hstore, and Jagged Arrays"
As we discussed in file_textarray_fdw Foreign Data Wrapper, Andrew Dunstan's text array foreign data wrapper works great for bringing in a delimited file and not having to worry about the column names until they are in.
We had demonstrated one way to tag the field names to avoid having to keep track of index locations, by using hstore and the header column in conjunction.
The problem with that is it doesn't work for jagged arrays. Jagged arrays are when not all rows have the same number of columns. I've jury rigged a small example
to demonstrate the issue. Luckily with the power of PostgreSQL arrays you can usually get around this issue and still have nice names for your columns. We'll demonstrate that too.
Tuesday, May 10. 2011
What is the difference between CURRENT_TIMESTAMP and clock_timestamp()
CURRENT_TIMESTAMP is an ANSI-SQL Standard variable you will find in many relational databases including PostgreSQL, SQL Server, Firebird, IBM DB2 and MySQL to name a few
that records the start of the transaction. The important thing to keep in mind about it is there is only one entry per transaction so if you have a long running transaction,
you won't be seeing it changing as you go along.
clock_timestamp() is a PostgreSQL function that always returns the current clock's timestamp. I don't think I'm alone in using it for doing simple benchmarking and other things
where for example I need to record the timings of each part of a function within the function using pedestrian RAISE NOTICE debug print statements.
There is another cool way I like using it, and that is for a batch of records each with an expensive function call, benchmarking how long it takes to process each record.
One of the things I'm working on is improving the speed of the tiger_geocoder packaged in PostGIS 2.0. The first root of attack seemed to me would be the normalize_address function
which I was noticing was taking anywhere from 10% to 50% of my time in the geocode process. That's a ton of time if you are trying to batch geocode a ton of records. The thing is
the function is very particular to how badly formed the address is so a whole batch could be held up by one bad apple and since the batch doesn't return until all are processed, it makes
the whole thing seem to take a while.
So rather than looping thru each, I thought it would be cool if I could run the batch, but for each record have it tell me how long it took to process relative to the rest so I could get
a sense of what a problem address looks like. So I wrote this query:
the_time - COALESCE(lag(the_time) OVER(ORDER BY the_time), CURRENT_TIMESTAMP) As process_time,
the_time - CURRENT_TIMESTAMP As diff_from_start
FROM (SELECT address_1, city, state, zip,
pprint_addy(normalize_address(coalesce(address_1,'') || ', ' || coalesce(city || ' ','') || state || ' ' || zip)) As pp_addr,
clock_timestamp() As the_time
FROM testgeocode LIMIT 1000) As foo )
WHERE process_time > '00:00:00.016'::interval;
Which returned an output something like this:
address_1 | city | state | zip | pp_addr | the_time | process_time | diff_from_start
48 MAIN ST .. | S.. | MA | 021.. | 48 MAIN .. | 2011-05-10 03:24:43.078-04 | 00:00:00.032 | 00:00:00.032
15 ... | | MA | 018... | 15 GREN... | 2011-05-10 03:24:50.796-04 | 00:00:00.031 | 00:00:07.75
Wednesday, March 30. 2011
I am happy to report, that the final proof of the PostGIS in Action E-Book got released today
and the printed version is scheduled for release Aprill 11th, 2011 and should be available on Amazon and other locations around then. The other e-Reader formats will come after that.
You can buy from here or download the two free chapters, if you haven't already.
Each hard-copy purchase comes with a free E-Book version. There is a coupon in the back of the book when you get it to get the E-Book versions.
Yes, I know it's been a really really long time.
On the bright side, we produced twice as much content as we had set out to do and that was with keeping things as concise as we
could get away with, still managing to cover more than we set out to cover, and stripping out as many unnecessary words as we could muster.
So 520 pages and almost 2 years later, this is where we are.
A good chunk of the additional bulk of the book was the appendices which are about 150 pages
total and focus strictly on PostgreSQL and SQL. After many comments from early reviewers, we thought it unfair not to have a good chunk of PostgreSQL
and just general relational database content to familiarize programmers and GIS folks with the RDBMS that PostGIS lives in. Most GIS folk unfortunately
have the hardest time with getting up to speed with SQL and just standard RDBMS management.
Two free chapters and accompanying code for all chapters
The two free chapters we selectively picked because we thought they would be most beneficial to newcomers and people new to relational databases.
So the free chapters are:
- Chapter 1: What is a spatial database? Which provides a fast paced history of PostGIS, PostgreSQL, Spatial Databases and moves into
an even faster journey into converting flat file restaurant locations to spatial point geometries, loading in an ESRI shapefile of roads. Then shows you how to write standard
spatial queries and render the results.
- Appendix C: SQL Primer -- goes through querying information_schemas, the common points of writing SELECT, INSERT, UPDATE, DELETE SQL statements and the finer points of using aggregate functions, Windowing constructs and common table expressions as well
as a brief overview of how PostgreSQL stacks up with other relational databases (SQL Server, Oracle, IBM DB2, MySQL, Firebird) in SQL features.
- All the chapter code and accompanying data. It's a bit hefty at 57 MB.
So even if you don't buy our book, we hope you find the free chapters useful.
You can get a more detailed listing of all the chapters from the PostGIS in Action book site.
We'd like to thank all those who supported us through this long and unpredictable journey. Hopefully we'll have several more, though hopefully
a bit less nerve-racking than this first one.
Friday, December 24. 2010
Continue reading "String Aggregation in PostgreSQL, SQL Server, and MySQL"
Question: You have a table of people and a table that specifies the activities each person is involved
in. You want to return a result that has one record per person and a column that has a listing of activities for each person
separated by semicolons and alphabetically sorted by activity. You also want the whole set alphabetically sorted by person's name.
This is a question we are always asked and since we mentor on various flavors of databases,
we need to be able to switch gears and provide an answer that works on the client's database. Most
often the additional requirement is that you can't install new functions in the database. This means that
for PostgreSQL/SQL Server that both support defining custom aggregates, that is out as an option.
Normally we try to come up with an answer that works in most databases, but sadly the only solution that works in
most is to push the problem off to the client front end and throw up your hands and proclaim -- "This ain't something that should be
done in the database and is a reporting problem." That is in fact what many database purists do, and all I can say to them is wake up and smell the coffee before you are out of a job.
We feel that data
transformation is an important function of a database, and if your database is incapable of massaging the data into a format
your various client apps can easily digest, WELL THAT's A PROBLEM.
We shall now document this answer rather than trying to answer for the nteenth time. For starter's
PostgreSQL has a lot of answers to this question, probably more so than any other, though some are easier to execute than others
and many depend on the version of PostgreSQL you are using. SQL Server has 2 classes of answers neither of which is terribly appealing,
but we'll go over the ones that don't require you to be able to install .NET stored functions in your database since we said that is often a requirement.
MySQL has a fairly
simple, elegant and very portable way that it has had for a really long time.
Monday, October 05. 2009
Continue reading "Allocating People into Groups with SQL the Sequel"
In our prior story about allocating people with the power of window aggregation, we saw our valiant hero and heroine trying
to sort people into elevators
to ensure that each elevator ride was not over capacity. All was good in the world until someone named Frank came along and spoiled the party.
Frank rightfully pointed out that our algorithm was flawed because should Charlie double his weight, then we could have one elevator ride over capacity.
We have a plan.
Sunday, August 16. 2009
Continue reading "Using Recursive Common table expressions to represent Tree structures"
A very long time ago, we wrote an article on how to use PostgreSQL to show the fully qualified name of an item in an inventory tree.
Basically we were modeling a paper products tree. The original article can be found here Using PostgreSQL User-Defined Functions to solve the Tree Problem and was based on PostgreSQL 7.4 technology.
We'll repeat the text here for completeness and demonstrate the PostgreSQL 8.4 that solves this and more efficiently.
Thursday, July 16. 2009
Continue reading "PostgresQL 8.4: Common Table Expressions (CTE), performance improvement, precalculated functions revisited"
Common table expressions are perhaps our favorite feature in PostgreSQL 8.4 even more so than windowing functions. Strangely enough I find myself using them more in SQL Server too now that PostgreSQL supports it.
CTEs are not only nice syntactic sugar, but they also produce better more efficient queries. To our knowledge only Firebird (see note below), PostgreSQL,SQL Server, and IBM DB2 support this, though I heard somewhere
that Oracle does too or is planning too UPDATE: As noted below Oracle as of version 9 supports non-recursive CTEs. For recursion you need to use the Oracle proprietary corresponding by syntax.
As far as CTEs go, the syntax between PostgreSQL, SQL Server 2005/2008, IBM DB2 and Firebird
is pretty much the same when not using recursive queries. When using recursive queries, PostgreSQL and Firebird use WITH RECURSIVE to denote a recursive CTE where as SQL Server and IBM DB2 its just WITH.
All 4 databases allow you to have multiple table expressions within one WITH clause anda RECURSIVE CTE expression can have both recursive and non-recursive CTEs. This makes writing complex queries especially where you have the same expressions used multiple times in the query,
a lot easier to debug and also more performant.
In our article on How to force PostgreSQL to use a pre-calculated value
we talked about techniques for forcing PostgreSQL to cache a highly costly function. For PostgreSQL 8.3 and below, the winning solution was using OFFSET which is not terribly cross platform and has the disadvantage of
materializing the subselect. David Fetter had suggested
for 8.4, why not try CTEs. Yes CTEs not only are syntactically nice, more portable, but they help you write more efficient queries. To demonstrate, we shall repeat the same exercise we did in that
article, but using CTEs instead.