A while back I mentioned to Bborie (aka dustymugs) and Sandro (aka strk): We're missing ability to label our images with text.
Picture this: What if someone has got a road or a parcel of land and they want to stamp
the parcel id or the road name on it and they don't want to have to depend on some piece of mapping software. Many report writers and databases can talk to each other directly
such as we demonstrated in Rendering PostGIS raster graphics with LibreOffice
and we've got raster functionality in the database already. Can we cut out the middleman?
Some ideas came to mind. One we could embed a font lib into PostGIS thus having the ultimate bloatware minus the built-in coffee maker.
strk screeched. He's still recovering from my constant demands of having him upgrade his PostgreSQL version. Okay fair enough.
Bborie suggested why don't you import your letters as rasters and then vectorize them. So that's exactly what we are going to demonstrate in this article
and in doing so we will combine some of the new features coming in PostGIS 2.1 we've been talking about in Waiting for PostGIS 2.1 series.
Warning this article will have a hint of
Rube Goldbergishness in it.
If you are easily offended by seeing stupid tricks done with SQL, stop reading now.
We are going to take a natural vector product and rasterize it just so
we can vectorize it again so we can then rasterize it again. Don't think about it too much. It may trigger activity in parts of your brain you didn't know were there
thus resulting in stabbing pains similar to what you experience by quickly guplping down a handful of Wasabi peas.
So here are the steps for creating your own font set you can
overlay on your geometries and rasters.
See if you can spot the use of window functions and CTEs in these examples.
Step 1: Find a font you like
I haven't tried, but in theory you can just write out letters evenly spaced on a piece of paper and scan the paper so you have a single image.
This would allow you to automate your own writing in the database thus allowing your database be able to impersonate you. A very exhilarating, but slightly disturbing idea.
Too many years of using a keyboard has made my technical drawing skills less than legible, so instead:
- I grabbed a free font set called Kankin
- Installed it on my computer
- Opened up a spreadsheet and in each cell typed in a different letter across with the font.
- Took a snapshot image of masterpiece spreadsheet. Which gave this image:
- Imported this image into PostGIS with raster2pgsql using command-line packaged tool:
raster2pgsql -F -Y pics/fonts/kankin.png font_pics | psql -U postgres -d testpostgis210 -h localhost -p 5432
For the rest of these steps you are going to need to be running a fairly recent PostGIS 2.1.0 development version which has the ubber-cool ST_Tile function which we demonstrated here
and is just well super cool.
Step 2: Chop your image into letter bits, reclass, vectorize
For this exercise I'm assuming the letters are evenly spaced. One of the changes in 2.1.0 is that ST_Polygon function ALWAYS returns a multipolygon.
Prior versions sometimes returned Polygons.
- Create a table to hold new letter vectors:
CREATE TABLE font_set(letter char(1), geom geometry(Multipolygon), font_name varchar(50)
, CONSTRAINT pk_font_set PRIMARY KEY (letter,font_name));
Do a quick histogram using the raster ST_Histogram function to get a sense of where the pixel values that repesent letters. In a simple world every thing would be as clear as black and white
. You'd only have 2 values: that which represents
the drawing of a letter and that which represents whitespace. But our world has all this noise caused by differences in pressure you put on your pen, the ink running out and so forth,
and the imperfectness of your snapshotting. Our histogram demonstates the issue.
WITH fp AS(SELECT ST_Tile(ST_Band(rast,1),ST_Width(rast)/36, ST_Height(rast)) As rast
FROM font_pics LIMIT 1 )
SELECT (h).*
FROM (SELECT ST_Histogram(rast) As h
FROM fp) As foo;
The output of the above query is shown below and was so fast I didn't even bother timing it - so as you can see our letter drawing probably falls in the 0-30 range.
min | max | count | percent
--------+---------+-------+----------------------
0 | 2.9375 | 1984 | 0.11372234323054
2.9375 | 5.875 | 23 | 0.00131835377737017
5.875 | 8.8125 | 32 | 0.00183423134242806
8.8125 | 11.75 | 16 | 0.000917115671214032
11.75 | 14.6875 | 18 | 0.00103175513011579
14.6875 | 17.625 | 35 | 0.00200619053078069
17.625 | 20.5625 | 16 | 0.000917115671214032
20.5625 | 23.5 | 22 | 0.00126103404791929
23.5 | 26.4375 | 15 | 0.000859795941763155
26.4375 | 29.375 | 16 | 0.000917115671214032
29.375 | 32.3125 | 0 | 0
32.3125 | 35.25 | 0 | 0
35.25 | 38.1875 | 36 | 0.00206351026023157
38.1875 | 41.125 | 36 | 0.00206351026023157
41.125 | 44.0625 | 541 | 0.0310099736329245
44.0625 | 47 | 14656 | 0.840077954832053
Looking at the above and eyeballing looks like we want to keep 0 - 38 and chuck the rest. What we need is a way to reclassify our world into black and white.
Luckily PostGIS 2+
has a function ST_Reclass that allows us to chuck the pixel values we don't want.
Now we reclassify and vectorize. See the Seamless Vector / Raster operations flying around like a lab experiment gone nuts.
We leave it as an exercise to the reader to figure out what we are doing in this snippet -- or you can read the Second edition of PostGIS in Action book in our upcoming Chapter 7.
Chapter 7 first draft will be available on MEAP next month and we hope to explain some of these operations.
INSERT INTO font_set(letter, geom, font_name)
WITH fp AS(SELECT ST_Tile(ST_Band(rast,1),ST_Width(rast)/36, ST_Height(rast)) As rast
FROM font_pics WHERE filename = 'kankin.png' )
, fpi As (SELECT (ROW_NUMBER() OVER())::integer As pos, rast
FROM fp)
SELECT substring('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', pos,1) As letter
, ST_Polygon(
ST_SetUpperLeft(ST_SetBandNoDataValue(
ST_Reclass(ST_Band(rast,1)
,'[0-38]:1,(38-255]:0'::text, '1BB'::text)
,0),0,0)
) As geom
, 'kankin'
FROM fpi;
Putting our fontset to work
So now we've got a table of vectorized letters. To make our new vectorized fontset easier to use, we create a word function.
CREATE OR REPLACE FUNCTION get_word_vector(param_word text, param_font_name text)
RETURNS geometry
AS
$$
WITH letters As
(SELECT geom, width, sum(width + 1) OVER (ORDER By pos) As run_length
FROM (SELECT geom, letter,ROW_NUMBER() OVER() As pos, ST_XMax(geom) As width
, ST_YMax(geom) As height
FROM font_set CROSS JOIN regexp_split_to_table($1, E'') As l
WHERE letter = l AND font_name = $2) As foo
)
SELECT ST_Union(
ST_Translate(geom, run_length - width,0)
) As word_geom
FROM letters;
$$
language 'sql';
And now we use it to write a somewhat complicated looking SQL statement that can probably be wrapped into a function and be significantly shorter.
This utilizes the ST_Union aggregate function for raster, which is much improved in speed in upcoming 2.1.
WITH the_words AS
(SELECT pos, ST_Translate(get_word_vector(word, 'kankin'), 0, -pos*100) As geom
FROM (VALUES (0,'HAPPY'),(1,'POSTGIS'),(2,'VALENTINE') ) AS my(pos, word)
)
, canvas
AS (SELECT ST_MakeEmptyRaster((ST_XMAX(ext) - ST_XMin(ext))::integer,
(ST_YMAx(ext) - ST_YMin(ext))::integer,0, 0, 1, -1, 0, 0, 0) As rast
FROM (SELECT ST_Extent(geom) As ext FROM the_words) As foo
)
SELECT ST_AsPNG(
ST_Resize(ST_Union(ST_AsRaster(geom
, canvas.rast
, ARRAY['8BUI','8BUI', '8BUI']::text[]
, ARRAY[pos*75,6,0], ARRAY[0,0,0])
), 0.3,0.3))
FROM the_words CROSS JOIN canvas;
Which of course outputs this: