In this article we'll demonstrate how to install JSON:select and conclude with how to use it to manipulate the GeoJSON Feature Collection we discussed in Creating GeoJSON Feature Collections with JSON and PostGIS.
Creating an in-db JSON selector function
We're stealing this example
from Will's presentation described in his slide 71 and the documented json selector code from: https://github.com/lloyd/JSONSelect/blob/master/src/jsonselect.js. For client side javascript, you'd want to use the minified version, but since we are using
this for server-side in-db code we don't need to worry about code weight and prefer readability over size. You'll need to install PL/V8 in your database before you can install the select code which means you have to have the binaries installed and then in database to plv8 enable:
CREATE EXTENSION plv8;
Remember if you are on Windows, we just made fresh compiled binaries for windows PostgreSQL 9.2, and if you are not, you can compile your own or beg your package maintainer to offer PL/V8 extension as an option if they don't have it already.
The JSON selector code is pretty long, but here is the basic structure of the function.
CREATE OR REPLACE FUNCTION
json_select(selector text, data json)
returns json as $$
exports = {};
//copy the contents of jsonselect.js here
return JSON.stringify(
exports.match(selector,
JSON.parse(data)));
$$ LANGUAGE plv8 IMMUTABLE STRICT;
Now let's test this out with our GeoJSON example:
SELECT json_select('.properties .loc_name', '{"type":"FeatureCollection",
"features":[
{"type":"Feature","geometry":{"type":"Point","coordinates":[42.400469999999999,-71.2577]},
"properties":{"loc_id":1,"loc_name":"Waltham, MA"}},
{"type":"Feature","geometry":{"type":"Point","coordinates":[42.990189999999998,-71.462590000000006]},
"properties":{"loc_id":2,"loc_name":"Manchester, NH"}},
{"type":"Feature","geometry":{"type":"Point","coordinates":[-96.757239999999996,32.909770000000002]},
"properties":{"loc_id":3,"loc_name":"TI Blvd, TX"}}
]
}');
--- output is huh JSON array object? ---
["Waltham, MA","Manchester, NH","TI Blvd, TX"]
If you are like us you prefer sets over JSON thingies, so how do we go from this JSON thing to a row set we are more familiar with working with?
Calling PLV8 functions from other PL/V8 functions
It turns out calling PL/V8 (javascript) functions from others is easy to do as documented in PLV8 Wiki. Soooo:
CREATE OR REPLACE FUNCTION json_select_multi(selector text, data json)
returns SETOF text as $$
var func = plv8.find_function("json_select");
var o = JSON.parse(func(selector,data));
for(var i=0; i < o.length; i++){
if (typeof o[i] == 'object') {
if (o[i].length > 0){
for (var j=0; j < o[i].length; j++) {
plv8.return_next(JSON.stringify(o[i][j]) )
}
}
else {
plv8.return_next(JSON.stringify(o[i]));
}
}
else {
plv8.return_next(o[i]);
}
}
$$ LANGUAGE plv8 IMMUTABLE STRICT;
Testing the functions
Test our new set function.
SELECT json_select_multi('.properties .loc_name', '{"type":"FeatureCollection",
"features":[
{"type":"Feature","geometry":{"type":"Point","coordinates":[42.400469999999999,-71.2577]},
"properties":{"loc_id":1,"loc_name":"Waltham, MA"}},
{"type":"Feature","geometry":{"type":"Point","coordinates":[42.990189999999998,-71.462590000000006]},
"properties":{"loc_id":2,"loc_name":"Manchester, NH"}},
{"type":"Feature","geometry":{"type":"Point","coordinates":[-96.757239999999996,32.909770000000002]},
"properties":{"loc_id":3,"loc_name":"TI Blvd, TX"}}
]
}');
--output is a set of text much better :) ---
json_select_multi
-------------------
Waltham, MA
Manchester, NH
TI Blvd, TX
Repurposing relational technology
Now if I were a blasphemous document loving person, with little respect for relational databases, I may not be satisfied with just being able to parse JSON documents, but would also want to store them in the database?
CREATE TABLE myfeatures(id serial primary key, data json);
INSERT INTO myfeatures(data)
SELECT json_select_multi('.features', '{"type":"FeatureCollection",
"features":[
{"type":"Feature","geometry":{"type":"Point","coordinates":[42.400469999999999,-71.2577]},
"properties":{"loc_id":1,"loc_name":"Waltham, MA"}},
{"type":"Feature","geometry":{"type":"Point","coordinates":[42.990189999999998,-71.462590000000006]},
"properties":{"loc_id":2,"loc_name":"Manchester, NH"}},
{"type":"Feature","geometry":{"type":"Point","coordinates":[-96.757239999999996,32.909770000000002]},
"properties":{"loc_id":3,"loc_name":"TI Blvd, TX"}}
]
}')::json;
Which would result in 3 table rows of features. There are varying degrees of blaphemy one can achieve, we can only stand this much.
Querying Spatial feature collection
If we had PostGIS installed in our database as we could get with:
CREATE EXTENSION postgis;
We could even do something as insane as this:
CREATE OR REPLACE FUNCTION json_select_geog(selector text, data json)
returns geography as $$
SELECT ST_Union(geom)::geography
FROM (SELECT ST_GeomFromGeoJSON(json_select_multi(selector, data)) As geom) As foo ;
$$ LANGUAGE sql IMMUTABLE COST 1;
CREATE INDEX idx_myfeatures_gist_geog
ON myfeatures USING gist (json_select_geog('.geometry',data));
If we had more than a paltry sampling of data, our spatial json soup index would be used when we write a query something like:
SELECT json_select_multi('.loc_id', data)::integer As loc_id
, json_select_multi('.loc_name', data) As loc_name
FROM myfeatures
WHERE ST_DWithin(ST_GeogFromText('POINT(42.40 -71.257)')
, json_select_geog('.geometry', data),100000 );
If you had a lot of spatial data and bigger geometry types like long linestrings or polygons, we suspect, it would probably be more efficient to not go crazier than:
DROP TABLE IF EXISTS myfeatures;
CREATE TABLE myfeatures(id serial primary key, properties json, geog geography(POINT,4326));
INSERT INTO myfeatures(properties, geog)
SELECT json_select_multi('.properties .loc_id',feat)::integer As id, json_select_multi('.properties', feat)::json As properties
, json_select_geog('.geometry', feat) As geog
FROM ( SELECT json_select_multi('.features', '{"type":"FeatureCollection",
"features":[
{"type":"Feature","geometry":{"type":"Point","coordinates":[42.400469999999999,-71.2577]},
"properties":{"loc_id":1,"loc_name":"Waltham, MA"}},
{"type":"Feature","geometry":{"type":"Point","coordinates":[42.990189999999998,-71.462590000000006]},
"properties":{"loc_id":2,"loc_name":"Manchester, NH"}},
{"type":"Feature","geometry":{"type":"Point","coordinates":[-96.757239999999996,32.909770000000002]},
"properties":{"loc_id":3,"loc_name":"TI Blvd, TX"}}
]
}')::json As feat ) AS foo;
When we check the table:
SELECT * FROM myfeatures;
id | properties | geog
----+------------------------------------------+--------------------------------
1 | {"loc_id":1,"loc_name":"Waltham, MA"} | 0101000020E6100000B...
2 | {"loc_id":2,"loc_name":"Manchester, NH"} | 0101000020E6100000C...
3 | {"loc_id":3,"loc_name":"TI Blvd, TX"} | 0101000020E61000004...
In short, for good or bad, PostgreSQL 9.2 provides more food for thought as to how you can organize your data. It's hard not to be a glutton about it and easy to lose sight of the fundamentals.