FWTools GIS Toolkit is a freely available open source toolkit for Windows and Linux that can do more than GIS tricks. It is a precompiled bundle of Open Source GIS tools. The FW comes from the initials of Frank Warmerdam, the originator of the toolkit and current President of the Open Source Geospatial Foundation (OSGEO).
One key component of the GIS Toolkit is the GDAL/OGR library. Parts of the library have been enhanced by several in the OSGEO community. GDAL is a basic foundation of countless Open source GIS as well as commercial GIS applications. Here are Listings of commercial and open source software that use it and GDAL sponsors.
This is a library which historically has been developed and maintained by Frank Warmerdam, but has started to garner quite a few developers. GDAL is X/MIT licensed (similar to BSD license), therefore the licensing is very generous for commercial use. The toolkit can be downloaded from http://fwtools.maptools.org/
In this article we shall talk a little bit about the OGR2OGR commandline data loader tool packaged in here, which should be of interest to many even those who do not know or care about GIS data. We shall demonstrate some non-GIS uses of this light-weight tool.
If you want to see some GIS example uses of this tool such as loading data into Postgis tables from various GIS formats or exporting data out and brief install instructions, check out our OGR2OGR Tips page.
Although GDAL/OGR was designed to deal with GIS data, a side-effect of the design is that it also handles regular attribute and relational data. The reason is simple. Spatial Data is not an island; it most often accompanies standard relational and attribute data to make it more meaningful. If you load mapping objects and geometries, you want to load the attributes of them as well.
When using OGR2OGr for loading non-spatial data ,which it really wasn't designed for, it does have some additional annoying consequences, but we feel the almost fit in your pocket feel of it makes it a convenient swiss army knife even for non-spatial data.
I would say we are not alone in being addicted to this tool. The ability to load data from various data sources with a simple line of script code is breath-taking. I think Bill Dollins says it well in his A One-Touch (Almost) Data Loader With FWTools and PostGIS. One only needs to look at the OGR supported formats to get a sense of just how powerful this tool is. Enough talk about the greatness of this tool; I'm beginning to make myself nauseous. On with the show.
OGR2OGR can read from any read supported format and write to any write supported format. You can imagine the permutations
of data loading you can do here. To get a list of the current drivers your install supports. Launch your FW Tools command line and type:
ogrinfo --formats
(NOTE: some drivers such as the Oracle Spatial and IBM Informix Spatial Datablade require proprietary libraries so you must compile yourself)
Some Caveats - Rought Spots
ALTER DATABASE pgdbname SET client_encoding=latin1;
For the following examples we will be a bit verbose and spell out the full connection strings to the PostgreSQL database. Keep in mind any arguments left out will use their default parameters. For example if no host or user or password or port is specified, OGR will use localhost and logged in user and 5432 port. For us this is usually wrong since our PostgreSQL server is on a separate box from our OGR2OGR scripts (e.g. we are using PgAgent that is sitting on a separate box) or is running on a non-standard port so we spell it out. We spell it out too because few tutorials spell these out, leaving many to read deeply in the docs to figure this basic stuff out. Other note - for the below - we put these on multiple lines so they fit on a page. In practice, each ogr2ogr call should be a one liner.
To get a listing of tables/views (layers in GIS lingo), the ogrinfo command-line tool comes in handy. Simply pass in the connection info for the database. Below is an example of querying the tables of an ODBC datasource, PostgreSQL, Dbase:
ogrinfo "ODBC:some_dsn"
ogrinfo PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432"
ogrinfo "/path/to/somefile.csv"
ogrinfo "/path/to/somefile.dbf"
For starters, lets say you need to do a nightly or one-time load from MySQL to PostgreSQL. In the past, writing to MySQL with OGR was not possible, but as of OGR 1.3.2, read and write to MySQL is possible (so MySQLers 4+ are now in luck and can now load ESRI shape files among other data directly into MySQL). The PostgreSQL driver in OGR has always supported both directions. Below is a simple example of dumping general non-spatial table from MySQL to PostgreSQL. To dump a spatial MYSQL table, the command is pretty much the same.
The below statement will copy the tables customers and orders from a MySQL database to a PostgreSQL database into the schema called mysqldump.. All the below should be on a single line. I wrapped to save space.
ogr2ogr -overwrite -update -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword"
MYSQL:"mydb,host=myhost,user=mylogin,password=mypassword,port=3306"
-lco OVERWRITE=yes -lco SCHEMA=mysqldump customers orders
ogr2ogr -overwrite -update -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword"
MYSQL:"mydb,host=myhost,user=mylogin,password=mypassword,port=3306" customers orders
--Pull all the tables from mysql into the mysqldump schema
ogr2ogr -overwrite -update -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword"
MYSQL:"mydb,host=myhost,user=mylogin,password=mypassword,port=3306"
--For nightly loads where you already have the structure of the table -- you can do a truncate table with a psql script and then append with OGR2OGR -- note: nln means new name for layer. In this case we want to take the mysql table orders and load into our mysqldump.orders table.
psql -U pgloginname -d pgdbname -h pghost -c "TRUNCATE TABLE orders"
ogr2ogr -append -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword"
MYSQL:"mydb,host=myhost,user=mylogin,password=mypassword,port=3306" -nln "mysqldump.orders" orders
OGR has a driver that supports ODBC DSN connections. Unfortunately this will only work on Windows. Below is an example that imports all tables from a SQL Server database to PostgreSQL. In this example, the DSN I have is using NT Authentication or has the name and password encoded in the System DSN. For this you need to setup a DSN connection to the SQL Server database in ODBC Manager (we described setting up a DSN in Using MS Access with PostgreSQL, except the ODBC driver seems to only work with System DSNs not File DSNs. For more details on referencing the ODBC DSN - check http://www.gdal.org/ogr/drv_odbc.html
ogr2ogr -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword"
"ODBC:system_dsn_goes_here" -update -lco OVERWRITE=yes -nlt NONE
As noted, geometry fields seem to be created and right field types are not necessarily created. SQL Server text come in as VARCHARS and VARCHARS come in as CHARS etc. One way to get around this is to create your table structures before hand and use OGR2OGR to just load the data.
E.g.
--This will append to a table called orders reading from the SQL Server (ODBC) table orders and customers. It will only append
like-named fields.
psql -h pghost -p 5432 -U pguser -d pgdbname -c "TRUNCATE TABLE orders;TRUNCATE TABLE customers;"
ogr2ogr -append -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432"
"ODBC:system_dsn_goes_here" orders customers
Or correct the structures afterword by dropping unnecessary fields using a script generation technique similar to what we described in DML to Generate DDL and DCL. So your script builder SQL to drop all the ogc_fid and wkb_geometry fields created by OGR2OGR would look something like this.
SELECT 'ALTER TABLE ' || t.table_schema || '.' || t.table_name || ' DROP COLUMN ogc_fid;'
FROM information_schema.tables t
INNER JOIN information_schema.columns c
ON (t.table_name = c.table_name AND t.table_schema = c.table_schema AND c.column_name = 'ogc_fid')
UNION
SELECT 'ALTER TABLE ' || t.table_schema || '.' || t.table_name || ' DROP COLUMN wkb_geometry;'
FROM information_schema.tables t
INNER JOIN information_schema.columns c
ON (t.table_name = c.table_name AND t.table_schema = c.table_schema AND c.column_name = 'wkb_geometry';
Below is an example of loading a single dbase file. Although the formats do not list DBase as a supported format, the ESRI Shapefile format includes several files (.shp, .dbf, .shx), so the OGR utility uses the ESRI shapefile driver to accomplish this.
ogr2ogr -f "PostgreSQL"
PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432"
somefile.dbf -nln mynewschema.mynewtable
For loading Microsoft Access, if you have a regular old Access database, it is best to use the ODBC driver. While the PGeo driver (ESRI Personal Geodatabase) does work against an MS Access Database, it relies on certain ESRI meta data tables to work so is not really useable for pure Access database use. One advantage of using OGR2OGR to export MS Access tables is that, unlike the MS Access export which requires you to export each table individually, one can do a bulk export of all the Microsoft Access tables or a subset of tables. The downside of using it over the Microsoft Access export feature is that its implementation of datatypes as mentioned, is impoverished, therefore things that should be text or varchar come in as char or varchar. Integers come in as numerics. Below is an example of copying Microsoft Access tables directly into PostgreSQL. To set it up, you would register a System DSN as we did for our SQL Server example. We'll show a slightly different feature here
This example apppends data from a table called company from an MS Access database into a table in accessdump.company in PostgreSQL, but only copies the records with company_group=2
ogr2ogr -append -f "PostgreSQL" PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432"
ODBC:"someaccessdsn" -nln "accessdump.company" -where "company_group=2" company
As mentioned, OGR2OGR can be used to create a data dump in some supported writable format. Below is a simple example to export a table to DBase.
This example will create a table called test in the folder C:/datadbf and in addition to the dbf file an annoying .shp and a .shx that you can discard if you know you are not outputing geometries.
ogr2ogr -overwrite -f "ESRI Shapefile" "C:/datadbf/"
PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432" -nln test somepgtable
If you want to output only a subset of the data in a table you can use the -sql, -select, -where properties of OGR2OGR - such as shown below
ogr2ogr -overwrite -f "ESRI Shapefile" "C:/datadbf/" -select "company,sector_id" -where "company_group=2"
PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432" "msaccessdump.company"
If you needed to use an SQL statement to rename fields and so forth, you would do it something like the below. The main issue we have with this is that the OGR tool is not smart enough to figure out the data types and lengths to output the fields so lengths etc are wrong. In general if we need to rename fields, we create a view in PostgreSQL and output the view.
ogr2ogr -overwrite -f "ESRI Shapefile" "C:\datadbf"
-sql "SELECT company_name As company,sector_id FROM accessdump.company"
PG:"host=pghost user=pgloginname dbname=pgdbname password=pgpassword port=5432" -nln compsec2