Sunday, January 27. 2013
Printer Friendly
This exercise is a continuation of our How to bulk export tables from MS Access. Now while this approach will work for other databases besides PostgreSQL, you'll probably need to fiddle with the subroutine to make it work for some other databases. PostgreSQL is fairly ANSI-SQL standard so not too much fiddling should be required to port to SQL Server, MySQL, Oracle etc.
Unlike the prior Visual basic subroutine we showed that exports the tables, this just creates an SQL script that you run on the already created PostgreSQL database that contains the exported data. We didn't test the quote option though we coded it in the subroutine, since like we said we hate having to quote fields. If perchance you are one of those folks that likes to put spaces in your field names to make it more englishy, then you'll need to quote or revise the other routine to convert your spaces to _ or some other thing.
Continue reading "How to recreate MS Access primary keys and indexes in PG"
Saturday, November 10. 2012
Printer Friendly
We've been fighting to get clients we have thinking of upgrading or creating new apps to also choose PostgreSQL in the process.
Here I'll just itemize some of the obstacles we've run into in making the sale. All of these fall under the It is not Database X line item.
By database X I mean SQL Server, MySQL, and Oracle and for us in exactly that order. Our obstacle focus is probably a bit different from others since
we are consultants to mostly Windows shops or consultants to ISVs who have to sell their applications to U.S. government agencies or units of agencies.
Continue reading "Biggest Obstacle to PostgreSQL Adoption: It is not Database X"
Sunday, December 11. 2011
Printer Friendly
In our article The Pure Relational database is dead there
were a lot of misunderstandings as a result of our poor choice of words. People thought we were bashing the relational model because in their mind that was what
pure meant. I got hit with a lot of poetic insults. I still can't think of an alternative word to use for what I meant. Simple doesn't really do it as even relational databases with just standard types
were far from simple when you consider the planner and all the other stuff going on under the hood to protect you from the underlying storage structure. What I was trying to say is that in the beginning most relational databases
just supported a standard set of types which you could not expand on and most people when they think relational today still think just that. That type of relational database is in my book dead or almost dead.
How did this all start. Well whenever we use something like PostgreSQL to store anything complex -- take your pick: geometry data, tree like structures which we use
ltree for, full-text query constructs, and Yes XML we get bashed by some know-it-all who has a very narrow view of what a relational database should be doing and suggesting we use a NoSQL database, a graph engine or a full text engine or normalize our data more. I have also learned XML is a dirty word to many people. I mistakenly thought XML was a complex type people could
relate to, but turns out they can relate to it so well that it brings up tragic memories I can only equate to Post Traumatic Stress Disorder suffered by war veterans or (early or wrong) technology adopters. That was not
my intent either. XML was just merely an example. I will not say you should use XML in your tables, but I will also not say you should stay clear of it as many people wanted me to say. I will say its use is rare, but it has its place. It has its place just as any other complex type and it has its own special needs for navigation, indexing etc. which many relational databases handle fine enough.
Continue reading "The Relational Model is very much alive"
Saturday, December 03. 2011
Printer Friendly
A lot of redditers took offense at our article XPathing XML data with PostgreSQL
with the general consensus, if you are going to be stuffing XML in a relational database where will you stop? That is not what relational databases are designed for.
We had comitted a sacrilegious sin and worsed yet encouraging bad habits by forcing people to think more about different options they have for storing data in a relational database and god forbid demonstrating querying such columns with xml specific functions. What were we thinking? How dare we try to query XML data with SQL? Perhaps we were thinking like this guy or this guy,
both equally misguided spatial relational database folk.
Of course we stepped one foot further by actually defining a column as xml and dare storing data in it for later consumption rather than just an intermediary step.
If I want to store documents, that are navigateable I should be using a document database like MongoDb, CouchDB etc designed for that kind of stuff. If I've got graphs I should be using a graph database. This got me thinking
that the "Pure Relational Database" is dead, and I'm surprised most people don't seem to realize it.
So while "Relational databases" have changed over the last 25 years, most people's notions of them have not kept up with the pace of its change.
First let me define what I mean by Pure. A pure relational database is one with standard meat and potato types like text, dates, numbers well suited for counting money and computing how close the world is to total bankruptcy which you store as fields in a row of a table and that you then define foreign keys / constraints / primary keys on to relate them to other tables.
You reconstitute your real world objects by stitching these tables together with joins etc and return sets using where conditions, summarize by using group bys or other mathy like constructs. Don't get me wrong; these are very beautiful things because they allow for easy slicing of dimensions and not having to think about all the dimensions that make up an object all at once. In fact it was so beautiful that some people thought, "wow that's cool, but it would be even cooler if I could
store more complex objects in those columns with their own specific needs for querying." and so was born the object relational database as some people refer to them that thought relational but also understood that different types had their own unique needs for querying, storage, indexing etc.
Nowadays most, if not all, relational like databases have standardized on some variant of SQL.
In essence though, the pure relational database doesn't allow you to define new types or have exotic types such as arrays, xml, graphs, geometries, rasters, sparse matrices etc.
Much less thinking involved and less likely you will shoot yourself in the foot by dumping a bunch of xml in a field and trying to do something with it.
When it is used to store more complex things such as spreadsheets and other user documents, these are stored as blobs and just retrieved. Even such use is frowned upon.
Well most relational databases I can think of nowadays have richer types: e.g. PostgreSQL, Oracle and Firebird all support arrays as a column type. Some even allow you to define custom types and functions to support your custom types e.g. PostgreSQL (I could go on forever), Oracle has rich user defined type support too, and SQL Server 2005+ with each version getting better and better for user defined custom types and introducing more exotic types and support infrastructure. Even MySQL/Drizzle (mostly in the form of different storage engines). Even my favorite light-weight SQLite under the hood has some tricks that aren't what I would call relational. E.g. Spatialite/RasterLite has a whole geometry type library built on SQLite with functions you can call from SQL and I'm sure there are lots of middleware tools you don't know about
using the SQLite and Firebird engine for more than relational tasks (e.g. HTML5 anyone/ CAD anyone).
Sunday, September 04. 2011
Printer Friendly
We've been working on converting some of our SQL Server apps to PostgreSQL. In this
article we'll describe some things to watch out for and provide a function we wrote to automate some of
the conversion.
Although both databases are fairly ANSI-SQL compliant, there are still differences
with their CREATE TABLE statements, data types, and how they handle other things that makes porting applications not so trivial.
Continue reading "SQL Server to PostgreSQL: Converting table structure"
Saturday, April 30. 2011
Printer Friendly
We like to enforce business rules at the database level wherever
we can, for the simple reason, particularly the business we are in, most database update happens
outside the end-user application layer.
That is not to say you shouldn't enforce at the application level too, but that the database is the last
line of defense, is usually more self-documenting than application code can be, and also protects you from your
programmers, even when that your programmers is you.
Domains are objects that you will find in many high-end
standards-compliant databases. They exist in SQL Server, Oracle, IBM Db2, Firebird, and PostgreSQL to name a few.
Domains have existed for a really long time in PostgreSQL. In PostGIS topology, Sandro Santilli (usually known as strk), takes advantage of them for fleshing out the topology support, and I got turned on to them by him.
With that said - let's dive into domains.
What are domains?
Domains are essentially a reusable packaging of check constraints. You use them as if they were a custom data type.
The nice thing about them is that they are usually transparent to applications that
don't understand them.
Example 1: Enforce pay ending/pay day happens only on certain days of the week
Here is an example -- suppose you had a payment system, and you had a rule that the pay thru end date has to
fall on a Friday. You could create a domain such as the following:
CREATE DOMAIN dom_payday
AS date
CONSTRAINT check_dow CHECK (trim(to_char(VALUE, 'day')) = 'friday');
COMMENT ON DOMAIN dom_payday IS 'Company payday rules';
Continue reading "Using Domains to Enforce Business Rules"
Wednesday, March 30. 2011
Printer Friendly
I am happy to report, that the final proof of the PostGIS in Action E-Book got released today
and the printed version is scheduled for release Aprill 11th, 2011 and should be available on Amazon and other locations around then. The other e-Reader formats will come after that.
You can buy from here or download the two free chapters, if you haven't already.
Each hard-copy purchase comes with a free E-Book version. There is a coupon in the back of the book when you get it to get the E-Book versions.
Yes, I know it's been a really really long time.
On the bright side, we produced twice as much content as we had set out to do and that was with keeping things as concise as we
could get away with, still managing to cover more than we set out to cover, and stripping out as many unnecessary words as we could muster.
So 520 pages and almost 2 years later, this is where we are.
A good chunk of the additional bulk of the book was the appendices which are about 150 pages
total and focus strictly on PostgreSQL and SQL. After many comments from early reviewers, we thought it unfair not to have a good chunk of PostgreSQL
and just general relational database content to familiarize programmers and GIS folks with the RDBMS that PostGIS lives in. Most GIS folk unfortunately
have the hardest time with getting up to speed with SQL and just standard RDBMS management.
Two free chapters and accompanying code for all chapters
The two free chapters we selectively picked because we thought they would be most beneficial to newcomers and people new to relational databases.
So the free chapters are:
- Chapter 1: What is a spatial database? Which provides a fast paced history of PostGIS, PostgreSQL, Spatial Databases and moves into
an even faster journey into converting flat file restaurant locations to spatial point geometries, loading in an ESRI shapefile of roads. Then shows you how to write standard
spatial queries and render the results.
- Appendix C: SQL Primer -- goes through querying information_schemas, the common points of writing SELECT, INSERT, UPDATE, DELETE SQL statements and the finer points of using aggregate functions, Windowing constructs and common table expressions as well
as a brief overview of how PostgreSQL stacks up with other relational databases (SQL Server, Oracle, IBM DB2, MySQL, Firebird) in SQL features.
- All the chapter code and accompanying data. It's a bit hefty at 57 MB.
So even if you don't buy our book, we hope you find the free chapters useful.
You can get a more detailed listing of all the chapters from the PostGIS in Action book site.
We'd like to thank all those who supported us through this long and unpredictable journey. Hopefully we'll have several more, though hopefully
a bit less nerve-racking than this first one.
Friday, February 25. 2011
Printer Friendly
Many of our customers ask us this question so we thought we'd lay down our thoughts.
The last couple of our articles have been how to do this and that in PostgreSQL, SQL Server, MySQL or having PostgreSQL coexist with an existing SQL Server install.
A major reason for that is that in many of our projects we have a choice of what database to choose for a new piece of an application as long as it can play nicely with the existing infrastructure.
Our core database competencies are still PostgreSQL, SQL Server, and MySQL with it leaning
more toward PostgreSQL each day. We are perhaps somewhat unique in the PostgreSQL community in that Oracle never comes into our equation of decisions (though Oracle and PostgreSQL are perhaps more similar than the others).
Oracle is too expensive for most of our clientele
so it's a non-issue, and when our clients do have Oracle -- it's thrust upon them by thier ERP/CRM vendor and is essentially off limits to them.
Continue reading "Why choose or not choose PostgreSQL?"
Friday, December 24. 2010
Printer Friendly
Question: You have a table of people and a table that specifies the activities each person is involved
in. You want to return a result that has one record per person and a column that has a listing of activities for each person
separated by semicolons and alphabetically sorted by activity. You also want the whole set alphabetically sorted by person's name.
This is a question we are always asked and since we mentor on various flavors of databases,
we need to be able to switch gears and provide an answer that works on the client's database. Most
often the additional requirement is that you can't install new functions in the database. This means that
for PostgreSQL/SQL Server that both support defining custom aggregates, that is out as an option.
Normally we try to come up with an answer that works in most databases, but sadly the only solution that works in
most is to push the problem off to the client front end and throw up your hands and proclaim -- "This ain't something that should be
done in the database and is a reporting problem." That is in fact what many database purists do, and all I can say to them is wake up and smell the coffee before you are out of a job.
We feel that data
transformation is an important function of a database, and if your database is incapable of massaging the data into a format
your various client apps can easily digest, WELL THAT's A PROBLEM.
We shall now document this answer rather than trying to answer for the nteenth time. For starter's
PostgreSQL has a lot of answers to this question, probably more so than any other, though some are easier to execute than others
and many depend on the version of PostgreSQL you are using. SQL Server has 2 classes of answers neither of which is terribly appealing,
but we'll go over the ones that don't require you to be able to install .NET stored functions in your database since we said that is often a requirement.
MySQL has a fairly
simple, elegant and very portable way that it has had for a really long time.
Continue reading "String Aggregation in PostgreSQL, SQL Server, and MySQL"
Sunday, November 14. 2010
Printer Friendly
Many people have been concerned with Oracle's stewardship of past Sun Microsystems open source projects.
There are Java, MySQL, OpenSolaris to name a few.
Why are people concerned? Perhaps the abandoning of projects such as OpenSolaris, the suing of Google over Java infringements, the marshalling out of many frontline contributors of core Open Source projects from Oracle, the idea of forking over license rights to a single company so they can relicense your code.
We have no idea.
All we know is that there is an awful lot of forking going on.
To Oracle's defense, many do feel that they have done a good job with progressing the advancements of some of the Open Source projects they have shepherded.
For example getting MySQL patches more quickly in place etc. For some projects where there is not much of a monetary incentive, many feel they have at best neglected e.g. OpenSolaris.
Perhaps it's more Oracle's size and the size that Sun was before takeover that has made people take notice that no Open Source project
is in stable hands when its ecosystem is predominantly controlled by the whims of one big gorilla.
One new fork we were quite interested to hear about is LibreOffice, which is a fork of OpenOffice.
In addition to the fork, there is a new organization
called Document Foundation to cradle the new project. Document Foundation is backed by many OpenOffice developers and corporate entities (Google, Novell,Canonical) to name a few.
The Document Foundation mission statement is outlined here. There is even a document foundation planet for LibreOfficerians to call home.
The LibreOffice starter screen looks similar to the OpenOffice starter screen, except instead of the flashy Oracle logo we have come to love and fear, it has a simple text Document Foundation below the basic multi-colored Libre Office title. Much the same tools
found in OpenOffice are present. The project has not forked too much in a user-centric way from its OpenOffice ancestor yet. The main changes so far are the promise of not having to hand over license assignment rights to a single company as described in
LibreOffice - A fresh page for OpenOffice as well as some general cleanup and introduction of plugins that had copy assignment issues such as some from RedHat and Go-OO. My favorite quote
listed in the above article is It feels like Oracle is "a mother who loves her child but is not aware that her child wants to walk alone." by André Schnabel. So perhaps Oracle's greatest contribution and legacy to Open Source and perhaps the biggest that any for-profit company
can make for an Open Source project is to force its offspring to grow feet to walk away.
In later posts we'll test drive Libreoffice with PostgreSQL to see how it compares to its OO ancestor and what additional surprises it has in store.
Though in future if Oracle does donate the trademark Openoffice name to the foundation, then
LibreOffice may go back to being called OpenOffice. Personally I like LibreOffice better and the fact that the name change signals a change in governance.
Friday, November 05. 2010
Printer Friendly
Problem: You have a set of numbers, or characters or whatever and you are trying to find the max or min of this set?
If the values are separate records in a table or query, the answer is well known and respected across all relational databases -- use the aggregate MAX and MIN functions.
But what if you have a set of free wheeling numbers or text not in separate records, and you want the max or min of each. Here is where the
least and greatest functions come in handy.
PostgreSQL has had these functions for as far back as I can remember and is not the only database to sport these marvelous functions. Our beloved MySQL and Oracle database have these functions as well. Sadly our more beloved SQL Server even in the SQL Server 2008
variant - lacks these functions.
Okay how to use these functions -- you use it like this:
SELECT least(1,-2,5) As num_least, greatest('Bobby', 'Catty', 'Kitty') As greatest_cat;
Yields:
num_least| greatest_cat
----------+----------
-2 | Kitty
We would classify these functions along the lines of COALESCE. They are like COALESCE because they take an arbitrary number of arguments and the datatype that is returned
is highest datatype that all arguments in the function can be autocast to. If there is no autocast then well you get an error. To demonstrate, guess what happens when you do this:
SELECT least(-1, 'Kitty');
Well do this in PostgreSQL at least in 8.3+, you get a nice slap if you haven't installed any deprecated autocasts:
ERROR: invalid input syntax for integer: "Kitty"
LINE 1: SELECT least('Kitty', -1)
Do this in MySQL - so friendly and forgiving, and great reader of minds and you get:
-1
OF COURSE
I apologize for the ambiguous sarcasm, its just sometimes I want my mind read and sometimes I don't and I just can't figure out whether today is one of those days or the other day.
Monday, October 11. 2010
Printer Friendly
One of the neat changes already present in the PgAdmin III 1.13dev, is the change in plugin architecture. Version 1.13 dev allows for multiple plugin*.ini files. How does this work.
Well if you have a plugins.d folder in your PgAdmin III version folder, it will read all the inis in that folder and load them as plugins.
Recall in PgAdmin III Plug-in Registration: PostGIS Shapefile and DBF Loader,
we demonstrated how to load the PostGIS shapefile and dbf loader as a plugin in PgAdmin III, well this time we will demonstrate how to do it using PgAdmin version 1.13. Better yet, we'll show you the new and improved
PgAdmin III Shapefile and DBF Loader in the works for PostGIS 2.0.
Continue reading "PgAdmin III 1.13 - change in plugin architecture and PostGIS Plugins"
Sunday, August 22. 2010
Printer Friendly
PostgreSQL offers several options for displaying and querying tree like structures.
In Using Recursive Common Table Expressions (CTE) to represent tree structures
we demonstrated how to use common table expressions to display a tree like structure. Common Table Expressions required PostgreSQL 8.4 and above but was fairly ANSI standards compliant. In addition to that
approach you have the option of using recursive functions. There is yet another common approach for this which is specific to PostgreSQL. This is using the ltree contrib datatype
that has been supported for sometime in PostgreSQL. For one of our recent projects, we chose ltree over the other approaches because the performance is much better when you need to do ad-hoc queries over the tree since it can take advantage of btree and gist indexes
and also has built-in tree query expressions that make ad-hoc queries simpler to do; similar in concept to the tsearch query syntax for querying text.
In this article we'll demonstrate how to use ltree and along the way also show the PostgreSQL 9.0 new features conditional triggers and ordered aggregates.
Continue reading "Using LTree to Represent and Query Hierarchy and Tree Structures"
Friday, July 23. 2010
Printer Friendly
When it comes to naming things in databases and languages, there are various common standards. For many languages the
camel family of namings is very popular. For unix based databases
usually UPPER or lower _ is the choice and for databases such as SQL Server and MySQL which allow you to name your columns with mixed casing
but couldn't care less what case you express them in selects, you get a mish mush of styles depending on what camp the database user originated from.
So to summarize the key styles and the family of people
- camelCase : lastName - employed by SmallTalk, Java, Flex, C++ and various C derivative languages.
- Pascal Case: (a variant of Camel Case) -- LastName which is employed by C#, VB.NET, Pascal (and Delphi), and SQL Server (and some MySQL windows converts). Also often used for class names by languages that use standard camelCase for function names.
- lower case _ last_name : often found in C, a favorite among PostgreSQL database users. (some MySQL)
- upper case _ LAST_NAME : a favorite among Oracle Users (some MySQL Oracle defectors)
Being at the cross roads of all the above, we often have to deal with the various above as well as having internal schizophrenic strife and external fights.
The internal turmoil is the worst and is worse than an ambidextrous person trying to figure out which hand to use in battle. For these exercises, we'll demonstrate one way how to convert between the various conventions. These
are the first thoughts that came to our mind, so may not be the most elegant.
Continue reading "Of Camels and People: Converting back and forth from Camel Case, Pascal Case to underscore lower case"
Monday, June 28. 2010
Printer Friendly
A while ago we demonstrated how to use Open Office Base to connect to a PostgreSQL server using both the native PostgreSQL SBC and the PostgreSQL JDBC driver.
The routine for doing the same in Open Office Base 3.2 is pretty much the same as it was in the 2.3 incarnation. In this excerpt, we'll demonstrate how to import data into PostgreSQL using Open Office Base, as we had promised to do in
Database Administration, Reporting, and Light Applicaton Development and some stumbling blocks to watch out for.
Use Case
Command line lovers are probably scratching there head, why you want to do this. After all stumbling your way thru a commandline and typing stuff is much more fun and you can automate it after you are done.
For our needs, we get stupid excel or some other kind of tab delimeted data
from somebody, and we just want to cut and paste that data in our database. These files are usually small (under 5000 records) and the column names are never consistent. We don't want to fiddle with writing code to do these one off type exercises.
For other people, who are used to using GUIs or training people afraid of command lines, the use cases are painfully obvious, so we won't bore you.
Importing Data with Open Office Base Using copy and paste
Open Office has this fantastic feature called Copy and Paste (no kidding), and we will demonstrate in a bit, why their copy and paste is better than Microsoft Access's Copy and Paste particularly when you want to paste into some database other than a Microsoft one.
It is worthy of a metal if I dear say.
Continue reading "Importing data into PostgreSQL using Open Office Base 3.2"
|