Question: What is the answer to SELECT 3/2?
Answer: In integer math, it is 1. A lot of people especially those coming from MySQL or MS Access backgrounds are surprised to find out that in PostgreSQL
3/2 = 1
. They view this as some sort of bug.
In actuality, the fact that 3/2 = 1 and 1/3 = 0 is part of the ANSI/ISO-SQL standard
that states mathematical operations between two values must be of the same data type of one of the values (not necessarily the same scale and precision though). This is not some idiosyncracy specific to PostgreSQL.
If you try the same operation in SQL Server, SQLite,FireBird, and some other ANSI/ISO SQL compliant databases, you will get the same results. So it seems MySQL and MS Access are the odd-balls in this arena, but arguably more practical.
Why is this an SQL Standard? We haven't found any definitive answer to that, but we have our guesses. Our guess is because it is less ambiguous (more precise) and speedier processor wise to only offer the level of accuracy specifically requested for. In terms of standards and a lot of domains (e.g. Engineering), precision is more important than accuracy. In the case of 3/2 it is not
quite obvious the benefit, but say you have 1/3. MySQL displays that as .3333 (although internally its probably storing 0.333333...), MS Access displays it as 0.333333333333333. Is MS Access more right? Both are not completely accurate and its ambiguous
how inaccurate they are. In the case of PostgreSQL and other ANSI/ISO databases its quite clear how accurate. They very precisely discard the remainder.
There is one particular behavior in PostgreSQL that seems somewhat contradictory to the above, and that is the way
it treats Averages. It returns averages in much the same way as MySQL where as something like SQL Server or SQLite returns a truncated integer average when averaging integers.
For example, lets say you have a table of all integers. If you do an Average e.g.
--Here we are using a more portable example
--instead of our preferred generate_series approach
--so it can be tested on multiple database platforms
CREATE TABLE dumnum(num integer);
INSERT INTO dumnum(num)
VALUES(1);
INSERT INTO dumnum(num)
VALUES(2);
SELECT AVG(num) as theavg, AVG(CAST(num As numeric(10,3))) as theavgm,
SUM(num)/COUNT(num) As intavg,
4/7 As intmath, 4./7 As floatmath,
CAST(4./7 As numeric(10,6)) as precmath,
4.000/7 As floatmath2,
CAST(4./7 As integer) As precintmath
FROM dumnum;
--For mysql the implementation of
--CAST is a little peculiar.
--Although MySQL happily accepts numeric and integer, int(11) as data types in table creation and converts to decimal
--It doesn't appear to do the same in CAST (e.g. you can't use numeric or integer in CAST)
--so the above example doesn't work
--Use instead
SELECT AVG(num) as theavg, AVG(CAST(num As decimal(10,3))) as theavgm,
SUM(num)/COUNT(num) As intavg,
4/7 As intmath, 4./7 As floatmath,
CAST(4./7 As decimal(10,6)) as precmath,
4.000/7 As floatmath2,
CAST(4./7 As SIGNED) As precintmath
FROM dumnum;
Speaking of other databases - has anyone seen the FireFox extension for browsing and creating SQLite databases?
It is extremely cute. The following tests on SQLite we ran using this FireFox SQLite management tool.
Running the above on PostgreSQL, SQL Server 2005, SQLite, FireBird, and MySQL yields the following
- PostgreSQL 8.2/8.3 RC1: 1.5000000000000000; 1.5000000000000000; 1; 0; 0.57142857142857142857; 0.571429; 0.57142857142857142857; 1 - Note when casting back to Int Postgres rounds instead of truncating.
- SQL Server 2005: 1; 1.500000; 1; 0; 0.571428; 0.571429; 0.571428; 0 (Casting back to integer SQL Server truncates)
- SQLite: 1.5; 1.5; 1; 0; 0.5714285714285714; 0.5714285714285714; 0.5714285714285714; 0 (The CAST to numeric is bizarre, but given SQLite's lax thoughts on data types - it simply ignores any CASTING it doesn't understand. For example you can say CAST(1 as boo) and it will happily do nothing. SQLite truncates similar to Microsoft SQL Server when casting back to integer.)
- Firebird: 1; 1.500; 1; 0; 0; 0.000000; 0.571; 0
Evidentally Firebird pays attention to the number of decimals
you place after your multiplier where as the others do not. Similarly when casting back to integer, Firebird follows the same behavior of truncating
that SQL Server 2005, SQLite follow.
- MySQL 5: 1.5000; 1.5000000; 1.5000;0.5714 ;0.5714 ;0.571429;0.5714286 ;1 (MySQL does averaging the same way as Postgres with fewer significant digits and Casting also rounds just as Postgres. It violates the 3/2 rule as previously stated, but
its behavior of CAST to decimal is in line with the other databases (except for SQLite).
In terms of the number of significant digits displayed, those are more presentational issues than actual storage so all the more reason to stay away from floating point values.
One can argue that PostgreSQL,SQLite, and MySQL are really not in violation of standards here when it comes to averaging, because after all the ANSI/ISO standard talks about operations between numbers to our knowledge, not functions. So presumably Averaging as a function is left up to the implementation discretion of the database vendor. Nevertheless it is still a bit disconcerting to witness these conflicting behaviors.
Given these disparities between databases, the best thing to do when dealing with operations between numbers is to be very precise and there are a couple of ways of doing this.
Here are some guidelines.
- When you care about precision don't cast to or use floats and doubles.
Those introduce rounding errors not to mention the precision and representation in each Db is probably all over the place. Use numeric or decimal data type. Decimal and numeric are
more or less the same in most databases and in SQL Server and Postgres decimal is just a synonym for numeric. Numeric doesn't exist in MySQL. According to Celko, the distinction in SQL-92 standard
between the two is that "DECIMAL(s,p) must be exactly as precise as declared, while NUMERIC(s,p) must be at least as precise as declared".
So I guess decimal would be preferable if supported and there was actually a difference.
Its not perfect, but its less up to the whims of the database vendor except in the bizarre case of SQLite
- To not loose data, when dealing with integers, do a CAST or multiply by 1. or for optimum portability measure 1.0000 (how precise you want) first
- Do a final cast or round of your value after the initial cast to make sure you have the precision you want. It seems that PostgreSQL for example throws out this precision/scale info
even when CASTING and then applying an operation, a second cast is needed to get the right precision. Keep in mind when CASTING PostgreSQL appears to round instead of truncate like the other databases (except MySQL). Example below to demonstrate.
SELECT CAST(x*1.0000/y As numeric(10,4)) As thepreciseavg,
x*1.00000/y As lessprecisebutmoreaccurate
FROM generate_series(1,4) As x, generate_series(3,10) As y
Needless to say the various different behaviors in databases trying to conform to some not so well-defined standard, leaves one feeling a little woozy.
There has been a lot of talk lately about PostgreSQL and what MySQL can learn from the PostgreSQL clan. We would like to look at the reverse of that. This article is a bit of a complement to Joshua Drake's What MySQL (and really, Sun) can learn from Po
Tracked: May 02, 16:22