Restore of functional indexes gotcha

Thursday, June 18. 2009

This has been bugging me for a long time and I finally complained about it and Tom Lane kindly gave a reason for the problem and that its by design and not a bug.

So I thought I would post the situation here without getting into too many embarassing specifics in case others have suffered from a similar fate and can learn from this.

The situation:

You create a function lets call it myniftyfunc() in the public schema.
Then you create another function that depends on myniftyfunc(), lets call it mysuperniftyfunc() also in public schema.
Then because your function is such a super nifty function, you decide to create a functional index with that super function on your table that sits in mysuperdata schema - mysuperdata.mysupertable

Your super nifty function is doing its thing; your table is happy; the planner is spitting out your queries lightning fast using the super nifty index on your super table; The world is good.

One day you decide to restore your nifty database backup and to your chagrin, your nifty index is not there. The planner is no longer happily spitting out your queries lighting fast and everything has come to a painful crawl. Your super nifty index is gone. What happened to super nifty functional index?

I have to admit that I'm the type of person that assumes the public schema is always there and always in search_path and that my assumption is a flawed one. After all the public schema is there by default on new databases for convenience, but one can change it not to be in the search_path and in fact pg_dump does just that. So if everything you have is kept in public schema -- you don't run into this particular misfortune. If however you have your functions in public and your tables in different schemas, during restore -- the search path is changed to the schema being restored and your super functional indexes based on super functions that depend on other super functions fail because public is no longer in the search_path.

Below is a simple script to recreate the issue so its clear:

CREATE DATABASE superdata;
CREATE OR REPLACE FUNCTION myniftyfunc(myint integer) RETURNS integer AS
 $$ SELECT 1 + $1;$$
LANGUAGE 'sql' IMMUTABLE;

CREATE OR REPLACE FUNCTION mysuperniftyfunc(myint integer) RETURNS integer AS
$$ SELECT myniftyfunc($1); $$
LANGUAGE 'sql' IMMUTABLE;

CREATE SCHEMA mysuperdata;
CREATE TABLE mysuperdata.mysupertable(sid integer PRIMARY KEY, super_key integer);
CREATE INDEX idx_mysupertable_super_index
   ON mysuperdata.mysupertable USING btree (mysuperniftyfunc(super_key));

INSERT INTO mysuperdata.mysupertable(sid,super_key)
VALUES(1,1);

--Backup superdata
"C:\Program files\postgresql\8.3\bin\pg_dump" --host=localhost --port=5432 --username=postgres --format=plain --verbose --file="C:\superdata.sql" superdata

--Restore
"C:\Program files\postgresql\8.3\bin\psql" -U postgres -h localhost -p 5432 -d superdata2 -f "C:\superduper.sql"

--Get non-super cool error
psql:C:/superduper.sql:99: ERROR:  function myniftyfunc(integer) does not exist
LINE 1:  SELECT myniftyfunc($1);
				^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
QUERY:   SELECT myniftyfunc($1);
CONTEXT:  SQL function "mysuperniftyfunc" during startup

Normally I do my backup in compressed format, but did it in plain to demonstrate the problem and here is what pg_dump produces.

--
-- PostgreSQL database dump
--

-- Started on 2009-06-18 21:45:59

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;

--
-- TOC entry 6 (class 2615 OID 1086067)
-- Name: mysuperdata; Type: SCHEMA; Schema: -; Owner: postgres
--

CREATE SCHEMA mysuperdata;


ALTER SCHEMA mysuperdata OWNER TO postgres;

SET search_path = public, pg_catalog;

--
-- TOC entry 21 (class 1255 OID 1086065)
-- Dependencies: 3
-- Name: myniftyfunc(integer); Type: FUNCTION; Schema: public; Owner: postgres
--

CREATE FUNCTION myniftyfunc(myint integer) RETURNS integer
	LANGUAGE sql IMMUTABLE
	AS $_$ SELECT 1 + $1;$_$;


ALTER FUNCTION public.myniftyfunc(myint integer) OWNER TO postgres;

--
-- TOC entry 22 (class 1255 OID 1086066)
-- Dependencies: 3
-- Name: mysuperniftyfunc(integer); Type: FUNCTION; Schema: public; Owner: postgres
--

CREATE FUNCTION mysuperniftyfunc(myint integer) RETURNS integer
	LANGUAGE sql IMMUTABLE
	AS $_$ SELECT myniftyfunc($1); $_$;


ALTER FUNCTION public.mysuperniftyfunc(myint integer) OWNER TO postgres;

SET search_path = mysuperdata, pg_catalog;
-- this is not a bug, but would be super
cool if public or whatever the default search path of the database was in here.

SET default_tablespace = '';

SET default_with_oids = false;

--
-- TOC entry 1465 (class 1259 OID 1086068)
-- Dependencies: 6
-- Name: mysupertable; Type: TABLE; Schema: mysuperdata; Owner: postgres; Tablespace:
--

CREATE TABLE mysupertable (
	sid integer NOT NULL,
	super_key integer
);


ALTER TABLE mysuperdata.mysupertable OWNER TO postgres;

--
-- TOC entry 1735 (class 0 OID 1086068)
-- Dependencies: 1465
-- Data for Name: mysupertable; Type: TABLE DATA; Schema: mysuperdata; Owner: postgres
--

COPY mysupertable (sid, super_key) FROM stdin;
1	1
\.

--
-- TOC entry 1734 (class 2606 OID 1086072)
-- Dependencies: 1465 1465
-- Name: mysupertable_pkey; Type: CONSTRAINT; Schema: mysuperdata; Owner: postgres; Tablespace:
--

ALTER TABLE ONLY mysupertable
	ADD CONSTRAINT mysupertable_pkey PRIMARY KEY (sid);


--
-- TOC entry 1732 (class 1259 OID 1086073)
-- Dependencies: 22 1465
-- Name: idx_mysupertable_super_index; Type: INDEX; Schema: mysuperdata; Owner: postgres; Tablespace:
--

CREATE INDEX idx_mysupertable_super_index ON mysupertable USING btree (public.mysuperniftyfunc(super_key));


--
-- TOC entry 1740 (class 0 OID 0)
-- Dependencies: 3
-- Name: public; Type: ACL; Schema: -; Owner: postgres
--

REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON SCHEMA public FROM postgres;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO PUBLIC;


-- Completed on 2009-06-18 21:45:59

--
-- PostgreSQL database dump complete
--

Solution (workaround):

The work around for this situation is to either explicitly qualify the functions you are using within another or for PostgreSQL 8.3 -- add to your function set search_path=public

Solution 1:

CREATE OR REPLACE FUNCTION mysuperniftyfunc(myint integer) RETURNS integer AS
$$ SELECT public.myniftyfunc($1); $$
LANGUAGE 'sql' IMMUTABLE;

Solution 2: (only works for PostgreSQL 8.3+)

CREATE OR REPLACE FUNCTION mysuperniftyfunc(myint integer) RETURNS integer AS
$$ SELECT myniftyfunc($1); $$
LANGUAGE 'sql' IMMUTABLE;

ALTER FUNCTION mysuperniftyfunc(integer) SET search_path=public;

Of course neither of these solutions is particularly satisfying if you are a package author. If you are and that is how this mess started. You want people to be able to install your functions in whatever schema they like and if they wanted to use it globally they would add it to their database default search path. Though that is arguable. Perhaps all packages should live in specifically named schemas.

Posted by Leo Hsu and Regina Obe in intermediate, q&a at 23:04 | Comments (8) | Trackbacks (0)

Trackbacks

Trackback specific URI for this entry

No Trackbacks

Comments

Display comments as (Linear | Threaded)

wouldn't a workaround , be forcing pg_dump to dump schema.functionname() always ?

#1 gregj on 2009-06-19 04:28 (Reply)

If I understand you correctly, no that doesn't help. It is installing the functions in the right schema and the index can see the function perfectly fine. It is even smart enough to prefix the function with public in the index. In fact it works if you have no data in your table.

This particular issue happens because when pg_restore goes to build the index on the table in another schema, the search_path = yourtable_schema,pg_catalog

So since the functions above live in public (both functions actually do get installed before the tables but they are not in the same schema as the table or in pg_catalog), and the indexing process starts, the second function can no longer see the first function since its no longer in the search path. So the error message aside from not being super cool is kinda confusing because it makes you think the first function was never created. Its there just not visible by the second during the indexing process.

#1.1 Regina on 2009-06-19 07:45 (Reply)

but than, if function name _in index_ was prefixed with schema name, it would find it!

#1.1.1 gregj on 2009-06-19 08:04 (Reply)

Ah but it is - look at the pg_backup output:

CREATE INDEX idx_mysupertable_super_index ON mysupertable USING btree (public.mysuperniftyfunc(super_key));

but its when the create index process tries to call mysuperniftyfunc

mysuperniftyfunc can't find myniftyfunc

and breaks right here

SELECT myniftyfunc($1)

#1.1.1.1 Regina on 2009-06-19 08:28 (Reply)

hmm, that's odd.
I would honestly think, that when you do specify schema explicitly - it doesn't need search paths at all. Kinda like FSs.

#1.1.1.1.1 gregj on 2009-06-19 08:41 (Reply)

It doesn't unless thecalled function calls another function that is not schema qualified. So its a somewhat isolated issue. Except in normal database workload the function works fine since the schemas in use by the function are part of the the search_path of the db. Restore changes that so the default schemas are not necessarily in the search_path

#1.1.1.1.1.1 Regina on 2009-06-19 11:45 (Reply)

In your example above, add the following function:

create or replace function mysuperdata.myniftyfunc(myint integer) returns integer as
$_$
select mod($1*200,1000);
$_$
LANGUAGE sql immutable;

And lets give it some more data to make it interesting:
insert into mysupertable (sid, super_key) values(1,1), (2,200), (3,300);

Now, go ahead and pg_dump and restore to a database named tmp2. It works!

Now, ready for a head-slapper? Execute the following:
tmp2=# set search_path to public, mysuperdata;

SET

tmp2=# select * from mysupertable where public.mysuperniftyfunc(super_key) = 201;

sid | super_key

-----+-----------

2 | 200

(1 row)

tmp2=# explain select * from mysupertable where public.mysuperniftyfunc(super_key) = 201;

QUERY PLAN

------------------------------------------------------------

Seq Scan on mysupertable (cost=0.00..1.04 rows=1 width=8)

Filter: ((1 + super_key) = 201)

(2 rows)

tmp2=# set search_path to mysuperdata, public;

SET

tmp2=# select * from mysupertable where public.mysuperniftyfunc(super_key) = 201;

sid | super_key

-----+-----------

(0 rows)

tmp2=# explain select * from mysupertable where public.mysuperniftyfunc(super_key) = 201;

QUERY PLAN

------------------------------------------------------------

Seq Scan on mysupertable (cost=0.00..1.05 rows=1 width=8)

Filter: (mod((super_key * 200), 1000) = 201)

(2 rows)

WOOOPS! This is not a build issue, it is a design flaw. Once you introduce schemas, you have to assume that search paths will be different. Your supernifty proc is therefore dependent on the user's search path. What I wonder is, if the data is inserted by different users with different search paths, what happens to the index? I imagine it is functionally corrupt.

So I changed my search_path, and inserted a bunch of records:

insert into mysupertable (sid, super_key) select sid, sid*100 from generate_series(4,500) ser(sid);

Then I changed it again and inserted another bunch of records:

insert into mysupertable (sid, super_key) select sid, sid*100 from generate_series(501,1000) ser(sid);

Then I VACUUM ANALYZE the table and did an explain. Now that the table is larger, and the cost of a sequence scan is more than using an index, the index showed up in the plan:

explain select * from mysupertable where mysuperniftyfunc(super_key) = mysuperniftyfunc(200);

QUERY PLAN

-------------------------------------------------------------

Seq Scan on mysupertable (cost=0.00..22.50 rows=5 width=8)

Filter: (mod((super_key * 200), 1000) = 0)

(2 rows)

Wait! that is the wrong function! Change search_paths and try again:

explain select * from mysupertable where mysuperniftyfunc(super_key) = mysuperniftyfunc(200);

QUERY PLAN

-------------------------------------------------------------

Seq Scan on mysupertable (cost=0.00..17.49 rows=4 width=8)

Filter: ((1 + super_key) = 201)

(2 rows)

Right function, wrong plan!

Lets do a VACUUM ANALYZE again, try the explain again:

explain select * from mysupertable where mysuperniftyfunc(super_key) = mysuperniftyfunc(200);

QUERY PLAN

-------------------------------------------------------------------------------------------------

Index Scan using idx_mysupertable_super_index on mysupertable (cost=0.00..8.27 rows=1 width=8)

Index Cond: ((1 + super_key) = 201)

(2 rows)

Do you follow what is happening? the statistics are being influenced by the search path. (So are the search results, by the way - try it!).

If you are a package author, you have bigger problems than PostgreSQL not rebuilding properly. You have created a situation where someone else's package and the unpredictability of any given user's search path will result in a) bad performance; b) the wrong data coming back; c) a functionally corrupt index(?)

The problem is that the design above implements schema, but only partially - you left a big gaping design hole in mysuperniftyfunc. There ARE design scenarios where it makes sense to NOT use explicit schema within functions, but they are generally the exception, and not the rule. In this scenario, I used a foil (mysuperdata.myniftyfunc) to illustrate the design flaw - but it was there from the beginning. The failure to restore is just one of the problems with the code.

#2 matt peters on 2009-07-06 22:32 (Reply)

Matt,
Very good points. I'm aware of these, I guess I was looking for a having my cake and eating it too kind of solution.

Normally and I'm sure I am different from other users. I use schemas to logically segregate my data (I don't go around changing search paths willy nilly and yes users can have their search paths individually set -- but for global functions I always make sure users have public or whatever in their paths just like pgcatalog is always there -- you can't get rid of it) and I define a specific search path for my database. Adding in the paths I want where I want tables to be used without schema qualification -- because its annoying to schema qualify stuff all the time and not terribly portable and hard to explain to users.

So I guess my basic point is if my database works happily given the search_paths I define, I expect it to work happily when I restore it as well. Yes I am a stupid user for expecting these things and I see very well the flaw in my logic for wanting these things. But it doesn't change the fact that this is not a solution just more problems.

I don't really want to have to force people to install functions in a specific schema. Now of course if PostgreSQL had something like a
"this function references other functions in the schema it is stored in" which is a quite common scenario especially for packages (without having to explicitly define the schema these functions should be stored in), that would solve all my problems and I'm sure many package authors as well and would be logically consistent and not break your nicely elaboret example.

But PostgreSQL doesn't to my knowledge support this idea of the schema that the function is stored in can reference things in the schema it is in and that is the main problem I have and why I'm a very frustrated user.

#2.1 Regina on 2009-07-07 12:10 (Reply)

Add Comment

Name
Email
Homepage
In reply to
Comment	E-Mail addresses will not be displayed and will only be used for E-Mail notifications. To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly. Enter the string from the spam-prevention image above: Phone* What is nine minus six?
	Remember Information? Subscribe to this entry

Restore of functional indexes gotcha

Postgres OnLine Journal

PostGIS in Action About the Authors Consulting

Thursday, June 18. 2009

Restore of functional indexes gotcha

Entry's Links

Quicksearch

Calendar

Categories

Archives

Subscribe

Blog Administration

Restore of functional indexes gotcha

Postgres OnLine Journal PostGIS in Action About the Authors Consulting

Thursday, June 18. 2009

Restore of functional indexes gotcha

Entry's Links

Quicksearch

Calendar

Categories

Archives

Subscribe

Blog Administration

Postgres OnLine Journal

PostGIS in Action About the Authors Consulting