Thursday, June 18. 2009Restore of functional indexes gotchaPrinter FriendlyTrackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
wouldn't a workaround , be forcing pg_dump to dump schema.functionname() always ?
If I understand you correctly, no that doesn't help. It is installing the functions in the right schema and the index can see the function perfectly fine. It is even smart enough to prefix the function with public in the index. In fact it works if you have no data in your table.
This particular issue happens because when pg_restore goes to build the index on the table in another schema, the search_path = yourtable_schema,pg_catalog So since the functions above live in public (both functions actually do get installed before the tables but they are not in the same schema as the table or in pg_catalog), and the indexing process starts, the second function can no longer see the first function since its no longer in the search path. So the error message aside from not being super cool is kinda confusing because it makes you think the first function was never created. Its there just not visible by the second during the indexing process.
but than, if function name _in index_ was prefixed with schema name, it would find it!
Ah but it is - look at the pg_backup output:
CREATE INDEX idx_mysupertable_super_index ON mysupertable USING btree (public.mysuperniftyfunc(super_key)); but its when the create index process tries to call mysuperniftyfunc mysuperniftyfunc can't find myniftyfunc and breaks right here SELECT myniftyfunc($1)
hmm, that's odd.
I would honestly think, that when you do specify schema explicitly - it doesn't need search paths at all. Kinda like FSs.
It doesn't unless thecalled function calls another function that is not schema qualified. So its a somewhat isolated issue. Except in normal database workload the function works fine since the schemas in use by the function are part of the the search_path of the db. Restore changes that so the default schemas are not necessarily in the search_path
In your example above, add the following function:
create or replace function mysuperdata.myniftyfunc(myint integer) returns integer as $_$ select mod($1*200,1000); $_$ LANGUAGE sql immutable; And lets give it some more data to make it interesting: insert into mysupertable (sid, super_key) values(1,1), (2,200), (3,300); Now, go ahead and pg_dump and restore to a database named tmp2. It works! Now, ready for a head-slapper? Execute the following: tmp2=# set search_path to public, mysuperdata; SET tmp2=# select * from mysupertable where public.mysuperniftyfunc(super_key) = 201; sid | super_key -----+----------- 2 | 200 (1 row) tmp2=# explain select * from mysupertable where public.mysuperniftyfunc(super_key) = 201; QUERY PLAN ------------------------------------------------------------ Seq Scan on mysupertable (cost=0.00..1.04 rows=1 width=8) Filter: ((1 + super_key) = 201) (2 rows) tmp2=# set search_path to mysuperdata, public; SET tmp2=# select * from mysupertable where public.mysuperniftyfunc(super_key) = 201; sid | super_key -----+----------- (0 rows) tmp2=# explain select * from mysupertable where public.mysuperniftyfunc(super_key) = 201; QUERY PLAN ------------------------------------------------------------ Seq Scan on mysupertable (cost=0.00..1.05 rows=1 width=8) Filter: (mod((super_key * 200), 1000) = 201) (2 rows) WOOOPS! This is not a build issue, it is a design flaw. Once you introduce schemas, you have to assume that search paths will be different. Your supernifty proc is therefore dependent on the user's search path. What I wonder is, if the data is inserted by different users with different search paths, what happens to the index? I imagine it is functionally corrupt. So I changed my search_path, and inserted a bunch of records: insert into mysupertable (sid, super_key) select sid, sid*100 from generate_series(4,500) ser(sid); Then I changed it again and inserted another bunch of records: insert into mysupertable (sid, super_key) select sid, sid*100 from generate_series(501,1000) ser(sid); Then I VACUUM ANALYZE the table and did an explain. Now that the table is larger, and the cost of a sequence scan is more than using an index, the index showed up in the plan: explain select * from mysupertable where mysuperniftyfunc(super_key) = mysuperniftyfunc(200); QUERY PLAN ------------------------------------------------------------- Seq Scan on mysupertable (cost=0.00..22.50 rows=5 width=8) Filter: (mod((super_key * 200), 1000) = 0) (2 rows) Wait! that is the wrong function! Change search_paths and try again: explain select * from mysupertable where mysuperniftyfunc(super_key) = mysuperniftyfunc(200); QUERY PLAN ------------------------------------------------------------- Seq Scan on mysupertable (cost=0.00..17.49 rows=4 width=8) Filter: ((1 + super_key) = 201) (2 rows) Right function, wrong plan! Lets do a VACUUM ANALYZE again, try the explain again: explain select * from mysupertable where mysuperniftyfunc(super_key) = mysuperniftyfunc(200); QUERY PLAN ------------------------------------------------------------------------------------------------- Index Scan using idx_mysupertable_super_index on mysupertable (cost=0.00..8.27 rows=1 width=8) Index Cond: ((1 + super_key) = 201) (2 rows) Do you follow what is happening? the statistics are being influenced by the search path. (So are the search results, by the way - try it!). If you are a package author, you have bigger problems than PostgreSQL not rebuilding properly. You have created a situation where someone else's package and the unpredictability of any given user's search path will result in a) bad performance; b) the wrong data coming back; c) a functionally corrupt index(?) The problem is that the design above implements schema, but only partially - you left a big gaping design hole in mysuperniftyfunc. There ARE design scenarios where it makes sense to NOT use explicit schema within functions, but they are generally the exception, and not the rule. In this scenario, I used a foil (mysuperdata.myniftyfunc) to illustrate the design flaw - but it was there from the beginning. The failure to restore is just one of the problems with the code.
Matt,
Very good points. I'm aware of these, I guess I was looking for a having my cake and eating it too kind of solution. Normally and I'm sure I am different from other users. I use schemas to logically segregate my data (I don't go around changing search paths willy nilly and yes users can have their search paths individually set -- but for global functions I always make sure users have public or whatever in their paths just like pgcatalog is always there -- you can't get rid of it) and I define a specific search path for my database. Adding in the paths I want where I want tables to be used without schema qualification -- because its annoying to schema qualify stuff all the time and not terribly portable and hard to explain to users. So I guess my basic point is if my database works happily given the search_paths I define, I expect it to work happily when I restore it as well. Yes I am a stupid user for expecting these things and I see very well the flaw in my logic for wanting these things. But it doesn't change the fact that this is not a solution just more problems. I don't really want to have to force people to install functions in a specific schema. Now of course if PostgreSQL had something like a "this function references other functions in the schema it is stored in" which is a quite common scenario especially for packages (without having to explicitly define the schema these functions should be stored in), that would solve all my problems and I'm sure many package authors as well and would be logically consistent and not break your nicely elaboret example. But PostgreSQL doesn't to my knowledge support this idea of the schema that the function is stored in can reference things in the schema it is in and that is the main problem I have and why I'm a very frustrated user. |
Entry's LinksQuicksearchCalendar
Categories
Blog Administration |