Tom Lane [Tue, 6 Feb 2007 06:50:33 +0000 (06:50 +0000)]
Fix a performance regression in 8.2: optimization of MIN/MAX into indexscans
had stopped working for tables buried inside views or sub-selects. This is
because I had gotten rid of the simplify_jointree() preprocessing step, and
optimize_minmax_aggregates() wasn't smart enough to deal with a non-canonical
FromExpr. Per gripe from Bill Howe.
Tom Lane [Sun, 4 Feb 2007 20:00:49 +0000 (20:00 +0000)]
Don't MAXALIGN in the checks to decide whether a tuple is over TOAST's
threshold for tuple length. On 4-byte-MAXALIGN machines, the toast code
creates tuples that have t_len exactly TOAST_TUPLE_THRESHOLD ... but this
number is not itself maxaligned, so if heap_insert maxaligns t_len before
comparing to TOAST_TUPLE_THRESHOLD, it'll uselessly recurse back to
tuptoaster.c, wasting cycles. (It turns out that this does not happen on
8-byte-MAXALIGN machines, because for them the outer MAXALIGN in the
TOAST_MAX_CHUNK_SIZE macro reduces TOAST_MAX_CHUNK_SIZE so that toast tuples
will be less than TOAST_TUPLE_THRESHOLD in size. That MAXALIGN is really
incorrect, but we can't remove it now, see below.) There isn't any particular
value in maxaligning before comparing to the thresholds, so just don't do
that, which saves a small number of cycles in itself.
These numbers should be rejiggered to minimize wasted space on toast-relation
pages, but we can't do that in the back branches because changing
TOAST_MAX_CHUNK_SIZE would force an initdb (by changing the contents of toast
tables). We can move the toast decision thresholds a bit, though, which is
what this patch effectively does.
Thanks to Pavan Deolasee for discovering the unintended recursion.
Back-patch into 8.2, but not further, pending more testing. (HEAD is about
to get a further patch modifying the thresholds, so it won't help much
for testing this form of the patch.)
Tom Lane [Fri, 2 Feb 2007 00:07:28 +0000 (00:07 +0000)]
Repair failure to check that a table is still compatible with a previously
made query plan. Use of ALTER COLUMN TYPE creates a hazard for cached
query plans: they could contain Vars that claim a column has a different
type than it now has. Fix this by checking during plan startup that Vars
at relation scan level match the current relation tuple descriptor. Since
at that point we already have at least AccessShareLock, we can be sure the
column type will not change underneath us later in the query. However,
since a backend's locks do not conflict against itself, there is still a
hole for an attacker to exploit: he could try to execute ALTER COLUMN TYPE
while a query is in progress in the current backend. Seal that hole by
rejecting ALTER TABLE whenever the target relation is already open in
the current backend.
This is a significant security hole: not only can one trivially crash the
backend, but with appropriate misuse of pass-by-reference datatypes it is
possible to read out arbitrary locations in the server process's memory,
which could allow retrieving database content the user should not be able
to see. Our thanks to Jeff Trout for the initial report.
Tom Lane [Fri, 2 Feb 2007 00:03:17 +0000 (00:03 +0000)]
Repair insufficiently careful type checking for SQL-language functions:
we should check that the function code returns the claimed result datatype
every time we parse the function for execution. Formerly, for simple
scalar result types we assumed the creation-time check was sufficient, but
this fails if the function selects from a table that's been redefined since
then, and even more obviously fails if check_function_bodies had been OFF.
This is a significant security hole: not only can one trivially crash the
backend, but with appropriate misuse of pass-by-reference datatypes it is
possible to read out arbitrary locations in the server process's memory,
which could allow retrieving database content the user should not be able
to see. Our thanks to Jeff Trout for the initial report.
Tom Lane [Thu, 1 Feb 2007 19:23:00 +0000 (19:23 +0000)]
Fix plpgsql so that when a local variable has no initial-value expression,
an error will be thrown correctly if the variable is of a NOT NULL domain.
Report and almost-correct fix from Sergiy Vyshnevetskiy (bug #2948).
Bruce Momjian [Tue, 30 Jan 2007 22:29:40 +0000 (22:29 +0000)]
Update documentation for backslashes to mention escape string syntax
more, and standard_conforming_strings less, because in the future non-E
strings will not treat backslashes specially.
Also use E'' strings where backslashes are used in examples. (The
existing examples would have drawn warnings.)
Tom Lane [Tue, 30 Jan 2007 22:05:20 +0000 (22:05 +0000)]
Repair oversights in the mechanism used to store compiled plpgsql functions.
The original coding failed (tried to access deallocated memory) if there were
two active call sites (fn_extra pointers) for the same function and the
function definition was updated. Also, if an update of a recursive function
was detected upon nested entry to the function, the existing compiled version
was summarily deallocated, resulting in crash upon return to the outer
instance. Problem observed while studying a bug report from Sergiy
Vyshnevetskiy.
Bug does not exist before 8.1 since older versions just leaked the memory of
obsoleted compiled functions, rather than trying to reclaim it.
Tom Lane [Tue, 30 Jan 2007 18:02:28 +0000 (18:02 +0000)]
Add SPI_push/SPI_pop calls so that datatype input and output functions called
by plpgsql can themselves use SPI --- possibly indirectly, as in the case
of domain_in() invoking plpgsql functions in a domain check constraint.
Per bug #2945 from Sergiy Vyshnevetskiy.
Somewhat arbitrarily, I've chosen to back-patch this as far as 8.0. Given
the lack of prior complaints, it doesn't seem critical for 7.x.
Tom Lane [Sun, 28 Jan 2007 18:50:48 +0000 (18:50 +0000)]
Repair oversight in creation of "append relations": we should set up
rel->tuples as well as rel->rows, since some estimation functions expect both
to be valid in every baserel. Per report from Dave Dutcher.
Tom Lane [Sun, 28 Jan 2007 16:15:58 +0000 (16:15 +0000)]
Fix up plpgsql's "simple expression" evaluation mechanism so that it behaves
safely in the presence of subtransactions. To ensure that any ExprContext
shutdown callbacks are called at the right times, we have to have a separate
EState for each level of subtransaction. Per "TupleDesc reference leak" bug
report from Stefan Kaltenbrunner.
Although I'm convinced the code is wrong as far back as 8.0, it doesn't seem
that there are any ways for the problem to really manifest before 8.2: AFAICS,
8.0 and 8.1 only use the ExprContextCallback mechanism to handle set-returning
functions, which cannot usefully be executed in a "simple expression" anyway.
Hence, no backpatch before 8.2 --- the risk of unforeseen breakage seems
to outweigh the chance of fixing something.
Tom Lane [Sun, 28 Jan 2007 02:53:42 +0000 (02:53 +0000)]
Dept of second thoughts: the IQ of estimate_array_length() needs to be
kept on par with that of scalararraysel(), else estimates that should
track might not. Hence teach it about binary-compatible cases, too.
Tom Lane [Sun, 28 Jan 2007 01:37:45 +0000 (01:37 +0000)]
Fix scalararraysel() to cope with binary-compatible cases, such as text[]
versus varchar[]. This oversight probably explains Ryan Holmes' recent
complaint --- he was getting a generic selectivity estimate instead of
anything intelligent.
Tom Lane [Sat, 27 Jan 2007 20:53:36 +0000 (20:53 +0000)]
Correct an old logic error in btree page splitting: when considering a split
exactly at the point where we need to insert a new item, the calculation used
the wrong size for the "high key" of the new left page. This could lead to
choosing an unworkable split, resulting in "PANIC: failed to add item to the
left sibling" (or "right sibling") failure. Although this bug has been there
a long time, it's very difficult to trigger a failure before 8.2, since there
was generally a lot of free space on both sides of a chosen split. In 8.2,
where the user-selected fill factor determines how much free space the code
tries to leave, an unworkable split is much more likely. Report by Joe
Conway, diagnosis and fix by Heikki Linnakangas.
Tom Lane [Sat, 27 Jan 2007 20:15:47 +0000 (20:15 +0000)]
Back-port changes of Jan 16 and 17 to "revoke" pending fsync requests during
DROP TABLE and DROP DATABASE. Should prevent unexpected "permission denied"
failures on Windows, and is cleaner on other platforms too since we no longer
have to take it on faith that ENOENT is okay during an fsync attempt.
Patched as far back as 8.1; per recent discussion I think we are not going
to worry about Windows-specific issues in 8.0 anymore.
Tom Lane [Fri, 26 Jan 2007 20:07:01 +0000 (20:07 +0000)]
On Windows, use pgwin32_waitforsinglesocket() instead of select() to wait for
input in the stats collector. Our select() emulation is apparently buggy
for UDP sockets :-(. This should resolve problems with stats collection
(and hence autovacuum) failing under more than minimal load. Diagnosis
and patch by Magnus Hagander.
Patch probably needs to be back-ported to 8.1 and 8.0, but first let's
see if it makes the buildfarm happy...
Tom Lane [Wed, 24 Jan 2007 17:12:23 +0000 (17:12 +0000)]
Get pg_utf_mblen(), pg_utf2wchar_with_len(), and utf2ucs() all on the same
page about the maximum UTF8 sequence length we support (4 bytes since 8.1,
3 before that). pg_utf2wchar_with_len never got updated to support 4-byte
characters at all, and in any case had a buffer-overrun risk in that it
could produce multiple pg_wchars from what mblen claims to be just one UTF8
character. The only reason we don't have a major security hole is that most
callers allocate worst-case output buffers; the sole exception in released
versions appears to be pre-8.2 iwchareq() (ie, ILIKE), which can be crashed
due to zeroing out its return address --- but AFAICS that can't be exploited
for anything more than a crash, due to inability to control what gets written
there. Per report from James Russell and Michael Fuhr.
Pre-8.1 the risk is much less, but I still think pg_utf2wchar_with_len's
behavior given an incomplete final character risks buffer overrun, so
back-patch that logic change anyway.
This patch also makes sure that UTF8 sequences exceeding the supported
length (whichever it is) are consistently treated as error cases, rather
than being treated like a valid shorter sequence in some places.
Tom Lane [Wed, 24 Jan 2007 01:25:51 +0000 (01:25 +0000)]
Relax an Assert() that has been found to be too strict in some situations
involving unions of types having typmods. Variants of the failure are known
to occur in 8.1 and up; not sure if it's possible in 8.0 and 7.4, but since
the code exists that far back, I'll just patch 'em all. Per report from
Brian Hurt.
Bruce Momjian [Sat, 20 Jan 2007 21:30:35 +0000 (21:30 +0000)]
Update documentation about postgresqlconf to mention default units that
match the postgresql.conf file. Also add units to descriptions that
lacked them. Wording improvements. Mention pg_settings.unit as the way
to find the default units for setting.
Tom Lane [Tue, 16 Jan 2007 18:32:32 +0000 (18:32 +0000)]
Fix incorrect permissions check in information_schema.key_column_usage view:
it was checking a pg_constraint OID instead of pg_class OID, resulting in
"relation with OID nnnnn does not exist" failures for anyone who wasn't
owner of the table being examined. Per bug #2848 from Laurence Rowe.
Note: for existing 8.2 installations a simple version update won't fix this;
the easiest fix is to CREATE OR REPLACE this view with the corrected
definition.
Tom Lane [Fri, 12 Jan 2007 23:35:04 +0000 (23:35 +0000)]
Fix handling of CC (century) format spec in to_date/to_char. According to
standard convention the 21st century runs from 2001-2100, not 2000-2099,
so make it work like that. Per bug #2885 from Akio Iwaasa.
Backpatch to 8.2, but no further, since this is really a definitional
change; users of older branches are probably more interested in stability.
Tom Lane [Thu, 11 Jan 2007 23:06:09 +0000 (23:06 +0000)]
Fix a performance problem in databases with large numbers of tables
(or other types of pg_class entry): the function pgstat_vacuum_tabstat,
invoked during VACUUM startup, had runtime proportional to the number of
stats table entries times the number of pg_class rows; in other words
O(N^2) if the stats collector's information is reasonably complete.
Replace list searching with a hash table to bring it back to O(N)
behavior. Per report from kim at myemma.com.
Back-patch as far as 8.1; 8.0 and before use different coding here.
Tatsuo Ishii [Wed, 10 Jan 2007 01:20:10 +0000 (01:20 +0000)]
Backport patch.
Call srandom() instead of srand().
pgbench calls random() later, so it should have called srandom().
On most platforms except Windows srandom() is actually identical
to srand(), so the bug only bites Windows users.
per bug report from Akio Ishida.
Tom Lane [Mon, 8 Jan 2007 16:47:35 +0000 (16:47 +0000)]
Tweak joinlist creation to avoid generating useless one-element subproblems
when collapsing of JOIN trees is stopped by join_collapse_limit. For instance
a list of 11 LEFT JOINs with limit 8 now produces something like
((1 2 3 4 5 6 7 8) 9 10 11 12)
instead of
(((1 2 3 4 5 6 7 8) (9)) 10 11 12)
The latter structure is really only required for a FULL JOIN.
Noted while studying an example from Shane Ambler.
Tom Lane [Mon, 8 Jan 2007 16:09:31 +0000 (16:09 +0000)]
Remove cost_hashjoin's very ancient hack to discourage (once, entirely forbid)
hash joins with the estimated-larger relation on the inside. There are
several cases where doing that makes perfect sense, and in cases where it
doesn't, the regular cost computation really ought to be able to figure that
out. Make some marginal tweaks in said computation to try to get results
approximating reality a bit better. Per an example from Shane Ambler.
Also, fix an oversight in the original patch to add seq_page_cost: the costs
of spilling a hash join to disk should be scaled by seq_page_cost.
Tom Lane [Sun, 7 Jan 2007 01:56:24 +0000 (01:56 +0000)]
Fix oversight in handling of row-comparison index keys: if the row comparison
doesn't exactly match the index, we may have to change our initial positioning
strategy. For example, given an index on (f1,f2,f3) and a WHERE condition
"ROW(f1,f3) > ROW(2,3)", the code extracted the initial-positioning condition
"f1 > 2", which is wrong ... it has to be "f1 >= 2", else some rows matching
the WHERE condition may fail to be returned.
Applying patch to 8.2 only --- I'll fix it in HEAD later as part of the
planned index improvements (reverse-sort and NULLS FIRST/LAST work).
Tom Lane [Sat, 6 Jan 2007 19:14:27 +0000 (19:14 +0000)]
Fix filtered_base_yylex() to save and restore base_yylval and base_yylloc
properly when doing a lookahead. The lack of this was causing various
interesting misbehaviors when one tries to use "with" as a plain identifier.
Tom Lane [Thu, 4 Jan 2007 17:49:42 +0000 (17:49 +0000)]
Tweak pg_dumpall to add GRANT CONNECT ON DATABASE ... TO PUBLIC when dumping
database privileges from a pre-8.2 server. This ensures that the reloaded
database will maintain the same behavior it had in the previous installation,
ie, everybody has connect privilege. Per gripe from L Bayuk.
Tom Lane [Thu, 4 Jan 2007 00:58:01 +0000 (00:58 +0000)]
Fix erroneous implementation of -s in postmaster.c (the switch doesn't take
an optarg). Add some comments noting that code in three different files has
to be kept in sync. Fix erroneous description of -S switch (it sets work_mem
not silent_mode), and do some light copy-editing elsewhere in postgres-ref.
Tom Lane [Wed, 3 Jan 2007 22:39:35 +0000 (22:39 +0000)]
Fix regex_fixed_prefix() to cope reasonably well with regex patterns of the
form '^(foo)$'. Before, these could never be optimized into indexscans.
The recent changes to make psql and pg_dump generate such patterns (for \d
commands and -t and related switches, respectively) therefore represented
a big performance hit for people with large pg_class catalogs, as seen in
recent gripe from Erik Jones. While at it, be more paranoid about
case-sensitivity checking in multibyte encodings, and fix some other
corner cases in which a regex might be interpreted too liberally.
Tom Lane [Thu, 28 Dec 2006 01:09:04 +0000 (01:09 +0000)]
Add a defense to prevent core dumps if 8.2 version of rank_cd() is used with
the 8.1 SQL function definition for it. Per report from Rajesh Kumar Mallah,
such a DBA error doesn't seem at all improbable, and the cost of checking for
it is not very high compared to the cost of running this function. (It would
have been better to change the C name of the function so it wouldn't be called
by the old SQL definition, but it's too late for that now in the 8.2 branch.)
Tom Lane [Wed, 27 Dec 2006 22:31:59 +0000 (22:31 +0000)]
Modify local buffer management to request memory for local buffers in blocks
of increasing size, instead of one at a time. This reduces the memory
management overhead when num_temp_buffers is large: in the previous coding
we would actually waste 50% of the space used for temp buffers, because aset.c
would round the individual requests up to 16K. Problem noted while studying
a performance issue reported by Steven Flatt.
Back-patch as far as 8.1 --- older versions used few enough local buffers
that the issue isn't significant for them.
Tom Lane [Wed, 27 Dec 2006 19:45:51 +0000 (19:45 +0000)]
Print combining characters (those reported as having zero width by
PQdsplen()) normally, instead of replacing them by \uXXXX sequences.
Assume that they in fact occupy zero screen space for formatting purposes.
Per gripe from Michael Fuhr and ensuing discussion.
Tom Lane [Wed, 27 Dec 2006 16:07:42 +0000 (16:07 +0000)]
Use FROM clause in example UPDATE commands where appropriate. Also
remove long-obsolete statement that there isn't a check for infinite
recursion in view rules.
Tom Lane [Tue, 26 Dec 2006 21:37:28 +0000 (21:37 +0000)]
Fix failure due to accessing an already-freed tuple descriptor in a plan
involving HashAggregate over SubqueryScan (this is the known case, there
may well be more). The bug is only latent in releases before 8.2 since they
didn't try to access tupletable slots' descriptors during ExecDropTupleTable.
The least bogus fix seems to be to make subqueries share the parent query's
memory context, so that tupdescs they create will have the same lifespan as
those of the parent query. There are comments in the code envisioning going
even further by not having a separate child EState at all, but that will
require rethinking executor access to range tables, which I don't want to
tackle right now. Per bug report from Jean-Pierre Pelletier.
Tom Lane [Tue, 26 Dec 2006 19:26:56 +0000 (19:26 +0000)]
Repair bug #2839: the various ExecReScan functions need to reset
ps_TupFromTlist in plan nodes that make use of it. This was being done
correctly in join nodes and Result nodes but not in any relation-scan nodes.
Bug would lead to bogus results if a set-returning function appeared in the
targetlist of a subquery that could be rescanned after partial execution,
for example a subquery within EXISTS(). Bug has been around forever :-(
... surprising it wasn't reported before.
Tom Lane [Tue, 26 Dec 2006 16:56:22 +0000 (16:56 +0000)]
Repair bug #2836: SPI_execute_plan returned zero if none of the querytrees
were marked canSetTag. While it's certainly correct to return the result
of the last one that is marked canSetTag, it's less clear what to do when
none of them are. Since plpgsql will complain if zero is returned, the
8.2.0 behavior isn't good. I've fixed it to restore the prior behavior of
returning the physically last query's result code when there are no
canSetTag queries.
Tom Lane [Fri, 15 Dec 2006 18:42:35 +0000 (18:42 +0000)]
Fix some planner bugs exposed by reports from Arjen van der Meijden. These
are all in new-in-8.2 logic associated with indexability of ScalarArrayOpExpr
(IN-clauses) or amortization of indexscan costs across repeated indexscans
on the inside of a nestloop. In particular:
Fix some logic errors in the estimation for multiple scans induced by a
ScalarArrayOpExpr indexqual.
Include a small cost component in bitmap index scans to reflect the costs of
manipulating the bitmap itself; this is mainly to prevent a bitmap scan from
appearing to have the same cost as a plain indexscan for fetching a single
tuple.
Also add a per-index-scan-startup CPU cost component; while prior releases
were clearly too pessimistic about the cost of repeated indexscans, the
original 8.2 coding allowed the cost of an indexscan to effectively go to zero
if repeated often enough, which is overly optimistic.
Pay some attention to index correlation when estimating costs for a nestloop
inner indexscan: this is significant when the plan fetches multiple heap
tuples per iteration, since high correlation means those tuples are probably
on the same or adjacent heap pages.
Tom Lane [Tue, 12 Dec 2006 21:31:09 +0000 (21:31 +0000)]
Fix planner to do the right thing when a degenerate outer join (one whose
joinclause doesn't use any outer-side vars) requires a "bushy" plan to be
created. The normal heuristic to avoid joins with no joinclause has to be
overridden in that case. Problem is new in 8.2; before that we forced the
outer join order anyway. Per example from Teodor.
Tom Lane [Fri, 8 Dec 2006 00:40:33 +0000 (00:40 +0000)]
Avoid double free of _SPI_current->tuptable. AtEOSubXact_SPI() now tries to
release it in a subtransaction abort, but this neglects possibility that
someone outside SPI already did. Fix is for spi.c to forget about a tuptable
as soon as it's handed it back to the caller.
Per bug #2817 from Michael Andreen.
Tom Lane [Thu, 7 Dec 2006 19:33:48 +0000 (19:33 +0000)]
Repair incorrect placement of WHERE clauses when there are multiple,
rearrangeable outer joins and the WHERE clause is non-strict and mentions
only nullable-side relations. New bug in 8.2, caused by new logic to allow
rearranging outer joins. Per bug #2807 from Ross Cohen; thanks to Jeff
Davis for producing a usable test case.
Tom Lane [Wed, 6 Dec 2006 19:40:08 +0000 (19:40 +0000)]
Fix planning of SubLinks to ensure that Vars generated from transformation of
a sublink's test expression have the correct vartypmod, rather than defaulting
to -1. There's at least one place where this is important because we're
expecting these Vars to be exactly equal() to those appearing in the subplan
itself. This is a pretty klugy solution --- it would likely be cleaner to
change Param nodes to include a typmod field --- but we can't do that in the
already-released 8.2 branch.
Per bug report from Hubert Fongarnand.
Tom Lane [Sun, 3 Dec 2006 21:40:13 +0000 (21:40 +0000)]
Fix LIMIT/OFFSET for null limit values. This worked before 8.2 but was broken
by the change to make limit values int8 instead of int4. (Specifically, you
can do DatumGetInt32 safely on a null value, but not DatumGetInt64.) Per
bug #2803 from Greg Johnson.