Tom Lane [Thu, 26 Apr 2007 23:24:46 +0000 (23:24 +0000)]
Fix dynahash.c to suppress hash bucket splits while a hash_seq_search() scan
is in progress on the same hashtable. This seems the least invasive way to
fix the recently-recognized problem that a split could cause the scan to
visit entries twice or (with much lower probability) miss them entirely.
The only field-reported problem caused by this is the "failed to re-find
shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky,
which was caused by multiply visited entries. However, it seems certain
that mdsync() is vulnerable to missing required fsync's due to missed
entries, and I am fearful that RelationCacheInitializePhase2() might be at
risk as well. Because of that and the generalized hazard presented by this
bug, back-patch all the supported branches.
Along the way, fix pg_prepared_statement() and pg_cursor() to not assume
that the hashtables they are examining will stay static between calls.
This is risky regardless of the newly noted dynahash problem, because
hash_seq_search() has never promised to cope with deletion of table entries
other than the just-returned one. There may be no bug here because the only
supported way to call these functions is via ExecMakeTableFunctionResult()
which will cycle them to completion before doing anything very interesting,
but it seems best to get rid of the assumption. This affects 8.2 and HEAD
only, since those functions weren't there earlier.
Neil Conway [Thu, 26 Apr 2007 22:25:56 +0000 (22:25 +0000)]
Another tweak for tab completion of CREATE TEMP. Instead of only
completing CREATE { TEMP | TEMPORARY } TABLE, we should also suggest
VIEW and SEQUENCE. Per Greg Sabino Mullane.
Neil Conway [Thu, 26 Apr 2007 18:10:28 +0000 (18:10 +0000)]
Minor enhancement to psql tab completion. If we see "CREATE TEMPORARY",
we can complete "TABLE". The previous coding only looked for "CREATE TEMP".
Note that I didn't add TEMPORARY to the list of suggested completions
after we've seen "CREATE", since TEMP is equivalent and more concise. But
if the user has already manually typed TEMPORARY, we may as well
complete TABLE for them.
Neil Conway [Thu, 26 Apr 2007 16:13:15 +0000 (16:13 +0000)]
Rename the newly-added commands for discarding session state.
RESET SESSION, RESET PLANS, and RESET TEMP are now DISCARD ALL,
DISCARD PLANS, and DISCARD TEMP, respectively. This is to avoid
confusion with the pre-existing RESET variants: the DISCARD
commands are not actually similar to RESET. Patch from Marko
Kreen, with some minor editorialization.
Tom Lane [Sun, 22 Apr 2007 03:52:40 +0000 (03:52 +0000)]
Remove some of the most blatant brain-fade in the recent guc patch
(it's so nice to have a buildfarm member that actively rejects naked
uses of strcasecmp). This coding is still pretty awful, though, since
it's going to be O(N^2) in the number of guc variables. May I direct
your attention to bsearch?
Tom Lane [Sat, 21 Apr 2007 21:01:45 +0000 (21:01 +0000)]
Some further performance tweaks for planning large inheritance trees that
are mostly excluded by constraints: do the CE test a bit earlier to save
some adjust_appendrel_attrs() work on excluded children, and arrange to
use array indexing rather than rt_fetch() to fetch RTEs in the main body
of the planner. The latter is something I'd wanted to do for awhile anyway,
but seeing list_nth_cell() as 35% of the runtime gets one's attention.
Tom Lane [Sat, 21 Apr 2007 05:56:41 +0000 (05:56 +0000)]
Tweak make_inh_translation_lists() to check the common case wherein parent and
child attnums are the same, before it grovels through each and every child
column looking for a name match. Saves some time in large inheritance trees,
per example from Greg.
Tom Lane [Sat, 21 Apr 2007 04:49:20 +0000 (04:49 +0000)]
Improve the way in which CatalogCacheComputeHashValue combines multiple key
values: don't throw away perfectly good hash bits, and increase the shift
distances so as to provide more separation in the common case where some of
the key values are small integers (and so their hashes are too, because
hashfunc.c doesn't try all that hard). This reduces the runtime of
SearchCatCache by a factor of 4 in an example provided by Greg Stark,
in which the planner spends a whole lot of time searching the two-key
STATRELATT cache. It seems unlikely to hurt in other cases, but maybe
we could do even better?
Tom Lane [Sat, 21 Apr 2007 04:10:53 +0000 (04:10 +0000)]
Adjust pgstat_initstats() to avoid repeated searches of the TabStat arrays
when a relation is opened multiple times in the same transaction. This is
particularly useful for system catalogs, which we may heap_open or index_open
many times in a transaction, and it doesn't really cost anything extra even
if the rel is touched but once. Motivated by study of an example from Greg
Stark, in which pgstat_initstats() accounted for an unreasonably large
fraction of the runtime.
Tom Lane [Sat, 21 Apr 2007 02:41:13 +0000 (02:41 +0000)]
Tweak set_rel_width() to avoid redundant executions of getrelid().
In very large queries this accounts for a noticeable fraction of
planning time. Per an example from Greg Stark.
Tom Lane [Fri, 20 Apr 2007 02:37:38 +0000 (02:37 +0000)]
Support explicit placement of the temporary-table schema within search_path.
This is needed to allow a security-definer function to set a truly secure
value of search_path. Without it, a malicious user can use temporary objects
to execute code with the privileges of the security-definer function. Even
pushing the temp schema to the back of the search path is not quite good
enough, because a function or operator at the back of the path might still
capture control from one nearer the front due to having a more exact datatype
match. Hence, disable searching the temp schema altogether for functions and
operators.
Tom Lane [Thu, 19 Apr 2007 20:24:04 +0000 (20:24 +0000)]
Repair PANIC condition in hash indexes when a previous index extension attempt
failed (due to lock conflicts or out-of-space). We might have already
extended the index's filesystem EOF before failing, causing the EOF to be
beyond what the metapage says is the last used page. Hence the invariant
maintained by the code needs to be "EOF is at or beyond last used page",
not "EOF is exactly the last used page". Problem was created by my patch
of 2006-11-19 that attempted to repair bug #2737. Since that was
back-patched to 7.4, this needs to be as well. Per report and test case
from Vlastimil Krejcir.
Tom Lane [Thu, 19 Apr 2007 16:33:24 +0000 (16:33 +0000)]
Fix plpgsql to avoid reference to already-freed memory when returning a
pass-by-reference data type and the RETURN statement is within an EXCEPTION
block. Bug introduced by my fix of 2007-01-28 to use per-subtransaction
ExprContexts/EStates; since that wasn't back-patched into older branches,
only 8.2 and HEAD are affected. Per report from Gary Winslow.
Bruce Momjian [Wed, 18 Apr 2007 00:17:56 +0000 (00:17 +0000)]
Document that the COPY delimiter must be an ASCII byte, rather than a
multi-byte value. It can also be a single-byte encoded character if
the client and server versions match.
Bruce Momjian [Tue, 17 Apr 2007 20:50:34 +0000 (20:50 +0000)]
Add warning about TODO item:
< Currently all schemas are owned by the super-user because they are
< copied from the template1 database.
> Currently all schemas are owned by the super-user because they are copied
> from the template1 database. However, since all objects are inherited
> from the template database, it is not clear that setting schemas to the db
> owner is correct.
Tom Lane [Tue, 17 Apr 2007 20:49:39 +0000 (20:49 +0000)]
Don't assume rd_smgr stays open across all of a rewriteheap operation;
doing so can result in crash if an sinval reset occurs meanwhile.
I believe this explains intermittent buildfarm failures in cluster test.
Tom Lane [Tue, 17 Apr 2007 20:03:03 +0000 (20:03 +0000)]
Rewrite choose_bitmap_and() to make it more robust in the presence of
competing alternatives for indexes to use in a bitmap scan. The former
coding took estimated selectivity as an overriding factor, causing it to
sometimes choose indexes that were much slower to scan than ones with a
slightly worse selectivity. It was also too narrow-minded about which
combinations of indexes to consider ANDing. The rewrite makes it pay more
attention to index scan cost than selectivity; this seems sane since it's
impossible to have very bad selectivity with low cost, whereas the reverse
isn't true. Also, we now consider each index alone, as well as adding
each index to an AND-group led by each prior index, for a total of about
O(N^2) rather than O(N) combinations considered. This makes the results
much less dependent on the exact order in which the indexes are
considered. It's still a lot cheaper than an O(2^N) exhaustive search.
A prefilter step eliminates all but the cheapest of those indexes using
the same set of WHERE conditions, to keep the effective value of N down in
scenarios where the DBA has created lots of partially-redundant indexes.
Tom Lane [Mon, 16 Apr 2007 18:42:10 +0000 (18:42 +0000)]
Fix pg_dump to not crash if -t or a similar switch is used to select a serial
sequence for dumping without also selecting its owning table. Make it not try
to emit ALTER SEQUENCE OWNED BY in this situation.
Per report from Michael Nolan.
Add a multi-worker capability to autovacuum. This allows multiple worker
processes to be running simultaneously. Also, now autovacuum processes do not
count towards the max_connections limit; they are counted separately from
regular processes, and are limited by the new GUC variable
autovacuum_max_workers.
The launcher now has intelligence to launch workers on each database every
autovacuum_naptime seconds, limited only on the max amount of worker slots
available.
Also, the global worker I/O utilization is limited by the vacuum cost-based
delay feature. Workers are "balanced" so that the total I/O consumption does
not exceed the established limit. This part of the patch was contributed by
ITAGAKI Takahiro.
Tom Lane [Mon, 16 Apr 2007 18:21:07 +0000 (18:21 +0000)]
Make plancache store cursor options so it can pass them to planner during
a replan. I had originally thought this was not necessary, but the new
SPI facilities create a path whereby queries planned with non-default
options can get into the cache, so it is necessary.
Tom Lane [Mon, 16 Apr 2007 01:14:58 +0000 (01:14 +0000)]
Expose more cursor-related functionality in SPI: specifically, allow
access to the planner's cursor-related planning options, and provide new
FETCH/MOVE routines that allow access to the full power of those commands.
Small refactoring of planner(), pg_plan_query(), and pg_plan_queries()
APIs to make it convenient to pass the planning options down from SPI.
This is the core-code portion of Pavel Stehule's patch for scrollable
cursor support in plpgsql; I'll review and apply the plpgsql changes
separately.
Tom Lane [Sun, 15 Apr 2007 20:09:28 +0000 (20:09 +0000)]
Avoid running build_index_pathkeys() in situations where there cannot
possibly be any useful pathkeys --- to wit, queries with neither any
join clauses nor any ORDER BY request. It's nearly free to check for
this case and it saves a useful fraction of the planning time for simple
queries.
Bruce Momjian [Fri, 13 Apr 2007 23:23:22 +0000 (23:23 +0000)]
Update TODO:
< o Consider reducing on-disk varlena length from four to two
< because a heap row cannot be more than 64k in length
> o Consider reducing on-disk varlena length from four bytes to
> two because a heap row cannot be more than 64k in length
Andrew Dunstan [Fri, 13 Apr 2007 18:50:01 +0000 (18:50 +0000)]
Enable building contrib/xml2 if configured using --with-libxml.
If this breaks things due to missing libxslt, then I'll have to
revert it, but let's see if it breaks the buildfarm.
Workarounds in case libxslt is missing include:
. don't configure with libxml, or
. don't build contrib modules from the contrib Makefile (use the individual module Makefiles instead), or
. change the xml2 Makefile
Neil Conway [Thu, 12 Apr 2007 22:39:21 +0000 (22:39 +0000)]
Minor fixes for the EXPLAIN reference page. Mention the fact that
EXPLAIN ANALYZE can sometimes be significantly slower than running
the same query normally, and make some minor markup improvements.
Tom Lane [Thu, 12 Apr 2007 17:10:55 +0000 (17:10 +0000)]
Rearrange mdsync() looping logic to avoid the problem that a sufficiently
fast flow of new fsync requests can prevent mdsync() from ever completing.
This was an unforeseen consequence of a patch added in Mar 2006 to prevent
the fsync request queue from overflowing. Problem identified by Heikki
Linnakangas and independently by ITAGAKI Takahiro; fix based on ideas from
Takahiro-san, Heikki, and Tom.
Back-patch as far as 8.1 because a previous back-patch introduced the problem
into 8.1 ...
Tom Lane [Thu, 12 Apr 2007 15:04:35 +0000 (15:04 +0000)]
Cancel pending fsync requests during WAL replay of DROP DATABASE, per bug
report from David Darville. Back-patch as far as 8.1, which may or may not
have the problem but it seems a safe change anyway.
Neil Conway [Thu, 12 Apr 2007 06:53:49 +0000 (06:53 +0000)]
RESET SESSION, plus related new DDL commands. Patch from Marko Kreen,
reviewed by Neil Conway. This patch adds the following DDL command
variants: RESET SESSION, RESET TEMP, RESET PLANS, CLOSE ALL, and
DEALLOCATE ALL. RESET SESSION is intended for use by connection
pool software and the like, in order to reset a client session
to something close to its initial state.
Note that while most of these command variants can be executed
inside a transaction block (but are not transaction-aware!),
RESET SESSION cannot. While this is inconsistent, it is intended
to catch programmer mistakes: RESET SESSION in an open transaction
block is probably unintended.
Tom Lane [Wed, 11 Apr 2007 20:47:38 +0000 (20:47 +0000)]
Code review for btree page split WAL reduction patch. Make it actually work
(original code *always* created a full-page image for the left page, thus
leaving the intended savings unrealized), avoid risk of not having enough room
on the page during xlog restore, squeeze out another couple bytes in the xlog
record, clean up neglected comments.