Tom Lane [Tue, 2 Jun 2009 17:37:55 +0000 (17:37 +0000)]
Remove the old advice to keep from_collapse_limit less than geqo_threshold,
instead just pointing out that a larger value may trigger use of GEQO.
Per Robert Haas.
In passing, do a bit of wordsmithing on the Genetic Query Optimizer section.
Only recycle normal files in pg_xlog as WAL segments. pg_standby creates
symbolic links with the -l option, and as Fujii Masao pointed out we ended up
overwriting files in the archive directory before this patch. Patch by
Aidan Van Dyk, Fujii Masao and me.
Backpatch to 8.3, where pg_standby was introduced.
Joe Conway [Tue, 2 Jun 2009 03:21:56 +0000 (03:21 +0000)]
Fix dblink_get_result() as reported by Oleksiy Shchukin. Refactor a bit
while we're at it per request by Tom Lane. Specifically, don't try to
perform dblink_send_query() via dblink_record_internal() -- it was
inappropriate and ugly.
Tom Lane [Mon, 1 Jun 2009 23:55:15 +0000 (23:55 +0000)]
Change AdjustIntervalForTypmod to not discard higher-order field values on the
grounds that they don't fit into the specified interval qualifier (typmod).
This behavior, while of long standing, is clearly wrong per spec --- for
example the value INTERVAL '999' SECOND means 999 seconds and should not be
reduced to less than 60 seconds.
In some cases there could be grounds to raise an error if higher-order field
values are not given as zero; for example '1 year 1 month'::INTERVAL MONTH
should arguably be taken as an error rather than equivalent to 13 months.
However our internal representation doesn't allow us to do that in a fashion
that would consistently reject all and only the cases that a strict reading
of the spec would suggest. Also, seeing that for example INTERVAL '13' MONTH
will print out as '1 year 1 mon', we have to be careful not to create a
situation where valid data will fail to dump and reload. The present patch
therefore takes the attitude of not throwing an error in any such case.
We might want to revisit that in future but it would take more redesign
than seems prudent in late beta.
Per a complaint from Sebastien Flaesch and subsequent discussion. While
at other times we might have just postponed such an issue to the next
development cycle, 8.4 already has changed the parsing of interval literals
quite a bit in an effort to accept all spec-compliant cases correctly.
This seems like a change that should be part of that rather than coming
along later.
Tom Lane [Mon, 1 Jun 2009 16:55:11 +0000 (16:55 +0000)]
Fix DecodeInterval to report an error for multiple occurrences of DAY, WEEK,
YEAR, DECADE, CENTURY, or MILLENIUM fields, just as it always has done for
other types of fields. The previous behavior seems to have been a hack to
avoid defining bit-positions for all these field types in DTK_M() masks,
rather than something that was really considered to be desired behavior.
But there is room in the masks for these, and we really need to tighten up
at least the behavior of DAY and YEAR fields to avoid unexpected behavior
associated with the 8.4 changes to interpret ambiguous fields based on the
interval qualifier (typmod) value. Per my example and proposed patch.
Tom Lane [Sun, 31 May 2009 20:55:37 +0000 (20:55 +0000)]
Update obsolete comment in index_drop(). When the comment was written,
queries frequently took no lock at all on individual indexes. That's not
true any more, but we still need lock on the parent table to make it safe
to use cached lists of index OIDs.
Tom Lane [Wed, 27 May 2009 22:12:53 +0000 (22:12 +0000)]
Improve release note explanation of the change in libpq's handling of
default usernames versus Kerberos tickets. Per confusion about what
bug #4824 was really about.
Magnus Hagander [Wed, 27 May 2009 21:08:22 +0000 (21:08 +0000)]
Properly return the usermap result when doing gssapi authentication. Without
this, the username was in practice never matched against the kerberos principal
used to log in.
Tom Lane [Wed, 27 May 2009 20:42:29 +0000 (20:42 +0000)]
Ignore RECHECK in CREATE OPERATOR CLASS, just throwing a NOTICE, instead of
throwing an error as 8.4 had been doing. The error interfered with porting
old database definitions (particularly for pg_migrator) without really buying
any safety. Per bug #4817 and subsequent discussion.
Tom Lane [Wed, 27 May 2009 01:18:06 +0000 (01:18 +0000)]
Improve documentation about function volatility: mention the snapshot
visibility effects in a couple of places where people are likely to look
for it. Per discussion of recent question from Karl Nack.
Tom Lane [Tue, 26 May 2009 17:36:05 +0000 (17:36 +0000)]
Allow the second argument of pg_get_expr() to be just zero when deparsing
an expression that's not supposed to contain variables. Per discussion
with Gevik Babakhani, this eliminates the need for an ugly kluge (namely,
specifying some unrelated relation name). Remove one such kluge from
pg_dump.
Tom Lane [Tue, 26 May 2009 02:17:50 +0000 (02:17 +0000)]
Remove the useless and rather inconsistent return values of EncodeDateOnly,
EncodeTimeOnly, EncodeDateTime, EncodeInterval. These don't have any good
reason to fail, and their callers were mostly not checking anyway.
Tom Lane [Tue, 26 May 2009 01:29:09 +0000 (01:29 +0000)]
Add range checks to time_recv() and timetz_recv(), to prevent binary input
of time values that would not be accepted via textual input.
Per gripe from Andrew McNamara.
This is potentially a back-patchable bug fix, but for the moment it doesn't
seem sufficiently high impact to justify doing that.
Tom Lane [Sun, 24 May 2009 18:10:38 +0000 (18:10 +0000)]
Fix LIKE's special-case code for % followed by _. I'm not entirely sure that
this case is worth a special code path, but a special code path that gets
the boundary condition wrong is definitely no good. Per bug #4821 from
Andrew Gierth.
In passing, clean up some minor code formatting issues (excess parentheses
and blank lines in odd places).
Teodor Sigaev [Thu, 21 May 2009 20:09:36 +0000 (20:09 +0000)]
Resort tsvector's lexemes in tsvectorrecv instead of emmiting an error.
Basically, it's needed to support binary dump from 8.3 because ordering rule
was changed.
Update relpages and reltuples estimates in stand-alone ANALYZE, even if
there's no analyzable attributes or indexes. We also used to report 0 live
and dead tuples for such tables, which messed with autovacuum threshold
calculations.
This fixes bug #4812 reported by George Su. Backpatch back to 8.1.
Peter Eisentraut [Mon, 18 May 2009 08:59:29 +0000 (08:59 +0000)]
Some documentation cleanup for the addition of the KOI8U encoding. Change
all (remaining) mentions of KOI8 to the new canonical form KOI8R. Add
information about the available conversions for KOI8U.
Tom Lane [Fri, 15 May 2009 15:56:39 +0000 (15:56 +0000)]
Fix all the server-side SIGQUIT handlers (grumble ... why so many identical
copies?) to ensure they really don't run proc_exit/shmem_exit callbacks,
as was intended. I broke this behavior recently by installing atexit
callbacks without thinking about the one case where we truly don't want
to run those callback functions. Noted in an example from Dave Page.
Add recovery_end_command option to recovery.conf. recovery_end_command
is run at the end of archive recovery, providing a chance to do external
cleanup. Modify pg_standby so that it no longer removes the trigger file,
that is to be done using the recovery_end_command now.
Provide a "smart" failover mode in pg_standby, where we don't fail over
immediately, but only after recovering all unapplied WAL from the archive.
That gives you zero data loss assuming all WAL was archived before
failover, which is what most users of pg_standby actually want.
recovery_end_command by Simon Riggs, pg_standby changes by Fujii Masao and
myself.
Tom Lane [Wed, 13 May 2009 22:32:55 +0000 (22:32 +0000)]
Add checks to DefineQueryRewrite() to prohibit attaching rules to relations
that aren't RELKIND_RELATION or RELKIND_VIEW, and to disallow attaching rules
to system relations unless allowSystemTableMods is on. This is to make the
behavior of CREATE RULE more like CREATE TRIGGER, which disallows the
comparable cases. Per discussion of bug #4808.
Tom Lane [Wed, 13 May 2009 20:27:17 +0000 (20:27 +0000)]
Rewrite xml.c's memory management (yet again). Give up on the idea of
redirecting libxml's allocations into a Postgres context. Instead, just let
it use malloc directly, and add PG_TRY blocks as needed to be sure we release
libxml data structures in error recovery code paths. This is ugly but seems
much more likely to play nicely with third-party uses of libxml, as seen in
recent trouble reports about using Perl XML facilities in pl/perl and bug
#4774 about contrib/xml2.
I left the code for allocation redirection in place, but it's only
built/used if you #define USE_LIBXMLCONTEXT. This is because I found it
useful to corral libxml's allocations in a palloc context when hunting
for libxml memory leaks, and we're surely going to have more of those
in the future with this type of approach. But we don't want it turned on
in a normal build because it breaks exactly what we need to fix.
I have not re-indented most of the code sections that are now wrapped
by PG_TRY(); that's for ease of review. pg_indent will fix it.
This is a pre-existing bug in 8.3, but I don't dare back-patch this change
until it's gotten a reasonable amount of field testing.
Tom Lane [Tue, 12 May 2009 20:17:40 +0000 (20:17 +0000)]
Fix intratransaction memory leaks in xml_recv, xmlconcat, xmlroot, and
xml_parse, all arising from the same sloppy usage of parse_xml_decl.
The original coding had that function returning its output string
parameters in the libxml context, which is long-lived, and all but one
of its callers neglected to free the strings afterwards. The easiest
and most bulletproof fix is to return the strings in the local palloc
context instead, since that's short-lived. This was only costing a
dozen or two bytes per function call, but that adds up fast if the
function is called repeatedly ...
Noted while poking at the more general problem of what to do with our
libxml memory allocation hooks. Back-patch to 8.3, which has the
identical coding.
Tom Lane [Tue, 12 May 2009 16:43:32 +0000 (16:43 +0000)]
Fix LOCK TABLE to eliminate the race condition that could make it give weird
errors when tables are concurrently dropped. To do this we must take lock
on each relation before we check its privileges. The old code was trying
to do that the other way around, which is a bit pointless when there are lots
of other commands that lock relations before checking privileges. I did keep
it checking each relation's privilege before locking the next relation, which
is a detail that ALTER TABLE isn't too picky about.
Tom Lane [Tue, 12 May 2009 03:11:02 +0000 (03:11 +0000)]
Modify find_inheritance_children() and find_all_inheritors() to add the
ability to lock relations as they scan pg_inherits, and to ignore any
relations that have disappeared by the time we get lock on them. This
makes uses of these functions safe against concurrent DROP operations
on child tables: we will effectively ignore any just-dropped child,
rather than possibly throwing an error as in recent bug report from
Thomas Johansson (and similar past complaints). The behavior should
not change otherwise, since the code was acquiring those same locks
anyway, just a little bit later.
An exception is LockTableCommand(), which is still behaving unsafely;
but that seems to require some more discussion before we change it.
Tom Lane [Tue, 12 May 2009 00:56:05 +0000 (00:56 +0000)]
Do some minor code refactoring in preparation for changing the APIs of
find_inheritance_children() and find_all_inheritors(). I got annoyed that
these are buried inside the planner but mostly used elsewhere. So, create
a new file catalog/pg_inherits.c and put them there, along with a couple
of other functions that search pg_inherits.
The code that modifies pg_inherits is (still) in tablecmds.c --- it's
kind of entangled with unrelated code that modifies pg_depend and other
stuff, so pulling it out seemed like a bigger change than I wanted to make
right now. But this file provides a natural home for it if anyone ever
gets around to that.
This commit just moves code around; it doesn't change anything, except
I succumbed to the temptation to make a couple of trivial optimizations
in typeInheritsFrom().
Tom Lane [Mon, 11 May 2009 17:56:08 +0000 (17:56 +0000)]
Partially revert my patch of 2008-11-12 that installed a limit on the number
of AND/OR clause branches that predtest.c would attempt to deal with. As
noted in bug #4721, that change disabled proof attempts for sizes of problems
that people are actually expecting it to work for. The original complaint
it was trying to solve was O(N^2) behavior for long IN-lists, so let's try
applying the limit to just ScalarArrayOpExprs rather than everything.
Another case of "foolish consistency" I fear.
Back-patch to 8.2, same as the previous patch was.
Tom Lane [Sun, 10 May 2009 22:45:28 +0000 (22:45 +0000)]
Make a marginal performance improvement in predicate_implied_by and
predicate_refuted_by: if either top-level input is a single-element list,
reduce it to its lone member before proceeding. This avoids
a useless level of AND-recursion within the recursive proof routines.
It's worth doing because, for example, if the clause is a 100-element
list and the predicate is a 1-element list then we'd otherwise strip
the predicate's list structure 100 times as we iterate through the clause.
It's only needed at top level because there won't be any trivial ANDs below
that --- this situation is an artifact of the decision to represent even
single-item conditions as Lists in the "implicit AND" format, and that format
is only used at the top level of any predicate or restriction condition.
Tom Lane [Sun, 10 May 2009 02:51:44 +0000 (02:51 +0000)]
Adjust pg_dumpall so that it emits ENCODING, LC_COLLATE, and LC_CTYPE options
in its CREATE DATABASE commands only for databases that have settings
different from the installation defaults. This is a low-tech method of
avoiding unnecessary platform dependencies in dump files. Eventually we ought
to have a platform-independent way of specifying LC_COLLATE and LC_CTYPE, but
that's not going to happen for 8.4, and this patch at least avoids the issue
for people who aren't setting up per-database locales. ENCODING doesn't have
the platform dependency problem, but it seems consistent to make it act the
same as the locale settings.
Tom Lane [Sat, 9 May 2009 22:51:41 +0000 (22:51 +0000)]
Fix cost_nestloop and cost_hashjoin to model the behavior of semi and anti
joins a bit better, ie, understand the differing cost functions for matched
and unmatched outer tuples. There is more that could be done in cost_hashjoin
but this already helps a great deal. Per discussions with Robert Haas.
Add alternative expected output files for cs_CZ locale for btree_gist and
tsearch2 tests. This should make 'comet_moth' buildfarm member pass
contrib check. Zdenek Kotala.
Tom Lane [Thu, 7 May 2009 22:58:28 +0000 (22:58 +0000)]
Add an option to AlterTableCreateToastTable() to allow its caller to force
a toast table to be built, even if the sum-of-column-widths calculation
indicates one isn't needed. This is needed by pg_migrator because if the
old table has a toast table, we have to migrate over the toast table since
it might contain some live data, even though subsequent column drops could
mean that no recently-added rows could require toasting.
Tom Lane [Thu, 7 May 2009 22:01:18 +0000 (22:01 +0000)]
Change pgbench to use the table names pgbench_accounts, pgbench_branches,
pgbench_history, and pgbench_tellers, rather than just accounts, branches,
history, and tellers. This is to prevent accidental conflicts with real
application tables, as has been reported to happen at least once. Also
remove the automatic "SET search_path = public" that it did at startup,
as this seems to restrict testing flexibility without actually buying much.
Per proposal by Joshua Drake and ensuing discussion.
Tom Lane [Thu, 7 May 2009 20:13:09 +0000 (20:13 +0000)]
Ooops ... make_outerjoininfo wasn't actually enforcing the join order
restrictions specified for semijoins in optimizer/README, to wit that
you can't reassociate outer joins into or out of the RHS of a semijoin.
Per report from Heikki.
Request XLOG switch before writing checkpoint in pg_start_backup(). Otherwise
you can end up with an unrecoverable backup if you start a new base backup
right after finishing archive recovery. In that scenario, the redo pointer of
the checkpoint that pg_start_backup() writes points to the XLOG segment where
the timeline-changing end-of-archive-recovery checkpoint is. The beginning
of that segment contains pages with the old timeline ID, and we don't accept
that in recovery unless we find a history file covering the old timeline ID.
If you omit pg_xlog from the base backup and clear the archive directory
before starting the backup, there will be no such history file available.
The bug is present in all versions since PITR was introduced in 8.0, but I'm
back-patching only back to 8.2. Earlier versions didn't have XLOG switch
records, making this fix unfeasible. Given the lack of reports until now,
it doesn't seem worthwhile to spend more effort to fix 8.0 and 8.1.
Tom Lane [Wed, 6 May 2009 20:31:18 +0000 (20:31 +0000)]
Tweak distribute_qual_to_rels so that when we decide a pseudoconstant qual
can be pushed to the top of the join tree, we update both the relids and
qualscope variables to keep them in sync. This prevents a possible later
failure of an Assert clause, and affects nothing else since qualscope isn't
used later except for that Assert. At the moment the Assert shouldn't be
reachable when we've pushed the qual up; but this is cheap insurance, and
it's more sensible anyway in terms of the overall logic of the routine.
Per analysis of a bug report from Stefan Huehner.
I'm not back-patching this since it's just future-proofing; but if anyone
gets tempted to change check_outerjoin_delay again in the back branches,
this might be needed.
Tom Lane [Wed, 6 May 2009 16:15:21 +0000 (16:15 +0000)]
Modify CREATE DATABASE to enforce that the source database's encoding setting
must be used for the new database, except when copying from template0.
This is the same rule that we now enforce for locale settings, and it has
the same motivation: databases other than template0 might contain data that
would be invalid according to a different setting. This represents another
step in a continuing process of locking down ways in which encoding violations
could occur inside the backend. Per discussion of a few days ago.
In passing, fix pre-existing breakage of mbregress.sh, and fix up a couple
of ereport() calls in dbcommands.c that failed to specify sqlstate codes.
Tom Lane [Tue, 5 May 2009 20:06:07 +0000 (20:06 +0000)]
Install an atexit(2) callback that ensures that proc_exit's cleanup processing
will still be performed if something in a backend process calls exit()
directly, instead of going through proc_exit() as we prefer. This is a second
response to the issue that we might load third-party code that doesn't know it
should not call exit(). Such a call will now cause a reasonably graceful
backend shutdown, if possible. (Of course, if the reason for the exit() call
is out-of-memory or some such, we might not be able to recover, but at least
we will try.)
Tom Lane [Tue, 5 May 2009 19:59:00 +0000 (19:59 +0000)]
Install a "dead man switch" to allow the postmaster to detect cases where
a backend has done exit(0) or exit(1) without having disengaged itself
from shared memory. We are at risk for this whenever third-party code is
loaded into a backend, since such code might not know it's supposed to go
through proc_exit() instead. Also, it is reported that under Windows
there are ways to externally kill a process that cause the status code
returned to the postmaster to be indistinguishable from a voluntary exit
(thank you, Microsoft). If this does happen then the system is probably
hosed --- for instance, the dead session might still be holding locks.
So the best recovery method is to treat this like a backend crash.
The dead man switch is armed for a particular child process when it
acquires a regular PGPROC, and disarmed when the PGPROC is released;
these should be the first and last touches of shared memory resources
in a backend, or close enough anyway. This choice means there is no
coverage for auxiliary processes, but I doubt we need that, since they
shouldn't be executing any user-provided code anyway.
This patch also improves the management of the EXEC_BACKEND
ShmemBackendArray array a bit, by reducing search costs.
Although this problem is of long standing, the lack of field complaints
seems to mean it's not critical enough to risk back-patching; at least
not till we get some more testing of this mechanism.
Tom Lane [Tue, 5 May 2009 19:36:32 +0000 (19:36 +0000)]
Insert CHECK_FOR_INTERRUPTS() calls into btree and hash index scans at the
points where we step right or left to the next page. This should ensure
reasonable response time to a query cancel request during an unsuccessful
index scan, as seen in recent gripe from Marc Cousin. It's a bit trickier
than it might seem at first glance, because CHECK_FOR_INTERRUPTS() is a no-op
if executed while holding a buffer lock. So we have to do it just at the
point where we've dropped one page lock and not yet acquired the next.
Remove CHECK_FOR_INTERRUPTS calls at the top level of btgetbitmap and
hashgetbitmap, since they're pointless given the added checks.
I think that GIST is okay already --- at least, there's a CHECK_FOR_INTERRUPTS
at a plausible-looking place in gistnext(). I don't claim to know GIN well
enough to try to poke it for this, if indeed it has a problem at all.
This is a pre-existing issue, but in view of the lack of prior complaints
I'm not going to risk back-patching.
Tom Lane [Tue, 5 May 2009 18:02:11 +0000 (18:02 +0000)]
Avoid integer overflow in the loop that extracts histogram entries from
ANALYZE's total sample. The original coding is at risk of overflow for
statistics targets exceeding about 2675; this was not a problem before
8.4 but it is now. Per bug #4793 from Dennis Noordsij.
Magnus Hagander [Tue, 5 May 2009 09:48:51 +0000 (09:48 +0000)]
Make the win32 shared memory code try 10 times instead of one if
it fails because the shared memory segment already exists. This
means it can take up to 10 seconds before it reports the error
if it *does* exist, but hopefully it will make the system capable
of restarting even when the server is under high load.
Magnus Hagander [Mon, 4 May 2009 08:36:40 +0000 (08:36 +0000)]
Call SetLastError(0) before calling the file mapping functions
to make sure that the error code is reset, as a precaution in
case the API doesn't properly reset it on success. This could
be necessary, since we check the error value even if the function
doesn't fail for specific success cases.
Alvaro Herrera [Mon, 4 May 2009 02:24:17 +0000 (02:24 +0000)]
Avoid a memory allocation in the backend startup code, to avoid having to check
whether it failed. Modelled after catcache.c's usage of DlList, per suggestion
from Tom.