Install recycled WAL segments with current timeline ID during recovery.
This is a follow-up to the earlier fix, which changed the recycling logic
to recycle WAL segments under the current recovery target timeline. That
turns out to be a bad idea, because installing a recycled segment with
a TLI higher than what we're recovering at the moment means that the recovery
logic will find the recycled WAL segment and try to replay it. It will fail,
but but the mere presence of such a WAL segment will mask any other, real,
file with the same log/seg, but smaller TLI.
Per report from Mitsumasa Kondo. Apply to 9.1 and 9.2, like the previous
fix. Master was already doing this differently; this patch makes 9.1 and
9.2 to do the same thing as master.
Kevin Grittner [Mon, 29 Apr 2013 18:06:28 +0000 (13:06 -0500)]
Ensure ANALYZE phase is not skipped because of canceled truncate.
Patch b19e4250b45e91c9cbdd18d35ea6391ab5961c8d attempted to
preserve existing behavior regarding statistics generation in the
case that a truncation attempt was canceled due to lock conflicts.
It failed to do this accurately in two regards: (1) autovacuum had
previously generated statistics if the truncate attempt failed to
initially get the lock rather than having started the attempt, and
(2) the VACUUM ANALYZE command had always generated statistics.
Both of these changes were unintended, and are reverted by this
patch. On review, there seems to be consensus that the previous
failure to generate statistics when the truncate was terminated
was more an unfortunate consequence of how that effort was
previously terminated than a feature we want to keep; so this
patch generates statistics even when an autovacuum truncation
attempt terminates early. Another unintended change which is kept
on the basis that it is an improvement is that when a VACUUM
command is truncating, it will the new heuristic for avoiding
blocking other processes, rather than keeping an
AccessExclusiveLock on the table for however long the truncation
takes.
Per multiple reports, with some renaming per patch by Jeff Janes.
Joe Conway [Fri, 26 Apr 2013 18:54:59 +0000 (11:54 -0700)]
Ensure that user created rows in extension tables get dumped if the table is explicitly requested, either with a -t/--table switch of the table itself, or by -n/--schema switch of the schema containing the extension table. Patch reviewed by Vibhor Kumar and Dimitri Fontaine.
Backpatched to 9.1 when the extension management facility was added.
Tom Lane [Thu, 25 Apr 2013 20:58:14 +0000 (16:58 -0400)]
Avoid deadlock between concurrent CREATE INDEX CONCURRENTLY commands.
There was a high probability of two or more concurrent C.I.C. commands
deadlocking just before completion, because each would wait for the others
to release their reference snapshots. Fix by releasing the snapshot
before waiting for other snapshots to go away.
Per report from Paul Hinze. Back-patch to all active branches.
Tom Lane [Sat, 20 Apr 2013 20:59:31 +0000 (16:59 -0400)]
Fix longstanding race condition in plancache.c.
When creating or manipulating a cached plan for a transaction control
command (particularly ROLLBACK), we must not perform any catalog accesses,
since we might be in an aborted transaction. However, plancache.c busily
saved or examined the search_path for every cached plan. If we were
unlucky enough to do this at a moment where the path's expansion into
schema OIDs wasn't already cached, we'd do some catalog accesses; and with
some more bad luck such as an ill-timed signal arrival, that could lead to
crashes or Assert failures, as exhibited in bug #8095 from Nachiket Vaidya.
Fortunately, there's no real need to consider the search path for such
commands, so we can just skip the relevant steps when the subject statement
is a TransactionStmt. This is somewhat related to bug #5269, though the
failure happens during initial cached-plan creation rather than
revalidation.
This bug has been there since the plan cache was invented, so back-patch
to all supported branches.
Calculate # of semaphores correctly with --disable-spinlocks.
The old formula didn't take into account that each WAL sender process needs
a spinlock. We had also already exceeded the fixed number of spinlocks
reserved for misc purposes (10). Bump that to 30.
Backpatch to 9.0, where WAL senders were introduced. If I counted correctly,
9.0 had exactly 10 predefined spinlocks, and 9.1 exceeded that, but bump the
limit in 9.0 too because 10 is uncomfortably close to the edge.
Tom Lane [Wed, 3 Apr 2013 01:15:50 +0000 (21:15 -0400)]
Minor robustness improvements for isolationtester.
Notice and complain about PQcancel() failures. Also, don't dump core if
an error PGresult doesn't contain severity and message subfields, as it
might not if it was generated by libpq itself. (We have a longstanding
TODO item to improve that, but in the meantime isolationtester had better
cope.)
I tripped across the latter item while investigating a trouble report on
buildfarm member spoonbill. As for the former, there's no evidence that
PQcancel failure is actually involved in spoonbill's problem, but it still
seems like a bad idea to ignore an error return code.
Tom Lane [Mon, 1 Apr 2013 18:01:04 +0000 (14:01 -0400)]
Fix insecure parsing of server command-line switches.
An oversight in commit e710b65c1c56ca7b91f662c63d37ff2e72862a94 allowed
database names beginning with "-" to be treated as though they were secure
command-line switches; and this switch processing occurs before client
authentication, so that even an unprivileged remote attacker could exploit
the bug, needing only connectivity to the postmaster's port. Assorted
exploits for this are possible, some requiring a valid database login,
some not. The worst known problem is that the "-r" switch can be invoked
to redirect the process's stderr output, so that subsequent error messages
will be appended to any file the server can write. This can for example be
used to corrupt the server's configuration files, so that it will fail when
next restarted. Complete destruction of database tables is also possible.
Fix by keeping the database name extracted from a startup packet fully
separate from command-line switches, as had already been done with the
user name field.
The Postgres project thanks Mitsumasa Kondo for discovering this bug,
Kyotaro Horiguchi for drafting the fix, and Noah Misch for recognizing
the full extent of the danger.
Tom Lane [Mon, 1 Apr 2013 17:09:35 +0000 (13:09 -0400)]
Make REPLICATION privilege checks test current user not authenticated user.
The pg_start_backup() and pg_stop_backup() functions checked the privileges
of the initially-authenticated user rather than the current user, which is
wrong. For example, a user-defined index function could successfully call
these functions when executed by ANALYZE within autovacuum. This could
allow an attacker with valid but low-privilege database access to interfere
with creation of routine backups. Reported and fixed by Noah Misch.
Tom Lane [Sun, 31 Mar 2013 22:33:07 +0000 (18:33 -0400)]
Ignore extra subquery outputs in set_subquery_size_estimates().
In commit 0f61d4dd1b4f95832dcd81c9688dac56fd6b5687, I added code to copy up
column width estimates for each column of a subquery. That code supposed
that the subquery couldn't have any output columns that didn't correspond
to known columns of the current query level --- which is true when a query
is parsed from scratch, but the assumption fails when planning a view that
depends on another view that's been redefined (adding output columns) since
the upper view was made. This results in an assertion failure or even a
crash, as per bug #8025 from lindebg. Remove the Assert and instead skip
the column if its resno is out of the expected range.
Bruce Momjian [Sun, 31 Mar 2013 02:20:53 +0000 (22:20 -0400)]
pg_upgrade: don't copy/link files for invalid indexes
Now that pg_dump no longer dumps invalid indexes, per commit 683abc73dff549e94555d4020dae8d02f32ed78b, have pg_upgrade also skip
them. Previously pg_upgrade threw an error if invalid indexes existed.
Backpatch to 9.2, 9.1, and 9.0 (where pg_upgrade was added to git)
I changed this in commit fd15dba543247eb1ce879d22632b9fdb4c230831, but
missed the fact that the SGML documentation of the function specified
exactly what it did. Well, one of the two places where it's specified
documented that --- probably I looked at the other place and thought
nothing needed to be done. Sync the two places where encode() and
decode() are described.
Tom Lane [Wed, 27 Mar 2013 22:50:29 +0000 (18:50 -0400)]
Reset OpenSSL randomness state in each postmaster child process.
Previously, if the postmaster initialized OpenSSL's PRNG (which it will do
when ssl=on in postgresql.conf), the same pseudo-random state would be
inherited by each forked child process. The problem is masked to a
considerable extent if the incoming connection uses SSL encryption, but
when it does not, identical pseudo-random state is made available to
functions like contrib/pgcrypto. The process's PID does get mixed into any
requested random output, but on most systems that still only results in 32K
or so distinct random sequences available across all Postgres sessions.
This might allow an attacker who has database access to guess the results
of "secure" operations happening in another session.
To fix, forcibly reset the PRNG after fork(). Each child process that has
need for random numbers from OpenSSL's generator will thereby be forced to
go through OpenSSL's normal initialization sequence, which should provide
much greater variability of the sequences. There are other ways we might
do this that would be slightly cheaper, but this approach seems the most
future-proof against SSL-related code changes.
This has been assigned CVE-2013-1900, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
In a heap update, if the old and new tuple were on different pages, and the
new page no longer existed (because it was subsequently truncated away by
vacuum), heap_xlog_update forgot to release the pin on the old buffer. This
bug was introduced by the "Fix multiple problems in WAL replay" patch,
commit 3bbf668de9f1bc172371681e80a4e769b6d014c8 (on master branch).
With full_page_writes=off, this triggered an "incorrect local pin count"
error later in replay, if the old page was vacuumed.
This fixes bug #7969, reported by Yunong Xiao. Backpatch to 9.0, like the
commit that introduced this bug.
Tom Lane [Tue, 26 Mar 2013 21:43:26 +0000 (17:43 -0400)]
Ignore invalid indexes in pg_dump.
Dumping invalid indexes can cause problems at restore time, for example
if the reason the index creation failed was because it tried to enforce
a uniqueness condition not satisfied by the table's data. Also, if the
index creation is in fact still in progress, it seems reasonable to
consider it to be an uncommitted DDL change, which pg_dump wouldn't be
expected to dump anyway.
Back-patch to all active versions, and teach them to ignore invalid
indexes in servers back to 8.2, where the concept was introduced.
In base backup, only include our own tablespace version directory.
If you have clusters of different versions pointing to the same tablespace
location, we would incorrectly include all the data belonging to the other
versions, too.
Add a server version check to pg_basebackup and pg_receivexlog.
These programs don't work against 9.0 or earlier servers, so check that when
the connection is made. That's better than a cryptic error message you got
before.
Also, these programs won't work with a 9.3 server, because the WAL streaming
protocol was changed in a non-backwards-compatible way. As a general rule,
we don't make any guarantee that an old client will work with a new server,
so check that. However, allow a 9.1 client to connect to a 9.2 server, to
avoid breaking environments that currently work; a 9.1 client happens to
work with a 9.2 server, even though we didn't make any great effort to
ensure that.
This patch is for the 9.1 and 9.2 branches, I'll commit a similar patch to
master later. Although this isn't a critical bug fix, it seems safe enough
to back-patch. The error message you got when connecting to a 9.3devel
server without this patch was cryptic enough to warrant backpatching.
Tom Lane [Sat, 23 Mar 2013 23:16:46 +0000 (19:16 -0400)]
Update time zone abbreviation lists for changes missed since 2006.
Most (all?) of Russia has moved to what's effectively year-round daylight
savings time, so that the "standard" zone names now mean an hour later
than they used to. Update that, notably changing MSK as per recent
complaint from Sergey Konoplev, but also CHOT, GET, IRKT, KGT, KRAT,
MAGT, NOVT, OMST, VLAT, YAKT, YEKT. The corresponding DST abbreviations
are presumably now obsolete, but I left them in place with their old
definitions, just to reduce any possible breakage from this change.
Also add VOLT (Europe/Volgograd), which for some reason we never had
before, as well as MIST (Antarctica/Macquarie), and fix obsolete
definitions of MAWT, TKT, and WST.
Tom Lane [Sat, 23 Mar 2013 18:06:40 +0000 (14:06 -0400)]
Don't put <indexterm> before <term> in <varlistentry> items.
Doing that results in a broken index entry in PDF output. We had only
a few like that, which is probably why nobody noticed before.
Standardize on putting the <term> first.
Tom Lane [Mon, 18 Mar 2013 17:34:27 +0000 (13:34 -0400)]
Improve documentation of EXTRACT(WEEK).
The docs showed that early-January dates can be considered part of the
previous year for week-counting purposes, but failed to say explicitly
that late-December dates can also be considered part of the next year.
Fix that, and add a cross-reference to the "isoyear" field. Per bug
#7967 from Pawel Kobylak.
Tom Lane [Sun, 10 Mar 2013 23:18:49 +0000 (19:18 -0400)]
Fix race condition in DELETE RETURNING.
When RETURNING is specified, ExecDelete would return a virtual-tuple slot
that could contain pointers into an already-unpinned disk buffer. Another
process could change the buffer contents before we get around to using the
data, resulting in garbage results or even a crash. This seems of fairly
low probability, which may explain why there are no known field reports of
the problem, but it's definitely possible. Fix by forcing the result slot
to be "materialized" before we release pin on the disk buffer.
Back-patch to 9.0; in earlier branches there is no bug because
ExecProcessReturning sent the tuple to the destination immediately. Also,
this is already fixed in HEAD as part of the writable-foreign-tables patch
(where the fix is necessary for DELETE RETURNING to work at all with
postgres_fdw).
Tom Lane [Thu, 7 Mar 2013 16:51:13 +0000 (11:51 -0500)]
Fix infinite-loop risk in fixempties() stage of regex compilation.
The previous coding of this function could get into situations where it
would never terminate, because successive passes would re-add EMPTY arcs
that had been removed by the previous pass. Rewrite the function
completely using a new algorithm that is guaranteed to terminate, and
also seems to be usually faster than the old one. Per Tcl bugs 3604074
and 3606683.
Tom Lane [Tue, 5 Mar 2013 18:02:38 +0000 (13:02 -0500)]
Fix to_char() to use ASCII-only case-folding rules where appropriate.
formatting.c used locale-dependent case folding rules in some code paths
where the result isn't supposed to be locale-dependent, for example
to_char(timestamp, 'DAY'). Since the source data is always just ASCII
in these cases, that usually didn't matter ... but it does matter in
Turkish locales, which have unusual treatment of "i" and "I". To confuse
matters even more, the misbehavior was only visible in UTF8 encoding,
because in single-byte encodings we used pg_toupper/pg_tolower which
don't have locale-specific behavior for ASCII characters. Fix by providing
intentionally ASCII-only case-folding functions and using these where
appropriate. Per bug #7913 from Adnan Dursun. Back-patch to all active
branches, since it's been like this for a long time.
Tom Lane [Mon, 4 Mar 2013 20:13:31 +0000 (15:13 -0500)]
Fix overflow check in tm2timestamp (this time for sure).
I fixed this code back in commit 841b4a2d5, but didn't think carefully
enough about the behavior near zero, which meant it improperly rejected
1999-12-31 24:00:00. Per report from Magnus Hagander.
Tom Lane [Sat, 2 Mar 2013 02:33:42 +0000 (21:33 -0500)]
Eliminate memory leaks in plperl's spi_prepare() function.
Careless use of TopMemoryContext for I/O function data meant that repeated
use of spi_prepare and spi_freeplan would leak memory at the session level,
as per report from Christian Schröder. In addition, spi_prepare
leaked a lot of transient data within the current plperl function's SPI
Proc context, which would be a problem for repeated use of spi_prepare
within a single plperl function call; and it wasn't terribly careful
about releasing permanent allocations in event of an error, either.
In passing, clean up some copy-and-pasteos in query-lookup error messages.
Tom Lane [Wed, 27 Feb 2013 15:40:14 +0000 (10:40 -0500)]
Add missing error check in regexp parser.
parseqatom() failed to check for an error return (NULL result) from its
recursive call to parsebranch(), and in consequence could crash with a
null-pointer dereference after an error return. This bug has been there
since day one, but wasn't noticed before, probably because most error cases
in parsebranch() didn't actually lead to returning NULL. Add the missing
error check, and also tweak parsebranch() to exit in a less indirect
fashion after a call to parseqatom() fails.
If a database name contained a '=' character, pg_dumpall failed. The problem
was in the way pg_dumpall passes the database name to pg_dump on the
command line. If it contained a '=' character, pg_dump would interpret it
as a libpq connection string instead of a plain database name.
To fix, pass the database name to pg_dump as a connection string,
"dbname=foo", with the database name escaped if necessary.
Tom Lane [Wed, 13 Feb 2013 19:07:17 +0000 (14:07 -0500)]
Fix contrib/pg_trgm's similarity() function for trigram-free strings.
Cases such as similarity('', '') produced a NaN result due to computing
0/0. Per discussion, make it return zero instead.
This appears to be the basic cause of bug #7867 from Michele Baravalle,
although it remains unclear why her installation doesn't think Cyrillic
letters are letters.
Since a backend adds itself to the global listener array during
Exec_ListenPreCommit, it's inappropriate for it to remove itself during
Exec_UnlistenCommit or Exec_UnlistenAllCommit --- that leads to failure
when committing a transaction that did UNLISTEN then LISTEN, since we end
up not registered though we should be. (This leads to missing later
notifications, or to Assert failures in assert-enabled builds.) Instead
deal with deregistering at the bottom of AtCommit_Notify, when we know the
final state of the listenChannels list.
Also, simplify the representation of registration status by replacing the
transient backendHasExecutedInitialListen flag with an amRegisteredListener
flag.
Per report from Greg Sabino Mullane. Back-patch to 9.0, where the problem
was introduced during the LISTEN/NOTIFY rewrite.
Tom Lane [Sun, 10 Feb 2013 21:21:37 +0000 (16:21 -0500)]
Further cleanup of gistsplit.c.
After further reflection I was unconvinced that the existing coding is
guaranteed to return valid union datums in every code path for multi-column
indexes. Fix that by forcing a gistunionsubkey() call at the end of the
recursion. Having done that, we can remove some clearly-redundant calls
elsewhere. This should be a little faster for multi-column indexes (since
the previous coding would uselessly do such a call for each column while
unwinding the recursion), as well as much harder to break.
Also, simplify the handling of cases where one side or the other of a
primary split contains only don't-care tuples. The previous coding used a
very ugly hack in removeDontCares() that essentially forced one random
tuple to be treated as non-don't-care, providing a random initial choice of
seed datum for the secondary split. It seems unlikely that that method
will give better-than-random splits. Instead, treat such a split as
degenerate and just let the next column determine the split, the same way
that we handle fully degenerate cases where the two sides produce identical
union datums.
This LOG message was put in over five years ago with the evident
expectation that we'd make all GiST opclasses support secondary split
directly. However, no such thing ever happened, and indeed the number of
opclasses supporting it decreased to zero in 9.2. The reason is that
improving on the default implementation isn't that easy --- the
opclass-specific code that did exist, before 9.2, doesn't appear to have
been any improvement over the default.
Hence, remove the message altogether. There's certainly no point in
nagging users about this in released branches, but I doubt that we'll
ever implement complete opclass-specific support anyway.
Tom Lane [Sun, 10 Feb 2013 16:58:28 +0000 (11:58 -0500)]
Document and clean up gistsplit.c.
Improve comments, rename some variables and functions, slightly simplify
a couple of APIs, in an attempt to make this code readable by people other
than its original author.
Even though this is essentially just cosmetic, back-patch to all active
branches, because otherwise it's going to make back-patching future fixes
in this file very painful.
Tom Lane [Fri, 8 Feb 2013 23:03:28 +0000 (18:03 -0500)]
Fix gist_box_same and gist_point_consistent to handle fuzziness correctly.
While there's considerable doubt that we want fuzzy behavior in the
geometric operators at all (let alone as currently implemented), nobody is
stepping forward to redesign that stuff. In the meantime it behooves us
to make sure that index searches agree with the behavior of the underlying
operators. This patch fixes two problems in this area.
First, gist_box_same was using fuzzy equality, but it really needs to use
exact equality to prevent not-quite-identical upper index keys from being
treated as identical, which for example would prevent an existing upper
key from being extended by an amount less than epsilon. This would result
in inconsistent indexes. (The next release notes will need to recommend
that users reindex GiST indexes on boxes, polygons, circles, and points,
since all four opclasses use gist_box_same.)
Second, gist_point_consistent used exact comparisons for upper-page
comparisons in ~= searches, when it needs to use fuzzy comparisons to
ensure it finds all matches; and it used fuzzy comparisons for point <@ box
searches, when it needs to use exact comparisons because that's what the
<@ operator (rather inconsistently) does.
The added regression test cases illustrate all three misbehaviors.
Back-patch to all active branches. (8.4 did not have GiST point_ops,
but it still seems prudent to apply the gist_box_same patch to it.)
Tom Lane [Fri, 8 Feb 2013 00:14:13 +0000 (19:14 -0500)]
Make contrib/btree_gist's GiST penalty function a bit saner.
The previous coding supposed that the first differing bytes in two varlena
datums must have the same sign difference as their overall comparison
result. This is obviously bogus for text strings in non-C locales, and
probably wrong for numeric, and even for bytea I think it was wrong on
machines where char is signed. When the assumption failed, the function
could deliver a zero or negative penalty in situations where such a result
is quite ridiculous, leading the core GiST code to make very bad page-split
decisions.
To fix, take the absolute values of the byte-level differences. Also,
switch the code to using unsigned char not just char, so that the behavior
will be consistent whether char is signed or not.
Per investigation of a trouble report from Tomas Vondra. Back-patch to all
supported branches.
Tom Lane [Thu, 7 Feb 2013 23:22:32 +0000 (18:22 -0500)]
Fix erroneous range-union logic for varlena types in contrib/btree_gist.
gbt_var_bin_union() failed to do the right thing when the existing range
needed to be widened at both ends rather than just one end. This could
result in an invalid index in which keys that are present would not be
found by searches, because the searches would not think they need to
descend to the relevant leaf pages. This error affected all the varlena
datatypes supported by btree_gist (text, bytea, bit, numeric).
Per investigation of a trouble report from Tomas Vondra. (There is also
an issue in gbt_var_penalty(), but that should only result in inefficiency
not wrong answers. I'm committing this separately so that we have a git
state in which it can be tested that bad penalty results don't produce
invalid indexes.) Back-patch to all supported branches.
Tom Lane [Thu, 7 Feb 2013 22:44:16 +0000 (17:44 -0500)]
Repair bugs in GiST page splitting code for multi-column indexes.
When considering a non-last column in a multi-column GiST index,
gistsplit.c tries to improve on the split chosen by the opclass-specific
pickSplit function by considering penalties for the next column. However,
there were two bugs in this code: it failed to recompute the union keys for
the leftmost index columns, even though these might well change after
reassigning tuples; and it included the old union keys in the recomputation
for the columns it did recompute, so that those keys couldn't get smaller
even if they should. The first problem could result in an invalid index
in which searches wouldn't find index entries that are in fact present;
the second would make the index less efficient to search.
Both of these errors were caused by misuse of gistMakeUnionItVec, whose
API was designed in a way that just begged such errors to be made. There
is no situation in which it's safe or useful to compute the union keys for
a subset of the index columns, and there is no caller that wants any
previous union keys to be included in the computation; so the undocumented
choice to treat the union keys as in/out rather than pure output parameters
is a waste of code as well as being dangerous.
Hence, rather than just making a minimal patch, I've changed the API of
gistMakeUnionItVec to remove the "startkey" parameter (it now always
processes all index columns) and treat the attr/isnull arrays as purely
output parameters.
In passing, also get rid of a couple of unnecessary and dangerous uses
of static variables in gistutil.c. It's remarkable that the one in
gistMakeUnionKey hasn't given us portability troubles before now, because
in addition to posing a re-entrancy hazard, it was unsafely assuming that
a static char[] array would have at least Datum alignment.
Per investigation of a trouble report from Tomas Vondra. (There are also
some bugs in contrib/btree_gist to be fixed, but that seems like material
for a separate patch.) Back-patch to all supported branches.
Tom Lane [Thu, 7 Feb 2013 19:44:15 +0000 (14:44 -0500)]
Fix possible failure to send final transaction counts to stats collector.
Normally, we suppress sending a tabstats message to the collector unless
there were some actual table stats to send. However, during backend exit
we should force out the message if there are any transaction commit/abort
counts to send, else the session's last few commit/abort counts will never
get reported at all. We had logic for this, but the short-circuit test
at the top of pgstat_report_stat() ignored the "force" flag, with the
consequence that session-ending transactions that touched no database-local
tables would not get counted. Seems to be an oversight in my commit 641912b4d17fd214a5e5bae4e7bb9ddbc28b144b, which added the "force" flag.
That was back in 8.3, so back-patch to all supported versions.
Tom Lane [Mon, 4 Feb 2013 21:25:15 +0000 (16:25 -0500)]
Prevent execution of enum_recv() from SQL.
This function was misdeclared to take cstring when it should take internal.
This at least allows crashing the server, and in principle an attacker
might be able to use the function to examine the contents of server memory.
The correct fix is to adjust the system catalog contents (and fix the
regression tests that should have caught this but failed to). However,
asking users to correct the catalog contents in existing installations
is a pain, so as a band-aid fix for the back branches, install a check
in enum_recv() to make it throw error if called with a cstring argument.
We will later revert this in HEAD in favor of correcting the catalogs.
Our thanks to Sumit Soni (via Secunia SVCRP) for reporting this issue.
Alvaro Herrera [Fri, 1 Feb 2013 15:00:40 +0000 (12:00 -0300)]
Fix typo in freeze_table_age implementation
The original code used freeze_min_age instead of freeze_table_age. The
main consequence of this mistake is that lowering freeze_min_age would
cause full-table scans to occur much more frequently, which causes
serious issues because the number of writes required is much larger.
That feature (freeze_min_age) is supposed to affect only how soon tuples
are frozen; some pages should still be skipped due to the visibility
map.
Backpatch to 8.4, where the freeze_table_age feature was introduced.
Magnus Hagander [Thu, 31 Jan 2013 14:08:05 +0000 (15:08 +0100)]
Properly zero-pad the day-of-year part of the win32 build number
This ensure the version number increases over time. The first three digits
in the version number is still set to the actual PostgreSQL version
number, but the last one is intended to be an ever increasing build number,
which previosly failed when it changed between 1, 2 and 3 digits long values.
Tom Lane [Wed, 30 Jan 2013 19:16:34 +0000 (14:16 -0500)]
Fix grammar for subscripting or field selection from a sub-SELECT result.
Such cases should work, but the grammar failed to accept them because of
our ancient precedence hacks to convince bison that extra parentheses
around a sub-SELECT in an expression are unambiguous. (Formally, they
*are* ambiguous, but we don't especially care whether they're treated as
part of the sub-SELECT or part of the expression. Bison cares, though.)
Fix by adding a redundant-looking production for this case.
This is a fine example of why fixing shift/reduce conflicts via
precedence declarations is more dangerous than it looks: you can easily
cause the parser to reject cases that should work.
This has been wrong since commit 3db4056e22b0c6b2adc92543baf8408d2894fe91
or maybe before, and apparently some people have been working around it
by inserting no-op casts. That method introduces a dump/reload hazard,
as illustrated in bug #7838 from Jan Mate. Hence, back-patch to all
active branches.
Alvaro Herrera [Mon, 28 Jan 2013 20:46:47 +0000 (17:46 -0300)]
DROP OWNED: don't try to drop tablespaces/databases
My "fix" for bugs #7578 and #6116 on DROP OWNED at fe3b5eb08a1 not only
misstated that it applied to REASSIGN OWNED (which it did not affect),
but it also failed to fix the problems fully, because I didn't test the
case of owned shared objects. Thus I created a new bug, reported by
Thomas Kellerer as #7748, which would cause DROP OWNED to fail with a
not-for-user-consumption error message. The code would attempt to drop
the database, which not only fails to work because the underlying code
does not support that, but is a pretty dangerous and undesirable thing
to be doing as well.
This patch fixes that bug by having DROP OWNED only attempt to process
shared objects when grants on them are found, ignoring ownership.
Backpatch to 8.3, which is as far as the previous bug was backpatched.
Tom Lane [Fri, 25 Jan 2013 21:59:05 +0000 (16:59 -0500)]
Fix plpython's handling of functions used as triggers on multiple tables.
plpython tried to use a single cache entry for a trigger function, but it
needs a separate cache entry for each table the trigger is applied to,
because there is table-dependent data in there. This was done correctly
before 9.1, but commit 46211da1b84bc3537e799ee1126098e71c2428e8 broke it
by simplifying the lookup key from "function OID and triggered table OID"
to "function OID and is-trigger boolean". Go back to using both OIDs
as the lookup key. Per bug report from Sandro Santilli.
Tom Lane [Thu, 24 Jan 2013 23:34:08 +0000 (18:34 -0500)]
Fix SPI documentation for new handling of ExecutorRun's count parameter.
Since 9.0, the count parameter has only limited the number of tuples
actually returned by the executor. It doesn't affect the behavior of
INSERT/UPDATE/DELETE unless RETURNING is specified, because without
RETURNING, the ModifyTable plan node doesn't return control to execMain.c
for each tuple. And we only check the limit at the top level.
While this behavioral change was unintentional at the time, discussion of
bug #6572 led us to the conclusion that we prefer the new behavior anyway,
and so we should just adjust the docs to match rather than change the code.
Accordingly, do that. Back-patch as far as 9.0 so that the docs match the
code in each branch.
Simon Riggs [Thu, 24 Jan 2013 14:24:48 +0000 (14:24 +0000)]
Fix rare missing cancellations in Hot Standby.
The machinery around XLOG_HEAP2_CLEANUP_INFO failed
to correctly pass through the necessary information
on latestRemovedXid, avoiding cancellations in some
infrequent concurrent update/cleanup scenarios.
Backpatchable fix to 9.0
Detailed bug report and fix by Noah Misch,
backpatchable version by me.
Kevin Grittner [Wed, 23 Jan 2013 19:40:06 +0000 (13:40 -0600)]
Fix performance problems with autovacuum truncation in busy workloads.
In situations where there are over 8MB of empty pages at the end of
a table, the truncation work for trailing empty pages takes longer
than deadlock_timeout, and there is frequent access to the table by
processes other than autovacuum, there was a problem with the
autovacuum worker process being canceled by the deadlock checking
code. The truncation work done by autovacuum up that point was
lost, and the attempt tried again by a later autovacuum worker. The
attempts could continue indefinitely without making progress,
consuming resources and blocking other processes for up to
deadlock_timeout each time.
This patch has the autovacuum worker checking whether it is
blocking any other thread at 20ms intervals. If such a condition
develops, the autovacuum worker will persist the work it has done
so far, release its lock on the table, and sleep in 50ms intervals
for up to 5 seconds, hoping to be able to re-acquire the lock and
try again. If it is unable to get the lock in that time, it moves
on and a worker will try to continue later from the point this one
left off.
While this patch doesn't change the rules about when and what to
truncate, it does cause the truncation to occur sooner, with less
blocking, and with the consumption of fewer resources when there is
contention for the table's lock.
The only user-visible change other than improved performance is
that the table size during truncation may change incrementally
instead of just once.
Tom Lane [Mon, 21 Jan 2013 04:43:56 +0000 (23:43 -0500)]
Fix one-byte buffer overrun in PQprintTuples().
This bug goes back to the original Postgres95 sources. Its significance
to modern PG versions is marginal, since we have not used PQprintTuples()
internally in a very long time, and it doesn't seem to have ever been
documented either. Still, it *is* exposed to client apps, so somebody
out there might possibly be using it.
Tom Lane [Sat, 19 Jan 2013 22:20:32 +0000 (17:20 -0500)]
Modernize string literal syntax in tutorial example.
Un-double the backslashes in the LIKE patterns, since
standard_conforming_strings is now the default. Just to be sure, include
a command to set standard_conforming_strings to ON in the example.
Back-patch to 9.1, where standard_conforming_strings became the default.
Andrew Dunstan [Sat, 19 Jan 2013 19:54:29 +0000 (14:54 -0500)]
Make pgxs build executables with the right suffix.
Complaint and patch from Zoltán Böszörményi.
When cross-compiling, the native make doesn't know
about the Windows .exe suffix, so it only builds with
it when explicitly told to do so.
The native make will not see the link between the target
name and the built executable, and might this do unnecesary
work, but that's a bigger problem than this one, if in fact
we consider it a problem at all.
Tom Lane [Fri, 18 Jan 2013 23:06:32 +0000 (18:06 -0500)]
Protect against SnapshotNow race conditions in pg_tablespace scans.
Use of SnapshotNow is known to expose us to race conditions if the tuple(s)
being sought could be updated by concurrently-committing transactions.
CREATE DATABASE and DROP DATABASE are particularly exposed because they do
heavyweight filesystem operations during their scans of pg_tablespace,
so that the scans run for a very long time compared to most. Furthermore,
the potential consequences of a missed or twice-visited row are nastier
than average:
* createdb() could fail with a bogus "file already exists" error, or
silently fail to copy one or more tablespace's worth of files into the
new database.
* remove_dbtablespaces() could miss one or more tablespaces, thus failing
to free filesystem space for the dropped database.
* check_db_file_conflict() could likewise miss a tablespace, leading to an
OID conflict that could result in data loss either immediately or in
future operations. (This seems of very low probability, though, since a
duplicate database OID would be unlikely to start with.)
Hence, it seems worth fixing these three places to use MVCC snapshots, even
though this will someday be superseded by a generic solution to SnapshotNow
race conditions.
On second thought, use an empty string instead of "none" when not connected.
"none" could mislead to think that you're connected a database with that
name. Also, it needs to be translated, which might be hard without some
context. So in back-branches, use empty string, so that the message is
(currently ""), which is at least unambiguous and doens't require
translation. In master, it's no problem to add translatable strings, so use
a different fix there.
Tom Lane [Mon, 14 Jan 2013 20:19:48 +0000 (15:19 -0500)]
Reject out-of-range dates in to_date().
Dates outside the supported range could be entered, but would not print
reasonably, and operations such as conversion to timestamp wouldn't behave
sanely either. Since this has the potential to result in undumpable table
data, it seems worth back-patching.
Tom Lane [Mon, 14 Jan 2013 19:45:40 +0000 (14:45 -0500)]
Add new timezone abbrevation "FET".
This seems to have been invented in 2011 to represent GMT+3, non daylight
savings rules, as now used in Europe/Kaliningrad and Europe/Minsk.
There are no conflicts so might as well add it to the Default list.
Per bug #7804 from Ruslan Izmaylov.
Tom Lane [Sun, 23 Dec 2012 19:07:36 +0000 (14:07 -0500)]
Prevent failure when RowExpr or XmlExpr is parse-analyzed twice.
transformExpr() is required to cope with already-transformed expression
trees, for various ugly-but-not-quite-worth-cleaning-up reasons. However,
some of its newer subroutines hadn't gotten the memo. This accounts for
bug #7763 from Norbert Buchmuller: transformRowExpr() was overwriting the
previously determined type of a RowExpr during CREATE TABLE LIKE INCLUDING
INDEXES. Additional investigation showed that transformXmlExpr had the
same kind of problem, but all the other cases seem to be safe.
Fix race condition if a file is removed while pg_basebackup is running.
If a relation file was removed when the server-side counterpart of
pg_basebackup was just about to open it to send it to the client, you'd
get a "could not open file" error. Fix that.
Backpatch to 9.1, this goes back to when pg_basebackup was introduced.
Tom Lane [Thu, 20 Dec 2012 21:31:10 +0000 (16:31 -0500)]
Fix pg_extension_config_dump() to handle update cases more sanely.
If pg_extension_config_dump() is executed again for a table already listed
in the extension's extconfig, the code was blindly making a new array entry.
This does not seem useful. Fix it to replace the existing array entry
instead, so that it's possible for extension update scripts to alter the
filter conditions for configuration tables.
In addition, teach ALTER EXTENSION DROP TABLE to check for an extconfig
entry for the target table, and remove it if present. This is not a 100%
solution because it's allowed for an extension update script to just
summarily DROP a member table, and that code path doesn't go through
ExecAlterExtensionContentsStmt. We could probably make that case clean
things up if we had to, but it would involve sticking a very ugly wart
somewhere in the guts of dependency.c. Since on the whole it seems quite
unlikely that extension updates would want to remove pre-existing
configuration tables, making the case possible with an explicit command
seems sufficient.
Per bug #7756 from Regina Obe. Back-patch to 9.1 where extensions were
introduced.
Fix recycling of WAL segments after changing recovery target timeline.
After the recovery target timeline is changed, we would still recycle and
preallocate WAL segments on the old target timeline. Those WAL segments
created for the old timeline are a waste of space, although otherwise
harmless.
The problem is that when installing a recycled WAL segment as a future one,
ThisTimeLineID is used to construct the filename. ThisTimeLineID is
initialized in the checkpointer process to the recovery target timeline at
startup, but it was not updated when the startup process chooses a new
target timeline (recovery_target_timeline='latest'). To fix, always update
ThisTimeLineID before recycling WAL segments at a restartpoint.
This still leaves a small window where we might install WAL segments under
wrong timeline ID, if the target timeline is changed just as we're about to
start recycling. Also, when we're not on the target timeline yet, but still
replaying some older timeline, we'll install WAL segments to the newer
timeline anyway and they will still go wasted. We'll just live with the
waste in that situation.
Commit to 9.2 and 9.1. Older versions didn't change recovery target timeline
after startup, and for master, I'll commit a slightly different variant of
this.
Tom Lane [Tue, 18 Dec 2012 21:22:24 +0000 (16:22 -0500)]
Ignore libedit/libreadline while probing for standard functions.
Some versions of libedit expose bogus definitions of setproctitle(),
optreset, and perhaps other symbols that we don't want configure to pick up
on. There was a previous report of similar problems with strlcpy(), which
we addressed in commit 59cf88da91bc88978b05275ebd94ac2d980c4047, but the
problem has evidently grown in scope since then. In hopes of not having to
deal with it again in future, rearrange configure's tests for supplied
functions so that we ignore libedit/libreadline except when probing
specifically for functions we expect them to provide.
Per report from Christoph Berg, though this is slightly more aggressive
than his proposed patch.
Tom Lane [Tue, 18 Dec 2012 01:15:45 +0000 (20:15 -0500)]
Fix failure to ignore leftover temp tables after a server crash.
During crash recovery, we remove disk files belonging to temporary tables,
but the system catalog entries for such tables are intentionally not
cleaned up right away. Instead, the first backend that uses a temp schema
is expected to clean out any leftover objects therein. This approach
requires that we be careful to ignore leftover temp tables (since any
actual access attempt would fail), *even if their BackendId matches our
session*, if we have not yet established use of the session's corresponding
temp schema. That worked fine in the past, but was broken by commit debcec7dc31a992703911a9953e299c8d730c778 which incorrectly removed the
rd_islocaltemp relcache flag. Put it back, and undo various changes
that substituted tests like "rel->rd_backend == MyBackendId" for use
of a state-aware flag. Per trouble report from Heikki Linnakangas.
Back-patch to 9.1 where the erroneous change was made. In the back
branches, be careful to add rd_islocaltemp in a spot in the struct that
was alignment padding before, so as not to break existing add-on code.
Tom Lane [Sun, 16 Dec 2012 20:02:07 +0000 (15:02 -0500)]
Fix filling of postmaster.pid in bootstrap/standalone mode.
We failed to ever fill the sixth line (LISTEN_ADDR), which caused the
attempt to fill the seventh line (SHMEM_KEY) to fail, so that the shared
memory key never got added to the file in standalone mode. This has been
broken since we added more content to our lock files in 9.1.
To fix, tweak the logic in CreateLockFile to add an empty LISTEN_ADDR
line in standalone mode. This is a tad grotty, but since that function
already knows almost everything there is to know about the contents of
lock files, it doesn't seem that it's any better to hack it elsewhere.
It's not clear how significant this bug really is, since a standalone
backend should never have any children and thus it seems not critical
to be able to check the nattch count of the shmem segment externally.
But I'm going to back-patch the fix anyway.
This problem had escaped notice because of an ancient (and in hindsight
pretty dubious) decision to suppress LOG-level messages by default in
standalone mode; so that the elog(LOG) complaint in AddToDataDirLockFile
that should have warned of the problem didn't do anything. Fixing that
is material for a separate patch though.
Tom Lane [Wed, 12 Dec 2012 03:09:20 +0000 (22:09 -0500)]
Add defenses against integer overflow in dynahash numbuckets calculations.
The dynahash code requires the number of buckets in a hash table to fit
in an int; but since we calculate the desired hash table size dynamically,
there are various scenarios where we might calculate too large a value.
The resulting overflow can lead to infinite loops, division-by-zero
crashes, etc. I (tgl) had previously installed some defenses against that
in commit 299d1716525c659f0e02840e31fbe4dea3, but that covered only one
call path. Moreover it worked by limiting the request size to work_mem,
but in a 64-bit machine it's possible to set work_mem high enough that the
problem appears anyway. So let's fix the problem at the root by installing
limits in the dynahash.c functions themselves.
Bruce Momjian [Tue, 11 Dec 2012 20:09:22 +0000 (15:09 -0500)]
Fix pg_upgrade for invalid indexes
All versions of pg_upgrade upgraded invalid indexes caused by CREATE
INDEX CONCURRENTLY failures and marked them as valid. The patch adds a
check to all pg_upgrade versions and throws an error during upgrade or
--check.
Backpatch to 9.2, 9.1, 9.0. Patch slightly adjusted.
Consistency check should compare last record replayed, not last record read.
EndRecPtr is the last record that we've read, but not necessarily yet
replayed. CheckRecoveryConsistency should compare minRecoveryPoint with the
last replayed record instead. This caused recovery to think it's reached
consistency too early.
Now that we do the check in CheckRecoveryConsistency correctly, we have to
move the call of that function to after redoing a record. The current place,
after reading a record but before replaying it, is wrong. In particular, if
there are no more records after the one ending at minRecoveryPoint, we don't
enter hot standby until one extra record is generated and read by the
standby, and CheckRecoveryConsistency is called. These two bugs conspired
to make the code appear to work correctly, except for the small window
between reading the last record that reaches minRecoveryPoint, and
replaying it.
In the passing, rename recoveryLastRecPtr, which is the last record
replayed, to lastReplayedEndRecPtr. This makes it slightly less confusing
with replayEndRecPtr, which is the last record read that we're about to
replay.
Original report from Kyotaro HORIGUCHI, further diagnosis by Fujii Masao.
Backpatch to 9.0, where Hot Standby subtly changed the test from
"minRecoveryPoint < EndRecPtr" to "minRecoveryPoint <= EndRecPtr". The
former works because where the test is performed, we have always read one
more record than we've replayed.
Andrew Dunstan [Tue, 11 Dec 2012 16:51:51 +0000 (11:51 -0500)]
Add mode where contrib installcheck runs each module in a separately named database.
Normally each module is tested in a database named contrib_regression,
which is dropped and recreated at the beginhning of each pg_regress run.
This new mode, enabled by adding USE_MODULE_DB=1 to the make command
line, runs most modules in a database with the module name embedded in
it.
This will make testing pg_upgrade on clusters with the contrib modules
a lot easier.
Second attempt at this, this time accomodating make versions older
than 3.82.
Still to be done: adapt to the MSVC build system.
Backpatch to 9.0, which is the earliest version it is reasonably
possible to test upgrading from.