Update High Availability docs. Clarify terms master/primary standby/slave,
move two paragraphs that apply to log shipping in general from the
"Alternative method for log shipping" section to the earlier sections.
Add varname tags where missing. Some small wording changes.
Tom Lane [Fri, 28 May 2010 01:14:03 +0000 (01:14 +0000)]
Rejigger mergejoin logic so that a tuple with a null in the first merge column
is treated like end-of-input, if nulls sort last in that column and we are not
doing outer-join filling for that input. In such a case, the tuple cannot
join to anything from the other input (because we assume mergejoinable
operators are strict), and neither can any tuple following it in the sort
order. If we're not interested in doing outer-join filling we can just
pretend the tuple and its successors aren't there at all. This can save a
great deal of time in situations where there are many nulls in the join
column, as in a recent example from Scott Marlowe. Also, since the planner
tends to not count nulls in its mergejoin scan selectivity estimates, this
is an important fix to make the runtime behavior more like the estimate.
I regard this as an omission in the patch I wrote years ago to teach mergejoin
that tuples containing nulls aren't joinable, so I'm back-patching it. But
only to 8.3 --- in older versions, we didn't have a solid notion of whether
nulls sort high or low, so attempting to apply this optimization could break
things.
Tom Lane [Thu, 27 May 2010 19:19:38 +0000 (19:19 +0000)]
Change ps_status.c to explicitly track the current logical length of ps_buffer.
This saves cycles in get_ps_display() on many popular platforms, and more
importantly ensures that get_ps_display() will correctly return an empty
string if init_ps_display() hasn't been called yet. Per trouble report
from Ray Stell, in which log_line_prefix %i produced junk early in backend
startup.
Back-patch to 8.0. 7.4 doesn't have %i and its version of get_ps_display()
makes no pretense of avoiding pad junk anyhow.
Tom Lane [Thu, 27 May 2010 16:20:11 +0000 (16:20 +0000)]
Fix the volatility marking of textanycat() and anytextcat(): they were marked
immutable, but that is wrong in general because the cast from the polymorphic
argument to text could be stable or even volatile. Mark them volatile for
safety. In the typical case where the cast isn't volatile, the planner will
deduce the correct expression volatility after inlining the function, so
performance is not lost. The just-committed fix in CREATE INDEX also ensures
this won't break any indexing cases that ought to be allowed.
Per discussion, I'm not bumping catversion for this change, as it doesn't
seem critical enough to force an initdb on beta testers.
Tom Lane [Thu, 27 May 2010 15:59:10 +0000 (15:59 +0000)]
Make CREATE INDEX run expression preprocessing on a proposed index expression
before it checks whether the expression is immutable. This covers two cases
that were previously handled poorly:
1. SQL function inlining could reduce the apparent volatility of the
expression, allowing an expression to be accepted where it previously would
not have been. As an example, polymorphic functions must be marked with the
worst-case volatility they have for any argument type, but for specific
argument types they might not be so volatile, so indexing could be allowed.
(Since the planner will refuse to inline functions in cases where the
apparent volatility of the expression would increase, this won't break
any cases that were accepted before.)
2. A nominally immutable function could have default arguments that are
volatile expressions. In such a case insertion of the defaults will increase
both the apparent and actual volatility of the expression, so it is
*necessary* to check this before allowing the expression to be indexed.
Back-patch to 8.4, where default arguments were introduced.
Make it more clear that you need to release savepoint with
RELEASE SAVEPOINT to make an older savepoint with the same name
accessible. It's also possible to implicitly release the savepoint by
rolling back to an earlier savepoint, but mentioning that too would make
the note just more verbose and confusing.
Thinko in previous commit: ensure that MAX_SEND_SIZE is always greater
than XLOG_BLCKSZ, by defining it as 16 * XLOG_BLCKSZ rather than directly
as 128k bytes.
In walsender, don't sleep if there's outstanding WAL waiting to be sent,
otherwise we effectively rate-limit the streaming as pointed out by
Simon Riggs. Also, send the WAL in smaller chunks, to respond to signals
more promptly.
Tom Lane [Wed, 26 May 2010 21:39:27 +0000 (21:39 +0000)]
Rearrange libpq's SSL initialization to simplify it and make it handle some
additional cases correctly. The original coding failed to load additional
(chain) certificates from the client cert file, meaning that indirectly signed
client certificates didn't work unless one hacked the server's root.crt file
to include intermediate CAs (not the desired approach). Another problem was
that everything got loaded into the shared SSL_context object, which meant
that concurrent connections trying to use different sslcert settings could
well fail due to conflicting over the single available slot for a keyed
certificate.
To fix, get rid of the use of SSL_CTX_set_client_cert_cb(), which is
deprecated anyway in the OpenSSL documentation, and instead just
unconditionally load the client cert and private key during connection
initialization. This lets us use SSL_CTX_use_certificate_chain_file(),
which does the right thing with additional certs, and is lots simpler than
the previous hacking about with BIO-level access. A small disadvantage is
that we have to load the primary client cert a second time with
SSL_use_certificate_file, so that that one ends up in the correct slot
within the connection's SSL object where it can get paired with the key.
Given the other overhead of making an SSL connection, that doesn't seem
worth worrying about.
Tom Lane [Wed, 26 May 2010 20:47:13 +0000 (20:47 +0000)]
Fix bogus error message for SSL-cert authentication, due to lack of
a uaCert entry in auth_failed(). Put the switch entries into a sane
order, namely the one the enum is declared in.
Simon Riggs [Wed, 26 May 2010 19:52:52 +0000 (19:52 +0000)]
HS Defer buffer pin deadlock check until deadlock_timeout has expired.
During Hot Standby we need to check for buffer pin deadlocks when the
Startup process begins to wait, in case it never wakes up again. We
previously made the deadlock check immediately on the basis it was
cheap, though clearer thinking and prima facie evidence shows that
was too simple. Refactor existing code to make it easy to add in
deferral of deadlock check until deadlock_timeout allowing a good
reduction in deadlock checks since far few buffer pins are held for
that duration. It's worth doing anyway, though major goal is to
prevent further reports of context switching with high numbers of
users on occasional tests.
Tom Lane [Wed, 26 May 2010 15:52:37 +0000 (15:52 +0000)]
Tell openssl to include the names of the root certs the server trusts in
requests for client certs. This lets a client with a keystore select the
appropriate client certificate to send. In particular, this is necessary
to get Java clients to work in all but the most trivial configurations.
Per discussion of bug #5468.
Robert Haas [Wed, 26 May 2010 12:32:41 +0000 (12:32 +0000)]
More fixes for shutdown during recovery.
1. If we receive a fast shutdown request while in the PM_STARTUP state,
process it just as we would in PM_RECOVERY, PM_HOT_STANDBY, or PM_RUN.
Without this change, an early fast shutdown followed by Hot Standby causes
the database to get stuck in a state where a shutdown is pending (so no new
connections are allowed) but the shutdown request is never processed unless
we end Hot Standby and enter normal running.
2. Avoid removing the backup label file when a smart or fast shutdown occurs
during recovery. It makes sense to do this once we've reached normal running,
since we must be taking a backup which now won't be valid. But during
recovery we must be recovering from a previously taken backup, and any backup
label file is needed to restart recovery from the right place.
Tom Lane [Tue, 25 May 2010 17:44:41 +0000 (17:44 +0000)]
Fix oversight in construction of sort/unique plans for UniquePaths.
If the original IN operator is cross-type, for example int8 = int4,
we need to use int4 < int4 to sort the inner data and int4 = int4
to unique-ify it. We got the first part of that right, but tried to
use the original IN operator for the equality checks. Per bug #5472
from Vlad Romascanu.
Backpatch to 8.4, where the bug was introduced by the patch that unified
SortClause and GroupClause. I was able to take out a whole lot of on-the-fly
calls of get_equality_op_for_ordering_op(), but failed to realize that
I needed to put one back in right here :-(
Robert Haas [Fri, 21 May 2010 17:37:44 +0000 (17:37 +0000)]
Unbreak \h; can't do strlen(NULL).
This was broken by the following commmit. Although the original commit was
backpatched all the way to 7.4, this particular bug exists only in the version
applied to HEAD.
Michael Meskes [Thu, 20 May 2010 22:10:46 +0000 (22:10 +0000)]
Ecpg now accepts "long long" datatypes even if "long" is 64bit wide. This used to cover the equally long "long long" type. This patch closes bug #5464.
Magnus Hagander [Thu, 20 May 2010 14:13:11 +0000 (14:13 +0000)]
Change the "N. Central Asia Standard Time" timezone to map to
Asia/Novosibirsk on Windows.
Microsoft changed the behaviour of this zone in the timezone update
from KB976098. The zones differ in handling of DST, and the old
zone was just removed.
Bruce Momjian [Wed, 19 May 2010 18:27:43 +0000 (18:27 +0000)]
For pg_upgrade, update template0's datfrozenxid and its relfrozenxids to
match the behavior of autovacuum, which does this as the xid advances
even if autovacuum is turned off.
Robert Haas [Sun, 16 May 2010 04:35:04 +0000 (04:35 +0000)]
Insert line breaks in two places in SQL functions documentation.
This avoids a formatting problem in the PDF output. In the HTML output this
isn't necessary, but we've done similar things elsewhere in the documentation
so I think it's OK to do it here, too. I've refrained from breaking a longish
error message which also causes problems for the PDF output, because that would
make the HTML output look wrong.
Tom Lane [Sat, 15 May 2010 21:41:16 +0000 (21:41 +0000)]
Ensure that pg_restore -l will output DATABASE entries whether or not -C
is specified. Per bug report from Russell Smith and ensuing discussion.
Since this is a corner case behavioral change, I'm going to be conservative
and not back-patch it.
In passing, also rename the RestoreOptions field for the -C switch to
something less generic than "create".
Tom Lane [Sat, 15 May 2010 18:11:07 +0000 (18:11 +0000)]
Improve documentation of pg_restore's -l and -L switches to point out their
interactions with filtering switches, such as -n and -t. Per a complaint
from Russell Smith.
Simon Riggs [Sat, 15 May 2010 07:14:43 +0000 (07:14 +0000)]
Fix bug in processing of checkpoint time for max_standby_delay. Latest
log time was incorrectly set, typically leading to dates in the past,
which would cause more cancellations in Hot Standby on a quiet server.
Tom Lane [Thu, 13 May 2010 18:29:12 +0000 (18:29 +0000)]
Prevent PL/Tcl from loading the "unknown" module from pltcl_modules unless
that is a regular table or view owned by a superuser. This prevents a
trojan horse attack whereby any unprivileged SQL user could create such a
table and insert code into it that would then get executed in other users'
sessions whenever they call pltcl functions.
Worse yet, because the code was automatically loaded into both the "normal"
and "safe" interpreters at first use, the attacker could execute unrestricted
Tcl code in the "normal" interpreter without there being any pltclu functions
anywhere, or indeed anyone else using pltcl at all: installing pltcl is
sufficient to open the hole. Change the initialization logic so that the
"unknown" code is only loaded into an interpreter when the interpreter is
first really used. (That doesn't add any additional security in this
particular context, but it seems a prudent change, and anyway the former
behavior violated the principle of least astonishment.)
Andrew Dunstan [Thu, 13 May 2010 16:39:43 +0000 (16:39 +0000)]
Abandon the use of Perl's Safe.pm to enforce restrictions in plperl, as it is
fundamentally insecure. Instead apply an opmask to the whole interpreter that
imposes restrictions on unsafe operations. These restrictions are much harder
to subvert than is Safe.pm, since there is no container to be broken out of.
Backported to release 7.4.
In releases 7.4, 8.0 and 8.1 this also includes the necessary backporting of
the two interpreters model for plperl and plperlu adopted in release 8.2.
In versions 8.0 and up, the use of Perl's POSIX module to undo its locale
mangling on Windows has become insecure with these changes, so it is
replaced by our own routine, which is also faster.
Nice side effects of the changes include that it is now possible to use perl's
"strict" pragma in a natural way in plperl, and that perl's $a and
$b variables now work as expected in sort routines, and that function
compilation is significantly faster.
Tim Bunce and Andrew Dunstan, with reviews from Alex Hunsaker and
Alexey Klyukin.
Magnus Hagander [Thu, 13 May 2010 15:58:15 +0000 (15:58 +0000)]
Assorted fixes to make pg_upgrade build on MSVC.
* There is no chmod() on Windows.
* Must always use the 3-parameter version of open()
* There is no dynloader.h - but it also appears unnecessary on all platforms
* Don't include shlobj.h because it causes compile errors, and from what I can
see it's not actually used. This may need to be added back for mingw
and/or cygwin in the worst case.
Simon Riggs [Thu, 13 May 2010 11:39:30 +0000 (11:39 +0000)]
Ensure that top level aborts call XLogSetAsyncCommit(). Not doing
so simply leads to data waiting in wal_buffers which then causes
later commits to potentially do emergency writes and for all forms
of replication to be potentially delayed without need or benefit.
Issue pointed out exactly by Fujii Masao, following bug report
by Robert Haas on a separate though related topic.
Simon Riggs [Thu, 13 May 2010 11:15:38 +0000 (11:15 +0000)]
Cleanup initialization of Hot Standby. Clarify working with reanalysis
of requirements and documentation on LogStandbySnapshot(). Fixes
two minor bugs reported by Tom Lane that would lead to an incorrect
snapshot after transaction wraparound. Also fix two other problems
discovered that would give incorrect snapshots in certain cases.
ProcArrayApplyRecoveryInfo() substantially rewritten. Some minor
refactoring of xact_redo_apply() and ExpireTreeKnownAssignedTransactionIds().