Tom Lane [Thu, 29 Apr 2010 21:36:19 +0000 (21:36 +0000)]
Rename the parameter recovery_connections to hot_standby, to reduce possible
confusion with streaming-replication settings. Also, change its default
value to "off", because of concern about executing new and poorly-tested
code during ordinary non-replicating operation. Per discussion.
In passing do some minor editing of related documentation.
Tom Lane [Thu, 29 Apr 2010 16:32:41 +0000 (16:32 +0000)]
Install a workaround for 'TeX capacity exceeded' problem
when building PDF output for recent versions of the documentation.
There is probably a better answer out there somewhere, but
we need something now so we can build beta releases.
Tom Lane [Wed, 28 Apr 2010 19:38:49 +0000 (19:38 +0000)]
Minor editorializing on pg_controldata and pg_resetxlog: adjust some message
wording, deal explicitly with some fields that were being silently left zero.
Tom Lane [Wed, 28 Apr 2010 16:54:16 +0000 (16:54 +0000)]
Modify ShmemInitStruct and ShmemInitHash to throw errors internally,
rather than returning NULL for some-but-not-all failures as they used to.
Remove now-redundant tests for NULL from call sites.
We had to do something about this because many call sites were failing to
check for NULL; and changing it like this seems a lot more useful and
mistake-proof than adding checks to the call sites without them.
Introduce wal_level GUC to explicitly control if information needed for
archival or hot standby should be WAL-logged, instead of deducing that from
other options like archive_mode. This replaces recovery_connections GUC in
the primary, where it now has no effect, but it's still used in the standby
to enable/disable hot standby.
Remove the WAL-logging of "unlogged operations", like creating an index
without WAL-logging and fsyncing it at the end. Instead, we keep a copy of
the wal_mode setting and the settings that affect how much shared memory a
hot standby server needs to track master transactions (max_connections,
max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings
change, at server restart, write a WAL record noting the new settings and
update pg_control. This allows us to notice the change in those settings in
the standby at the right moment, they used to be included in checkpoint
records, but that meant that a changed value was not reflected in the
standby until the first checkpoint after the change.
Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to
the sequence it used to follow, before hot standby and subsequent patches
changed it to 0x9003.
Tom Lane [Wed, 28 Apr 2010 02:04:16 +0000 (02:04 +0000)]
Modify the built-in text search parser to handle URLs more nearly according
to RFC 3986. In particular, these characters now terminate the path part
of a URL: '"', '<', '>', '\', '^', '`', '{', '|', '}'. The previous behavior
was inconsistent and depended on whether a "?" was present in the path.
Per gripe from Donald Fraser and spec research by Kevin Grittner.
This is a pre-existing bug, but not back-patching since the risks of
breaking existing applications seem to outweigh the benefits.
Tom Lane [Wed, 28 Apr 2010 00:09:05 +0000 (00:09 +0000)]
Replace the KnownAssignedXids hash table with a sorted-array data structure,
and be more tense about the locking requirements for it, to improve performance
in Hot Standby mode. In passing fix a few bugs and improve a number of
comments in the existing HS code.
If a base backup is cancelled by server shutdown or crash, throw an error
in WAL recovery when it sees the shutdown checkpoint record. It's more
user-friendly to find out about it at that point than at the end of
recovery, and you're not left wondering why your hot standby server never
opens up for read-only connections.
Robert Haas [Mon, 26 Apr 2010 10:52:00 +0000 (10:52 +0000)]
When we're restricting who can connect, don't allow new walsenders.
Normal superuser processes are allowed to connect even when the database
system is shutting down, or when fewer than superuser_reserved_connection
slots remain. This is intended to make sure an administrator can log in
and troubleshoot, so don't extend these same courtesies to users connecting
for replication.
Simon Riggs [Fri, 23 Apr 2010 22:23:39 +0000 (22:23 +0000)]
Add missing optimizer hooks for function cost and number of rows.
Closely follow design of other optimizer hooks: if hook exists
retrieve value from plugin; if still not set then get from cache.
Simon Riggs [Fri, 23 Apr 2010 19:57:19 +0000 (19:57 +0000)]
Make CheckRequiredParameterValues() depend upon correct combination
of parameters. Fix bug report by Robert Haas that error message and
hint was incorrect if wrong mode parameters specified on master.
Internal changes only. Proposals for parameter simplification on
master/primary still under way.
Simon Riggs [Thu, 22 Apr 2010 08:04:25 +0000 (08:04 +0000)]
Optimise btree delete processing when no active backends.
Clarify comments, downgrade a message to DEBUG and remove some
debug counters. Direct from ideas by Heikki Linnakangas.
Simon Riggs [Thu, 22 Apr 2010 02:15:45 +0000 (02:15 +0000)]
Further reductions in Hot Standby conflict processing. These
come from the realistion that HEAP2_CLEAN records don't
always remove user visible data, so conflict processing for
them can be skipped. Confirm validity using Assert checks,
clarify circumstances under which we log heap_cleanup_info
records. Tuning arises from bug fixing of earlier safety
check failures.
Fix encoding issue when lc_monetary or lc_numeric are different encoding
from lc_ctype, that could happen on Windows. We need to change lc_ctype
together with lc_monetary or lc_numeric, and convert strings in lconv
from lc_ctype encoding to the database encoding.
The bug reported by Mikko, original patch by Hiroshi Inoue,
with changes by Bruce and me.
Tom Lane [Wed, 21 Apr 2010 20:54:19 +0000 (20:54 +0000)]
Enforce superuser permissions checks during ALTER ROLE/DATABASE SET, rather
than during define_custom_variable(). This entails rejecting an ALTER
command if the target variable doesn't have a known (non-placeholder)
definition, unless the calling user is superuser. When the variable *is*
known, we can correctly apply the rule that only superusers can issue ALTER
for SUSET parameters. This allows define_custom_variable to apply ALTER's
values for SUSET parameters at module load time, secure in the knowledge
that only a superuser could have set the ALTER value. This change fixes a
longstanding gotcha in the usage of SUSET-level custom parameters; which
is a good thing to fix now that plpgsql defines such a parameter.
Simon Riggs [Wed, 21 Apr 2010 19:53:24 +0000 (19:53 +0000)]
Only send cleanup_info messages if VACUUM removes any tuples.
There is no other purpose for this message type than to report
the latestRemovedXid of removed tuples, prior to index scans.
Removes overlooked path for sending invalid latestRemovedXid.
Fixes buildfarm failure on centaur.
Simon Riggs [Wed, 21 Apr 2010 19:08:14 +0000 (19:08 +0000)]
Relax locking during GetCurrentVirtualXIDs(). Earlier improvements
to handling of btree delete records mean that all snapshot
conflicts on standby now have a valid, useful latestRemovedXid.
Our earlier approach using LW_EXCLUSIVE was useful when we didnt
always have a valid value, though is no longer useful or necessary.
Asserts added to code path to prove and ensure this is the case.
This will reduce contention and improve performance of larger Hot
Standby servers.
Simon Riggs [Wed, 21 Apr 2010 17:20:56 +0000 (17:20 +0000)]
Fix oversight in collecting values for cleanup_info records.
vacuum_log_cleanup_info() now generates log records with a valid
latestRemovedXid set in all cases. Also be careful not to zero the
value when we do a round of vacuuming part-way through lazy_scan_heap().
Incidentally, this reduces frequency of conflicts in Hot Standby.
Tom Lane [Wed, 21 Apr 2010 03:32:53 +0000 (03:32 +0000)]
Fix pg_hba.conf matching so that replication connections only match records
with database = replication. The previous coding would allow them to match
ordinary records too, but that seems like a recipe for security breaches.
Improve the messages associated with no-such-pg_hba.conf entry to report
replication connections as such, since that's now a critical aspect of
whether the connection matches. Make some cursory improvements in the related
documentation, too.
Tom Lane [Wed, 21 Apr 2010 00:51:57 +0000 (00:51 +0000)]
Move the check for whether walreceiver has authenticated as a superuser
from walsender.c, where it didn't really belong, to postinit.c where it does
belong (and is essentially free, too).
Tom Lane [Tue, 20 Apr 2010 23:48:47 +0000 (23:48 +0000)]
Arrange for client authentication to occur before we select a specific
database to connect to. This is necessary for the walsender code to work
properly (it was previously using an untenable assumption that template1 would
always be available to connect to). This also gets rid of a small security
shortcoming that was introduced in the original patch to eliminate the flat
authentication files: before, you could find out whether or not the requested
database existed even if you couldn't pass the authentication checks.
The changes needed to support this are mainly just to treat pg_authid and
pg_auth_members as nailed relations, so that we can read them without having
to be able to locate real pg_class entries for them. This mechanism was
already debugged for pg_database, but we hadn't recognized the value of
applying it to those catalogs too.
Since the current code doesn't have support for accessing toast tables before
we've brought up all of the relcache, remove pg_authid's toast table to ensure
that no one can store an out-of-line toasted value of rolpassword. The case
seems quite unlikely to occur in practice, and was effectively unsupported
anyway in the old "flatfiles" implementation.
Update genbki.pl to actually implement the same rules as bootstrap.c does for
not-nullability of catalog columns. The previous coding was a bit cheesy but
worked all right for the previous set of bootstrap catalogs. It does not work
for pg_authid, where rolvaliduntil needs to be nullable.
Initdb forced due to minor catalog changes (mainly the toast table removal).
Robert Haas [Tue, 20 Apr 2010 11:15:06 +0000 (11:15 +0000)]
Rename standby_keep_segments to wal_keep_segments.
Also, make the name of the GUC and the name of the backing variable match.
Alnong the way, clean up a couple of slight typographical errors in the
related docs.
Tom Lane [Tue, 20 Apr 2010 01:38:52 +0000 (01:38 +0000)]
Move the responsibility for calling StartupXLOG into InitPostgres, for
those process types that go through InitPostgres; in particular, bootstrap
and standalone-backend cases. This ensures that we have set up a PGPROC
and done some other basic initialization steps (corresponding to the
if (IsUnderPostmaster) block in AuxiliaryProcessMain) before we attempt to
run WAL recovery in a standalone backend. As was discovered last September,
this is necessary for some corner-case code paths during WAL recovery,
particularly end-of-WAL cleanup.
Moving the bootstrap case here too is not necessary for correctness, but it
seems like a good idea since it reduces the number of distinct code paths.
Robert Haas [Tue, 20 Apr 2010 00:26:06 +0000 (00:26 +0000)]
Update docs as to when WAL logging can be skipped.
In 8.4 and prior, WAL-logging could potentially be skipped whenever
archive_mode=off. With streaming replication, this is now true only
if max_wal_senders=0.
Simon Riggs [Mon, 19 Apr 2010 18:03:38 +0000 (18:03 +0000)]
Check RecoveryInProgress() while holding ProcArrayLock during snapshots.
This prevents a rare, yet possible race condition at the exact moment
of transition from recovery to normal running.
Tom Lane [Mon, 19 Apr 2010 17:54:48 +0000 (17:54 +0000)]
Fix uninitialized local variables. Not sure why gcc doesn't complain about
these --- maybe because they're effectively unused? MSVC does complain though,
per buildfarm.
Magnus Hagander [Mon, 19 Apr 2010 14:10:45 +0000 (14:10 +0000)]
Add wrapper function libpqrcv_PQexec() in the walreceiver that uses async
libpq to send queries, making the waiting for responses interruptible on
platforms where PQexec() can't normally be interrupted by signals, such
as win32.
Robert Haas [Mon, 19 Apr 2010 00:55:26 +0000 (00:55 +0000)]
Add an 'enable_material' GUC.
The logic for determining whether to materialize has been significantly
overhauled for 9.0. In case there should be any doubt about whether
materialization is a win in any particular case, this should provide a
convenient way of seeing what happens without it; but even with enable_material
turned off, we still materialize in cases where it is required for
correctness.
Simon Riggs [Sun, 18 Apr 2010 18:44:53 +0000 (18:44 +0000)]
Improve sequence and sense of messages from pg_stop_backup().
Now doesn't report it is waiting until it actually is waiting,
plus message doesn't appear until at least 5 seconds wait, so
we avoid reporting the wait before we've given the archiver
a reasonable time to wake up and archive the file we just
created earlier in the function.
Also add new unconditional message to confirm safe completion.
Now a normal, healthy execution does not report waiting at
all, just safe completion.
Simon Riggs [Sun, 18 Apr 2010 18:06:07 +0000 (18:06 +0000)]
Tune GetSnapshotData() during Hot Standby by avoiding loop
through normal backends. Makes code clearer also, since we
avoid various Assert()s. Performance of snapshots taken
during recovery no longer depends upon number of read-only
backends.
On Windows, syslogger runs in two threads. The main thread processes config
reload and rotation signals, and a helper thread reads messages from the
pipe and writes them to the log file. However, server code isn't generally
thread-safe, so if both try to do e.g palloc()/pfree() at the same time,
bad things will happen. To fix that, use a critical section (which is like
a mutex) to enforce that only one the threads are active at a time.
In standby mode, suppress repeated LOG messages about a corrupt record,
which just indicates that we've reached the end of valid WAL found in
the standby.
Tom Lane [Wed, 14 Apr 2010 23:52:10 +0000 (23:52 +0000)]
Fix plpgsql's exec_eval_expr() to ensure it returns a sane type OID
even when the expression is a query that returns no rows.
So far as I can tell, the only caller that actually fails when a garbage
OID is returned is exec_stmt_case(), which is new in 8.4 --- in all other
cases, we might make a useless trip through casting logic, but we won't
fail since the isnull flag will be set. Hence, backpatch only to 8.4,
just in case there are apps out there that aren't expecting an error to
be thrown if the query returns more or less than one column. (Which seems
unlikely, since the error would be thrown if the query ever did return a
row; but it's possible there's some never-exercised code out there.)
Tom Lane [Wed, 14 Apr 2010 21:31:11 +0000 (21:31 +0000)]
Fix a problem introduced by my patch of 2010-01-12 that revised the way
relcache reload works. In the patched code, a relcache entry in process of
being rebuilt doesn't get unhooked from the relcache hash table; which means
that if a cache flush occurs due to sinval queue overrun while we're
rebuilding it, the entry could get blown away by RelationCacheInvalidate,
resulting in crash or misbehavior. Fix by ensuring that an entry being
rebuilt has positive refcount, so it won't be seen as a target for removal
if a cache flush occurs. (This will mean that the entry gets rebuilt twice
in such a scenario, but that's okay.) It appears that the problem can only
arise within a transaction that has previously reassigned the relfilenode of
a pre-existing table, via TRUNCATE or a similar operation. Per bug #5412
from Rusty Conover.
Back-patch to 8.2, same as the patch that introduced the problem.
I think that the failure can't actually occur in 8.2, since it lacks the
rd_newRelfilenodeSubid optimization, but let's make it work like the later
branches anyway.
Magnus Hagander [Tue, 13 Apr 2010 08:16:09 +0000 (08:16 +0000)]
Only try to do a graceful disconnect if we've successfully loaded the
shared library with the disconnect function in it. Fixes segmentation
fault reported by Jeff Davis.
Update the location of last removed WAL segment in shared memory only
after actually removing one, so that if we can't remove segments because
WAL archiving is lagging behind, we don't unnecessarily forbid streaming
the old not-yet-archived segments that are still perfectly valid. Per
suggestion from Fujii Masao.
Change the logic to decide when to delete old WAL segments, so that it
doesn't take into account how far the WAL senders are. This way a hung
WAL sender doesn't prevent old WAL segments from being recycled/removed
in the primary, ultimately causing the disk to fill up. Instead add
standby_keep_segments setting to control how many old WAL segments are
kept in the primary. This also makes it more reliable to use streaming
replication without WAL archiving, assuming that you set
standby_keep_segments high enough.
Magnus Hagander [Thu, 8 Apr 2010 11:25:58 +0000 (11:25 +0000)]
Proceed to look for the next timezone when matching a localized
Windows timezone name where the information in the registry is
incomplete, instead of aborting.
This fixes cases when the registry information is incomplete for
a timezone that is alphabetically before the one that is in use.
Robert Haas [Thu, 8 Apr 2010 01:39:37 +0000 (01:39 +0000)]
Make smart shutdown work in combination with Hot Standby/Streaming Replication.
At present, killing the startup process does not release any locks it holds,
so we must wait to stop the startup and walreceiver processes until all
read-only backends have exited. Without this patch, the startup and
walreceiver processes never exit, so the server gets permanently stuck in
a half-shutdown state.
Fujii Masao, with review, docs, and comment adjustments by me.
Tom Lane [Wed, 7 Apr 2010 21:41:53 +0000 (21:41 +0000)]
Fix to_char YYY, YY, Y format codes so that FM zero-suppression really works,
rather than only sort-of working as the previous attempt had left it.
Clean up some unnecessary differences between the way these were coded and
the way the YYYY case was coded. Update the regression test cases that
proved that it wasn't working.
Allow quotes to be escaped in recovery.conf, by doubling them. This patch
also makes the parsing a little bit stricter, rejecting garbage after the
parameter value and values with missing ending quotes, for example.
Forbid using pg_xlogfile_name() and pg_xlogfile_name_offset() during
recovery. We might want to relax this in the future, but ThisTimeLineID
isn't currently correct in backends during recovery, so the filename
returned was wrong.
Magnus Hagander [Tue, 6 Apr 2010 20:35:11 +0000 (20:35 +0000)]
Log the actual timezone name that we fail to look up the values for in
case the registry data doesn't follow the format we expect, to facilitate
debugging.
Add missing completions for:
- ALTER SEQUENCE name OWNER TO
- ALTER TYPE name RENAME TO
- ALTER VIEW name ALTER COLUMN
- ALTER VIEW name OWNER TO
- ALTER VIEW name SET SCHEMA
Fix wrong completions for:
- ALTER FUNCTION/AGGREGATE name (arguments) ...
"(arguments)" has been ignored.
- ALTER ... SET SCHEMA
"SCHEMA" has been considered as a variable name.
Andrew Dunstan [Mon, 5 Apr 2010 03:09:09 +0000 (03:09 +0000)]
Exclude unwanted typedef symbols in pgindent, including FD_SET which is found on some Windows platforms. Also, silence unnecessary messages and make awk happier about literal '*' on some platforms.