Simon Riggs [Thu, 13 May 2010 11:39:30 +0000 (11:39 +0000)]
Ensure that top level aborts call XLogSetAsyncCommit(). Not doing
so simply leads to data waiting in wal_buffers which then causes
later commits to potentially do emergency writes and for all forms
of replication to be potentially delayed without need or benefit.
Issue pointed out exactly by Fujii Masao, following bug report
by Robert Haas on a separate though related topic.
Simon Riggs [Thu, 13 May 2010 11:15:38 +0000 (11:15 +0000)]
Cleanup initialization of Hot Standby. Clarify working with reanalysis
of requirements and documentation on LogStandbySnapshot(). Fixes
two minor bugs reported by Tom Lane that would lead to an incorrect
snapshot after transaction wraparound. Also fix two other problems
discovered that would give incorrect snapshots in certain cases.
ProcArrayApplyRecoveryInfo() substantially rewritten. Some minor
refactoring of xact_redo_apply() and ExpireTreeKnownAssignedTransactionIds().
Tom Lane [Wed, 12 May 2010 16:50:00 +0000 (16:50 +0000)]
Clean up unnecessary unportability and compiler warnings by removing the
cmp parameter for pg_scandir(). The code failed to support this anyway
for Sun/Windows, so pretending we could accept a parameter other than
NULL was just asking for trouble.
Tom Lane [Tue, 11 May 2010 23:01:27 +0000 (23:01 +0000)]
Update time zone data files to tzdata release 2010j: DST law changes in
Argentina, Australian Antarctic, Bangladesh, Mexico, Morocco, Pakistan,
Palestine, Russia, Syria, Tunisia. Historical corrections for Taiwan.
Tom Lane [Tue, 11 May 2010 22:36:52 +0000 (22:36 +0000)]
Add PKST to the default set of timezone abbreviations.
Per discussion, if we have PKT in there then PKST should be too.
Also, fix mistaken claim that these abbrevs are not known to zic.
Tom Lane [Tue, 11 May 2010 16:42:28 +0000 (16:42 +0000)]
Cause the archiver process to adopt new postgresql.conf settings (particularly
archive_command) as soon as possible, namely just before issuing a new call
of archive_command, even when there is a backlog of files to be archived.
The original coding would only absorb new settings after clearing the backlog
and returning to the outer loop. Per discussion.
Back-patch to 8.3. The logic in prior versions is a bit different and it
doesn't seem worth taking any risks of breaking it.
Tom Lane [Tue, 11 May 2010 15:31:37 +0000 (15:31 +0000)]
Fix incorrect patch that removed permission checks on inheritance child
tables --- the parent table no longer got checked, either. Per bug #5458
from Takahiro Itagaki.
Itagaki Takahiro [Tue, 11 May 2010 04:52:28 +0000 (04:52 +0000)]
Set per-function GUC settings during validating the function.
Now validators work properly even when the settings contain
parameters that affect behavior of the function, like search_path.
Tom Lane [Mon, 10 May 2010 16:25:46 +0000 (16:25 +0000)]
When adding a "target IS NOT NULL" indexqual to the plan for an index-optimized
MIN or MAX, we must take care to insert the added qual in a legal place among
the existing indexquals, if any. The btree index AM requires the quals to
appear in index-column order. We didn't have to worry about this before
because "target IS NOT NULL" was just treated as a plain scan filter condition;
but as of 9.0 it can be an index qual and then it has to follow the rule.
Per report from Ian Barwick.
Tom Lane [Sun, 9 May 2010 02:16:00 +0000 (02:16 +0000)]
Adjust comments about avoiding use of printf's %.*s.
My initial impression that glibc was measuring the precision in characters
(which is what the Linux man page says it does) was incorrect. It does take
the precision to be in bytes, but it also tries to truncate the string at a
character boundary. The bottom line remains the same: it will mess up
if the string is not in the encoding it expects, so we need to avoid %.*s
anytime there's a significant risk of that. Previous code changes are still
good, but adjust the comments to reflect this knowledge. Per research by
Hernan Gonzalez.
Tom Lane [Sat, 8 May 2010 16:39:53 +0000 (16:39 +0000)]
Work around a subtle portability problem in use of printf %s format.
Depending on which spec you read, field widths and precisions in %s may be
counted either in bytes or characters. Our code was assuming bytes, which
is wrong at least for glibc's implementation, and in any case libc might
have a different idea of the prevailing encoding than we do. Hence, for
portable results we must avoid using anything more complex than just "%s"
unless the string to be printed is known to be all-ASCII.
This patch fixes the cases I could find, including the psql formatting
failure reported by Hernan Gonzalez. In HEAD only, I also added comments
to some places where it appears safe to continue using "%.*s".
Tom Lane [Thu, 6 May 2010 19:28:25 +0000 (19:28 +0000)]
On Linux, use --enable-new-dtags when specifying -rpath to linker.
This should allow LD_LIBRARY_PATH to work as desired. Per trouble
report from Andy Colson.
Tom Lane [Wed, 5 May 2010 22:18:56 +0000 (22:18 +0000)]
Fix psql to not go into infinite recursion when expanding a variable that
refers to itself (directly or indirectly). Instead, print a message when
recursion is detected, and don't expand the repeated reference. Per bug
#5448 from Francis Markham.
Back-patch to 8.0. Although the issue exists in 7.4 as well, it seems
impractical to fix there because of the lack of any state stack that
could be used to track active expansions.
Need to hold ControlFileLock while updating control file. Update
minRecoveryPoint in control file when replaying a parameter change record,
to ensure that we don't allow hot standby on WAL generated without
wal_level='hot_standby' after a standby restart.
Add cross-reference from wal_level to hot_standby setting. Update
the PITR documentation to mention that you need to set wal_level to
'archive' or 'hot_standby', to enable WAL archiving. Per Simon's request.
Tom Lane [Sun, 2 May 2010 22:28:05 +0000 (22:28 +0000)]
Fix replay of XLOG_HEAP_NEWPAGE WAL records to pay attention to the forknum
field of the WAL record. The previous coding always wrote to the main fork,
resulting in data corruption if the page was meant to go into a non-default
fork.
At present, the only operation that can produce such WAL records is
ALTER TABLE/INDEX SET TABLESPACE when executed with archive_mode = on.
Data corruption would be observed on standby slaves, and could occur on the
master as well if a database crash and recovery occurred after committing
the ALTER and before the next checkpoint. Per report from Gordon Shannon.
Back-patch to 8.4; the problem doesn't exist in earlier branches because
we didn't have a concept of multiple relation forks then.
Simon Riggs [Sun, 2 May 2010 11:32:53 +0000 (11:32 +0000)]
Mention that max_standby_delay has units of milliseconds. Units are mentioned
for all other parameters where the default is expressed in a different unit.
Tom Lane [Sun, 2 May 2010 02:10:33 +0000 (02:10 +0000)]
Clean up some awkward, inaccurate, and inefficient processing around
MaxStandbyDelay. Use the GUC units mechanism for the value, and choose more
appropriate timestamp functions for performing tests with it. Make the
ps_activity manipulation in ResolveRecoveryConflictWithVirtualXIDs have
behavior similar to ps_activity code elsewhere, notably not updating the
display when update_process_title is off and not truncating the display
contents at an arbitrarily-chosen length. Improve the docs to be explicit
about what MaxStandbyDelay actually measures, viz the difference between
primary and standby servers' clocks, and the possible hazards if their clocks
aren't in sync.
Tom Lane [Sat, 1 May 2010 22:46:30 +0000 (22:46 +0000)]
Add code to InternalIpcMemoryCreate() to handle the case where shmget()
returns EINVAL for an existing shared memory segment. Although it's not
terribly sensible, that behavior does meet the POSIX spec because EINVAL
is the appropriate error code when the existing segment is smaller than the
requested size, and the spec explicitly disclaims any particular ordering of
error checks. Moreover, it does in fact happen on OS X and probably other
BSD-derived kernels. (We were able to talk NetBSD into changing their code,
but purging that behavior from the wild completely seems unlikely to happen.)
We need to distinguish collision with a pre-existing segment from invalid size
request in order to behave sensibly, so it's worth some extra code here to get
it right. Per report from Gavin Kistner and subsequent investigation.
Back-patch to all supported versions, since any of them could get used
with a kernel having the debatable behavior.
Tom Lane [Sat, 1 May 2010 21:31:17 +0000 (21:31 +0000)]
Install hack workaround for failure of 'make all' in VPATH builds.
It appears that gmake gets confused if postgres.sgml is not present in
the working directory, and instantiates some default rule or other that
would let postgres.sgml be built from postgres.xml. I haven't been able
to track down exactly where that's coming from, but the problem can be
dodged by specifying srcdir explicitly in the rule for postgres.xml.
Per report from Vladimir Kokovic.
Tom Lane [Sat, 1 May 2010 18:15:07 +0000 (18:15 +0000)]
Adjust postgres.xml rule so that make will notice a failure exit from osx.
The previous coding had it in a pipe, which on most shells won't report
the error. Per experimentation with a bug report from Vladimir Kokovic.
This doesn't actually fix his problem, but it does explain why make
didn't report that there was a problem.
Tom Lane [Fri, 30 Apr 2010 22:24:50 +0000 (22:24 +0000)]
Update our information about OS X shared memory configuration: it's now
possible to set most of the SHM kernel parameters without a reboot.
Also, reorder the paragraph to explain the modern configuration method first.
There are probably not too many people who still care about how to do it on
OS X 10.3 or older.
Tom Lane [Fri, 30 Apr 2010 19:15:45 +0000 (19:15 +0000)]
Fix multiple memory leaks in PLy_spi_execute_fetch_result: it would leak
memory if the result had zero rows, and also if there was any sort of error
while converting the result tuples into Python data. Reported and partially
fixed by Andres Freund.
Back-patch to all supported versions. Note: I haven't tested the 7.4 fix.
7.4's configure check for python is so obsolete it doesn't work on my
current machines :-(. The logic change is pretty straightforward though.
Tom Lane [Fri, 30 Apr 2010 17:09:13 +0000 (17:09 +0000)]
Fix a couple of places where the result of fgets() wasn't checked.
This is mostly to suppress compiler warnings, although in principle
the cases could result in undesirable behavior.
Fix handling of b-tree reuse WAL records when hot standby is disabled,
and add missing code in btree_desc for them. This fixes the bug
with "tree_redo: unknown op code 208" error reported by Jaime Casanova.
Tom Lane [Thu, 29 Apr 2010 21:49:03 +0000 (21:49 +0000)]
Adjust error checks in pg_start_backup and pg_stop_backup to make it possible
to perform a backup without archive_mode being enabled. This gives up some
user-error protection in order to improve usefulness for streaming-replication
scenarios. Per discussion.
Tom Lane [Thu, 29 Apr 2010 21:36:19 +0000 (21:36 +0000)]
Rename the parameter recovery_connections to hot_standby, to reduce possible
confusion with streaming-replication settings. Also, change its default
value to "off", because of concern about executing new and poorly-tested
code during ordinary non-replicating operation. Per discussion.
In passing do some minor editing of related documentation.
Tom Lane [Thu, 29 Apr 2010 16:32:41 +0000 (16:32 +0000)]
Install a workaround for 'TeX capacity exceeded' problem
when building PDF output for recent versions of the documentation.
There is probably a better answer out there somewhere, but
we need something now so we can build beta releases.
Tom Lane [Wed, 28 Apr 2010 19:38:49 +0000 (19:38 +0000)]
Minor editorializing on pg_controldata and pg_resetxlog: adjust some message
wording, deal explicitly with some fields that were being silently left zero.
Tom Lane [Wed, 28 Apr 2010 16:54:16 +0000 (16:54 +0000)]
Modify ShmemInitStruct and ShmemInitHash to throw errors internally,
rather than returning NULL for some-but-not-all failures as they used to.
Remove now-redundant tests for NULL from call sites.
We had to do something about this because many call sites were failing to
check for NULL; and changing it like this seems a lot more useful and
mistake-proof than adding checks to the call sites without them.
Introduce wal_level GUC to explicitly control if information needed for
archival or hot standby should be WAL-logged, instead of deducing that from
other options like archive_mode. This replaces recovery_connections GUC in
the primary, where it now has no effect, but it's still used in the standby
to enable/disable hot standby.
Remove the WAL-logging of "unlogged operations", like creating an index
without WAL-logging and fsyncing it at the end. Instead, we keep a copy of
the wal_mode setting and the settings that affect how much shared memory a
hot standby server needs to track master transactions (max_connections,
max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings
change, at server restart, write a WAL record noting the new settings and
update pg_control. This allows us to notice the change in those settings in
the standby at the right moment, they used to be included in checkpoint
records, but that meant that a changed value was not reflected in the
standby until the first checkpoint after the change.
Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to
the sequence it used to follow, before hot standby and subsequent patches
changed it to 0x9003.
Tom Lane [Wed, 28 Apr 2010 02:04:16 +0000 (02:04 +0000)]
Modify the built-in text search parser to handle URLs more nearly according
to RFC 3986. In particular, these characters now terminate the path part
of a URL: '"', '<', '>', '\', '^', '`', '{', '|', '}'. The previous behavior
was inconsistent and depended on whether a "?" was present in the path.
Per gripe from Donald Fraser and spec research by Kevin Grittner.
This is a pre-existing bug, but not back-patching since the risks of
breaking existing applications seem to outweigh the benefits.
Tom Lane [Wed, 28 Apr 2010 00:09:05 +0000 (00:09 +0000)]
Replace the KnownAssignedXids hash table with a sorted-array data structure,
and be more tense about the locking requirements for it, to improve performance
in Hot Standby mode. In passing fix a few bugs and improve a number of
comments in the existing HS code.
If a base backup is cancelled by server shutdown or crash, throw an error
in WAL recovery when it sees the shutdown checkpoint record. It's more
user-friendly to find out about it at that point than at the end of
recovery, and you're not left wondering why your hot standby server never
opens up for read-only connections.
Robert Haas [Mon, 26 Apr 2010 10:52:00 +0000 (10:52 +0000)]
When we're restricting who can connect, don't allow new walsenders.
Normal superuser processes are allowed to connect even when the database
system is shutting down, or when fewer than superuser_reserved_connection
slots remain. This is intended to make sure an administrator can log in
and troubleshoot, so don't extend these same courtesies to users connecting
for replication.
Simon Riggs [Fri, 23 Apr 2010 22:23:39 +0000 (22:23 +0000)]
Add missing optimizer hooks for function cost and number of rows.
Closely follow design of other optimizer hooks: if hook exists
retrieve value from plugin; if still not set then get from cache.
Simon Riggs [Fri, 23 Apr 2010 19:57:19 +0000 (19:57 +0000)]
Make CheckRequiredParameterValues() depend upon correct combination
of parameters. Fix bug report by Robert Haas that error message and
hint was incorrect if wrong mode parameters specified on master.
Internal changes only. Proposals for parameter simplification on
master/primary still under way.
Simon Riggs [Thu, 22 Apr 2010 08:04:25 +0000 (08:04 +0000)]
Optimise btree delete processing when no active backends.
Clarify comments, downgrade a message to DEBUG and remove some
debug counters. Direct from ideas by Heikki Linnakangas.
Simon Riggs [Thu, 22 Apr 2010 02:15:45 +0000 (02:15 +0000)]
Further reductions in Hot Standby conflict processing. These
come from the realistion that HEAP2_CLEAN records don't
always remove user visible data, so conflict processing for
them can be skipped. Confirm validity using Assert checks,
clarify circumstances under which we log heap_cleanup_info
records. Tuning arises from bug fixing of earlier safety
check failures.
Fix encoding issue when lc_monetary or lc_numeric are different encoding
from lc_ctype, that could happen on Windows. We need to change lc_ctype
together with lc_monetary or lc_numeric, and convert strings in lconv
from lc_ctype encoding to the database encoding.
The bug reported by Mikko, original patch by Hiroshi Inoue,
with changes by Bruce and me.
Tom Lane [Wed, 21 Apr 2010 20:54:19 +0000 (20:54 +0000)]
Enforce superuser permissions checks during ALTER ROLE/DATABASE SET, rather
than during define_custom_variable(). This entails rejecting an ALTER
command if the target variable doesn't have a known (non-placeholder)
definition, unless the calling user is superuser. When the variable *is*
known, we can correctly apply the rule that only superusers can issue ALTER
for SUSET parameters. This allows define_custom_variable to apply ALTER's
values for SUSET parameters at module load time, secure in the knowledge
that only a superuser could have set the ALTER value. This change fixes a
longstanding gotcha in the usage of SUSET-level custom parameters; which
is a good thing to fix now that plpgsql defines such a parameter.
Simon Riggs [Wed, 21 Apr 2010 19:53:24 +0000 (19:53 +0000)]
Only send cleanup_info messages if VACUUM removes any tuples.
There is no other purpose for this message type than to report
the latestRemovedXid of removed tuples, prior to index scans.
Removes overlooked path for sending invalid latestRemovedXid.
Fixes buildfarm failure on centaur.