Simon Riggs [Tue, 8 Feb 2011 19:39:08 +0000 (19:39 +0000)]
Named restore points in recovery. Users can record named points, then
new recovery.conf parameter recovery_target_name allows PITR to
specify named points as recovery targets.
Jaime Casanova, reviewed by Euler Taveira de Oliveira, plus minor edits
Simon Riggs [Tue, 8 Feb 2011 18:30:22 +0000 (18:30 +0000)]
Basic Recovery Control functions for use in Hot Standby. Pause, Resume,
Status check functions only. Also, new recovery.conf parameter to
pause_at_recovery_target, default on.
Simon Riggs [Tue, 8 Feb 2011 14:38:02 +0000 (14:38 +0000)]
Remove rare corner case for data loss when triggering standby server.
If the standby was streaming when trigger file arrives, check also in the
archive for additional WAL files. This is a corner case since it is
unlikely that we would trigger a failover while the master is still
available and sending data to standby, while at the same time running in
archive mode and also while the streaming standby has fallen behind archive.
Someone would eventually be unlucky; we must plug all gaps however small.
Simon Riggs [Tue, 8 Feb 2011 12:23:20 +0000 (12:23 +0000)]
Extend ALTER TABLE to allow Foreign Keys to be added without initial validation.
FK constraints that are marked NOT VALID may later be VALIDATED, which uses an
ShareUpdateExclusiveLock on constraint table and RowShareLock on referenced
table. Significantly reduces lock strength and duration when adding FKs.
New state visible from psql.
Simon Riggs, with reviews from Marko Tiikkaja and Robert Haas
Robert Haas [Tue, 8 Feb 2011 03:04:29 +0000 (22:04 -0500)]
Avoid having autovacuum workers wait for relation locks.
Waiting for relation locks can lead to starvation - it pins down an
autovacuum worker for as long as the lock is held. But if we're doing
an anti-wraparound vacuum, then we still wait; maintenance can no longer
be put off.
To assist with troubleshooting, if log_autovacuum_min_duration >= 0,
we log whenever an autovacuum or autoanalyze is skipped for this reason.
Per a gripe by Josh Berkus, and ensuing discussion.
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.
To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.
A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.
Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.
We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.
Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.
Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
Andrew Dunstan [Sun, 6 Feb 2011 22:29:26 +0000 (17:29 -0500)]
Force strings passed to and from plperl to be in UTF8 encoding.
String are converted to UTF8 on the way into perl and to the
database encoding on the way back. This avoids a number of
observed anomalies, and ensures Perl a consistent view of the
world.
Bruce Momjian [Sun, 6 Feb 2011 15:46:15 +0000 (10:46 -0500)]
Rename macro DECIMAL to DECIMAL_T to help pgindent; this is already
done for a few other macros in that file, for other reasons. I also
remove pgindent/README mention of the file.
Robert Haas [Sun, 6 Feb 2011 05:26:27 +0000 (00:26 -0500)]
Tighten ALTER FOREIGN TABLE .. SET DATA TYPE checks.
If the foreign table's rowtype is being used as the type of a column in
another table, we can't just up and change its data type. This was
already checked for composite types and ordinary tables, but we
previously failed to enforce it for foreign tables.
Robert Haas [Fri, 4 Feb 2011 21:14:54 +0000 (16:14 -0500)]
Clarify comment in ATRewriteTable().
Make sure it's clear that the prohibition on adding a column with a default
when the rowtype is used elsewhere is intentional, and be a bit more
explicit about the other cases where we perform this check.
Robert Haas [Fri, 4 Feb 2011 14:28:06 +0000 (09:28 -0500)]
Make handling of errcodes.h more consistent with other generated headers.
This fixes make distprep, and seems more robust in other ways as well.
Some special handling is required because errcodes.txt is needed by
some stuff in src/port, but just by src/backend as is the case for the
other generated headers.
While I'm at it, fix a few other things that were overlooked in the
original patch.
Robert Haas [Fri, 4 Feb 2011 03:32:49 +0000 (22:32 -0500)]
Avoid maintaining three separate copies of the error codes list.
src/pl/plpgsql/src/plerrcodes.h, src/include/utils/errcodes.h, and a
big chunk of errcodes.sgml are now automatically generated from a single
file, src/backend/utils/errcodes.txt.
Magnus Hagander [Thu, 3 Feb 2011 12:46:23 +0000 (13:46 +0100)]
Include more status information in walsender results
Add the current xlog insert location to the response of
IDENTIFY_SYSTEM, and adds result sets containing start
and stop location of backups to BASE_BACKUP responses.
Robert Haas [Thu, 3 Feb 2011 02:08:53 +0000 (21:08 -0500)]
Log restartpoints in the same fashion as checkpoints.
Prior to 9.0, restartpoints never created, deleted, or recycled WAL
files, but now they can. This code makes log_checkpoints treat
checkpoints and restartpoints symmetrically. It also adjusts up
the documentation of the parameter to mention restartpoints.
Fujii Masao. Docs by me, as suggested by Itagaki Takahiro.
Bruce Momjian [Tue, 1 Feb 2011 21:43:51 +0000 (16:43 -0500)]
Clarify documentation to state that "zero_damaged_pages" does not force
data to disk, so the table or index should be recreated before the
parameter is turned off again.
Andrew Dunstan [Tue, 1 Feb 2011 14:43:25 +0000 (09:43 -0500)]
Set up PLPerl trigger data using C code instead of Perl code.
This is an efficiency change, and means we now no longer have to run
"out $_TD; local $_TD = shift;", which was especially pointless in the case of
non-trigger functions where the passed value was always undef anyway.
A tiny open issue is whether we should get rid of the $prolog argument of
mkfunc, and the corresponding pushed value, which is now just a constant "false".
Magnus Hagander [Tue, 1 Feb 2011 12:19:18 +0000 (13:19 +0100)]
Undefine setlocale() macro on Win32
New versions of libintl redefine setlocale() to a macro
which causes problems when the backend and libintl are
linked against different versions of the runtime, which
is often the case in msvc builds.
Tom Lane [Tue, 1 Feb 2011 02:33:55 +0000 (21:33 -0500)]
Support LIKE and ILIKE index searches via contrib/pg_trgm indexes.
Unlike Btree-based LIKE optimization, this works for non-left-anchored
search patterns. The effectiveness of the search depends on how many
trigrams can be extracted from the pattern. (The worst case, with no
trigrams, degrades to a full-table scan, so this isn't a panacea. But
it can be very useful.)
Simon Riggs [Tue, 1 Feb 2011 00:20:53 +0000 (00:20 +0000)]
Create new errcode for recovery conflict caused by db drop on master.
Previously reported as ERRCODE_ADMIN_SHUTDOWN, this case is now
reported as ERRCODE_T_R_DATABASE_DROPPED. No message text change.
Unlikely to happen on most servers, so low impact change to allow
session poolers to correctly handle this situation.
Bruce Momjian [Mon, 31 Jan 2011 20:21:51 +0000 (15:21 -0500)]
Update pg_upgrade docs to mention its use in a less risk-warning way,
and update the pg_upgrade docs to mention its reliance on no changes to
the storage format (the later based on Robert Haas's patch).
Simon Riggs [Mon, 31 Jan 2011 19:20:23 +0000 (19:20 +0000)]
Fix error code for canceling statement due to conflict with recovery.
All retryable conflict errors now have an error code that indicates that
a retry is possible, correcting my incomplete fix of 2010/05/12
Tatsuo Ishii and Simon Riggs, input from Robert Haas and Florian Pflug
Andrew Dunstan [Mon, 31 Jan 2011 18:40:45 +0000 (13:40 -0500)]
Update docs on building for Windows to accomodate current reality.
Document how to build 64 bit Windows binaries using the MinGW64 tool set.
Remove recommendation against using Mingw as a build platform.
Be more specific about when Cygwin is useful and when it's not, in
particular note its usefulness for running psql, and
add a note about building on Cygwin in non-C locales.
Bruce Momjian [Mon, 31 Jan 2011 17:32:03 +0000 (12:32 -0500)]
Move upgrade instructions into its own section under "Server Setup and
Operation", merged from upgrade sections in "Installation from Source
Code" and "Backup and Restore". This now gives a single place for all
upgrade information.
Support multiple concurrent pg_basebackup backups.
With this patch, pg_basebackup doesn't write a backup_label file in the
data directory, so it doesn't interfere with a pg_start/stop_backup() based
backup anymore. backup_label is still included in the backup, but it is
injected directly into the tar stream.
Heikki Linnakangas, reviewed by Fujii Masao and Magnus Hagander.
Andrew Dunstan [Mon, 31 Jan 2011 00:56:46 +0000 (19:56 -0500)]
Enable building with the Mingw64 compiler.
This can be used to build 64 bit Windows binaries, not only on 64 bit
Windows but on supported cross-compiling hosts including 32 bit Windows,
Cygwin, Darwin and Linux.
Tom Lane [Sun, 30 Jan 2011 22:04:31 +0000 (17:04 -0500)]
Make reduce_outer_joins() smarter about semijoins.
reduce_outer_joins() mistakenly treated a semijoin like a left join for
purposes of deciding whether not-null constraints created by the join's
quals could be passed down into the join's left-hand side (possibly
resulting in outer-join simplification there). Actually, semijoin works
like inner join for this purpose, ie, we do not need to see any rows that
can't possibly satisfy the quals. Hence, two-line fix to treat semi and
inner joins alike. Per observation by Andres Freund about a performance
gripe from Yazan Suleiman.
Back-patch to 8.4, since this oversight has been there since the current
handling of semijoins was implemented.
Magnus Hagander [Sun, 30 Jan 2011 20:30:09 +0000 (21:30 +0100)]
Add option to include WAL in base backup
When included, this makes the base backup a complete working
"clone" of the initial database, ready to have a postmaster
started against it without the need to set up any log archiving
or similar.
Magnus Hagander, reviewed by Fujii Masao and Heikki Linnakangas
Tom Lane [Sat, 29 Jan 2011 17:51:44 +0000 (12:51 -0500)]
Make installation.sgml build standalone again.
We must not try to link to sections that aren't part of the standalone
"make INSTALL" build. Corrects build failure introduced in commit 159e3d86292cfec2a2828f9f69ac7a6cb1be242d.
Robert Haas [Sat, 29 Jan 2011 13:08:41 +0000 (08:08 -0500)]
Try to avoid running with a full fsync request queue.
When we need to insert a new entry and the queue is full, compact the
entire queue in the hopes of making room for the new entry. Doing this
on every insertion might worsen contention on BgWriterCommLock, but
when the queue it's full, it's far better than allowing the backend to
perform its own fsync, per testing by Greg Smith as reported in
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02665.php
Original idea from Greg Smith. Patch by me. Review by Chris Browne
and Greg Smith
Tom Lane [Fri, 28 Jan 2011 19:34:42 +0000 (14:34 -0500)]
Copy-edit a paragraph in the contrib/seg documentation.
Although this improves the style, an ulterior motive is to keep the two
table links from breaking across lines in PDF output, per complaint from
Josh Kupershmidt.
Tom Lane [Thu, 27 Jan 2011 23:42:12 +0000 (18:42 -0500)]
Rephrase pg_conversion description to avoid splitting link across page.
The link to the CREATE CONVERSION manual page was split across a page
boundary in the PDF output, leading to "\pdfendlink ended up in different
nesting level than \pdfstartlink" error while building PDFs.
It wouldn't be worth changing text that's undergoing active editing to
avoid this, since other editing might result in moving the link away from
the page end anyway. But this paragraph has been static for a long time,
so might as well fix it to prevent it from being an issue in future.
Tom Lane [Thu, 27 Jan 2011 22:41:41 +0000 (17:41 -0500)]
Prevent buffer overrun while parsing an integer in a "query_int" value.
contrib/intarray's gettoken() uses a fixed-size buffer to collect an
integer's digits, and did not guard against overrunning the buffer.
This is at least a backend crash risk, and in principle might allow
arbitrary code execution. The code didn't check for overflow of the
integer value either, which while not presenting a crash risk was still
bad.
Thanks to Apple Inc's security team for reporting this issue and supplying
the fix.
Tom Lane [Thu, 27 Jan 2011 21:27:27 +0000 (16:27 -0500)]
Don't include <asm/ia64regs.h> unnecessarily.
We only need that header when compiling with icc, since the gcc variant of
ia64_get_bsp() uses in-line assembly code. Per report from Frank Brendel,
the header doesn't exist on all IA64 platforms; so don't include it unless
we need it.
Robert Haas [Thu, 27 Jan 2011 13:35:34 +0000 (08:35 -0500)]
Restore ALTER TABLE .. ADD COLUMN w/DEFAULT restriction.
This reverts commit a06e41deebdf74b8b5109329dc75b2e9d9057962 of 2011-01-26.
Per discussion, this behavior is not wanted, as it would need to change if
we ever made composite types support DEFAULT.
Tom Lane [Thu, 27 Jan 2011 00:33:50 +0000 (19:33 -0500)]
Change inv_truncate() to not repeat its systable_getnext_ordered() scan.
In the case where the initial call of systable_getnext_ordered() returned
NULL, this function would nonetheless call it again. That's undefined
behavior that only by chance failed to not give visibly incorrect results.
Put an if-test around the final loop to prevent that, and in passing
improve some comments. No back-patch since there's no actual failure.