Send status updates back from standby server to master, indicating how far
the standby has written, flushed, and applied the WAL. At the moment, this
is for informational purposes only, the values are only shown in
pg_stat_replication system view, but in the future they will also be needed
for synchronous replication.
Extracted from Simon riggs' synchronous replication patch by Robert Haas, with
some tweaking by me.
Magnus Hagander [Thu, 10 Feb 2011 14:09:35 +0000 (15:09 +0100)]
Track last time for statistics reset on databases and bgwriter
Tracks one counter for each database, which is reset whenever
the statistics for any individual object inside the database is
reset, and one counter for the background writer.
Tom Lane [Thu, 10 Feb 2011 04:27:07 +0000 (23:27 -0500)]
Fix improper matching of resjunk column names for FOR UPDATE in subselect.
Flattening of subquery range tables during setrefs.c could lead to the
rangetable indexes in PlanRowMark nodes not matching up with the column
names previously assigned to the corresponding resjunk ctid (resp. tableoid
or wholerow) columns. Typical symptom would be either a "cannot extract
system attribute from virtual tuple" error or an Assert failure. This
wasn't a problem before 9.0 because we didn't support FOR UPDATE below the
top query level, and so the final flattening could never renumber an RTE
that was relevant to FOR UPDATE. Fix by using a plan-tree-wide unique
number for each PlanRowMark to label the associated resjunk columns, so
that the number need not change during flattening.
Per report from David Johnston (though I'm darned if I can see how this got
past initial testing of the relevant code). Back-patch to 9.0.
Tom Lane [Thu, 10 Feb 2011 00:17:33 +0000 (19:17 -0500)]
Fix pg_upgrade to handle extensions.
This follows my proposal of yesterday, namely that we try to recreate the
previous state of the extension exactly, instead of allowing CREATE
EXTENSION to run a SQL script that might create some entirely-incompatible
on-disk state. In --binary-upgrade mode, pg_dump won't issue CREATE
EXTENSION at all, but instead uses a kluge function provided by
pg_upgrade_support to recreate the pg_extension row (and extension-level
pg_depend entries) without creating any member objects. The member objects
are then restored in the same way as if they weren't members, in particular
using pg_upgrade's normal hacks to preserve OIDs that need to be preserved.
Then, for each member object, ALTER EXTENSION ADD is issued to recreate the
pg_depend entry that marks it as an extension member.
In passing, fix breakage in pg_upgrade's enum-type support: somebody didn't
fix it when the noise word VALUE got added to ALTER TYPE ADD. Also,
rationalize parsetree representation of COMMENT ON DOMAIN and fix
get_object_address() to allow OBJECT_DOMAIN.
Tom Lane [Wed, 9 Feb 2011 19:05:34 +0000 (14:05 -0500)]
Rethink order of operations for dumping extension member objects.
My original idea of doing extension member identification during
getDependencies() didn't work correctly: we have to mark member tables as
not-to-be-dumped rather earlier than that, else their subsidiary objects
like indexes get dumped anyway. Rearrange code to mark them early enough.
Tom Lane [Wed, 9 Feb 2011 16:55:32 +0000 (11:55 -0500)]
Implement "ALTER EXTENSION ADD object".
This is an essential component of making the extension feature usable;
first because it's needed in the process of converting an existing
installation containing "loose" objects of an old contrib module into
the extension-based world, and second because we'll have to use it
in pg_dump --binary-upgrade, as per recent discussion.
Loosely based on part of Dimitri Fontaine's ALTER EXTENSION UPGRADE
patch.
Fix allocation of RW-conflict pool in the new predicate lock manager, and
also take the RW-conflict pool into account in the PredicateLockShmemSize()
estimate.
Magnus Hagander [Wed, 9 Feb 2011 09:59:53 +0000 (10:59 +0100)]
Implement NOWAIT option for BASE_BACKUP command
Specifying this option makes the server not wait for the
xlog to be archived, or emit a warning that it can't,
instead leaving the responsibility with the client.
This is useful when the log is being streamed using
the streaming protocol in parallel with the backup,
without having log archiving enabled.
Tom Lane [Tue, 8 Feb 2011 23:12:17 +0000 (18:12 -0500)]
Suppress some compiler warnings in recent commits.
Older versions of gcc tend to throw "variable might be clobbered by
`longjmp' or `vfork'" warnings whenever a variable is assigned in more than
one place and then used after the end of a PG_TRY block. That's reasonably
easy to work around in execute_extension_script, and the overhead of
unconditionally saving/restoring the GUC variables seems unlikely to be a
serious concern.
Also clean up logic in ATExecValidateConstraint to make it easier to read
and less likely to provoke "variable might be used uninitialized in this
function" warnings.
Tom Lane [Tue, 8 Feb 2011 21:08:41 +0000 (16:08 -0500)]
Core support for "extensions", which are packages of SQL objects.
This patch adds the server infrastructure to support extensions.
There is still one significant loose end, namely how to make it play nice
with pg_upgrade, so I am not yet committing the changes that would make
all the contrib modules depend on this feature.
In passing, fix a disturbingly large amount of breakage in
AlterObjectNamespace() and callers.
Dimitri Fontaine, reviewed by Anssi Kääriäinen,
Itagaki Takahiro, Tom Lane, and numerous others
Simon Riggs [Tue, 8 Feb 2011 19:39:08 +0000 (19:39 +0000)]
Named restore points in recovery. Users can record named points, then
new recovery.conf parameter recovery_target_name allows PITR to
specify named points as recovery targets.
Jaime Casanova, reviewed by Euler Taveira de Oliveira, plus minor edits
Simon Riggs [Tue, 8 Feb 2011 18:30:22 +0000 (18:30 +0000)]
Basic Recovery Control functions for use in Hot Standby. Pause, Resume,
Status check functions only. Also, new recovery.conf parameter to
pause_at_recovery_target, default on.
Simon Riggs [Tue, 8 Feb 2011 14:38:02 +0000 (14:38 +0000)]
Remove rare corner case for data loss when triggering standby server.
If the standby was streaming when trigger file arrives, check also in the
archive for additional WAL files. This is a corner case since it is
unlikely that we would trigger a failover while the master is still
available and sending data to standby, while at the same time running in
archive mode and also while the streaming standby has fallen behind archive.
Someone would eventually be unlucky; we must plug all gaps however small.
Simon Riggs [Tue, 8 Feb 2011 12:23:20 +0000 (12:23 +0000)]
Extend ALTER TABLE to allow Foreign Keys to be added without initial validation.
FK constraints that are marked NOT VALID may later be VALIDATED, which uses an
ShareUpdateExclusiveLock on constraint table and RowShareLock on referenced
table. Significantly reduces lock strength and duration when adding FKs.
New state visible from psql.
Simon Riggs, with reviews from Marko Tiikkaja and Robert Haas
Robert Haas [Tue, 8 Feb 2011 03:04:29 +0000 (22:04 -0500)]
Avoid having autovacuum workers wait for relation locks.
Waiting for relation locks can lead to starvation - it pins down an
autovacuum worker for as long as the lock is held. But if we're doing
an anti-wraparound vacuum, then we still wait; maintenance can no longer
be put off.
To assist with troubleshooting, if log_autovacuum_min_duration >= 0,
we log whenever an autovacuum or autoanalyze is skipped for this reason.
Per a gripe by Josh Berkus, and ensuing discussion.
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.
To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.
A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.
Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.
We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.
Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.
Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
Andrew Dunstan [Sun, 6 Feb 2011 22:29:26 +0000 (17:29 -0500)]
Force strings passed to and from plperl to be in UTF8 encoding.
String are converted to UTF8 on the way into perl and to the
database encoding on the way back. This avoids a number of
observed anomalies, and ensures Perl a consistent view of the
world.
Bruce Momjian [Sun, 6 Feb 2011 15:46:15 +0000 (10:46 -0500)]
Rename macro DECIMAL to DECIMAL_T to help pgindent; this is already
done for a few other macros in that file, for other reasons. I also
remove pgindent/README mention of the file.
Robert Haas [Sun, 6 Feb 2011 05:26:27 +0000 (00:26 -0500)]
Tighten ALTER FOREIGN TABLE .. SET DATA TYPE checks.
If the foreign table's rowtype is being used as the type of a column in
another table, we can't just up and change its data type. This was
already checked for composite types and ordinary tables, but we
previously failed to enforce it for foreign tables.
Robert Haas [Fri, 4 Feb 2011 21:14:54 +0000 (16:14 -0500)]
Clarify comment in ATRewriteTable().
Make sure it's clear that the prohibition on adding a column with a default
when the rowtype is used elsewhere is intentional, and be a bit more
explicit about the other cases where we perform this check.
Robert Haas [Fri, 4 Feb 2011 14:28:06 +0000 (09:28 -0500)]
Make handling of errcodes.h more consistent with other generated headers.
This fixes make distprep, and seems more robust in other ways as well.
Some special handling is required because errcodes.txt is needed by
some stuff in src/port, but just by src/backend as is the case for the
other generated headers.
While I'm at it, fix a few other things that were overlooked in the
original patch.
Robert Haas [Fri, 4 Feb 2011 03:32:49 +0000 (22:32 -0500)]
Avoid maintaining three separate copies of the error codes list.
src/pl/plpgsql/src/plerrcodes.h, src/include/utils/errcodes.h, and a
big chunk of errcodes.sgml are now automatically generated from a single
file, src/backend/utils/errcodes.txt.
Magnus Hagander [Thu, 3 Feb 2011 12:46:23 +0000 (13:46 +0100)]
Include more status information in walsender results
Add the current xlog insert location to the response of
IDENTIFY_SYSTEM, and adds result sets containing start
and stop location of backups to BASE_BACKUP responses.
Robert Haas [Thu, 3 Feb 2011 02:08:53 +0000 (21:08 -0500)]
Log restartpoints in the same fashion as checkpoints.
Prior to 9.0, restartpoints never created, deleted, or recycled WAL
files, but now they can. This code makes log_checkpoints treat
checkpoints and restartpoints symmetrically. It also adjusts up
the documentation of the parameter to mention restartpoints.
Fujii Masao. Docs by me, as suggested by Itagaki Takahiro.
Bruce Momjian [Tue, 1 Feb 2011 21:43:51 +0000 (16:43 -0500)]
Clarify documentation to state that "zero_damaged_pages" does not force
data to disk, so the table or index should be recreated before the
parameter is turned off again.
Andrew Dunstan [Tue, 1 Feb 2011 14:43:25 +0000 (09:43 -0500)]
Set up PLPerl trigger data using C code instead of Perl code.
This is an efficiency change, and means we now no longer have to run
"out $_TD; local $_TD = shift;", which was especially pointless in the case of
non-trigger functions where the passed value was always undef anyway.
A tiny open issue is whether we should get rid of the $prolog argument of
mkfunc, and the corresponding pushed value, which is now just a constant "false".
Magnus Hagander [Tue, 1 Feb 2011 12:19:18 +0000 (13:19 +0100)]
Undefine setlocale() macro on Win32
New versions of libintl redefine setlocale() to a macro
which causes problems when the backend and libintl are
linked against different versions of the runtime, which
is often the case in msvc builds.
Tom Lane [Tue, 1 Feb 2011 02:33:55 +0000 (21:33 -0500)]
Support LIKE and ILIKE index searches via contrib/pg_trgm indexes.
Unlike Btree-based LIKE optimization, this works for non-left-anchored
search patterns. The effectiveness of the search depends on how many
trigrams can be extracted from the pattern. (The worst case, with no
trigrams, degrades to a full-table scan, so this isn't a panacea. But
it can be very useful.)
Simon Riggs [Tue, 1 Feb 2011 00:20:53 +0000 (00:20 +0000)]
Create new errcode for recovery conflict caused by db drop on master.
Previously reported as ERRCODE_ADMIN_SHUTDOWN, this case is now
reported as ERRCODE_T_R_DATABASE_DROPPED. No message text change.
Unlikely to happen on most servers, so low impact change to allow
session poolers to correctly handle this situation.
Bruce Momjian [Mon, 31 Jan 2011 20:21:51 +0000 (15:21 -0500)]
Update pg_upgrade docs to mention its use in a less risk-warning way,
and update the pg_upgrade docs to mention its reliance on no changes to
the storage format (the later based on Robert Haas's patch).