Don't throw a warning if vacuum sees PD_ALL_VISIBLE flag set on a page that
contains newly-inserted tuples that according to our OldestXmin are not
yet visible to everyone. The value returned by GetOldestXmin() is conservative,
and it can move backwards on repeated calls, so if we see that contradiction
between the PD_ALL_VISIBLE flag and status of tuples on the page, we have to
assume it's because an earlier vacuum calculated a higher OldestXmin value,
and all the tuples really are visible to everyone.
We have received several reports of this bug, with the "PD_ALL_VISIBLE flag
was incorrectly set in relation ..." warning appearing in logs. We were
finally able to hunt it down with David Gould's help to run extra diagnostics
in an environment where this happened frequently.
Also reword the warning, per Robert Haas' suggestion, to not imply that the
PD_ALL_VISIBLE flag is necessarily at fault, as it might also be a symptom
of corruption on a tuple header.
Backpatch to 8.4, where the PD_ALL_VISIBLE flag was introduced.
Truncate predicate lock manager's SLRU lazily at checkpoint. That's safer
than doing it aggressively whenever the tail-XID pointer is advanced, because
this way we don't need to do it while holding SerializableXactHashLock.
This also fixes bug #5915 spotted by YAMAMOTO Takashi, and removes an
obsolete comment spotted by Kevin Grittner.
This improves reporting, as the error string now includes the actual
Python exception. As a side effect, this no longer sets the errcode to
ERRCODE_DATA_EXCEPTION, which might be considered a feature, as it's
not documented and not clear why iterator errors should be treated
differently.
If recovery_target_timeline is set to 'latest' and standby mode is enabled,
periodically rescan the archive for new timelines, while waiting for new WAL
segments to arrive. This allows you to set up a standby server that follows
the TLI change if another standby server is promoted to master. Before this,
you had to restart the standby server to make it notice the new timeline.
This patch only scans the archive for TLI changes, it won't follow a TLI
change in streaming replication. That is much needed too, but it would be a
much bigger patch than I dare to sneak in this late in the release cycle.
There was discussion on improving the sanity checking of the WAL segments so
that the system would notice more reliably if the new timeline isn't an
ancestor of the current one, but that is not included in this patch.
Tom Lane [Mon, 7 Mar 2011 16:17:06 +0000 (11:17 -0500)]
Zero out vacuum_count and related counters in pgstat_recv_tabstat().
This fixes an oversight in commit 946045f04d11d246a834b917a2b8bc6e4f884a37
of 2010-08-21, as reported by Itagaki Takahiro. Also a couple of minor
cosmetic adjustments.
Tom Lane [Mon, 7 Mar 2011 02:15:48 +0000 (21:15 -0500)]
Suppress some "variable might be clobbered by longjmp" warnings.
Seen with an older gcc version. I'm not sure these represent any real
risk factor, but still a bit scary. Anyway we have lots of other
volatile-marked variables in this code, so a couple more won't hurt.
Simon Riggs [Sun, 6 Mar 2011 22:49:16 +0000 (22:49 +0000)]
Efficient transaction-controlled synchronous replication.
If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.
This version has very strict behaviour; more relaxed options
may be added at a later date.
Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime
Casanova, Heikki Linnakangas and Robert Haas, plus the assistance
of many other design reviewers.
Tom Lane [Sun, 6 Mar 2011 17:10:50 +0000 (12:10 -0500)]
Fix incorrect access to pg_index.indcollation.
Since this field is after a variable-length field, it can't simply be
accessed via the C struct for pg_index. Fortunately, the relcache already
did the dirty work of pulling the information out to where it can be
accessed easily, so this is a one-line fix.
Tom Lane [Sat, 5 Mar 2011 20:13:15 +0000 (15:13 -0500)]
Make plpythonu language use plpython2 shared library directly.
The original scheme for this was to symlink plpython.$DLSUFFIX to
plpython2.$DLSUFFIX, but that doesn't work on Windows, and only
accidentally failed to fail because of the way that CREATE LANGUAGE created
or didn't create new C functions. My changes of yesterday exposed the
weakness of that approach. To fix, get rid of the symlink and make
pg_pltemplate show what's really going on.
Tom Lane [Sat, 5 Mar 2011 19:03:06 +0000 (14:03 -0500)]
Convert createlang/droplang to use CREATE/DROP EXTENSION.
In createlang this is a one-line change. In droplang there's a whole
lot of cruft that can be discarded since the extension mechanism now
manages removal of the language's support functions.
Also, add deprecation notices to these two programs' reference pages,
since per discussion we may toss them overboard altogether in a release
or two.
Tom Lane [Sat, 5 Mar 2011 02:51:14 +0000 (21:51 -0500)]
Create extension infrastructure for the core procedural languages.
This mostly just involves creating control, install, and
update-from-unpackaged scripts for them. However, I had to adjust plperl
and plpython to not share the same support functions between variants,
because we can't put the same function into multiple extensions.
catversion bump forced due to new contents of pg_pltemplate, and because
initdb now installs plpgsql as an extension not a bare language.
Add support for regression testing these as extensions not bare
languages.
Fix a couple of other issues that popped up while testing this: my initial
hack at pg_dump binary-upgrade support didn't work right, and we don't want
an extra schema permissions test after all.
Documentation changes still to come, but I'm committing now to see
whether the MSVC build scripts need work (likely they do).
Don't allow CREATE TABLE AS to create a column with invalid collation
It is possible that an expression ends up with a collatable type but
without a collation. CREATE TABLE AS could then create a table based
on that. But such a column cannot be dumped with valid SQL syntax, so
we disallow creating such a column.
Tom Lane [Fri, 4 Mar 2011 21:08:24 +0000 (16:08 -0500)]
Allow non-superusers to create (some) extensions.
Remove the unconditional superuser permissions check in CREATE EXTENSION,
and instead define a "superuser" extension property, which when false
(not the default) skips the superuser permissions check. In this case
the calling user only needs enough permissions to execute the commands
in the extension's installation script. The superuser property is also
enforced in the same way for ALTER EXTENSION UPDATE cases.
In other ALTER EXTENSION cases and DROP EXTENSION, test ownership of
the extension rather than superuserness. ALTER EXTENSION ADD/DROP needs
to insist on ownership of the target object as well; to do that without
duplicating code, refactor comment.c's big switch for permissions checks
into a separate function in objectaddress.c.
I also removed the superuserness checks in pg_available_extensions and
related functions; there's no strong reason why everybody shouldn't
be able to see that info.
Also invent an IF NOT EXISTS variant of CREATE EXTENSION, and use that
in pg_dump, so that dumps won't fail for installed-by-default extensions.
We don't have any of those yet, but we will soon.
This is all per discussion of wrapping the standard procedural languages
into extensions. I'll make those changes in a separate commit; this is
just putting the core infrastructure in place.
Tom Lane [Fri, 4 Mar 2011 16:38:45 +0000 (11:38 -0500)]
In initialize_SSL, don't fail unnecessarily when home dir is unavailable.
Instead, just act as though the certificate file(s) are not present.
There is only one case where this need be a hard failure condition: when
sslmode is verify-ca or verify-full, not having a root cert file is an
error. Change the logic so that we complain only in that case, and
otherwise fall through cleanly. This is how it used to behave pre-9.0,
but my patch 4ed4b6c54e5fab24ab2624d80e26f7546edc88ad of 2010-05-26 broke
the case. Per report from Christian Kastner.
Tom Lane [Thu, 3 Mar 2011 20:55:47 +0000 (15:55 -0500)]
Further refine patch for commenting operator implementation functions.
Instead of manually maintaining the "implementation of XXX operator"
comments in pg_proc.h, delete all those entries and let initdb create
them via a join. To let initdb figure out which name to use when there
is a conflict, change the comments for deprecated operators to say they
are deprecated --- which seems like a good thing to do anyway.
Tom Lane [Thu, 3 Mar 2011 18:22:18 +0000 (13:22 -0500)]
Fix citext's upgrade-from-unpackaged script to set its collation correctly.
Although there remains some debate about how CREATE TYPE should represent
the collation property, this doesn't really affect what we need to do in
citext's script, so go ahead and fix that.
Tom Lane [Thu, 3 Mar 2011 18:03:34 +0000 (13:03 -0500)]
Run a portal's cleanup hook immediately when pushing it to DONE state.
This works around the problem noted by Yamamoto Takashi in bug #5906,
that there were code paths whereby we could reach AtCleanup_Portals
with a portal's cleanup hook still unexecuted. The changes I made
a few days ago were intended to prevent that from happening, and
I think that on balance it's still a good thing to avoid, so I don't
want to remove the Assert in AtCleanup_Portals. Hence do this instead.
Tom Lane [Thu, 3 Mar 2011 06:33:19 +0000 (01:33 -0500)]
Mark operator implementation functions as such in their comments.
Historically, we've not had separate comments for built-in pg_operator
entries, but relied on the comments for the underlying functions. The
trouble with this approach is that there isn't much of anything to suggest
to users that they'd be better off using the operators instead. So, move
all the relevant comments into pg_operator, and give each underlying
function a comment that just says "implementation of XXX operator".
There are only about half a dozen cases where it seems reasonable to use
the underlying function interchangeably with the operator; in these cases
I left the same comment in place on the function as on the operator.
While at it, establish a policy that every built-in function and operator
entry should have a comment: there are now queries in the opr_sanity
regression test that will complain if one doesn't. This only required
adding a dozen or two more entries than would have been there anyway.
I also spent some time trying to eliminate gratuitous inconsistencies in
the style of the comments, though it's hopeless to suppose that more won't
creep in soon enough.
Mapped to NetBSD, the closest existing match. (Even though DragonFly
BSD is derived from FreeBSD, the shared library version numbering
matches NetBSD, and the rest is mostly the same among all BSD
variants.)
Tom Lane [Wed, 2 Mar 2011 16:39:18 +0000 (11:39 -0500)]
Fix erroneous documentation of the syntax of CREATE CONSTRAINT TRIGGER.
The grammar requires a specific ordering of the clauses, but the
documentation showed a different order. This error was introduced in
commit b47953f9c69d48a9261bd643e3170017b93f6337, which merged the CREATE
CONSTRAINT TRIGGER documentation into the CREATE TRIGGER page. There is
no code bug AFAICS.
Tom Lane [Wed, 2 Mar 2011 16:17:03 +0000 (11:17 -0500)]
Correct mistaken claims about EXPLAIN ANALYZE's handling of triggers.
Time spent executing AFTER triggers is not included in the runtime of the
associated ModifyTable node; in my patch of yesterday I confused queuing of
these triggers with their actual execution. Spotted by Marko Tiikkaja.
Change pg_last_xlog_receive_location() not to move backwards. That makes
it a lot more useful for determining which standby is most up-to-date,
for example. There was long discussions on whether overwriting existing
existing WAL makes sense to begin with, and whether we should do some more
extensive variable renaming, but this change nevertheless seems quite
uncontroversial.
Fujii Masao, reviewed by Jeff Janes, Robert Haas, Stephen Frost.
Change the way UPDATEs are handled. Instead of maintaining a chain of
tuple-level locks in shared memory, copy any existing locks on the old
tuple to the new tuple at UPDATE. Any existing page-level lock needs to
be duplicated too, as a lock on the new tuple. That was neglected
previously.
Store xmin on tuple-level predicate locks, to distinguish a lock on an old
already-recycled tuple from a new tuple at the same physical location.
Failure to distinguish them caused loops in the tuple-lock chains, as
reported by YAMAMOTO Takashi. Although we don't use the chain representation
of UPDATEs anymore, it seems like a good idea to store the xmin to avoid
some false positives if no other reason.
CheckSingleTargetForConflictsIn now correctly handles the case where a lock
that's being held is not reflected in the local lock table. That happens
if another backend acquires a lock on our behalf due to an UPDATE or a page
split.
PredicateLockPageCombine now retains locks for the page that is being
removed, rather than removing them. This prevents a potentially dangerous
false-positive inconsistency where the local lock table believes that a lock
is held, but it is actually not.
Tom Lane [Tue, 1 Mar 2011 16:32:13 +0000 (11:32 -0500)]
Include the target table in EXPLAIN output for ModifyTable nodes.
Per discussion, this seems important for plans involving writable CTEs,
since there can now be more than one ModifyTable node in the plan.
To retain the same formatting as for target tables of scan nodes, we
show only one target table, which will be the parent table in case of
an UPDATE or DELETE on an inheritance tree. Individual child tables
can be determined by inspecting the child plan trees if needed.
Robert Haas [Tue, 1 Mar 2011 16:32:23 +0000 (11:32 -0500)]
Avoid excessive Hot Standby feedback messages.
Without this patch, when wal_receiver_status_interval=0, indicating that no
status messages should be sent, Hot Standby feedback messages are instead sent
extremely frequently.
Tom Lane [Tue, 1 Mar 2011 04:27:18 +0000 (23:27 -0500)]
Rearrange snapshot handling to make rule expansion more consistent.
With this patch, portals, SQL functions, and SPI all agree that there
should be only a CommandCounterIncrement between the queries that are
generated from a single SQL command by rule expansion. Fetching a whole
new snapshot now happens only between original queries. This is equivalent
to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the
best choice since it eliminates one source of concurrency hazards for
rules. The patch should also make things marginally faster by reducing the
number of snapshot push/pop operations.
The patch removes pg_parse_and_rewrite(), which is no longer used anywhere.
There was considerable discussion about more aggressive refactoring of the
query-processing functions exported by postgres.c, but for the moment
nothing more has been done there.
I also took the opportunity to refactor snapmgr.c's API slightly: the
former PushUpdatedSnapshot() has been split into two functions.
Marko Tiikkaja, reviewed by Steve Singer and Tom Lane
Peter Eisentraut [Mon, 28 Feb 2011 16:41:10 +0000 (18:41 +0200)]
PL/Python custom SPI exceptions
This provides a separate exception class for each error code that the
backend defines, as well as the ability to get the SQLSTATE from the
exception object.
Peter Eisentraut [Sun, 27 Feb 2011 12:00:19 +0000 (14:00 +0200)]
Remove remaining expected file for Python 2.2
We don't have complete expected coverage for Python 2.2 anyway, so it
doesn't seem worth keeping this one around that no one appears to be
updating anyway. Visual inspection of the differences ought to be
good enough for those few who care about this obsolete Python version.
Tom Lane [Sun, 27 Feb 2011 18:43:29 +0000 (13:43 -0500)]
Refactor the executor's API to support data-modifying CTEs better.
The originally committed patch for modifying CTEs didn't interact well
with EXPLAIN, as noted by myself, and also had corner-case problems with
triggers, as noted by Dean Rasheed. Those problems show it is really not
practical for ExecutorEnd to call any user-defined code; so split the
cleanup duties out into a new function ExecutorFinish, which must be called
between the last ExecutorRun call and ExecutorEnd. Some Asserts have been
added to these functions to help verify correct usage.
It is no longer necessary for callers of the executor to call
AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now
done by ExecutorStart/ExecutorFinish respectively. If you really need to
suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to
ExecutorStart.
Also, refactor portal commit processing to allow for the possibility that
PortalDrop will invoke user-defined code. I think this is not actually
necessary just yet, since the portal-execution-strategy logic forces any
non-pure-SELECT query to be run to completion before we will consider
committing. But it seems like good future-proofing.
Bruce Momjian [Sun, 27 Feb 2011 17:21:25 +0000 (12:21 -0500)]
Be less detailed about reporting shared memory failure by avoiding the
output of actual Postgres parameter _values_ related to shared memory,
and suggesting that these are only possible parameters to reduce.
Increase the default for wal_sender_delay from 200ms to 1s. Now that WAL
sender is immediately woken up by transaction commit, there's no need to
wake up so aggressively.
Tom Lane [Sat, 26 Feb 2011 04:53:34 +0000 (23:53 -0500)]
Fix order of shutdown processing when CTEs contain inter-references.
We need ExecutorEnd to run the ModifyTable nodes to completion in
reverse order of initialization, not forward order. Easily done
by constructing the list back-to-front.
Tom Lane [Fri, 25 Feb 2011 23:56:23 +0000 (18:56 -0500)]
Support data-modifying commands (INSERT/UPDATE/DELETE) in WITH.
This patch implements data-modifying WITH queries according to the
semantics that the updates all happen with the same command counter value,
and in an unspecified order. Therefore one WITH clause can't see the
effects of another, nor can the outer query see the effects other than
through the RETURNING values. And attempts to do conflicting updates will
have unpredictable results. We'll need to document all that.
This commit just fixes the code; documentation updates are waiting on
author.
Alvaro Herrera [Fri, 25 Feb 2011 22:04:25 +0000 (19:04 -0300)]
Fix pageinspect's heap_page_item to return infomasks as 32 bit values
HeapTupleHeader's t_infomask and t_infomask2 are defined as 16-bit
unsigned integers, so when the 16th bit was set, heap_page_item was
returning them as negative values, which was ugly.
The change to pageinspect--unpackaged--1.0.sql allows a module upgraded
from 9.0 to be cleanly updated from the previous definition.
Itagaki Takahiro [Thu, 24 Feb 2011 12:05:40 +0000 (21:05 +0900)]
More psql tab-completion for new commands.
- ALTER FOREIGN DATA WRAPPER with HANDLER
- ALTER TABLE VALIDATE CONSTRAINT
- ALTER TYPE ADD VALUE
- COPY with ENCODING and FORCE NOT NULL
- CREATE FOREIGN DATA WRAPPER with HANDLER
- CREATE TRIGGER ... INSTEAD OF
Tom Lane [Wed, 23 Feb 2011 00:23:23 +0000 (19:23 -0500)]
Add a relkind field to RangeTblEntry to avoid some syscache lookups.
The recent additions for FDW support required checking foreign-table-ness
in several places in the parse/plan chain. While it's not clear whether
that would really result in a noticeable slowdown, it seems best to avoid
any performance risk by keeping a copy of the relation's relkind in
RangeTblEntry. That might have some other uses later, anyway.
Per discussion.