Rethink the way walreceiver is linked into the backend. Instead than shoving
walreceiver as whole into a dynamically loaded module, split the
libpq-specific parts of it into dynamically loaded module and keep the rest
in the main backend binary.
Although Tom fixed the Windows compilation problems with the old walreceiver
module already, this is a cleaner division of labour and makes the code
more readable. There's also the prospect of adding new transport methods
as pluggable modules in the future, which this patch makes easier, though for
now the API between libpqwalreceiver and walreceiver process should be
considered private.
The libpq-specific module is now in src/backend/replication/libpqwalreceiver,
and the part linked with postgres binary is in
src/backend/replication/walreceiver.c.
Peter Eisentraut [Wed, 20 Jan 2010 05:47:09 +0000 (05:47 +0000)]
Before attempting to create a composite type, check whether a type of that
name already exists, so we'd get an error message about a "type" instead
of about a "relation", because the composite type code shares code with
relation creation.
Andrew Dunstan [Wed, 20 Jan 2010 01:08:21 +0000 (01:08 +0000)]
Add utility functions to PLPerl:
quote_literal, quote_nullable, quote_ident,
encode_bytea, decode_bytea, looks_like_number,
encode_array_literal, encode_array_constructor.
Split SPI.xs into two - SPI.xs now contains only SPI functions. Remainder
are in new Util.xs.
Some more code and documentation cleanup along the way, as well as
adding some CVS markers to files missing them.
Original patch from Tim Bunce, with a little editing from me.
Robert Haas [Wed, 20 Jan 2010 00:42:28 +0000 (00:42 +0000)]
Reformat documentation of libpq escaping functions.
Modify the "Escaping Strings for Inclusion in SQL Commands" section
to use a <variablelist> as the preceding and following sections do,
and merge the "Escaping Binary Strings for Inclusion in SQL Commands"
section into it.
This changes only the formatting of these sections, not the content.
It is intended to lay the groundwork for a follow-on patch to add
some new escaping functions, but it makes sense to commit this first,
for clarity.
Tom Lane [Tue, 19 Jan 2010 18:39:19 +0000 (18:39 +0000)]
When doing a parallel restore, we must guard against out-of-range dependency
dump IDs, because the array we're using is sized according to the highest
dump ID actually defined in the archive file. In a partial dump there could
be references to higher dump IDs that weren't dumped. Treat these the same
as references to in-range IDs that weren't dumped. (The whole thing is a
bit scary because the missing objects might have been part of dependency
chains, which we won't know about. Not much we can do though --- throwing
an error is probably overreaction.)
Also, reject parallel restore with pre-1.8 archive version (made by pre-8.0
pg_dump). In these old versions the dependency entries are OIDs, not dump
IDs, and we don't have enough information to interpret them.
Tom Lane [Tue, 19 Jan 2010 16:33:33 +0000 (16:33 +0000)]
Fix thinko in my recent change to put an explicit argisrow field in NullTest:
when the planner splits apart a ROW(...) IS NULL test, the argisrow values
of the component tests have to be determined from the component field types,
not copied from the original NullTest (in which argisrow is surely true).
Tom Lane [Mon, 18 Jan 2010 18:17:45 +0000 (18:17 +0000)]
Fix an oversight in convert_EXISTS_sublink_to_join: we can't convert an
EXISTS that contains a WITH clause. This would usually lead to a
"could not find CTE" error later in planning, because the WITH wouldn't
get processed at all. Noted while playing with an example from Ken Marshall.
Tom Lane [Mon, 18 Jan 2010 02:30:25 +0000 (02:30 +0000)]
Fix portalmem.c to avoid keeping a dangling pointer to a cached plan list
after it's released its reference count for the cached plan. There are
code paths that might try to examine the plan list before noticing that
the portal is already in aborted state. Report and diagnosis by Tatsuo
Ishii, though this isn't exactly his proposed patch.
Tom Lane [Sun, 17 Jan 2010 22:56:23 +0000 (22:56 +0000)]
Improve the handling of SET CONSTRAINTS commands by having them search
pg_constraint before searching pg_trigger. This allows saner handling of
corner cases; in particular we now say "constraint is not deferrable"
rather than "constraint does not exist" when the command is applied to
a constraint that's inherently non-deferrable. Per a gripe several months
ago from hubert depesz lubaczewski.
To make this work without breaking user-defined constraint triggers,
we have to add entries for them to pg_constraint. However, in return
we can remove the pgconstrname column from pg_constraint, which represents
a fairly sizable space savings. I also replaced the tgisconstraint column
with tgisinternal; the old meaning of tgisconstraint can now be had by
testing for nonzero tgconstraint, while there is no other way to get
the old meaning of nonzero tgconstraint, namely that the trigger was
internally generated rather than being user-created.
In passing, fix an old misstatement in the docs and comments, namely that
pg_trigger.tgdeferrable is exactly redundant with pg_constraint.condeferrable.
Actually, we mark RI action triggers as nondeferrable even when they belong to
a nominally deferrable FK constraint. The SET CONSTRAINTS code now relies on
that instead of hard-coding a list of exception OIDs.
Tom Lane [Sat, 16 Jan 2010 19:50:26 +0000 (19:50 +0000)]
Re-order configure tests to reflect the fact that the code generated for
posix_fadvise and other file-related functions can depend on _LARGEFILE_SOURCE
and/or _FILE_OFFSET_BITS. Per report from Robert Treat.
Back-patch to 8.4. This has been wrong all along, but we weren't really using
posix_fadvise in anger before, and AC_FUNC_FSEEKO seems to mask the issue well
enough for that function.
Tom Lane [Sat, 16 Jan 2010 17:39:55 +0000 (17:39 +0000)]
Fix unportable use of isxdigit() with char (rather than unsigned char)
argument, per warnings from buildfarm member pika. Also clean up code
formatting a trifle.
Simon Riggs [Sat, 16 Jan 2010 14:16:31 +0000 (14:16 +0000)]
Lock database while running drop database in Hot Standby to protect
against concurrent reconnection. Failure during testing showed issue
was possible, even though earlier analysis seemed to indicate it
would not be required. Use LockSharedObjectForSession() before
ResolveRecoveryConflictWithDatabase() and hold lock until end of
processing for that WAL record. Simple approach to avoid introducing
further bugs at this stage of development on an improbable issue.
Simon Riggs [Sat, 16 Jan 2010 10:05:59 +0000 (10:05 +0000)]
Teach standby conflict resolution to use SIGUSR1
Conflict reason is passed through directly to the backend, so we can
take decisions about the effect of the conflict based upon the local
state. No specific changes, as yet, though this prepares for later work.
CancelVirtualTransaction() sends signals while holding ProcArrayLock.
Introduce errdetail_abort() to give message detail explaining that the
abort was caused by conflict processing. Remove CONFLICT_MODE states
in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.
Tom Lane [Sat, 16 Jan 2010 05:52:29 +0000 (05:52 +0000)]
Huh, apparently on cygwin we HAVE_SIGPROCMASK, so both variants of
the BlockSig/UnBlockSig declaration have to be PGDLLIMPORT'ified.
Per buildfarm results.
Tom Lane [Fri, 15 Jan 2010 22:36:35 +0000 (22:36 +0000)]
Do parse analysis of an EXPLAIN's contained statement during the normal
parse analysis phase, rather than at execution time. This makes parameter
handling work the same as it does in ordinary plannable queries, and in
particular fixes the incompatibility that Pavel pointed out with plpgsql's
new handling of variable references. plancache.c gets a little bit
grottier, but the alternatives seem worse.
Move build of src/backend/replication/walreceiver/ later in the build
process, after src/interfaces, because it depends on libpq. Also add
missing lines for clean etc. targets
This includes two new kinds of postmaster processes, walsenders and
walreceiver. Walreceiver is responsible for connecting to the primary server
and streaming WAL to disk, while walsender runs in the primary server and
streams WAL from disk to the client.
Documentation still needs work, but the basics are there. We will probably
pull the replication section to a new chapter later on, as well as the
sections describing file-based replication. But let's do that as a separate
patch, so that it's easier to see what has been added/changed. This patch
also adds a new section to the chapter about FE/BE protocol, documenting the
protocol used by walsender/walreceivxer.
Bump catalog version because of two new functions,
pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
monitoring the progress of replication.
Simon Riggs [Thu, 14 Jan 2010 11:08:02 +0000 (11:08 +0000)]
First part of refactoring of code for ResolveRecoveryConflict. Purposes
of this are to centralise the conflict code to allow further change,
as well as to allow passing through the full reason for the conflict
through to the conflicting backends. Backend state alters how we
can handle different types of conflict so this is now required.
As originally suggested by Heikki, no longer optional.
Tom Lane [Thu, 14 Jan 2010 00:14:06 +0000 (00:14 +0000)]
Simplify validate_exec() by using access(2) to check file permissions,
rather than trying to implement the equivalent logic by hand. The motivation
for the original coding appears to have been to check with the effective uid's
permissions not the real uid's; but there is no longer any difference, because
we don't run the postmaster setuid (indeed, main.c enforces that they're the
same). Using access() means we will get it right in situations the original
coding failed to handle, such as ACL-based permissions. Besides it's a lot
shorter, cleaner, and more thread-safe. Per bug #5275 from James Bellinger.
Tom Lane [Wed, 13 Jan 2010 23:07:08 +0000 (23:07 +0000)]
When loading critical system indexes into the relcache, ensure we lock the
underlying catalog not only the index itself. Otherwise, if the cache
load process touches the catalog (which will happen for many though not
all of these indexes), we are locking index before parent table, which can
result in a deadlock against processes that are trying to lock them in the
normal order. Per today's failure on buildfarm member gothic_moth; it's
surprising the problem hadn't been identified before.
Back-patch to 8.2. Earlier releases didn't have the issue because they
didn't try to lock these indexes during load (instead assuming that they
couldn't change schema at all during multiuser operation).
Tom Lane [Wed, 13 Jan 2010 16:56:56 +0000 (16:56 +0000)]
Fix bug #5269: ResetPlanCache mustn't invalidate cached utility statements,
especially not ROLLBACK. ROLLBACK might need to be executed in an already
aborted transaction, when there is no safe way to revalidate the plan. But
in general there's no point in marking utility statements invalid, since
they have no plans in the normal sense of the word; so we might as well
work a bit harder here to avoid future revalidation cycles.
Michael Meskes [Wed, 13 Jan 2010 08:41:50 +0000 (08:41 +0000)]
Fix SQL3 type return value.
For non-SQL3 types ecpg used to return -Oid. This will break if there are
enough Oids to fill the namespace. Therefore we play it safe and return 0 if
there is no Oid->SQL3 tyoe mapping available.
Tom Lane [Wed, 13 Jan 2010 01:17:07 +0000 (01:17 +0000)]
Make fixed_paramref_hook behave properly when there are 'unused' slots
in the parameter array. Noted while experimenting with an example
from Pavel. This wouldn't come up in normal use, but it ought to honor
the specification that a parameter array can have unused slots.
Tom Lane [Tue, 12 Jan 2010 18:12:18 +0000 (18:12 +0000)]
Fix relcache reload mechanism to be more robust in the face of errors
occurring during a reload, such as query-cancel. Instead of zeroing out
an existing relcache entry and rebuilding it in place, build a new relcache
entry, then swap its contents with the old one, then free the new entry.
This avoids problems with code believing that a previously obtained pointer
to a cache entry must still reference a valid entry, as seen in recent
failures on buildfarm member jaguar. (jaguar is using CLOBBER_CACHE_ALWAYS
which raises the probability of failure substantially, but the problem
could occur in the field without that.) The previous design was okay
when it was made, but subtransactions and the ResourceOwner mechanism
make it unsafe now.
Also, make more use of the already existing rd_isvalid flag, so that we
remember that the entry requires rebuilding even if the first attempt fails.
Back-patch as far as 8.2. Prior versions have enough issues around relcache
reload anyway (due to inadequate locking) that fixing this one doesn't seem
worthwhile.
Bruce Momjian [Tue, 12 Jan 2010 02:42:52 +0000 (02:42 +0000)]
Please tablespace directories in their own subdirectory so pg_migrator
can upgrade clusters without renaming the tablespace directories. New
directory structure format is, e.g.:
Tom Lane [Mon, 11 Jan 2010 18:39:32 +0000 (18:39 +0000)]
Add some simple support and documentation for using process-specific oom_adj
settings to prevent the postmaster from being OOM-killed on Linux systems.
Tom Lane [Mon, 11 Jan 2010 15:31:04 +0000 (15:31 +0000)]
Improve ExecEvalVar's handling of whole-row variables in cases where the
rowtype contains dropped columns. Sometimes the input tuple will be formed
from a select targetlist in which dropped columns are filled with a NULL
of an arbitrary type (the planner typically uses INT4, since it can't tell
what type the dropped column really was). So we need to relax the rowtype
compatibility check to not insist on physical compatibility if the actual
column value is NULL.
In principle we might need to do this for functions returning composite
types, too (see tupledesc_match()). In practice there doesn't seem to be
a bug there, probably because the function will be using the same cached
rowtype descriptor as the caller. Fixing that code path would require
significant rearrangement, so I left it alone for now.
Tom Lane [Sun, 10 Jan 2010 17:56:50 +0000 (17:56 +0000)]
Improve plpgsql parsing to report "foo is not a known variable", rather than a
generic syntax error, when seeing "foo := something" and foo isn't recognized.
This buys back most of the helpfulness discarded in my previous patch by not
throwing errors when a qualified name appears to match a row variable but the
last component doesn't match any field of the row. It covers other cases
where our error messages left something to be desired, too.
Tom Lane [Sun, 10 Jan 2010 17:15:18 +0000 (17:15 +0000)]
Improve plpgsql's handling of record field references by forcing all potential
field references in SQL expressions to have RECFIELD datum-array entries at
parse time. If it turns out that the reference is actually to a SQL column,
the RECFIELD entry is useless, but it costs little. This allows us to get rid
of the previous use of FieldSelect applied to a whole-row Param for the record
variable; which was not only slower than a direct RECFIELD reference, but
failed for references to system columns of a trigger's NEW or OLD record.
Per report and fix suggestion from Dean Rasheed.
Magnus Hagander [Sun, 10 Jan 2010 15:54:11 +0000 (15:54 +0000)]
Update Windows installation notes.
pginstaller isn't used anymore, in favor of the one-click installers.
Make it clear that we support Windows 2000 and newer with the native
port, instead of first saying we support NT4 and then saying we don't.
Simon Riggs [Sun, 10 Jan 2010 15:44:28 +0000 (15:44 +0000)]
During Hot Standby, fix drop database when sessions idle.
Previously we only cancelled sessions that were in-transaction.
Simple fix is to just cancel all sessions without waiting. Doing
it this way avoids complicating common code paths, which would
not be worth the trouble to cover this rare case.
Problem report and fix by Andres Freund, edited somewhat by me
Robert Haas [Sun, 10 Jan 2010 04:26:36 +0000 (04:26 +0000)]
Remove partial, broken support for NULL pointers when fetching attributes.
Previously, fastgetattr() and heap_getattr() tested their fourth argument
against a null pointer, but any attempt to use them with a literal-NULL
fourth argument evaluated to *(void *)0, resulting in a compiler error.
Remove these NULL tests to avoid leading future readers of this code to
believe that this has a chance of working. Also clean up related legacy
code in nocachegetattr(), heap_getsysattr(), and nocache_index_getattr().
The new coding standard is that any code which calls a getattr-type
function or macro which takes an isnull argument MUST pass a valid
boolean pointer. Per discussion with Bruce Momjian, Tom Lane, Alvaro
Herrera.
Tom Lane [Sat, 9 Jan 2010 20:46:19 +0000 (20:46 +0000)]
Make ExecEvalFieldSelect throw a more intelligible error if it's asked to
extract a system column, and remove a couple of lines that are useless
in light of the fact that we aren't ever going to support this case. There
isn't much point in trying to make this work because a tuple Datum does
not carry many of the system columns. Per experimentation with a case
reported by Dean Rasheed; we'll have to fix his problem somewhere else.
Simon Riggs [Sat, 9 Jan 2010 16:49:27 +0000 (16:49 +0000)]
During Hot Standby, set DatabasePath correctly during relcache init file
deletion, so that we attempt to unlink the correct filepath. unlink()
errors are ignorable there, so lack of a DatabasePath initialization step
did not cause visible problems until a related bug showed up on Solaris.
Code refactored from xact_redo_commit() to
ProcessCommittedInvalidationMessages() in inval.c. Recovery may replay
shared invalidation messages for many databases, so we cannot
SetDatabasePath() once as we do in normal backends. Read the databaseid
from the shared invalidation messages, then set DatabasePath
temporarily before calling RelationCacheInitFileInvalidate().
Problem report by Robert Treat, analysis and fix by me.
Andrew Dunstan [Sat, 9 Jan 2010 15:25:41 +0000 (15:25 +0000)]
Provide regression testing for plperlu, and for plperl+plperlu interaction.
The latter are only run if the platform can run both interpreters in the
same backend.
Andrew Dunstan [Sat, 9 Jan 2010 02:40:50 +0000 (02:40 +0000)]
Tidy up and refactor plperl.c.
- Changed MULTIPLICITY check from runtime to compiletime.
No loads the large Config module.
- Changed plperl_init_interp() to return new interp
and not alter the global interp_state
- Moved plperl_safe_init() call into check_interp().
- Removed plperl_safe_init_done state variable
as interp_state now covers that role.
- Changed plperl_create_sub() to take a plperl_proc_desc argument.
- Simplified return value handling in plperl_create_sub.
- Changed perl.com link in the docs to perl.org and tweaked
wording to clarify that require, not use, is what's blocked.
- Moved perl code in large multi-line C string literal macros
out to plc_*.pl files.
- Added a test2macro.pl utility to convert the plc_*.pl files to
macros in a perlchunks.h file which is #included
- Simplifed plperl_safe_init() slightly
- Optimized pg_verifymbstr calls to avoid unneeded strlen()s.
Tom Lane [Fri, 8 Jan 2010 02:44:00 +0000 (02:44 +0000)]
Fix oversight in EvalPlanQualFetch: after failing to lock a tuple because
someone else has just updated it, we have to set priorXmax to that tuple's
xmax (ie, the XID of the other xact that updated it) before looping back to
examine the next tuple. Obviously, the next tuple in the update chain should
have that XID as its xmin, not the same xmin as the preceding tuple that we
had been trying to lock. The mismatch would cause the EvalPlanQual logic to
decide that the tuple chain ended in a deletion, when actually there was a
live tuple that should have been found.
I inserted this error when recently adding logic to EvalPlanQual to make it
lock tuples before returning them (as opposed to the old method in which the
lock would occur much later, causing a great deal of work to be wasted if we
only then discover someone else updated it). Sigh. Per today's report from
Takahiro Itagaki of inconsistent results during pgbench runs.
This uses the same infrastructure with EXPLAIN BUFFERS to support
{shared|local}_blks_{hit|read|written} andtemp_blks_{read|written}
columns in the pg_stat_statements view. The dumped file format
also updated.
Tom Lane [Thu, 7 Jan 2010 19:53:11 +0000 (19:53 +0000)]
Make bit/varbit substring() treat any negative length as meaning "all the rest
of the string". The previous coding treated only -1 that way, and would
produce an invalid result value for other negative values.
We ought to fix it so that 2-parameter bit substring() is a different C
function and the 3-parameter form throws error for negative length, but
that takes a pg_proc change which is impractical in the back branches;
and in any case somebody might be relying on -1 working this way.
So just do this as a back-patchable fix.
Tom Lane [Thu, 7 Jan 2010 16:29:58 +0000 (16:29 +0000)]
Fix (some of the) breakage introduced into query-cancel processing by HS.
It is absolutely not okay to throw an ereport(ERROR) in any random place in
the code just because DoingCommandRead is set; interrupting, say, OpenSSL
in the midst of its activities is guaranteed to result in heartache.
Instead of that, undo the original optimizations that threw away
QueryCancelPending anytime we were starting or finishing a command read, and
instead discard the cancel request within ProcessInterrupts if we find that
there is no HS reason for forcing a cancel and we are DoingCommandRead.
In passing, may I once again condemn the practice of changing the code
and not fixing the adjacent comment that you just turned into a lie?
Tom Lane [Thu, 7 Jan 2010 04:53:35 +0000 (04:53 +0000)]
Remove all the special-case code for INT64_IS_BUSTED, per decision that
we're not going to support that anymore.
I did keep the 64-bit-CRC-with-32-bit-arithmetic code, since it has a
performance excuse to live. It's a bit moot since that's all ifdef'd
out, of course.
Robert Haas [Thu, 7 Jan 2010 03:53:08 +0000 (03:53 +0000)]
Further fixes for per-tablespace options patch.
Add missing varlena header to TableSpaceOpts structure. And, per
Tom Lane, instead of calling tablespace_reloptions in CacheMemoryContext,
call it in the caller's memory context and copy the value over
afterwards, to reduce the chances of a session-lifetime memory leak.
Tom Lane [Thu, 7 Jan 2010 01:41:11 +0000 (01:41 +0000)]
Make configure check the version of Perl we're building with, and reject
versions < 5.8. Also, if there's no Perl, emit a warning informing the
user that he won't be able to build from a CVS pull. This is exactly the
same treatment we give Bison and Perl, and for the same reasons.
Tom Lane [Thu, 7 Jan 2010 00:25:05 +0000 (00:25 +0000)]
Alter the configure script to fail immediately if the C compiler does not
provide a working 64-bit integer datatype. As recently noted, we've been
broken on such platforms since early in the 8.4 development cycle. Since
it took nearly two years for anyone to even notice, it seems that the
rationale for continuing to support such platforms has reached the point
of non-existence. Rather than thrashing around to try to make it work
again, we'll just admit up front that this no longer works.
Back-patch to 8.4 since that branch is also broken.
We should go around to remove INT64_IS_BUSTED support, but just in HEAD,
so that seems like material for a separate commit.
Tom Lane [Wed, 6 Jan 2010 23:00:02 +0000 (23:00 +0000)]
Fix spccache.c to not suppose that a cache entry will live across database
access, per testing with CLOBBER_CACHE_ALWAYS. Minor other editorialization.