Joe Conway [Wed, 3 Feb 2010 23:01:23 +0000 (23:01 +0000)]
Check to ensure the number of primary key fields supplied does not
exceed the total number of non-dropped source table fields for
dblink_build_sql_*(). Addresses bug report from Rushabh Lathia.
Tom Lane [Tue, 2 Feb 2010 19:12:34 +0000 (19:12 +0000)]
CLUSTER specified the wrong namespace when renaming toast tables of temporary
relations (they don't live in pg_toast). This caused an Assert failure in
assert-enabled builds. So far as I can see, in a non-assert build it would
only have messed up the checks for conflicting names, so a failure would be
quite improbable but perhaps not impossible.
Tom Lane [Mon, 1 Feb 2010 02:45:35 +0000 (02:45 +0000)]
Change regexp engine's ccondissect/crevdissect routines to perform DFA
matching before recursing instead of after. The DFA match eliminates
unworkable midpoint choices a lot faster than the recursive check, in most
cases, so doing it first can speed things up; particularly in pathological
cases such as recently exhibited by Michael Glaesemann.
In addition, apply some cosmetic changes that were applied upstream (in the
Tcl project) at the same time, in order to sync with upstream version 1.15
of regexec.c.
Upstream apparently intends to backpatch this, so I will too. The
pathological behavior could be unpleasant if encountered in the field,
which seems to justify any risk of introducing new bugs.
Tom Lane, reviewed by Donal K. Fellows of Tcl project
Magnus Hagander [Sun, 31 Jan 2010 17:16:29 +0000 (17:16 +0000)]
Fix race condition in win32 signal handling.
There was a race condition where the receiving pipe could be closed by the
child thread if the main thread was pre-empted before it got a chance to
create a new one, and the dispatch thread ran to completion during that time.
One symptom of this is that rows in pg_listener could be dropped under
heavy load.
Analysis and original patch by Radu Ilie, with some small
modifications by Magnus Hagander.
Tom Lane [Sat, 30 Jan 2010 20:09:59 +0000 (20:09 +0000)]
Avoid performing encoding conversion on command tag strings during EndCommand.
Since all current and foreseeable future command tags will be pure ASCII,
there is no need to do conversion on them. This saves a few cycles and also
avoids polluting otherwise-pristine subtransaction memory contexts, which
is the cause of the backend memory leak exhibited in bug #5302. (Someday
we'll probably want to have a better method of determining whether
subtransaction contexts need to be kept around, but today is not that day.)
Backpatch to 8.0. The cycle-shaving aspect of this would work in 7.4
too, but without subtransactions the memory-leak aspect doesn't apply,
so it doesn't seem worth touching 7.4.
Tom Lane [Sat, 30 Jan 2010 18:59:58 +0000 (18:59 +0000)]
Fix memory leakage introduced into print_aligned_text by 8.4 changes
(failure to free col_lineptrs[] array elements) and exacerbated in the
current devel cycle (failure to free "wrap"). This resulted in moderate
bloat of psql over long script runs. Noted while testing bug #5302,
although what the reporter was complaining of was backend-side leakage.
Tom Lane [Mon, 25 Jan 2010 01:58:19 +0000 (01:58 +0000)]
Apply Tcl_Init() to the "hold" interpreter created by pltcl.
You might think this is unnecessary since that interpreter is never used
to run code --- but it turns out that's wrong. As of Tcl 8.5, the "clock"
command (alone among builtin Tcl commands) is partially implemented by
loaded-on-demand Tcl code, which means that it fails if there's not
unknown-command support, and also that it's impossible to run it directly
in a safe interpreter. The way they get around the latter is that
Tcl_CreateSlave() automatically sets up an alias command that forwards any
execution of "clock" in a safe slave interpreter to its parent interpreter.
Thus, when attempting to execute "clock" in trusted pltcl, the command
actually executes in the "hold" interpreter, where it will fail if
unknown-command support hasn't been introduced by sourcing the standard
init.tcl script, which is done by Tcl_Init(). (This is a pretty dubious
design decision on the Tcl boys' part, if you ask me ... but they didn't.)
Back-patch all the way. It's not clear that anyone would try to use ancient
versions of pltcl with a recent Tcl, but it's not clear they wouldn't, either.
Also add a regression test using "clock", in branches that have regression
test support for pltcl.
Tom Lane [Sun, 24 Jan 2010 21:49:31 +0000 (21:49 +0000)]
Fix assorted core dumps and Assert failures that could occur during
AbortTransaction or AbortSubTransaction, when trying to clean up after an
error that prevented (sub)transaction start from completing:
* access to TopTransactionResourceOwner that might not exist
* assert failure in AtEOXact_GUC, if AtStart_GUC not called yet
* assert failure or core dump in AfterTriggerEndSubXact, if
AfterTriggerBeginSubXact not called yet
Per testing by injecting elog(ERROR) at successive steps in StartTransaction
and StartSubTransaction. It's not clear whether all of these cases could
really occur in the field, but at least one of them is easily exposed by
simple stress testing, as per my accidental discovery yesterday.
Tom Lane [Sat, 23 Jan 2010 21:29:06 +0000 (21:29 +0000)]
Insert CHECK_FOR_INTERRUPTS calls into loops in dbsize.c, to ensure that
the various disk-size-reporting functions will respond to query cancel
reasonably promptly even in very large databases. Per report from
Kevin Grittner.
Tom Lane [Wed, 20 Jan 2010 23:12:15 +0000 (23:12 +0000)]
Well, the systemtap guys moved the goalposts again: with the latest version,
we *must* generate probes.o or the dtrace probes don't work. Revert our
workaround for their previous bug. Details at
https://bugzilla.redhat.com/show_bug.cgi?id=557266
Tom Lane [Tue, 19 Jan 2010 18:39:26 +0000 (18:39 +0000)]
When doing a parallel restore, we must guard against out-of-range dependency
dump IDs, because the array we're using is sized according to the highest
dump ID actually defined in the archive file. In a partial dump there could
be references to higher dump IDs that weren't dumped. Treat these the same
as references to in-range IDs that weren't dumped. (The whole thing is a
bit scary because the missing objects might have been part of dependency
chains, which we won't know about. Not much we can do though --- throwing
an error is probably overreaction.)
Also, reject parallel restore with pre-1.8 archive version (made by pre-8.0
pg_dump). In these old versions the dependency entries are OIDs, not dump
IDs, and we don't have enough information to interpret them.
Tom Lane [Mon, 18 Jan 2010 18:17:52 +0000 (18:17 +0000)]
Fix an oversight in convert_EXISTS_sublink_to_join: we can't convert an
EXISTS that contains a WITH clause. This would usually lead to a
"could not find CTE" error later in planning, because the WITH wouldn't
get processed at all. Noted while playing with an example from Ken Marshall.
Tom Lane [Mon, 18 Jan 2010 02:30:30 +0000 (02:30 +0000)]
Fix portalmem.c to avoid keeping a dangling pointer to a cached plan list
after it's released its reference count for the cached plan. There are
code paths that might try to examine the plan list before noticing that
the portal is already in aborted state. Report and diagnosis by Tatsuo
Ishii, though this isn't exactly his proposed patch.
Tom Lane [Sat, 16 Jan 2010 19:50:38 +0000 (19:50 +0000)]
Re-order configure tests to reflect the fact that the code generated for
posix_fadvise and other file-related functions can depend on _LARGEFILE_SOURCE
and/or _FILE_OFFSET_BITS. Per report from Robert Treat.
Back-patch to 8.4. This has been wrong all along, but we weren't really using
posix_fadvise in anger before, and AC_FUNC_FSEEKO seems to mask the issue well
enough for that function.
Tom Lane [Wed, 13 Jan 2010 23:07:15 +0000 (23:07 +0000)]
When loading critical system indexes into the relcache, ensure we lock the
underlying catalog not only the index itself. Otherwise, if the cache
load process touches the catalog (which will happen for many though not
all of these indexes), we are locking index before parent table, which can
result in a deadlock against processes that are trying to lock them in the
normal order. Per today's failure on buildfarm member gothic_moth; it's
surprising the problem hadn't been identified before.
Back-patch to 8.2. Earlier releases didn't have the issue because they
didn't try to lock these indexes during load (instead assuming that they
couldn't change schema at all during multiuser operation).
Tom Lane [Wed, 13 Jan 2010 16:57:03 +0000 (16:57 +0000)]
Fix bug #5269: ResetPlanCache mustn't invalidate cached utility statements,
especially not ROLLBACK. ROLLBACK might need to be executed in an already
aborted transaction, when there is no safe way to revalidate the plan. But
in general there's no point in marking utility statements invalid, since
they have no plans in the normal sense of the word; so we might as well
work a bit harder here to avoid future revalidation cycles.
Tom Lane [Tue, 12 Jan 2010 18:12:26 +0000 (18:12 +0000)]
Fix relcache reload mechanism to be more robust in the face of errors
occurring during a reload, such as query-cancel. Instead of zeroing out
an existing relcache entry and rebuilding it in place, build a new relcache
entry, then swap its contents with the old one, then free the new entry.
This avoids problems with code believing that a previously obtained pointer
to a cache entry must still reference a valid entry, as seen in recent
failures on buildfarm member jaguar. (jaguar is using CLOBBER_CACHE_ALWAYS
which raises the probability of failure substantially, but the problem
could occur in the field without that.) The previous design was okay
when it was made, but subtransactions and the ResourceOwner mechanism
make it unsafe now.
Also, make more use of the already existing rd_isvalid flag, so that we
remember that the entry requires rebuilding even if the first attempt fails.
Back-patch as far as 8.2. Prior versions have enough issues around relcache
reload anyway (due to inadequate locking) that fixing this one doesn't seem
worthwhile.
Tom Lane [Mon, 11 Jan 2010 15:31:12 +0000 (15:31 +0000)]
Improve ExecEvalVar's handling of whole-row variables in cases where the
rowtype contains dropped columns. Sometimes the input tuple will be formed
from a select targetlist in which dropped columns are filled with a NULL
of an arbitrary type (the planner typically uses INT4, since it can't tell
what type the dropped column really was). So we need to relax the rowtype
compatibility check to not insist on physical compatibility if the actual
column value is NULL.
In principle we might need to do this for functions returning composite
types, too (see tupledesc_match()). In practice there doesn't seem to be
a bug there, probably because the function will be using the same cached
rowtype descriptor as the caller. Fixing that code path would require
significant rearrangement, so I left it alone for now.
Magnus Hagander [Sun, 10 Jan 2010 15:54:14 +0000 (15:54 +0000)]
Update Windows installation notes.
pginstaller isn't used anymore, in favor of the one-click installers.
Make it clear that we support Windows 2000 and newer with the native
port, instead of first saying we support NT4 and then saying we don't.
Tom Lane [Thu, 7 Jan 2010 19:53:16 +0000 (19:53 +0000)]
Make bit/varbit substring() treat any negative length as meaning "all the rest
of the string". The previous coding treated only -1 that way, and would
produce an invalid result value for other negative values.
We ought to fix it so that 2-parameter bit substring() is a different C
function and the 3-parameter form throws error for negative length, but
that takes a pg_proc change which is impractical in the back branches;
and in any case somebody might be relying on -1 working this way.
So just do this as a back-patchable fix.
Tom Lane [Thu, 7 Jan 2010 00:25:18 +0000 (00:25 +0000)]
Alter the configure script to fail immediately if the C compiler does not
provide a working 64-bit integer datatype. As recently noted, we've been
broken on such platforms since early in the 8.4 development cycle. Since
it took nearly two years for anyone to even notice, it seems that the
rationale for continuing to support such platforms has reached the point
of non-existence. Rather than thrashing around to try to make it work
again, we'll just admit up front that this no longer works.
Back-patch to 8.4 since that branch is also broken.
We should go around to remove INT64_IS_BUSTED support, but just in HEAD,
so that seems like material for a separate commit.
Tom Lane [Tue, 5 Jan 2010 23:25:44 +0000 (23:25 +0000)]
Add support for doing FULL JOIN ON FALSE. While this is really a rather
peculiar variant of UNION ALL, and so wouldn't likely get written directly
as-is, it's possible for it to arise as a result of simplification of
less-obviously-silly queries. In particular, now that we can do flattening
of subqueries that have constant outputs and are underneath an outer join,
it's possible for the case to result from simplification of queries of the
type exhibited in bug #5263. Back-patch to 8.4 to avoid a functionality
regression for this type of query.
Magnus Hagander [Fri, 1 Jan 2010 14:57:19 +0000 (14:57 +0000)]
Make the win32 putenv() override update *all* present versions of the
MSVCRxx runtime, not just the current + Visual Studio 6 (MSVCRT). Clearly
there can be an almost unlimited number of runtimes loaded at the same
time.
Tom Lane [Wed, 30 Dec 2009 03:45:53 +0000 (03:45 +0000)]
Set errno to zero before invoking SSL_read or SSL_write. It appears that
at least in some Windows versions, these functions are capable of returning
a failure indication without setting errno. That puts us into an infinite
loop if the previous value happened to be EINTR. Per report from Brendan
Hill.
Back-patch to 8.2. We could take it further back, but since this is only
known to be an issue on Windows and we don't support Windows before 8.2,
it does not seem worth the trouble.
Previous fix for temporary file management broke returning a set from
PL/pgSQL function within an exception handler. Make sure we use the right
resource owner when we create the tuplestore to hold returned tuples.
Simplify tuplestore API so that the caller doesn't need to be in the right
memory context when calling tuplestore_put* functions. tuplestore.c
automatically switches to the memory context used when the tuplestore was
created. Tuplesort was already modified like this earlier. This patch also
removes the now useless MemoryContextSwitch calls from callers.
Report by Aleksei on pgsql-bugs on Dec 22 2009. Backpatch to 8.1, like
the previous patch that broke this.
Tom Lane [Thu, 24 Dec 2009 17:52:11 +0000 (17:52 +0000)]
Fix wrong WAL info value generated when gistContinueInsert() performs an
index page split. This would result in index corruption, or even more likely
an error during WAL replay, if we were unlucky enough to crash during
end-of-recovery cleanup after having completed an incomplete GIST insertion.
Always pass catalog id to the options validator function specified in
CREATE FOREIGN DATA WRAPPER. Arguably it wasn't a bug because the
documentation said that it's passed the catalog ID or zero, but surely
we should provide it when it's known. And there isn't currently any
scenario where it's not known, and I can't imagine having one in the
future either, so better remove the "or zero" escape hatch and always
pass a valid catalog ID. Backpatch to 8.4.
Tom Lane [Wed, 16 Dec 2009 22:24:19 +0000 (22:24 +0000)]
Avoid a premature coercion failure in transformSetOperationTree() when
presented with an UNKNOWN-type Var, which can happen in cases where an
unknown literal appeared in a subquery. While many such cases will fail
later on anyway in the planner, there are some cases where the planner is
able to flatten the query and replace the Var by the constant before it has
to coerce the union column to the final type. I had added this check in 8.4
to provide earlier/better error detection, but it causes a regression for
some cases that worked OK before. Fix by not making the check if the input
node is UNKNOWN type and not a Const or Param. If it isn't going to work,
it will fail anyway at plan time, with the only real loss being inability to
provide an error cursor. Per gripe from Britt Piehler.
In passing, rename a couple of variables to remove confusion from an
inner scope masking the same variable names in an outer scope.
Tom Lane [Mon, 14 Dec 2009 02:16:04 +0000 (02:16 +0000)]
Fix a bug introduced when set-returning SQL functions were made inline-able:
we have to cope with the possibility that the declared result rowtype contains
dropped columns. This fails in 8.4, as per bug #5240.
While at it, be more paranoid about inserting binary coercions when inlining.
The pre-8.4 code did not really need to worry about that because it could not
inline at all in any case where an added coercion could change the behavior
of the function's statement. However, when inlining a SRF we allow sorting,
grouping, and set-ops such as UNION. In these cases, modifying one of the
targetlist entries that the sort/group/setop depends on could conceivably
change the behavior of the function's statement --- so don't inline when
such a case applies.
Tom Lane [Sat, 12 Dec 2009 19:24:44 +0000 (19:24 +0000)]
Fix integer-to-bit-string conversions to handle the first fractional byte
correctly when the output bit width is wider than the given integer by
something other than a multiple of 8 bits.
This has been wrong since I first wrote that code for 8.0 :-(. Kudos to
Roman Kononov for being the first to notice, though I didn't use his
patch. Per bug #5237.
Tom Lane [Wed, 9 Dec 2009 21:58:04 +0000 (21:58 +0000)]
Prevent indirect security attacks via changing session-local state within
an allegedly immutable index function. It was previously recognized that
we had to prevent such a function from executing SET/RESET ROLE/SESSION
AUTHORIZATION, or it could trivially obtain the privileges of the session
user. However, since there is in general no privilege checking for changes
of session-local state, it is also possible for such a function to change
settings in a way that might subvert later operations in the same session.
Examples include changing search_path to cause an unexpected function to
be called, or replacing an existing prepared statement with another one
that will execute a function of the attacker's choosing.
The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against
these threats, which are the same places previously deemed to need protection
against the SET ROLE issue. GUC changes are still allowed, since there are
many useful cases for that, but we prevent security problems by forcing a
rollback of any GUC change after completing the operation. Other cases are
handled by throwing an error if any change is attempted; these include temp
table creation, closing a cursor, and creating or deleting a prepared
statement. (In 7.4, the infrastructure to roll back GUC changes doesn't
exist, so we settle for rejecting changes of "search_path" in these contexts.)
Original report and patch by Gurjeet Singh, additional analysis by
Tom Lane.
Magnus Hagander [Wed, 9 Dec 2009 06:37:29 +0000 (06:37 +0000)]
Reject certificates with embedded NULLs in the commonName field. This stops
attacks where an attacker would put <attack>\0<propername> in the field and
trick the validation code that the certificate was for <attack>.
This is a very low risk attack since it reuqires the attacker to trick the
CA into issuing a certificate with an incorrect field, and the common
PostgreSQL deployments are with private CAs, and not external ones. Also,
default mode in 8.4 does not do any name validation, and is thus also not
vulnerable - but the higher security modes are.
Backpatch all the way. Even though versions 8.3.x and before didn't have
certificate name validation support, they still exposed this field for
the user to perform the validation in the application code, and there
is no way to detect this problem through that API.
Tom Lane [Wed, 9 Dec 2009 00:35:59 +0000 (00:35 +0000)]
Update time zone data files to tzdata release 2009s: DST law changes in
Antarctica, Argentina, Bangladesh, Fiji, Novokuznetsk, Pakistan, Palestine,
Samoa, Syria. Also historical corrections for Hong Kong.
Fix bug in temporary file management with subtransactions. A cursor opened
in a subtransaction stays open even if the subtransaction is aborted, so
any temporary files related to it must stay alive as well. With the patch,
we use ResourceOwners to track open temporary files and don't automatically
close them at subtransaction end (though in the normal case temporary files
are registered with the subtransaction resource owner and will therefore be
closed).
At end of top transaction, we still check that there's no temporary files
marked as close-at-end-of-transaction open, but that's now just a debugging
cross-check as the resource owner cleanup should've closed them already.
Tom Lane [Wed, 2 Dec 2009 17:41:07 +0000 (17:41 +0000)]
Ignore attempts to set "application_name" in the connection startup packet.
This avoids a useless connection retry and complaint in the postmaster log
when receiving a connection from 8.5 or later libpq.
Backpatch in all supported branches, but of course *not* HEAD.
Tom Lane [Sun, 29 Nov 2009 21:02:22 +0000 (21:02 +0000)]
Fix session-lifespan memory leak when a plperl function is redefined:
we have to tell Perl it can release its compiled copy of the function
text. Noted by Alexey Klyukin.
Back-patch to 8.2 --- the problem exists further back, but this patch
won't work without modification, and it's probably not worth the trouble.
Fix an old bug in multixact and two-phase commit. Prepared transactions can
be part of multixacts, so allocate a slot for each prepared transaction in
the "oldest member" array in multixact.c. On PREPARE TRANSACTION, transfer
the oldest member value from the current backends slot to the prepared xact
slot. Also save and recover the value from the 2pc state file.
The symptom of the bug was that after a transaction prepared, a shared lock
still held by the prepared transaction was sometimes ignored by other
transactions.
Fix back to 8.1, where both 2PC and multixact were introduced.
Tom Lane [Sat, 21 Nov 2009 05:44:12 +0000 (05:44 +0000)]
Refactor ecpg grammar so that it uses the core grammar's unreserved_keyword
list, minus a few specific words that have to be treated specially. This
replaces a hard-wired list of keywords that would have needed manual
maintenance, and was not getting it. The 8.4 coding was already missing
these words, causing ecpg to incorrectly treat them as reserved words:
CALLED, CATALOG, DEFINER, ENUM, FOLLOWING, INVOKER, OPTIONS, PARTITION,
PRECEDING, RANGE, SECURITY, SERVER, UNBOUNDED, WRAPPER. In HEAD we were
additionally missing COMMENTS, FUNCTIONS, SEQUENCES, TABLES.
Per gripe from Bosco Rama.
Tom Lane [Fri, 20 Nov 2009 20:54:20 +0000 (20:54 +0000)]
Fix display and dumping of UPDATE OR TRUNCATE triggers (a bizarre combination
maybe, but we should get it right). Bug noted while reviewing TRIGGER WHEN
patch. Already fixed in HEAD.
Tom Lane [Thu, 19 Nov 2009 02:45:40 +0000 (02:45 +0000)]
Fix memory leak in syslogger: logfile_rotate() would leak a copy of the
output filename if CSV logging was enabled and only one of the two possible
output files got rotated during a particular call (which would, in fact,
typically be the case during a size-based rotation). This would amount to
about MAXPGPATH (1KB) per rotation, and it's been there since the CSV
code was put in, so it's surprising that nobody noticed it before.
Per bug #5196 from Thomas Poindessous.
Tom Lane [Mon, 16 Nov 2009 18:04:47 +0000 (18:04 +0000)]
While doing the final setrefs.c pass over a plan tree, try to match up
non-Var sort/group expressions using ressortgroupref labels instead of
depending entirely on equal()-ity of the upper node's tlist expressions
to the lower node's. This avoids emitting the wrong outputs in cases
where there are textually identical volatile sort/group expressions,
as for example
select distinct random(),random() from generate_series(1,10);
Per report from Andrew Gierth.
Backpatch to 8.4. Arguably this is wrong all the way back, but the only known
case where there's an observable problem is when using hash aggregation to
implement DISTINCT, which is new as of 8.4. So for the moment I'll refrain
from backpatching further.
A better fix for the "ARRAY[...]::domain" problem. The previous patch worked,
but the transformed ArrayExpr claimed to have a return type of "domain",
even though the domain constraint was only checked by the enclosing
CoerceToDomain node. With this fix, the ArrayExpr is correctly labeled with
the base type of the domain. Per gripe by Tom Lane.
When you do "ARRAY[...]::domain", where domain is a domain over an array type,
we need to check domain constraints. We used to do it correctly, but 8.4
introduced a separate code path for the "ARRAY[]::arraytype" case to infer
the type of an empty ARRAY construct from the cast target, and forgot to take
domains into account.
Teodor Sigaev [Fri, 13 Nov 2009 11:17:22 +0000 (11:17 +0000)]
Fix multicolumn GIN's wrong results with fastupdate enabled.
User-defined consistent functions believes the check array
contains at least one true element which was not a true for
scanning pending list.
Tom Lane [Tue, 10 Nov 2009 23:12:21 +0000 (23:12 +0000)]
Do not build psql's flex module on its own, but instead include it in
mainloop.c. This ensures that postgres_fe.h is read before including
any system headers, which is necessary to avoid problems on some platforms
where we make nondefault selections of feature macros for stdio.h or
other headers. We have had this policy for flex modules in the backend
for many years, but for some reason it was not applied to psql.
Per trouble report from Alexandra Roy and diagnosis by Albe Laurenz.
Alvaro Herrera [Tue, 10 Nov 2009 18:00:30 +0000 (18:00 +0000)]
Fix longstanding problems in VACUUM caused by untimely interruptions
In VACUUM FULL, an interrupt after the initial transaction has been recorded
as committed can cause postmaster to restart with the following error message:
PANIC: cannot abort transaction NNNN, it was already committed
This problem has been reported many times.
In lazy VACUUM, an interrupt after the table has been truncated by
lazy_truncate_heap causes other backends' relcache to still point to the
removed pages; this can cause future INSERT and UPDATE queries to error out
with the following error message:
could not read block XX of relation 1663/NNN/MMMM: read only 0 of 8192 bytes
The window to this race condition is extremely narrow, but it has been seen in
the wild involving a cancelled autovacuum process.
The solution for both problems is to inhibit interrupts in both operations
until after the respective transactions have been committed. It's not a
complete solution, because the transaction could theoretically be aborted by
some other error, but at least fixes the most common causes of both problems.
Tom Lane [Thu, 5 Nov 2009 04:38:35 +0000 (04:38 +0000)]
Allow binary-coercible cases in ri_HashCompareOp; there are some such cases
that are not handled by find_coercion_pathway, notably composite->RECORD.
Now that 8.4 supports composites as primary keys, it's worth dealing with
this case.
Disable triggering failover with a signal in pg_standby on Windows, because
Windows doesn't do signal processing like other platforms do. It never
really worked, but recent changes to the signal handling made it crash.
In PLy_output(), when the elog() call in the TRY branch throws an exception
(this can happen when a statement timeout kicks in, for example), the
PyErr_SetString() call in the CATCH branch can cause a segfault, because the
Py_XDECREF(so) call before it releases memory that is still used by the sv
variable that PyErr_SetString() uses as argument, because sv points into
memory owned by so.
Backpatched back to 8.0, where this code was introduced.
I also threw in a couple of volatile declarations for variables that are used
before and after the TRY. I don't think they caused the crash that I
observed, but they could become issues.
Tom Lane [Sun, 1 Nov 2009 22:31:02 +0000 (22:31 +0000)]
Dept of second thoughts: after studying index_getnext() a bit more I realize
that it can scribble on scan->xs_ctup.t_self while following HOT chains,
so we can't rely on that to stay valid between hashgettuple() calls.
Introduce a private variable in HashScanOpaque, instead.
Tom Lane [Sun, 1 Nov 2009 21:25:33 +0000 (21:25 +0000)]
Fix two serious bugs introduced into hash indexes by the 8.4 patch that made
hash indexes keep entries sorted by hash value. First, the original plans for
concurrency assumed that insertions would happen only at the end of a page,
which is no longer true; this could cause scans to transiently fail to find
index entries in the presence of concurrent insertions. We can compensate
by teaching scans to re-find their position after re-acquiring read locks.
Second, neither the bucket split nor the bucket compaction logic had been
fixed to preserve hashvalue ordering, so application of either of those
processes could lead to permanent corruption of an index, in the sense
that searches might fail to find entries that are present.
This patch fixes the split and compaction logic to preserve hashvalue
ordering, but it cannot do anything about pre-existing corruption. We will
need to recommend reindexing all hash indexes in the 8.4.2 release notes.
To buy back the performance loss hereby induced in split and compaction,
fix them to use PageIndexMultiDelete instead of retail PageIndexDelete
operations. We might later want to do something with qsort'ing the
page contents rather than doing a binary search for each insertion,
but that seemed more invasive than I cared to risk in a back-patch.
Per bug #5157 from Jeff Janes and subsequent investigation.
Tom Lane [Sat, 31 Oct 2009 18:12:05 +0000 (18:12 +0000)]
Ensure the previous Perl interpreter selection is restored upon exit from
plperl_call_handler, in both the normal and error-exit paths. Per report
from Alexey Klyukin.
Tom Lane [Fri, 30 Oct 2009 20:58:51 +0000 (20:58 +0000)]
Make the overflow guards in ExecChooseHashTableSize be more protective.
The original coding ensured nbuckets and nbatch didn't exceed INT_MAX,
which while not insane on its own terms did nothing to protect subsequent
code like "palloc(nbatch * sizeof(BufFile *))". Since enormous join size
estimates might well be planner error rather than reality, it seems best
to constrain the initial sizes to be not more than work_mem/sizeof(pointer),
thus ensuring the allocated arrays don't exceed work_mem. We will allow
nbatch to get bigger than that during subsequent ExecHashIncreaseNumBatches
calls, but we should still guard against integer overflow in those palloc
requests. Per bug #5145 from Bernt Marius Johnsen.
Although the given test case only seems to fail back to 8.2, previous
releases have variants of this issue, so patch all supported branches.
Tom Lane [Wed, 28 Oct 2009 18:10:00 +0000 (18:10 +0000)]
Fix \df to re-allow regexp special characters in the function name pattern.
This has always worked, up until somebody's thinko here:
http://archives.postgresql.org/pgsql-committers/2009-04/msg00233.php
Per bug #5143 from Piotr Wolinski.
Tom Lane [Tue, 27 Oct 2009 20:14:33 +0000 (20:14 +0000)]
Fix AfterTriggerSaveEvent to use a test and elog, not just Assert, to check
that it's called within an AfterTriggerBeginQuery/AfterTriggerEndQuery pair.
The RI cascade triggers suppress that overhead on the assumption that they
are always run non-deferred, so it's possible to violate the condition if
someone mistakenly changes pg_trigger to mark such a trigger deferred.
We don't really care about supporting that, but throwing an error instead
of crashing seems desirable. Per report from Marcelo Costa.
Tom Lane [Tue, 27 Oct 2009 17:11:30 +0000 (17:11 +0000)]
Make FOR UPDATE/SHARE in the primary query not propagate into WITH queries;
for example in
WITH w AS (SELECT * FROM foo) SELECT * FROM w, bar ... FOR UPDATE
the FOR UPDATE will now affect bar but not foo. This is more useful and
consistent than the original 8.4 behavior, which tried to propagate FOR UPDATE
into the WITH query but always failed due to assorted implementation
restrictions. Even though we are in process of removing those restrictions,
it seems correct on philosophical grounds to not let the outer query's
FOR UPDATE affect the WITH query.
In passing, fix isLockedRel which frequently got things wrong in
nested-subquery cases: "FOR UPDATE OF foo" applies to an alias foo in the
current query level, not subqueries. This has been broken for a long time,
but it doesn't seem worth back-patching further than 8.4 because the actual
consequences are minimal. At worst the parser would sometimes get
RowShareLock on a relation when it should be AccessShareLock or vice versa.
That would only make a difference if someone were using ExclusiveLock
concurrently, which no standard operation does, and anyway FOR UPDATE
doesn't result in visible changes so it's not clear that the someone would
notice any problem. Between that and the fact that FOR UPDATE barely works
with subqueries at all in existing releases, I'm not excited about worrying
about it.
Tom Lane [Fri, 16 Oct 2009 22:08:42 +0000 (22:08 +0000)]
Rewrite pam_passwd_conv_proc to be more robust: avoid assuming that the
pam_message array contains exactly one PAM_PROMPT_ECHO_OFF message.
Instead, deal with however many messages there are, and don't throw error
for PAM_ERROR_MSG and PAM_TEXT_INFO messages. This logic is borrowed from
openssh 5.2p1, which hopefully has seen more real-world PAM usage than we
have. Per bug #5121 from Ryan Douglas, which turned out to be caused by
the conv_proc being called with zero messages. Apparently that is normal
behavior given the combination of Linux pam_krb5 with MS Active Directory
as the domain controller.
Patch all the way back, since this code has been essentially untouched
since 7.4. (Surprising we've not heard complaints before.)
Rename the new MAX_AUTH_TOKEN_LENGTH #define to PG_MAX_AUTH_MAX_TOKEN_LENGTH,
to make it more obvious that it's a PostgreSQL internal limit, not something
that comes from system header files.
Raise the maximum authentication token (Kerberos ticket) size in GSSAPI
and SSPI athentication methods. While the old 2000 byte limit was more than
enough for Unix Kerberos implementations, tickets issued by Windows Domain
Controllers can be much larger.
Tom Lane [Tue, 13 Oct 2009 14:33:21 +0000 (14:33 +0000)]
Fix ts_stat's failure on empty tsvector.
Also insert a couple of Asserts that check for stack overflow.
Bogus coding appears to be new in 8.4 --- older releases had a much
simpler algorithm here. Per bug #5111.
Alvaro Herrera [Wed, 7 Oct 2009 16:27:29 +0000 (16:27 +0000)]
Fix snapshot management, take two.
Partially revert the previous patch I installed and replace it with a more
general fix: any time a snapshot is pushed as Active, we need to ensure that it
will not be modified in the future. This means that if the same snapshot is
used as CurrentSnapshot, it needs to be copied separately. This affects
serializable transactions only, because CurrentSnapshot has already been copied
by RegisterSnapshot and so PushActiveSnapshot does not think it needs another
copy. However, CommandCounterIncrement would modify CurrentSnapshot, whereas
ActiveSnapshots must not have their command counters incremented.
I say "partially" because the regression test I added for the previous bug
has been kept.
(This restores 8.3 behavior, because before snapmgr.c existed, any snapshot set
as Active was copied.)
Per bug report from Stuart Bishop in
6bc73d4c0910042358k3d1adff3qa36f8df75198ecea@mail.gmail.com
Tom Lane [Tue, 6 Oct 2009 00:55:35 +0000 (00:55 +0000)]
Change CREATE TABLE so that column default expressions coming from different
inheritance parent tables are compared using equal(), instead of doing
strcmp() on the nodeToString representation. The old implementation was
always a tad cheesy, and it finally fails completely as of 8.4, now that the
node tree might contain syntax location information. equal() knows it's
supposed to ignore those fields, but strcmp() hardly can. Per recent
report from Scott Ribe.
Tom Lane [Sat, 3 Oct 2009 20:04:45 +0000 (20:04 +0000)]
Fix assorted memory leaks in pg_hba.conf parsing. Over a sufficiently
large number of SIGHUP cycles, these would have run the postmaster out
of memory. Noted while testing memory-leak scenario in postgresql.conf
configuration-change-printing patch.
Tom Lane [Fri, 2 Oct 2009 21:14:11 +0000 (21:14 +0000)]
Make sure that GIN fast-insert and regular code paths enforce the same
tuple size limit. Improve the error message for index-tuple-too-large
so that it includes the actual size, the limit, and the index name.
Sync with the btree occurrences of the same error.
Back-patch to 8.4 because it appears that the out-of-sync problem
is occurring in the field.
Tom Lane [Fri, 2 Oct 2009 18:13:10 +0000 (18:13 +0000)]
Fix erroneous handling of shared dependencies (ie dependencies on roles)
in CREATE OR REPLACE FUNCTION. The original code would update pg_shdepend
as if a new function was being created, even if it wasn't, with two bad
consequences: pg_shdepend might record the wrong owner for the function,
and any dependencies for roles mentioned in the function's ACL would be lost.
The fix is very easy: just don't touch pg_shdepend at all when doing a
function replacement.
Also update the CREATE FUNCTION reference page, which never explained
exactly what changes and doesn't change in a function replacement.
In passing, fix the CREATE VIEW reference page similarly; there's no
code bug there, but the docs didn't say what happens.
Alvaro Herrera [Fri, 2 Oct 2009 17:58:21 +0000 (17:58 +0000)]
Ensure that a cursor has an immutable snapshot throughout its lifespan.
The old coding was using a regular snapshot, referenced elsewhere, that was
subject to having its command counter updated. Fix by creating a private copy
of the snapshot exclusively for the cursor.
Backpatch to 8.4, which is when the bug was introduced during the snapshot
management rewrite.
Tom Lane [Tue, 29 Sep 2009 01:20:55 +0000 (01:20 +0000)]
Fix equivclass.c's not-quite-right strategy for handling X=X clauses.
The original coding correctly noted that these aren't just redundancies
(they're effectively X IS NOT NULL, assuming = is strict). However, they
got treated that way if X happened to be in a single-member EquivalenceClass
already, which could happen if there was an ORDER BY X clause, for instance.
The simplest and most reliable solution seems to be to not try to process
such clauses through the EquivalenceClass machinery; just throw them back
for traditional processing. The amount of work that'd be needed to be
smarter than that seems out of proportion to the benefit.
Per bug #5084 from Bernt Marius Johnsen, and analysis by Andrew Gierth.
Andrew Dunstan [Mon, 28 Sep 2009 17:30:56 +0000 (17:30 +0000)]
Convert a perl array to a postgres array when returned by Set Returning Functions as well as non SRFs. Backpatch to 8.1 where these facilities were introduced. with a little help from Abhijit Menon-Sen.
Tom Lane [Sat, 26 Sep 2009 18:24:55 +0000 (18:24 +0000)]
Fix RelationCacheInitializePhase2 (Phase3, in HEAD) to cope with the
possibility of shared-inval messages causing a relcache flush while it tries
to fill in missing data in preloaded relcache entries. There are actually
two distinct failure modes here:
1. The flush could delete the next-to-be-processed cache entry, causing
the subsequent hash_seq_search calls to go off into the weeds. This is
the problem reported by Michael Brown, and I believe it also accounts
for bug #5074. The simplest fix is to restart the hashtable scan after
we've read any new data from the catalogs. It appears that pre-8.4
branches have not suffered from this failure, because by chance there were
no other catalogs sharing the same hash chains with the catalogs that
RelationCacheInitializePhase2 had work to do for. However that's obviously
pretty fragile, and it seems possible that derivative versions with
additional system catalogs might be vulnerable, so I'm back-patching this
part of the fix anyway.
2. The flush could delete the *current* cache entry, in which case the
pointer to the newly-loaded data would end up being stored into an
already-deleted Relation struct. As long as it was still deleted, the only
consequence would be some leaked space in CacheMemoryContext. But it seems
possible that the Relation struct could already have been recycled, in
which case this represents a hard-to-reproduce clobber of cached data
structures, with unforeseeable consequences. The fix here is to pin the
entry while we work on it.
In passing, also change RelationCacheInitializePhase2 to Assert that
formrdesc() set up the relation's cached TupleDesc (rd_att) with the
correct type OID and hasoids values. This is more appropriate than
silently updating the values, because the original tupdesc might already
have been copied into the catcache. However this part of the patch is
not in HEAD because it fails due to some questionable recent changes in
formrdesc :-(. That will be cleaned up in a subsequent patch.
Tom Lane [Tue, 15 Sep 2009 20:31:35 +0000 (20:31 +0000)]
Fix two distinct errors in creation of GIN_INSERT_LISTPAGE xlog records.
In practice these mistakes were always masked when full_page_writes was on,
because XLogInsert would always choose to log the full page, and then
ginRedoInsertListPage wouldn't try to do anything. But with full_page_writes
off a WAL replay failure was certain.
The GIN_INSERT_LISTPAGE record type could probably be eliminated entirely
in favor of using XLOG_HEAP_NEWPAGE, but I refrained from doing that now
since it would have required a significantly more invasive patch.
In passing do a little bit of code cleanup, including making the accounting
for free space on GIN list pages more precise. (This wasn't a bug as the
errors were always in the conservative direction.)
Per report from Simon. Back-patch to 8.4 which contains the identical code.