Tom Lane [Wed, 17 Oct 2018 16:26:48 +0000 (12:26 -0400)]
Improve tzparse's handling of TZDEFRULES ("posixrules") zone data.
In the IANA timezone code, tzparse() always tries to load the zone
file named by TZDEFRULES ("posixrules"). Previously, we'd hacked
that logic to skip the load in the "lastditch" code path, which we use
only to initialize the default "GMT" zone during GUC initialization.
That's critical for a couple of reasons: since we do not support leap
seconds, we *must not* allow "GMT" to have leap seconds, and since this
case runs before the GUC subsystem is fully alive, we'd really rather
not take the risk of pg_open_tzfile throwing any errors.
However, that still left the code reading TZDEFRULES on every other
call, something we'd noticed to the extent of having added code to cache
the result so it was only done once per process not a lot of times.
Andres Freund complained about the static data space used up for the
cache; but as long as the logic was like this, there was no point in
trying to get rid of that space.
We can improve matters by looking a bit more closely at what the IANA
code actually needs the TZDEFRULES data for. One thing it does is
that if "posixrules" is a leap-second-aware zone, the leap-second
behavior will be absorbed into every POSIX-style zone specification.
However, that's a behavior we'd really prefer to do without, since
for our purposes the end effect is to render every POSIX-style zone
name unsupported. Otherwise, the TZDEFRULES data is used only if
the POSIX zone name specifies DST but doesn't include a transition
date rule (e.g., "EST5EDT" rather than "EST5EDT,M3.2.0,M11.1.0").
That is a minority case for our purposes --- in particular, it
never happens when tzload() invokes tzparse() to interpret a
transition date rule string found in a tzdata zone file.
Hence, if we legislate that we're going to ignore leap-second data
from "posixrules", we can postpone the TZDEFRULES load into the path
where we actually need to substitute for a missing date rule string.
That means it will never happen at all in common scenarios, making it
reasonable to dynamically allocate the cache space when it does happen.
Even when the data is already loaded, this saves some cycles in the
common code path since we avoid a memcpy of 23KB or so. And, IMO at
least, this is a less ugly hack on the IANA logic than what we had
before, since it's not messing with the lastditch-vs-regular code paths.
Back-patch to all supported branches, not so much because this is a
critical change as that I want to keep all our copies of the IANA
timezone code in sync.
Tom Lane [Tue, 16 Oct 2018 20:27:15 +0000 (16:27 -0400)]
Back off using -isysroot on Darwin.
Rethink the solution applied in commit 5e2217131 to get PL/Tcl to
build on macOS Mojave. I feared that adding -isysroot globally might
have undesirable consequences, and sure enough Jakob Egger reported
one: it complicates building extensions with a different Xcode version
than was used for the core server. (I find that a risky proposition
in general, but apparently it works most of the time, so we shouldn't
break it if we don't have to.)
We'd already adopted the solution for PL/Perl of inserting the sysroot
path directly into the -I switches used to find Perl's headers, and we
can do the same thing for PL/Tcl by changing the -iwithsysroot switch
that Apple's tclConfig.sh reports. This restricts the risks to PL/Perl
and PL/Tcl themselves and directly-dependent extensions, which is a lot
more pleasing in general than a global -isysroot switch.
Along the way, tighten the test to see if we need to inject the sysroot
path into $perl_includedir, as I'd speculated about upthread but not
gotten round to doing.
Tom Lane [Tue, 16 Oct 2018 17:56:58 +0000 (13:56 -0400)]
Avoid rare race condition in privileges.sql regression test.
We created a temp table, then switched to a new session, leaving
the old session to clean up its temp objects in background. If that
took long enough, the eventual attempt to drop the user that owns
the temp table could fail, as exhibited today by sidewinder.
Fix by dropping the temp table explicitly when we're done with it.
It's been like this for quite some time, so back-patch to all
supported branches.
localtime.c's "struct state" is a rather large object, ~23KB. We were
statically allocating one for gmtsub() to use to represent the GMT
timezone, even though that function is not at all heavily used and is
never reached in most backends. Let's malloc it on-demand, instead.
This does pose the question of how to handle a malloc failure, but
there's already a well-defined error report convention here, ie
set errno and return NULL.
We have but one caller of pg_gmtime in HEAD, and two in back branches,
neither of which were troubling to check for error. Make them do so.
The possible errors are sufficiently unlikely (out-of-range timestamp,
and now malloc failure) that I think elog() is adequate.
Back-patch to all supported branches to keep our copies of the IANA
timezone code in sync. This particular change is in a stanza that
already differs from upstream, so it's a wash for maintenance purposes
--- but only as long as we keep the branches the same.
Tom Lane [Mon, 15 Oct 2018 18:01:38 +0000 (14:01 -0400)]
Check for stack overrun in standard_ProcessUtility().
ProcessUtility can recurse, and indeed can be driven to infinite
recursion, so it ought to have a check_stack_depth() call. This
covers the reported bug (portal trying to execute itself) and a bunch
of other cases that could perhaps arise somewhere.
Per bug #15428 from Malthe Borch. Back-patch to all supported branches.
Michael Paquier [Sun, 14 Oct 2018 13:24:01 +0000 (22:24 +0900)]
Avoid duplicate XIDs at recovery when building initial snapshot
On a primary, sets of XLOG_RUNNING_XACTS records are generated on a
periodic basis to allow recovery to build the initial state of
transactions for a hot standby. The set of transaction IDs is created
by scanning all the entries in ProcArray. However it happens that its
logic never counted on the fact that two-phase transactions finishing to
prepare can put ProcArray in a state where there are two entries with
the same transaction ID, one for the initial transaction which gets
cleared when prepare finishes, and a second, dummy, entry to track that
the transaction is still running after prepare finishes. This way
ensures a continuous presence of the transaction so as callers of for
example TransactionIdIsInProgress() are always able to see it as alive.
So, if a XLOG_RUNNING_XACTS takes a standby snapshot while a two-phase
transaction finishes to prepare, the record can finish with duplicated
XIDs, which is a state expected by design. If this record gets applied
on a standby to initial its recovery state, then it would simply fail,
so the odds of facing this failure are very low in practice. It would
be tempting to change the generation of XLOG_RUNNING_XACTS so as
duplicates are removed on the source, but this requires to hold on
ProcArrayLock for longer and this would impact all workloads,
particularly those using heavily two-phase transactions.
XLOG_RUNNING_XACTS is also actually used only to initialize the standby
state at recovery, so instead the solution is taken to discard
duplicates when applying the initial snapshot.
Diagnosed-by: Konstantin Knizhnik
Author: Michael Paquier
Discussion: https://postgr.es/m/0c96b653-4696-d4b4-6b5d-78143175d113@postgrespro.ru
Backpatch-through: 9.3
Tom Lane [Fri, 12 Oct 2018 23:33:57 +0000 (19:33 -0400)]
Remove abstime, reltime, tinterval tables from old regression databases.
In the back branches, drop these tables after the regression tests are
done with them. This fixes failures of cross-branch pg_upgrade testing
caused by these types having been removed in v12. We do lose the ability
to test dump/restore behavior with these types in the back branches, but
the actual loss of code coverage seems to be nil given that there's nothing
very special about these types.
Tom Lane [Fri, 12 Oct 2018 18:49:33 +0000 (14:49 -0400)]
Back-patch addition of the ALLOCSET_FOO_SIZES macros.
These macros were originally added in commit ea268cdc9, and back-patched
into 9.6 before 9.6.0. However, some extensions would like to use them
in older branches, and there seems no harm in providing them. So add
them to all supported branches. Per suggestions from Christoph Berg and
Andres Freund.
Tom Lane [Fri, 5 Oct 2018 20:01:30 +0000 (16:01 -0400)]
Allow btree comparison functions to return INT_MIN.
Historically we forbade datatype-specific comparison functions from
returning INT_MIN, so that it would be safe to invert the sort order
just by negating the comparison result. However, this was never
really safe for comparison functions that directly return the result
of memcmp(), strcmp(), etc, as POSIX doesn't place any such restriction
on those library functions. Buildfarm results show that at least on
recent Linux on s390x, memcmp() actually does return INT_MIN sometimes,
causing sort failures.
The agreed-on answer is to remove this restriction and fix relevant
call sites to not make such an assumption; code such as "res = -res"
should be replaced by "INVERT_COMPARE_RESULT(res)". The same is needed
in a few places that just directly negated the result of memcmp or
strcmp.
To help find places having this problem, I've also added a compile option
to nbtcompare.c that causes some of the commonly used comparators to
return INT_MIN/INT_MAX instead of their usual -1/+1. It'd likely be
a good idea to have at least one buildfarm member running with
"-DSTRESS_SORT_INT_MIN". That's far from a complete test of course,
but it should help to prevent fresh introductions of such bugs.
This is a longstanding portability hazard, so back-patch to all supported
branches.
Tom Lane [Tue, 2 Oct 2018 16:41:28 +0000 (12:41 -0400)]
Set snprintf.c's maximum number of NL arguments to be 31.
Previously, we used the platform's NL_ARGMAX if any, otherwise 16.
The trouble with this is that the platform value is hugely variable,
ranging from the POSIX-minimum 9 to as much as 64K on recent FreeBSD.
Values of more than a dozen or two have no practical use and slow down
the initialization of the argtypes array. Worse, they cause snprintf.c
to consume far more stack space than was the design intention, possibly
resulting in stack-overflow crashes.
Standardize on 31, which is comfortably more than we need (it looks like
no existing translatable message has more than about 10 parameters).
I chose that, not 32, to make the array sizes powers of 2, for some
possible small gain in speed of the memset.
The lack of reported crashes suggests that the set of platforms we
use snprintf.c on (in released branches) may have no overlap with
the set where NL_ARGMAX has unreasonably large values. But that's
not entirely clear, so back-patch to all supported branches.
Tom Lane [Tue, 2 Oct 2018 15:54:13 +0000 (11:54 -0400)]
Fix corner-case failures in has_foo_privilege() family of functions.
The variants of these functions that take numeric inputs (OIDs or
column numbers) are supposed to return NULL rather than failing
on bad input; this rule reduces problems with snapshot skew when
queries apply the functions to all rows of a catalog.
has_column_privilege() had careless handling of the case where the
table OID didn't exist. You might get something like this:
select has_column_privilege(9999,'nosuchcol','select');
ERROR: column "nosuchcol" of relation "(null)" does not exist
or you might get a crash, depending on the platform's printf's response
to a null string pointer.
In addition, while applying the column-number variant to a dropped
column returned NULL as desired, applying the column-name variant
did not:
select has_column_privilege('mytable','........pg.dropped.2........','select');
ERROR: column "........pg.dropped.2........" of relation "mytable" does not exist
It seems better to make this case return NULL as well.
Also, the OID-accepting variants of has_foreign_data_wrapper_privilege,
has_server_privilege, and has_tablespace_privilege didn't follow the
principle of returning NULL for nonexistent OIDs. Superusers got TRUE,
everybody else got an error.
Per investigation of Jaime Casanova's report of a new crash in HEAD.
These behaviors have been like this for a long time, so back-patch to
all supported branches.
Patch by me; thanks to Stephen Frost for discussion and review
Michael Paquier [Tue, 2 Oct 2018 07:37:16 +0000 (16:37 +0900)]
Fix documentation of pgrowlocks using "lock_type" instead of "modes"
The example used in the documentation is outdated as well. This is an
oversight from 0ac5ad5, which bumped up pgrowlocks but forgot some bits
of the documentation.
Reported-by: Chris Wilson
Discussion: https://postgr.es/m/153838692816.2950.12001142346234155699@wrigleys.postgresql.org
Backpatch-through: 9.3
Tom Lane [Mon, 1 Oct 2018 15:42:12 +0000 (11:42 -0400)]
Lock relation used to generate fresh data for RMV.
Back-patch the 9.4-era commit 2636ecf78 into 9.3, as that fixes a case
where we open a relation while not holding any lock on it. It's
probably mostly safe anyway, since no other session could touch the
newly-created table; but I think CheckTableNotInUse could be fooled
if one tried.
Per testing with a patch that complains if we open a relation without
holding any lock on it. I don't plan to back-patch that patch, but we
should close the holes it identifies in all supported branches.
Tom Lane [Mon, 1 Oct 2018 15:39:14 +0000 (11:39 -0400)]
Fix ALTER COLUMN TYPE to not open a relation without any lock.
If the column being modified is referenced by a foreign key constraint
of another table, ALTER TABLE would open the other table (to re-parse
the constraint's definition) without having first obtained a lock on it.
This was evidently intentional, but that doesn't mean it's really safe.
It's especially not safe in 9.3, which pre-dates use of MVCC scans for
catalog reads, but even in current releases it doesn't seem like a good
idea.
We know we'll need AccessExclusiveLock shortly to drop the obsoleted
constraint, so just get that a little sooner to close the hole.
Per testing with a patch that complains if we open a relation without
holding any lock on it. I don't plan to back-patch that patch, but we
should close the holes it identifies in all supported branches.
Tom Lane [Sun, 30 Sep 2018 20:24:56 +0000 (16:24 -0400)]
Fix detection of the result type of strerror_r().
The method we've traditionally used, of redeclaring strerror_r() to
see if the compiler complains of inconsistent declarations, turns out
not to work reliably because some compilers only report a warning,
not an error. Amazingly, this has gone undetected for years, even
though it certainly breaks our detection of whether strerror_r
succeeded.
Let's instead test whether the compiler will take the result of
strerror_r() as a switch() argument. It's possible this won't
work universally either, but it's the best idea I could come up with
on the spur of the moment.
Back-patch of commit 751f532b9. Buildfarm results indicate that only
icc-on-Linux actually has an issue here; perhaps the lack of field
reports indicates that people don't build PG for production that way.
Peter Eisentraut [Fri, 15 Jun 2018 03:22:14 +0000 (23:22 -0400)]
Recurse to sequences on ownership change for all relkinds
When a table ownership is changed, we must apply that also to any owned
sequences. (Otherwise, it would result in a situation that cannot be
restored, because linked sequences must have the same owner as the
table.) But this was previously only applied to regular tables and
materialized views. But it should also apply to at least foreign
tables. This patch removes the relkind check altogether, because it
doesn't save very much and just introduces the possibility of similar
omissions.
Bug: #15238 Reported-by: Christoph Berg <christoph.berg@credativ.de>
Tom Lane [Tue, 25 Sep 2018 17:23:29 +0000 (13:23 -0400)]
Make some fixes to allow building Postgres on macOS 10.14 ("Mojave").
Apple's latest rearrangements of the system-supplied headers have broken
building of PL/Perl and PL/Tcl. The only practical way to fix PL/Tcl is to
start using the "-isysroot" compiler flag to point to SDK-supplied headers,
as Apple expects. We must also start distinguishing where to find Perl's
headers from where to find its shared library; but that seems like good
cleanup anyway.
Extensions that formerly did something like -I$(perl_archlibexp)/CORE
should now do -I$(perl_includedir)/CORE instead. perl_archlibexp
is still the place to look for libperl.so, though.
If for some reason you don't like the default -isysroot setting, you can
override that by setting PG_SYSROOT in configure's arguments. I don't
currently think people would need to do so, unless maybe for cross-version
build purposes.
In addition, teach configure where to find tclConfig.sh. Our traditional
method of searching $auto_path hasn't worked for the last couple of macOS
releases, and it now seems clear that Apple's not going to change that.
The workaround of manually specifying --with-tclconfig was annoying
already, but Mojave's made it a lot more so because the sysroot path now
has to be included as well. Let's just wire the knowledge into configure
instead. To avoid breaking builds against non-default Tcl installations
(e.g. MacPorts) wherein the $auto_path method probably still works,
arrange to try the additional case only after all else has failed.
Back-patch to all supported versions, since at least the buildfarm
cares about that. The changes are set up to not do anything on macOS
releases that are old enough to not have functional sysroot trees.
Tom Lane [Mon, 24 Sep 2018 15:30:51 +0000 (11:30 -0400)]
Fix over-allocation of space for array_out()'s result string.
array_out overestimated the space needed for its output, possibly by
a very substantial amount if the array is multi-dimensional, because
of wrong order of operations in the loop that counts the number of
curly-brace pairs needed. While the output string is normally
short-lived, this could still cause problems in extreme cases.
An additional minor error was that it counted one more delimiter than
is actually needed.
Repair those errors, add an Assert that the space is now correctly
calculated, and make some minor improvements in the comments.
I also failed to resist the temptation to get rid of an integer
modulus operation per array element; a simple comparison is sufficient.
This bug dates clear back to Berkeley days, so back-patch to all
supported versions.
Initialize random() in bootstrap/stand-alone postgres and in initdb.
This removes a difference between the standard IsUnderPostmaster
execution environment and that of --boot and --single. In a stand-alone
backend, "SELECT random()" always started at the same seed.
On a system capable of using posix shared memory, initdb could still
conclude "selecting dynamic shared memory implementation ... sysv".
Crashed --boot or --single postgres processes orphaned shared memory
objects having names that collided with the not-actually-random names
that initdb probed. The sysv fallback appeared after ten crashes of
--boot or --single postgres. Since --boot and --single are rare in
production use, systems used for PostgreSQL development are the
principal candidate to notice this symptom.
Back-patch to 9.3 (all supported versions). PostgreSQL 9.4 introduced
dynamic shared memory, but 9.3 does share the "SELECT random()" problem.
Tom Lane [Sun, 23 Sep 2018 20:05:46 +0000 (16:05 -0400)]
Fix failure in WHERE CURRENT OF after rewinding the referenced cursor.
In a case where we have multiple relation-scan nodes in a cursor plan,
such as a scan of an inheritance tree, it's possible to fetch from a
given scan node, then rewind the cursor and fetch some row from an
earlier scan node. In such a case, execCurrent.c mistakenly thought
that the later scan node was still active, because ExecReScan hadn't
done anything to make it look not-active. We'd get some sort of
failure in the case of a SeqScan node, because the node's scan tuple
slot would be pointing at a HeapTuple whose t_self gets reset to
invalid by heapam.c. But it seems possible that for other relation
scan node types we'd actually return a valid tuple TID to the caller,
resulting in updating or deleting a tuple that shouldn't have been
considered current. To fix, forcibly clear the ScanTupleSlot in
ExecScanReScan.
Another issue here, which seems only latent at the moment but could
easily become a live bug in future, is that rewinding a cursor does
not necessarily lead to *immediately* applying ExecReScan to every
scan-level node in the plan tree. Upper-level nodes will think that
they can postpone that call if their child node is already marked
with chgParam flags. I don't see a way for that to happen today in
a plan tree that's simple enough for execCurrent.c's search_plan_tree
to understand, but that's one heck of a fragile assumption. So, add
some logic in search_plan_tree to detect chgParam flags being set on
nodes that it descended to/through, and assume that that means we
should consider lower scan nodes to be logically reset even if their
ReScan call hasn't actually happened yet.
Per bug #15395 from Matvey Arye. This has been broken for a long time,
so back-patch to all supported branches.
Andres Freund [Fri, 21 Sep 2018 01:11:49 +0000 (18:11 -0700)]
Error out for clang on x86-32 without SSE2 support, no -fexcess-precision.
As clang currently doesn't support -fexcess-precision=standard,
compiling x86-32 code with SSE2 disabled, can lead to problems with
floating point overflow checks and the like.
This issue was noticed because clang, on at least some BSDs, defaults
to i386 compatibility, whereas it defaults to pentium4 on Linux. Our
forced usage of __builtin_isinf() lead to some overflow checks not
triggering when compiling for i386, e.g. when the result of the
calculation didn't overflow in 80bit registers, but did so in 64bit.
While we could just fall back to a non-builtin isinf, it seems likely
that the use of 80bit registers leads to other problems (which is why
we force the flag for GCC already). Therefore error out when
detecting clang in that situation.
Reported-By: Victor Wagner Analyzed-By: Andrew Gierth and Andres Freund
Author: Andres Freund
Discussion: https://postgr.es/m/20180905005130.ewk4xcs5dgyzcy45@alap3.anarazel.de
Backpatch: 9.3-, all supported versions are affected
Tom Lane [Sat, 15 Sep 2018 17:42:34 +0000 (13:42 -0400)]
Fix failure with initplans used conditionally during EvalPlanQual rechecks.
The EvalPlanQual machinery assumes that any initplans (that is,
uncorrelated sub-selects) used during an EPQ recheck would have already
been evaluated during the main query; this is implicit in the fact that
execPlan pointers are not copied into the EPQ estate's es_param_exec_vals.
But it's possible for that assumption to fail, if the initplan is only
reached conditionally. For example, a sub-select inside a CASE expression
could be reached during a recheck when it had not been previously, if the
CASE test depends on a column that was just updated.
This bug is old, appearing to date back to my rewrite of EvalPlanQual in
commit 9f2ee8f28, but was not detected until Kyle Samson reported a case.
To fix, force all not-yet-evaluated initplans used within the EPQ plan
subtree to be evaluated at the start of the recheck, before entering the
EPQ environment. This could be inefficient, if such an initplan is
expensive and goes unused again during the recheck --- but that's piling
one layer of improbability atop another. It doesn't seem worth adding
more complexity to prevent that, at least not in the back branches.
It was convenient to use the new-in-v11 ExecEvalParamExecParams function
to implement this, but I didn't like either its name or the specifics of
its API, so revise that.
Back-patch all the way. Rather than rewrite the patch to avoid depending
on bms_next_member() in the oldest branches, I chose to back-patch that
function into 9.4 and 9.3. (This isn't the first time back-patches have
needed that, and it exhausted my patience.) I also chose to back-patch
some test cases added by commits 71404af2a and 342a1ffa2 into 9.4 and 9.3,
so that the 9.x versions of eval-plan-qual.spec are all the same.
Andrew Gierth diagnosed the problem and contributed the added test cases,
though the actual code changes are by me.
Andrew Gierth [Wed, 12 Sep 2018 18:31:06 +0000 (19:31 +0100)]
Repair bug in regexp split performance improvements.
Commit c8ea87e4b introduced a temporary conversion buffer for
substrings extracted during regexp splits. Unfortunately the code that
sized it was failing to ignore the effects of ignored degenerate
regexp matches, so for regexp_split_* calls it could under-size the
buffer in such cases.
Fix, and add some regression test cases (though those will only catch
the bug if run in a multibyte encoding).
Backpatch to 9.3 as the faulty code was.
Thanks to the PostGIS project, Regina Obe and Paul Ramsey for the
report (via IRC) and assistance in analysis. Patch by me.
Tom Lane [Tue, 1 May 2018 16:02:41 +0000 (12:02 -0400)]
On all Windows platforms, not just Cygwin, use _timezone and _tzname.
Back-patch commit 868628e4f into the 9.5 branch, so that we can support
building that branch with Visual Studio 2015. This patch itself could
go further back, but other VS2015 patches such as 0fb54de9a and c8e81afc6
were only back-patched to 9.5, so there seems little point in handling
this one differently.
Andrew Dunstan [Fri, 29 Apr 2016 11:59:47 +0000 (07:59 -0400)]
Support building with Visual Studio 2015
Adjust the way we detect the locale. As a result the minumum Windows
version supported by VS2015 and later is Windows Vista. Add some tweaks
to remove new compiler warnings. Remove documentation references to the
now obsolete msysGit.
Michael Paquier, somewhat edited by me, reviewed by Christian Ullrich.
Tom Lane [Sat, 8 Sep 2018 00:09:57 +0000 (20:09 -0400)]
Save/restore SPI's global variables in SPI_connect() and SPI_finish().
This patch removes two sources of interference between nominally
independent functions when one SPI-using function calls another,
perhaps without knowing that it does so.
Chapman Flack pointed out that xml.c's query_to_xml_internal() expects
SPI_tuptable and SPI_processed to stay valid across datatype output
function calls; but it's possible that such a call could involve
re-entrant use of SPI. It seems likely that there are similar hazards
elsewhere, if not in the core code then in third-party SPI users.
Previously SPI_finish() reset SPI's API globals to zeroes/nulls, which
would typically make for a crash in such a situation. Restoring them
to the values they had at SPI_connect() seems like a considerably more
useful behavior, and it still meets the design goal of not leaving any
dangling pointers to tuple tables of the function being exited.
Also, cause SPI_connect() to reset these variables to zeroes/nulls after
saving them. This prevents interference in the opposite direction: it's
possible that a SPI-using function that's only ever been tested standalone
contains assumptions that these variables start out as zeroes. That was
the case as long as you were the outermost SPI user, but not so much for
an inner user. Now it's consistent.
Report and fix suggestion by Chapman Flack, actual patch by me.
Back-patch to all supported branches.
Tom Lane [Fri, 7 Sep 2018 22:13:29 +0000 (18:13 -0400)]
Limit depth of forced recursion for CLOBBER_CACHE_RECURSIVELY.
It's somewhat surprising that we got away with this before. (Actually,
since nobody tests this routinely AFAIK, it might've been broken for
awhile. But it's definitely broken in the wake of commit f868a8143.)
It seems sufficient to limit the forced recursion to a small number
of levels.
Back-patch to all supported branches, like the preceding patch.
Tom Lane [Fri, 7 Sep 2018 22:04:38 +0000 (18:04 -0400)]
Fix longstanding recursion hazard in sinval message processing.
LockRelationOid and sibling routines supposed that, if our session already
holds the lock they were asked to acquire, they could skip calling
AcceptInvalidationMessages on the grounds that we must have already read
any remote sinval messages issued against the relation being locked.
This is normally true, but there's a critical special case where it's not:
processing inside AcceptInvalidationMessages might attempt to access system
relations, resulting in a recursive call to acquire a relation lock.
Hence, if the outer call had acquired that same system catalog lock, we'd
fall through, despite the possibility that there's an as-yet-unread sinval
message for that system catalog. This could, for example, result in
failure to access a system catalog or index that had just been processed
by VACUUM FULL. This is the explanation for buildfarm failures we've been
seeing intermittently for the past three months. The bug is far older
than that, but commits a54e1f158 et al added a new recursion case within
AcceptInvalidationMessages that is apparently easier to hit than any
previous case.
To fix this, we must not skip calling AcceptInvalidationMessages until
we have *finished* a call to it since acquiring a relation lock, not
merely acquired the lock. (There's already adequate logic inside
AcceptInvalidationMessages to deal with being called recursively.)
Fortunately, we can implement that at trivial cost, by adding a flag
to LOCALLOCK hashtable entries that tracks whether we know we have
completed such a call.
There is an API hazard added by this patch for external callers of
LockAcquire: if anything is testing for LOCKACQUIRE_ALREADY_HELD,
it might be fooled by the new return code LOCKACQUIRE_ALREADY_CLEAR
into thinking the lock wasn't already held. This should be a fail-soft
condition, though, unless something very bizarre is being done in
response to the test.
Also, I added an additional output argument to LockAcquireExtended,
assuming that that probably isn't called by any outside code given
the very limited usefulness of its additional functionality.
Tom Lane [Thu, 6 Sep 2018 14:49:45 +0000 (10:49 -0400)]
Make contrib/unaccent's unaccent() function work when not in search path.
Since the fixes for CVE-2018-1058, we've advised people to schema-qualify
function references in order to fix failures in code that executes under
a minimal search_path setting. However, that's insufficient to make the
single-argument form of unaccent() work, because it looks up the "unaccent"
text search dictionary using the search path.
The most expedient answer seems to be to remove the search_path dependency
by making it look in the same schema that the unaccent() function itself
is declared in. This will definitely work for the normal usage of this
function with the unaccent dictionary provided by the extension.
It's barely possible that there are people who were relying on the
search-path-dependent behavior to select other dictionaries with the same
name; but if there are any such people at all, they can still get that
behavior by writing unaccent('unaccent', ...), or possibly
unaccent('unaccent'::text::regdictionary, ...) if the lookup has to be
postponed to runtime.
Per complaint from Gunnlaugur Thor Briem. Back-patch to all supported
branches.
Bruce Momjian [Wed, 5 Sep 2018 02:34:07 +0000 (22:34 -0400)]
docs: improve AT TIME ZONE description
The previous description was unclear. Also add a third example, change
use of time zone acronyms to more verbose descriptions, and add a
mention that using 'time' with AT TIME ZONE uses the current time zone
rules.
Tom Lane [Sat, 1 Sep 2018 20:02:47 +0000 (16:02 -0400)]
Doc: fix oversights in "Client/Server Character Set Conversions" table.
This table claimed that JOHAB could be used as a server encoding, which
was true originally but hasn't been true since 8.3. It also lacked
entries for EUC_JIS_2004 and SHIFT_JIS_2004.
JOHAB problem noted by Lars Kanis, the others by me.
Tom Lane [Sat, 1 Sep 2018 19:27:14 +0000 (15:27 -0400)]
Avoid using potentially-under-aligned page buffers.
There's a project policy against using plain "char buf[BLCKSZ]" local
or static variables as page buffers; preferred style is to palloc or
malloc each buffer to ensure it is MAXALIGN'd. However, that policy's
been ignored in an increasing number of places. We've apparently got
away with it so far, probably because (a) relatively few people use
platforms on which misalignment causes core dumps and/or (b) the
variables chance to be sufficiently aligned anyway. But this is not
something to rely on. Moreover, even if we don't get a core dump,
we might be paying a lot of cycles for misaligned accesses.
To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock
that the compiler must allocate with sufficient alignment, and use
those in place of plain char arrays.
I used these types even for variables where there's no risk of a
misaligned access, since ensuring proper alignment should make
kernel data transfers faster. I also changed some places where
we had been palloc'ing short-lived buffers, for coding style
uniformity and to save palloc/pfree overhead.
Since this seems to be a live portability hazard (despite the lack
of field reports), back-patch to all supported versions.
Patch by me; thanks to Michael Paquier for review.
Michael Paquier [Fri, 31 Aug 2018 18:06:09 +0000 (11:06 -0700)]
Ensure correct minimum consistent point on standbys
Startup process has improved its calculation of incorrect minimum
consistent point in 8d68ee6, which ensures that all WAL available gets
replayed when doing crash recovery, and has introduced an incorrect
calculation of the minimum recovery point for non-startup processes,
which can cause incorrect page references on a standby when for example
the background writer flushed a couple of pages on-disk but was not
updating the control file to let a subsequent crash recovery replay to
where it should have.
The only case where this has been reported to be a problem is when a
standby needs to calculate the latest removed xid when replaying a btree
deletion record, so one would need connections on a standby that happen
just after recovery has thought it reached a consistent point. Using a
background worker which is started after the consistent point is reached
would be the easiest way to get into problems if it connects to a
database. Having clients which attempt to connect periodically could
also be a problem, but the odds of seeing this problem are much lower.
The fix used is pretty simple, as the idea is to give access to the
minimum recovery point written in the control file to non-startup
processes so as they use a reference, while the startup process still
initializes its own references of the minimum consistent point so as the
original problem with incorrect page references happening post-promotion
with a crash do not show up.
Reported-by: Alexander Kukushkin Diagnosed-by: Alexander Kukushkin
Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Alexander Kukushkin
Discussion: https://postgr.es/m/153492341830.1368.3936905691758473953@wrigleys.postgresql.org
Backpatch-through: 9.3
Enforce cube dimension limit in all cube construction functions
contrib/cube has a limit to 100 dimensions for cube datatype. However, it's
not enforced everywhere, and one can actually construct cube with more than
100 dimensions having then trouble with dump/restore. This commit add checks
for dimensions limit in all functions responsible for cube construction.
Backpatch to all supported versions.
Reported-by: Andrew Gierth
Discussion: https://postgr.es/m/87va7uybt4.fsf%40news-spur.riddles.org.uk
Author: Andrey Borodin with small additions by me
Review: Tom Lane
Backpatch-through: 9.3
Split contrib/cube platform-depended checks into separate test
We're currently maintaining two outputs for cube regression test. But that
appears to be unsuitable, because these outputs are different in out few checks
involving scientific notation. So, split checks involving scientific notation
into separate test, making contrib/cube easier to maintain. Backpatch to all
supported versions in order to make further backpatching easier.
Discussion: https://postgr.es/m/CAPpHfdvJgWjxHsJTtT%2Bo1tz3OR8EFHcLQjhp-d3%2BUcmJLh-fQA%40mail.gmail.com
Author: Alexander Korotkov
Backpatch-through: 9.3
Tom Lane [Fri, 31 Aug 2018 16:26:20 +0000 (12:26 -0400)]
Make checksum_impl.h safe to compile with -fstrict-aliasing.
In general, Postgres requires -fno-strict-aliasing with compilers that
implement C99 strict aliasing rules. There's little hope of getting
rid of that overall. But it seems like it would be a good idea if
storage/checksum_impl.h in particular didn't depend on it, because
that header is explicitly intended to be included by external programs.
We don't have a lot of control over the compiler switches that an
external program might use, as shown by Michael Banck's report of
failure in a privately-modified version of pg_verify_checksums.
Hence, switch to using a union in place of willy-nilly pointer casting
inside this file. I think this makes the code a bit more readable
anyway.
checksum_impl.h hasn't changed since it was introduced in 9.3,
so back-patch all the way.
Andrew Gierth [Tue, 28 Aug 2018 08:52:25 +0000 (09:52 +0100)]
Avoid quadratic slowdown in regexp match/split functions.
regexp_matches, regexp_split_to_table and regexp_split_to_array all
work by compiling a list of match positions as character offsets (NOT
byte positions) in the source string.
Formerly, they then used text_substr to extract the matched text; but
in a multi-byte encoding, that counts the characters in the string,
and the characters needed to reach the starting byte position, on
every call. Accordingly, the performance degraded as the product of
the input string length and the number of match positions, such that
splitting a string of a few hundred kbytes could take many minutes.
Repair by keeping the wide-character copy of the input string
available (only in the case where encoding_max_length is not 1) after
performing the match operation, and extracting substrings from that
instead. This reduces the complexity to being linear in the number of
result bytes, discounting the actual regexp match itself (which is not
affected by this patch).
In passing, remove cleanup using retail pfree() which was obsoleted by
commit ff428cded (Feb 2008) which made cleanup of SRF multi-call
contexts automatic. Also increase (to ~134 million) the maximum number
of matches and provide an error message when it is reached.
Backpatch all the way because this has been wrong forever.
Tom Lane [Sun, 26 Aug 2018 18:21:55 +0000 (14:21 -0400)]
Make syslogger more robust against failures in opening CSV log files.
The previous coding figured it'd be good enough to postpone opening
the first CSV log file until we got a message we needed to write there.
This is unsafe, though, because if the open fails we end up in infinite
recursion trying to report the failure. Instead make the CSV log file
management code look as nearly as possible like the longstanding logic
for the stderr log file. In particular, open it immediately at postmaster
startup (if enabled), or when we get a SIGHUP in which we find that
log_destination has been changed to enable CSV logging.
It seems OK to fail if a postmaster-start-time open attempt fails, as
we've long done for the stderr log file. But we can't die if we fail
to open a CSV log file during SIGHUP, so we're still left with a problem.
In that case, write any output meant for the CSV log file to the stderr
log file. (This will also cover race-condition cases in which backends
send CSV log data before or after we have the CSV log file open.)
This patch also fixes an ancient oversight that, if CSV logging was
turned off during a SIGHUP, we never actually closed the last CSV
log file.
In passing, remember to reset whereToSendOutput = DestNone during syslogger
start, since (unlike all other postmaster children) it's forked before the
postmaster has done that. This made for a platform-dependent difference
in error reporting behavior between the syslogger and other children:
except on Windows, it'd report problems to the original postmaster stderr
as well as the normal error log file(s). It's barely possible that that
was intentional at some point; but it doesn't seem likely to be desirable
in production, and the platform dependency definitely isn't desirable.
Per report from Alexander Kukushkin. It's been like this for a long time,
so back-patch to all supported branches.
Andrew Gierth [Thu, 23 Aug 2018 15:35:33 +0000 (16:35 +0100)]
Reduce an unnecessary O(N^3) loop in lexer.
The lexer's handling of operators contained an O(N^3) hazard when
dealing with long strings of + or - characters; it seems hard to
prevent this case from being O(N^2), but the additional N multiplier
was not needed.
Backpatch all the way since this has been there since 7.x, and it
presents at least a mild hazard in that trying to do Bind, PREPARE or
EXPLAIN on a hostile query could take excessive time (without
honouring cancels or timeouts) even if the query was never executed.
Michael Paquier [Tue, 21 Aug 2018 06:18:35 +0000 (15:18 +0900)]
Fix set of NLS translation issues
While monitoring the code, it has been noticed that GSSAPI
authentication missed two translations.
Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20180810.152131.31921918.horiguchi.kyotaro@lab.ntt.co.jp
Backpatch-through: 9.3
Tom Lane [Fri, 17 Aug 2018 21:12:21 +0000 (17:12 -0400)]
Ensure schema qualification in pg_restore DISABLE/ENABLE TRIGGER commands.
Previously, this code blindly followed the common coding pattern of
passing PQserverVersion(AH->connection) as the server-version parameter
of fmtQualifiedId. That works as long as we have a connection; but in
pg_restore with text output, we don't. Instead we got a zero from
PQserverVersion, which fmtQualifiedId interpreted as "server is too old to
have schemas", and so the name went unqualified. That still accidentally
managed to work in many cases, which is probably why this ancient bug went
undetected for so long. It only became obvious in the wake of the changes
to force dump/restore to execute with restricted search_path.
In HEAD/v11, let's deal with this by ripping out fmtQualifiedId's server-
version behavioral dependency, and just making it schema-qualify all the
time. We no longer support pg_dump from servers old enough to need the
ability to omit schema name, let alone restoring to them. (Also, the few
callers outside pg_dump already didn't work with pre-schema servers.)
In older branches, that's not an acceptable solution, so instead just
tweak the DISABLE/ENABLE TRIGGER logic to ensure it will schema-qualify
its output regardless of server version.
Per bug #15338 from Oleg somebody. Back-patch to all supported branches.
Andrew Gierth [Fri, 17 Aug 2018 14:04:26 +0000 (15:04 +0100)]
Set scan direction appropriately for SubPlans (bug #15336)
When executing a SubPlan in an expression, the EState's direction
field was left alone, resulting in an attempt to execute the subplan
backwards if it was encountered during a backwards scan of a cursor.
Also, though much less likely, it was possible to reach the execution
of an InitPlan while in backwards-scan state.
Repair by saving/restoring estate->es_direction and forcing forward
scan mode in the relevant places.
Backpatch all the way, since this has been broken since 8.3 (prior to
commit c7ff7663e, SubPlans had their own EStates rather than sharing
the parent plan's, so there was no confusion over scan direction).
Per bug #15336 reported by Vladimir Baranoff; analysis and patch by
me, review by Tom Lane.
Bruce Momjian [Fri, 17 Aug 2018 14:25:48 +0000 (10:25 -0400)]
pg_upgrade: issue helpful error message for use on standbys
Commit 777e6ddf1723306bd2bf8fe6f804863f459b0323 checked for a shut down
message from a standby and allowed it to continue. This patch reports a
helpful error message in these cases, suggesting to use rsync as
documented.
Tom Lane [Wed, 15 Aug 2018 21:25:24 +0000 (17:25 -0400)]
Make snprintf.c follow the C99 standard for snprintf's result value.
C99 says that the result should be the number of bytes that would have
been emitted given a large enough buffer, not the number we actually
were able to put in the buffer. It's time to make our substitute
implementation comply with that. Not doing so results in inefficiency
in buffer-enlargement cases, and also poses a portability hazard for
third-party code that might expect C99-compliant snprintf behavior
within Postgres.
In passing, remove useless tests for str == NULL; neither C99 nor
predecessor standards ever allowed that except when count == 0,
so I see no reason to expend cycles on making that a non-crash case
for this implementation. Also, don't waste a byte in pg_vfprintf's
local I/O buffer; this might have performance benefits by allowing
aligned writes during flushbuffer calls.
Back-patch of commit 805889d7d. There was some concern about this
possibly breaking code that assumes pre-C99 behavior, but there is
much more risk (and reality, in our own code) of code that assumes
C99 behavior and hence fails to detect buffer overrun without this.
Tom Lane [Wed, 15 Aug 2018 20:29:32 +0000 (16:29 -0400)]
Clean up assorted misuses of snprintf()'s result value.
Fix a small number of places that were testing the result of snprintf()
but doing so incorrectly. The right test for buffer overrun, per C99,
is "result >= bufsize" not "result > bufsize". Some places were also
checking for failure with "result == -1", but the standard only says
that a negative value is delivered on failure.
(Note that this only makes these places correct if snprintf() delivers
C99-compliant results. But at least now these places are consistent
with all the other places where we assume that.)
Also, make psql_start_test() and isolation_start_test() check for
buffer overrun while constructing their shell commands. There seems
like a higher risk of overrun, with more severe consequences, here
than there is for the individual file paths that are made elsewhere
in the same functions, so this seemed like a worthwhile change.
Also fix guc.c's do_serialize() to initialize errno = 0 before
calling vsnprintf. In principle, this should be unnecessary because
vsnprintf should have set errno if it returns a failure indication ...
but the other two places this coding pattern is cribbed from don't
assume that, so let's be consistent.
These errors are all very old, so back-patch as appropriate. I think
that only the shell command overrun cases are even theoretically
reachable in practice, but there's not much point in erroneous error
checks.
Bruce Momjian [Tue, 14 Aug 2018 21:19:02 +0000 (17:19 -0400)]
pg_upgrade: fix shutdown check for standby servers
Commit 244142d32afd02e7408a2ef1f249b00393983822 only tested for the
pg_controldata output for primary servers, but standby servers have
different "Database cluster state" output, so check for that too.
Diagnosed-by: Michael Paquier
Discussion: https://postgr.es/m/20180810164240.GM13638@paquier.xyz
Bruce Momjian [Thu, 9 Aug 2018 14:13:15 +0000 (10:13 -0400)]
docs: Only first instance of a PREPARE parameter sets data type
If the first reference to $1 is "($1 = col) or ($1 is null)", the data
type can be determined, but not for "($1 is null) or ($1 = col)". This
change documents this.
Don't run atexit callbacks in quickdie signal handlers.
exit() is not async-signal safe. Even if the libc implementation is, 3rd
party libraries might have installed unsafe atexit() callbacks. After
receiving SIGQUIT, we really just want to exit as quickly as possible, so
we don't really want to run the atexit() callbacks anyway.
The original report by Jimmy Yih was a self-deadlock in startup_die().
However, this patch doesn't address that scenario; the signal handling
while waiting for the startup packet is more complicated. But at least this
alleviates similar problems in the SIGQUIT handlers, like that reported
by Asim R P later in the same thread.
Tom Lane [Tue, 7 Aug 2018 20:32:50 +0000 (16:32 -0400)]
Don't record FDW user mappings as members of extensions.
CreateUserMapping has a recordDependencyOnCurrentExtension call that's
been there since extensions were introduced (very possibly my fault).
However, there's no support anywhere else for user mappings as members
of extensions, nor are they listed as a possible member object type in
the documentation. Nor does it really seem like a good idea for user
mappings to belong to extensions when roles don't. Hence, remove the
bogus call.
(As we saw in bug #15310, the lack of any pg_dump support for this case
ensures that any such membership record would silently disappear during
pg_upgrade. So there's probably no need for us to do anything else
about cleaning up after this mistake.)
Tom Lane [Tue, 7 Aug 2018 20:00:44 +0000 (16:00 -0400)]
Fix incorrect initialization of BackendActivityBuffer.
Since commit c8e8b5a6e, this has been zeroed out using the wrong length.
In practice the length would always be too small, leading to not zeroing
the whole buffer rather than clobbering additional memory; and that's
pretty harmless, both because shmem would likely start out as zeroes
and because we'd reinitialize any given entry before use. Still,
it's bogus, so fix it.
Tom Lane [Tue, 7 Aug 2018 19:43:49 +0000 (15:43 -0400)]
Fix pg_upgrade to handle event triggers in extensions correctly.
pg_dump with --binary-upgrade must emit ALTER EXTENSION ADD commands
for all objects that are members of extensions. It forgot to do so for
event triggers, as per bug #15310 from Nick Barnes. Back-patch to 9.3
where event triggers were introduced.
Tom Lane [Tue, 7 Aug 2018 17:13:42 +0000 (13:13 -0400)]
Ensure pg_dump_sort.c sorts null vs non-null namespace consistently.
The original coding here (which is, I believe, my fault) supposed that
it didn't need to concern itself with the possibility that one object
of a given type-priority has a namespace while another doesn't. But
that's not reliably true anymore, if it ever was; and if it does happen
then it's possible that DOTypeNameCompare returns self-inconsistent
comparison results. That leads to unspecified behavior in qsort()
and a resultant weird output order from pg_dump.
This should end up being only a cosmetic problem, because any ordering
constraints that actually matter should be enforced by the later
dependency-based sort. Still, it's a bug, so back-patch.
Report and fix by Jacob Champion, though I editorialized on his
patch to the extent of making NULL sort after non-NULL, for consistency
with our usual sorting definitions.
Tom Lane [Mon, 6 Aug 2018 14:53:35 +0000 (10:53 -0400)]
Fix failure to reset libpq's state fully between connection attempts.
The logic in PQconnectPoll() did not take care to ensure that all of
a PGconn's internal state variables were reset before trying a new
connection attempt. If we got far enough in the connection sequence
to have changed any of these variables, and then decided to try a new
server address or server name, the new connection might be completed
with some state that really only applied to the failed connection.
While this has assorted bad consequences, the only one that is clearly
a security issue is that password_needed didn't get reset, so that
if the first server asked for a password and the second didn't,
PQconnectionUsedPassword() would return an incorrect result. This
could be leveraged by unprivileged users of dblink or postgres_fdw
to allow them to use server-side login credentials that they should
not be able to use.
Other notable problems include the possibility of forcing a v2-protocol
connection to a server capable of supporting v3, or overriding
"sslmode=prefer" to cause a non-encrypted connection to a server that
would have accepted an encrypted one. Those are certainly bugs but
it's harder to paint them as security problems in themselves. However,
forcing a v2-protocol connection could result in libpq having a wrong
idea of the server's standard_conforming_strings setting, which opens
the door to SQL-injection attacks. The extent to which that's actually
a problem, given the prerequisite that the attacker needs control of
the client's connection parameters, is unclear.
These problems have existed for a long time, but became more easily
exploitable in v10, both because it introduced easy ways to force libpq
to abandon a connection attempt at a late stage and then try another one
(rather than just giving up), and because it provided an easy way to
specify multiple target hosts.
Fix by rearranging PQconnectPoll's state machine to provide centralized
places to reset state properly when moving to a new target host or when
dropping and retrying a connection to the same host.
Tom Lane, reviewed by Noah Misch. Our thanks to Andrew Krasichkov
for finding and reporting the problem.
Michael Paquier [Sat, 4 Aug 2018 20:32:54 +0000 (05:32 +0900)]
Reset properly errno before calling write()
6cb3372 enforces errno to ENOSPC when less bytes than what is expected
have been written when it is unset, though it forgot to properly reset
errno before doing a system call to write(), causing errno to
potentially come from a previous system call.
Reported-by: Tom Lane
Author: Michael Paquier Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/31797.1533326676@sss.pgh.pa.us
Peter Geoghegan [Fri, 3 Aug 2018 21:44:26 +0000 (14:44 -0700)]
Add table relcache invalidation to index builds.
It's necessary to make sure that owning tables have a relcache
invalidation prior to advancing the command counter to make
newly-entered catalog tuples for the index visible. inval.c must be
able to maintain the consistency of the local caches in the event of
transaction abort. There is usually only a problem when CREATE INDEX
transactions abort, since there is a generic invalidation once we reach
index_update_stats().
This bug is of long standing. Problems were made much more likely by
the addition of parallel CREATE INDEX (commit 9da0cc35284), but it is
strongly suspected that similar problems can be triggered without
involving plan_create_index_workers(). (plan_create_index_workers()
triggers a relcache build or rebuild, which previously only happened in
rare edge cases.)
Author: Peter Geoghegan Reported-By: Luca Ferrari Diagnosed-By: Andres Freund Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAKoxK+5fVodiCtMsXKV_1YAKXbzwSfp7DgDqUmcUAzeAhf=HEQ@mail.gmail.com
Backpatch: 9.3-
Tom Lane [Tue, 31 Jul 2018 17:00:08 +0000 (13:00 -0400)]
Further fixes for quoted-list GUC values in pg_dump and ruleutils.c.
Commits 742869946 et al turn out to be a couple bricks shy of a load.
We were dumping the stored values of GUC_LIST_QUOTE variables as they
appear in proconfig or setconfig catalog columns. However, although that
quoting rule looks a lot like SQL-identifier double quotes, there are two
critical differences: empty strings ("") are legal, and depending on which
variable you're considering, values longer than NAMEDATALEN might be valid
too. So the current technique fails altogether on empty-string list
entries (as reported by Steven Winfield in bug #15248) and it also risks
truncating file pathnames during dump/reload of GUC values that are lists
of pathnames.
To fix, split the stored value without any downcasing or truncation,
and then emit each element as a SQL string literal.
This is a tad annoying, because we now have three copies of the
comma-separated-string splitting logic in varlena.c as well as a fourth
one in dumputils.c. (Not to mention the randomly-different-from-those
splitting logic in libpq...) I looked at unifying these, but it would
be rather a mess unless we're willing to tweak the API definitions of
SplitIdentifierString, SplitDirectoriesString, or both. That might be
worth doing in future; but it seems pretty unsafe for a back-patched
bug fix, so for now accept the duplication.
Back-patch to all supported branches, as the previous fix was.
Affected test queries have been testing the wrong thing since their
introduction in commit 4c1383efd132e4f532213c8a8cc63a455f55e344.
Back-patch to 9.3 (all supported versions).
Document security implications of qualified names.
Commit 5770172cb0c9df9e6ce27c507b449557e5b45124 documented secure schema
usage, and that advice suffices for using unqualified names securely.
Document, in typeconv-func primarily, the additional issues that arise
with qualified names. Back-patch to 9.3 (all supported versions).
Bruce Momjian [Sat, 28 Jul 2018 19:01:55 +0000 (15:01 -0400)]
pg_upgrade: check for clean server shutdowns
Previously pg_upgrade checked for the pid file and started/stopped the
server to force a clean shutdown. However, "pg_ctl -m immediate"
removes the pid file but doesn't do a clean shutdown, so check
pg_controldata for a clean shutdown too.
Diagnosed-by: Vimalraj A
Discussion: https://postgr.es/m/CAFKBAK5e4Q-oTUuPPJ56EU_d2Rzodq6GWKS3ncAk3xo7hAsOZg@mail.gmail.com
Tom Lane [Sat, 21 Jul 2018 19:40:52 +0000 (15:40 -0400)]
Further portability hacking in pg_upgrade's test script.
I blew the dust off a Bourne shell (file date 1996, yea verily) and
tried to run test.sh with it. It mostly worked, but I found that the
temp-directory creation code introduced by commit be76a6d39 was not
compatible, for a couple of reasons: this shell thinks "set -e" should
force an exit if a command within backticks fails, and it also thinks code
within braces should be executed by a sub-shell, meaning that variable
settings don't propagate back up to the parent shell. In view of Victor
Wagner's report that Solaris is still using pre-POSIX shells, seems like
we oughta make this case work. It's not like the code is any less
idiomatic this way; the prior coding technique appeared nowhere else.
(There is a remaining bash-ism here, which is that $RANDOM doesn't do
what the code hopes in non-bash shells. But the use of $$ elsewhere in
that path should be enough to ensure uniqueness and some amount of
randomness, so I think it's okay as-is.)
Back-patch to all supported branches, as the previous commit was.
Tom Lane [Fri, 13 Jul 2018 22:45:30 +0000 (18:45 -0400)]
Fix crash in contrib/ltree's lca() function for empty input array.
lca_inner() wasn't prepared for the possibility of getting no inputs.
Fix that, and make some cosmetic improvements to the code while at it.
Also, I thought the documentation of this function as returning the
"longest common prefix" of the paths was entirely misleading; it really
returns a path one shorter than the longest common prefix, for the typical
definition of "prefix". Don't use that term in the docs, and adjust the
examples to clarify what really happens.
This has been broken since its beginning, so back-patch to all supported
branches.
Per report from Hailong Li. Thanks to Pierre Ducroquet for diagnosing
and for the initial patch, though I whacked it around some and added
test cases.
Tom Lane [Fri, 13 Jul 2018 15:52:50 +0000 (11:52 -0400)]
Fix inadequate buffer locking in FSM and VM page re-initialization.
When reading an existing FSM or VM page that was found to be corrupt by the
buffer manager, the code applied PageInit() to reinitialize the page, but
did so without any locking. There is thus a hazard that two backends might
concurrently do PageInit, which in itself would still be OK, but the slower
one might then zero over subsequent data changes applied by the faster one.
Even that is unlikely to be fatal; but it's not desirable, so add locking
to prevent it.
This does not add any locking overhead in the normal code path where the
page is OK. It's not immediately obvious that that's safe, but I believe
it is, for reasons explained in the added comments.
Problem noted by R P Asim. It's been like this for a long time, so
back-patch to all supported branches.
Tom Lane [Thu, 12 Jul 2018 16:28:43 +0000 (12:28 -0400)]
Doc: minor improvement in pl/pgsql FETCH/MOVE documentation.
Explain that you can use any integer expression for the "count" in
pl/pgsql's versions of FETCH/MOVE, unlike the SQL versions which only
allow a constant.
Remove the duplicate version of this para under MOVE. I don't see
a good reason to maintain two identical paras when we just said that
MOVE works exactly like FETCH.
Tom Lane [Mon, 9 Jul 2018 23:26:19 +0000 (19:26 -0400)]
Avoid emitting a bogus WAL record when recycling an all-zero btree page.
Commit fafa374f2 caused _bt_getbuf() to possibly emit a WAL record for
a page that it was about to recycle. However, it failed to distinguish
all-zero pages from dead pages, which is important because only the
latter have valid btpo.xact values, or indeed any special space at all.
Recycling an all-zero page with XLogStandbyInfoActive() enabled therefore
led to an Assert failure, or to emission of a WAL record containing a
bogus cutoff XID, which might lead to unnecessary query cancellations
on hot standby servers.
Per reports from Antonin Houska and 自己. Amit Kapila was first to
propose this fix, and Robert Haas, myself, and Kyotaro Horiguchi
reviewed it at various times.
This is an old bug, so back-patch to all supported branches.
Tom Lane [Mon, 9 Jul 2018 21:23:32 +0000 (17:23 -0400)]
Prevent accidental linking of system-supplied copies of libpq.so etc.
Back-patch commit dddfc4cb2, which broke LDFLAGS and related Makefile
variables into two parts, one for within-build-tree library references and
one for external libraries, to ensure that the order of -L flags has all
of the former before all of the latter. This turns out to fix a problem
recently noted on buildfarm member peripatus, that we attempted to
incorporate code from libpgport.a into a shared library. That will fail on
platforms that are sticky about putting non-PIC code into shared libraries.
(It's quite surprising we hadn't seen such failures before, since the code
in question has been like that for a long time.)
I think that peripatus' problem could have been fixed with just a subset
of this patch; but since the previous issue of accidentally linking to the
wrong copy of a Postgres shlib seems likely to bite people in the field,
let's just back-patch the whole change. Now that commit dddfc4cb2 has
survived some beta testing, I'm less afraid to back-patch it than I was
at the time.
This also fixes undesired inclusion of "-DFRONTEND" in pg_config's CPPFLAGS
output (in 9.6 and up) and undesired inclusion of "-L../../src/common" in
its LDFLAGS output (in all supported branches).
Back-patch to v10 and older branches; this is already in v11.
Michael Paquier [Thu, 5 Jul 2018 01:48:03 +0000 (10:48 +0900)]
Prevent references to invalid relation pages after fresh promotion
If a standby crashes after promotion before having completed its first
post-recovery checkpoint, then the minimal recovery point which marks
the LSN position where the cluster is able to reach consistency may be
set to a position older than the first end-of-recovery checkpoint while
all the WAL available should be replayed. This leads to the instance
thinking that it contains inconsistent pages, causing a PANIC and a hard
instance crash even if all the WAL available has not been replayed for
certain sets of records replayed. When in crash recovery,
minRecoveryPoint is expected to always be set to InvalidXLogRecPtr,
which forces the recovery to replay all the WAL available, so this
commit makes sure that the local copy of minRecoveryPoint from the
control file is initialized properly and stays as it is while crash
recovery is performed. Once switching to archive recovery or if crash
recovery finishes, then the local copy minRecoveryPoint can be safely
updated.
Pavan Deolasee has reported and diagnosed the failure in the first
place, and the base fix idea to rely on the local copy of
minRecoveryPoint comes from Kyotaro Horiguchi, which has been expanded
into a full-fledged patch by me. The test included in this commit has
been written by Álvaro Herrera and Pavan Deolasee, which I have modified
to make it faster and more reliable with sleep phases.
Backpatch down to all supported versions where the bug appears, aka 9.3
which is where the end-of-recovery checkpoint is not run by the startup
process anymore. The test gets easily supported down to 10, still it
has been tested on all branches.
Improve the performance of relation deletes during recovery.
When multiple relations are deleted at the same transaction,
the files of those relations are deleted by one call to smgrdounlinkall(),
which leads to scan whole shared_buffers only one time. OTOH,
previously, during recovery, smgrdounlink() (not smgrdounlinkall()) was
called for each file to delete, which led to scan shared_buffers
multiple times. Obviously this could cause to increase the WAL replay
time very much especially when shared_buffers was huge.
To alleviate this situation, this commit changes the recovery so that
it also calls smgrdounlinkall() only one time to delete multiple
relation files.
This is just fix for oversight of commit 279628a0a7, not new feature.
So, per discussion on pgsql-hackers, we concluded to backpatch this
to all supported versions.
Author: Fujii Masao Reviewed-by: Michael Paquier, Andres Freund, Thomas Munro, Kyotaro Horiguchi, Takayuki Tsunakawa
Discussion: https://postgr.es/m/CAHGQGwHVQkdfDqtvGVkty+19cQakAydXn1etGND3X0PHbZ3+6w@mail.gmail.com
When these programs call pg_catalog.set_config, they need to check for
PGRES_TUPLES_OK instead of PGRES_COMMAND_OK. Fix for 5770172cb0c9df9e6ce27c507b449557e5b45124.
Michael Paquier [Fri, 29 Jun 2018 13:18:24 +0000 (22:18 +0900)]
Replace search.cpan.org with metacpan.org
search.cpan.org has been EOL'd, with metacpan.org being the official
replacement to which URLs now redirect. Update links to match the new
URL. Also update links to CPAN to use https as it will redirect from
http.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/B74C0219-6BA9-46E1-A524-5B9E8CD3BDB3@yesql.se
Fujii Masao [Tue, 26 Jun 2018 15:45:21 +0000 (00:45 +0900)]
Fix documentation bug related to backup history file.
The backup history file has been no longer necessary for recovery
since the version 9.0. It's now basically just for informational purpose.
But previously the documentations still described that a recovery
requests the backup history file to proceed. The commit fixes this
documentation bug.
Thomas Munro [Mon, 18 Jun 2018 06:33:53 +0000 (18:33 +1200)]
Add PGTYPESchar_free() to avoid cross-module problems on Windows.
On Windows, it is sometimes important for corresponding malloc() and
free() calls to be made from the same DLL, since some build options can
result in multiple allocators being active at the same time. For that
reason we already provided PQfreemem(). This commit adds a similar
function for freeing string results allocated by the pgtypes library.
Thomas Munro [Tue, 26 Jun 2018 06:23:17 +0000 (18:23 +1200)]
Move RecoveryLockList into a hash table.
Standbys frequently need to release all locks held by a given xid.
Instead of searching one big list linearly, let's create one list
per xid and put them in a hash table, so we can find what we need
in O(1) time.
Earlier analysis and a prototype were done by David Rowley, though
this isn't his patch.
Back-patch all the way.
Author: Thomas Munro Diagnosed-by: David Rowley, Andres Freund Reviewed-by: Andres Freund, Tom Lane, Robert Haas
Discussion: https://postgr.es/m/CAEepm%3D1mL0KiQ2KJ4yuPpLGX94a4Ns_W6TL4EGRouxWibu56pA%40mail.gmail.com
Discussion: https://postgr.es/m/CAKJS1f9vJ841HY%3DwonnLVbfkTWGYWdPN72VMxnArcGCjF3SywA%40mail.gmail.com
Michael Paquier [Mon, 25 Jun 2018 01:30:59 +0000 (10:30 +0900)]
Address set of issues with errno handling
System calls mixed up in error code paths are causing two issues which
several code paths have not correctly handled:
1) For write() calls, sometimes the system may return less bytes than
what has been written without errno being set. Some paths were careful
enough to consider that case, and assumed that errno should be set to
ENOSPC, other calls missed that.
2) errno generated by a system call is overwritten by other system calls
which may succeed once an error code path is taken, causing what is
reported to the user to be incorrect.
This patch uses the brute-force approach of correcting all those code
paths. Some refactoring could happen in the future, but this is let as
future work, which is not targeted for back-branches anyway.
Author: Michael Paquier Reviewed-by: Ashutosh Sharma
Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz
Tom Lane [Sat, 16 Jun 2018 19:34:07 +0000 (15:34 -0400)]
Use -Wno-format-truncation and -Wno-stringop-truncation, if available.
gcc 8 has started emitting some warnings that are largely useless for
our purposes, particularly since they complain about code following
the project-standard coding convention that path names are assumed
to be shorter than MAXPGPATH. Even if we make the effort to remove
that assumption in some future release, the changes wouldn't get
back-patched. Hence, just suppress these warnings, on compilers that
have these switches.
Tom Lane [Sat, 16 Jun 2018 18:58:11 +0000 (14:58 -0400)]
Avoid unnecessary use of strncpy in a couple of places in ecpg.
Use of strncpy with a length limit based on the source, rather than
the destination, is non-idiomatic and draws warnings from gcc 8.
Replace with memcpy, which does exactly the same thing in these cases,
but with less chance for confusion.
Tom Lane [Sat, 16 Jun 2018 18:45:47 +0000 (14:45 -0400)]
Use snprintf not sprintf in pg_waldump's timestamptz_to_str.
This could only cause an issue if strftime returned a ridiculously
long timezone name, which seems unlikely; and it wouldn't qualify
as a security problem even then, since pg_waldump (nee pg_xlogdump)
is a debug tool not part of the server. But gcc 8 has started issuing
warnings about it, so let's use snprintf and be safe.
Andres Freund [Tue, 12 Jun 2018 18:13:22 +0000 (11:13 -0700)]
Fix bugs in vacuum of shared rels, by keeping their relcache entries current.
When vacuum processes a relation it uses the corresponding relcache
entry's relfrozenxid / relminmxid as a cutoff for when to remove
tuples etc. Unfortunately for nailed relations (i.e. critical system
catalogs) bugs could frequently lead to the corresponding relcache
entry being stale.
This set of bugs could cause actual data corruption as vacuum would
potentially not remove the correct row versions, potentially reviving
them at a later point. After 699bf7d05c some corruptions in this vein
were prevented, but the additional error checks could also trigger
spuriously. Examples of such errors are:
ERROR: found xmin ... from before relfrozenxid ...
and
ERROR: found multixact ... from before relminmxid ...
To be caused by this bug the errors have to occur on system catalog
tables.
The two bugs are:
1) Invalidations for nailed relations were ignored, based on the
theory that the relcache entry for such tables doesn't
change. Which is largely true, except for fields like relfrozenxid
etc. This means that changes to relations vacuumed in other
sessions weren't picked up by already existing sessions. Luckily
autovacuum doesn't have particularly longrunning sessions.
2) For shared *and* nailed relations, the shared relcache init file
was never invalidated while running. That means that for such
tables (e.g. pg_authid, pg_database) it's not just already existing
sessions that are affected, but even new connections are as well.
That explains why the reports usually were about pg_authid et. al.
To fix 1), revalidate the rd_rel portion of a relcache entry when
invalid. This implies a bit of extra complexity to deal with
bootstrapping, but it's not too bad. The fix for 2) is simpler,
simply always remove both the shared and local init files.
Alvaro Herrera [Wed, 6 Jun 2018 18:46:53 +0000 (14:46 -0400)]
Fix function code in error report
This bug causes a lseek() failure to be reported as a "could not open"
failure in the error message, muddling bug reports. I introduced this
copy-and-pasteo in commit 78e122010422.
Noticed while reviewing code for bug report #15221, from lily liang. In
version 10 the affected function is only used by multixact.c and
commit_ts, and only in corner-case circumstances, neither of which are
involved in the reported bug (a pg_subtrans failure.)
Tom Lane [Fri, 25 May 2018 18:31:07 +0000 (14:31 -0400)]
Fix misidentification of SQL statement type in plpgsql's exec_stmt_execsql.
To distinguish SQL statements that are INSERT/UPDATE/DELETE from other
ones, exec_stmt_execsql looked at the post-rewrite form of the statement
rather than the original. This is problematic because it did that only
during first execution of the statement (in a session), but the correct
answer could change later due to addition or removal of DO INSTEAD rules
during the session. That could lead to an Assert failure, as reported
by Tushar Ahuja and Robert Haas. In non-assert builds, there's a hazard
that we would fail to enforce STRICT behavior when we'd be expected to.
That would happen if an initially present DO INSTEAD, that replaced the
original statement with one of a different type, were removed; after that
the statement should act "normally", including strictness enforcement, but
it didn't. (The converse case of enforcing strictness when we shouldn't
doesn't seem to be a hazard, as addition of a DO INSTEAD that changes the
statement type would always lead to acting as though the statement returned
zero rows, so that the strictness error could not fire.)
To fix, inspect the original form of the statement not the post-rewrite
form, making it valid to assume the answer can't change intra-session.
This should lead to the same answer in every case except when there is a
DO INSTEAD that changes the statement type; we will now set mod_stmt=true
anyway, while we would not have done so before. That breaks the Assert
in the SPI_OK_REWRITTEN code path, which expected the latter behavior.
It might be all right to assert mod_stmt rather than !mod_stmt there,
but I'm not entirely convinced that that'd always hold, so just remove
the assertion altogether.
This has been broken for a long time, so back-patch to all supported
branches.