Tom Lane [Wed, 17 Aug 2016 00:33:01 +0000 (20:33 -0400)]
Improve parsetree representation of special functions such as CURRENT_DATE.
We implement a dozen or so parameterless functions that the SQL standard
defines special syntax for. Up to now, that was done by converting them
into more or less ad-hoc constructs such as "'now'::text::date". That's
messy for multiple reasons: it exposes what should be implementation
details to users, and performance is worse than it needs to be in several
cases. To improve matters, invent a new expression node type
SQLValueFunction that can represent any of these parameterless functions.
Bump catversion because this changes stored parsetrees for rules.
Tom Lane [Tue, 16 Aug 2016 19:58:30 +0000 (15:58 -0400)]
Fix assorted places in psql to print version numbers >= 10 in new style.
This is somewhat cosmetic, since as long as you know what you are looking
at, "10.0" is a serviceable substitute for "10". But there is a potential
for confusion between version numbers with minor numbers and those without
--- we don't want people asking "why is psql saying 10.0 when my server is
10.2". Therefore, back-patch as far as practical, which turns out to be
9.3. I could have redone the patch to use fprintf(stderr) in place of
psql_error(), but it seems more work than is warranted for branches that
will be EOL or nearly so by the time v10 comes out.
Although only psql seems to contain any code that needs this, I chose
to put the support function into fe_utils, since it seems likely we'll
need it in other client programs in future. (In 9.3-9.5, use dumputils.c,
the predecessor of fe_utils/string_utils.c.)
In HEAD, also fix the backend code that whines about loadable-library
version mismatch. I don't see much need to back-patch that.
Tom Lane [Tue, 16 Aug 2016 17:58:44 +0000 (13:58 -0400)]
Automate the maintenance of SO_MINOR_VERSION for our shared libraries.
Up to now we've manually adjusted these numbers in several different
Makefiles at the start of each development cycle. While that's not
much work, it's easily forgotten, so let's get rid of it by setting
the SO_MINOR_VERSION values directly from $(MAJORVERSION).
In the case of libpq, this dev cycle's value of SO_MINOR_VERSION happens
to be "10" anyway, so this switch is transparent. For ecpg's shared
libraries, this will result in skipping one or two minor version numbers
between v9.6 and v10, which seems like no big problem; and it was a bit
inconsistent that they didn't have equal minor version numbers anyway.
Robert Haas [Tue, 16 Aug 2016 17:23:32 +0000 (13:23 -0400)]
Fix possible crash due to incorrect allocation context.
Commit af33039317ddc4a0e38a02e2255c2bf453115fd2 aimed to reduce
leakage from tqueue.c, which is good. Unfortunately, by changing the
memory context in which all of gather_readnext() executes, it also
changed the context in which ExecShutdownGatherWorkers executes, which
is not good, because that function eventually causes a call to
ExecParallelRetrieveInstrumentation, which proceeds to allocate
planstate->worker_instrument in a short-lived context, causing a
crash.
Rushabh Lathia, reviewed by Amit Kapila and by me.
Tom Lane [Tue, 16 Aug 2016 16:49:30 +0000 (12:49 -0400)]
Remove separate version numbering for ecpg preprocessor.
Once upon a time, it made sense for the ecpg preprocessor to have its
own version number, because it used a manually-maintained grammar that
wasn't always in sync with the core grammar. But those days are
thankfully long gone, leaving only a maintenance nuisance behind.
Let's use the PG v10 version numbering changeover as an excuse to get
rid of the ecpg version number and just have ecpg identify itself by
PG_VERSION. From the user's standpoint, ecpg will go from "4.12" in
the 9.6 branch to "10" in the 10 branch, so there's no failure of
monotonicity.
Robert Haas [Mon, 15 Aug 2016 22:09:55 +0000 (18:09 -0400)]
Once again allow LWLocks to be used within DSM segments.
Prior to commit 7882c3b0b95640e361f1533fe0f2d02e4e5d8610, it was
possible to use LWLocks within DSM segments, but that commit broke
this use case by switching from a doubly linked list to a circular
linked list. Switch back, using a new bit of general infrastructure
for maintaining lists of PGPROCs.
Tom Lane [Mon, 15 Aug 2016 17:49:49 +0000 (13:49 -0400)]
Stamp HEAD as 10devel.
This is a good bit more complicated than the average new-version stamping
commit, because it includes various adjustments in pursuit of changing
from three-part to two-part version numbers. It's likely some further
work will be needed around that change; but this is enough to get through
the regression tests, at least in Unix builds.
Tom Lane [Mon, 15 Aug 2016 15:32:09 +0000 (11:32 -0400)]
Simplify the process of perltidy'ing our Perl files.
Wrap the perltidy invocation into a shell script to reduce the risk of
copy-and-paste errors. Include removal of *.bak files in the script,
so they don't accidentally get committed. Improve the directions in
the README file.
Tom Lane [Sun, 14 Aug 2016 19:06:01 +0000 (15:06 -0400)]
Remove bogus dependencies on NUMERIC_MAX_PRECISION.
NUMERIC_MAX_PRECISION is a purely arbitrary constraint on the precision
and scale you can write in a numeric typmod. It might once have had
something to do with the allowed range of a typmod-less numeric value,
but at least since 9.1 we've allowed, and documented that we allowed,
any value that would physically fit in the numeric storage format;
which is something over 100000 decimal digits, not 1000.
Hence, get rid of numeric_in()'s use of NUMERIC_MAX_PRECISION as a limit
on the allowed range of the exponent in scientific-format input. That was
especially silly in view of the fact that you can enter larger numbers as
long as you don't use 'e' to do it. Just constrain the value enough to
avoid localized overflow, and let make_result be the final arbiter of what
is too large. Likewise adjust ecpg's equivalent of this code.
Also get rid of numeric_recv()'s use of NUMERIC_MAX_PRECISION to limit the
number of base-NBASE digits it would accept. That created a dump/restore
hazard for binary COPY without doing anything useful; the wire-format
limit on number of digits (65535) is about as tight as we would want.
In HEAD, also get rid of pg_size_bytes()'s unnecessary intimacy with what
the numeric range limit is. That code doesn't exist in the back branches.
Per gripe from Aravind Kumar. Back-patch to all supported branches,
since they all contain the documentation claim about allowed range of
NUMERIC (cf commit cabf5d84b).
Tom Lane [Sun, 14 Aug 2016 02:24:48 +0000 (22:24 -0400)]
Fix assorted bugs in contrib/bloom.
In blinsert(), cope with the possibility that a page we pull from the
notFullPage list is marked BLOOM_DELETED. This could happen if VACUUM
recently marked it deleted but hasn't (yet) updated the metapage.
We can re-use such a page safely, but we *must* reinitialize it so that
it's no longer marked deleted.
Fix blvacuum() so that it updates the notFullPage list even if it's
going to update it to empty. The previous "optimization" of skipping
the update seems pretty dubious, since it means that the next blinsert()
will uselessly visit whatever pages we left in the list.
Uniformly treat PageIsNew pages the same as deleted pages. This should
allow proper recovery if a crash occurs just after relation extension.
Properly use vacuum_delay_point, not assorted ad-hoc CHECK_FOR_INTERRUPTS
calls, in the blvacuum() main loop.
Fix broken tuple-counting logic: blvacuum.c counted the number of live
index tuples over again in each scan, leading to VACUUM VERBOSE reporting
some multiple of the actual number of surviving index tuples after any
vacuum that removed any tuples (since they'd be counted in blvacuum, maybe
more than once, and then again in blvacuumcleanup, without ever zeroing the
counter). It's sufficient to count them in blvacuumcleanup.
stats->estimated_count is a boolean, not a counter, and we don't want
to set it true, so don't add tuple counts to it.
Add a couple of Asserts that we don't overrun available space on a bloom
page. I don't think there's any bug there today, but the way the
FreeBlockNumberArray size calculation is set up is scarily fragile, and
BloomPageGetFreeSpace isn't much better. The Asserts should help catch
any future mistakes.
Per investigation of a report from Jeff Janes. I think the first item
above may explain his report; the other changes were things I noticed
while casting about for an explanation.
Tom Lane [Sat, 13 Aug 2016 22:31:14 +0000 (18:31 -0400)]
Add SQL-accessible functions for inspecting index AM properties.
Per discussion, we should provide such functions to replace the lost
ability to discover AM properties by inspecting pg_am (cf commit 65c5fcd35). The added functionality is also meant to displace any code
that was looking directly at pg_index.indoption, since we'd rather not
believe that the bit meanings in that field are part of any client API
contract.
As future-proofing, define the SQL API to not assume that properties that
are currently AM-wide or index-wide will remain so unless they logically
must be; instead, expose them only when inquiring about a specific index
or even specific index column. Also provide the ability for an index
AM to override the behavior.
In passing, document pg_am.amtype, overlooked in commit 473b93287.
Tom Lane [Fri, 12 Aug 2016 22:45:18 +0000 (18:45 -0400)]
Doc: clarify that DROP ... CASCADE is recursive.
Apparently that's not obvious to everybody, so let's belabor the point.
In passing, document that DROP POLICY has CASCADE/RESTRICT options (which
it does, per gram.y) but they do nothing (I assume, anyway). Also update
some long-obsolete commentary in gram.y.
Tom Lane [Fri, 12 Aug 2016 16:13:04 +0000 (12:13 -0400)]
Fix inappropriate printing of never-measured times in EXPLAIN.
EXPLAIN (ANALYZE, TIMING OFF) would print an elapsed time of zero for
a trigger function, because no measurement has been taken but it printed
the field anyway. This isn't what EXPLAIN does elsewhere, so suppress it.
In the same vein, EXPLAIN (ANALYZE, BUFFERS) with non-text output format
would print buffer I/O timing numbers even when no measurement has been
taken because track_io_timing is off. That seems not per policy, either,
so change it.
Back-patch to 9.2 where these features were introduced.
Tom Lane [Thu, 11 Aug 2016 15:22:25 +0000 (11:22 -0400)]
Fix busted Assert for CREATE MATVIEW ... WITH NO DATA.
Commit 874fe3aea changed the command tag returned for CREATE MATVIEW/CREATE
TABLE AS ... WITH NO DATA, but missed that there was code in spi.c that
expected the command tag to always be "SELECT". Fortunately, the
consequence was only an Assert failure, so this oversight should have no
impact in production builds.
Since this code path was evidently un-exercised, add a regression test.
Per report from Shivam Saxena. Back-patch to 9.3, like the previous commit.
Fix several one-byte buffer over-reads in to_number
Several places in NUM_numpart_from_char(), which is called from the SQL
function to_number(text, text), could accidentally read one byte past
the end of the input buffer (which comes from the input text datum and
is not null-terminated).
1. One leading space character would be skipped, but there was no check
that the input was at least one byte long. This does not happen in
practice, but for defensiveness, add a check anyway.
2. Commit 4a3a1e2cf apparently accidentally doubled that code that skips
one space character (so that two spaces might be skipped), but there
was no overflow check before skipping the second byte. Fix by
removing that duplicate code.
3. A logic error would allow a one-byte over-read when looking for a
trailing sign (S) placeholder.
In each case, the extra byte cannot be read out directly, but looking at
it might cause a crash.
The third item was discovered by Piotr Stefaniak, the first two were
found and analyzed by Tom Lane and Peter Eisentraut.
Tom Lane [Mon, 8 Aug 2016 14:33:46 +0000 (10:33 -0400)]
Fix two errors with nested CASE/WHEN constructs.
ExecEvalCase() tried to save a cycle or two by passing
&econtext->caseValue_isNull as the isNull argument to its sub-evaluation of
the CASE value expression. If that subexpression itself contained a CASE,
then *isNull was an alias for econtext->caseValue_isNull within the
recursive call of ExecEvalCase(), leading to confusion about whether the
inner call's caseValue was null or not. In the worst case this could lead
to a core dump due to dereferencing a null pointer. Fix by not assigning
to the global variable until control comes back from the subexpression.
Also, avoid using the passed-in isNull pointer transiently for evaluation
of WHEN expressions. (Either one of these changes would have been
sufficient to fix the known misbehavior, but it's clear now that each of
these choices was in itself dangerous coding practice and best avoided.
There do not seem to be any similar hazards elsewhere in execQual.c.)
Also, it was possible for inlining of a SQL function that implements the
equality operator used for a CASE comparison to result in one CASE
expression's CaseTestExpr node being inserted inside another CASE
expression. This would certainly result in wrong answers since the
improperly nested CaseTestExpr would be caused to return the inner CASE's
comparison value not the outer's. If the CASE values were of different
data types, a crash might result; moreover such situations could be abused
to allow disclosure of portions of server memory. To fix, teach
inline_function to check for "bare" CaseTestExpr nodes in the arguments of
a function to be inlined, and avoid inlining if there are any.
Noah Misch [Mon, 8 Aug 2016 14:07:46 +0000 (10:07 -0400)]
Obstruct shell, SQL, and conninfo injection via database and role names.
Due to simplistic quoting and confusion of database names with conninfo
strings, roles with the CREATEDB or CREATEROLE option could escalate to
superuser privileges when a superuser next ran certain maintenance
commands. The new coding rule for PQconnectdbParams() calls, documented
at conninfo_array_parse(), is to pass expand_dbname=true and wrap
literal database names in a trivial connection string. Escape
zero-length values in appendConnStrVal(). Back-patch to 9.1 (all
supported versions).
Nathan Bossart, Michael Paquier, and Noah Misch. Reviewed by Peter
Eisentraut. Reported by Nathan Bossart.
Noah Misch [Mon, 8 Aug 2016 14:07:46 +0000 (10:07 -0400)]
Promote pg_dumpall shell/connstr quoting functions to src/fe_utils.
Rename these newly-extern functions with terms more typical of their new
neighbors. No functional changes; a subsequent commit will use them in
more places. Back-patch to 9.1 (all supported versions). Back branches
lack src/fe_utils, so instead rename the functions in place; the
subsequent commit will copy them into the other programs using them.
Noah Misch [Mon, 8 Aug 2016 14:07:46 +0000 (10:07 -0400)]
Fix Windows shell argument quoting.
The incorrect quoting may have permitted arbitrary command execution.
At a minimum, it gave broader control over the command line to actors
supposed to have control over a single argument. Back-patch to 9.1 (all
supported versions).
Noah Misch [Mon, 8 Aug 2016 14:07:46 +0000 (10:07 -0400)]
Reject, in pg_dumpall, names containing CR or LF.
These characters prematurely terminate Windows shell command processing,
causing the shell to execute a prefix of the intended command. The
chief alternative to rejecting these characters was to bypass the
Windows shell with CreateProcess(), but the ability to use such names
has little value. Back-patch to 9.1 (all supported versions).
This change formally revokes support for these characters in database
names and roles names. Don't document this; the error message is
self-explanatory, and too few users would benefit. A future major
release may forbid creation of databases and roles so named. For now,
check only at known weak points in pg_dumpall. Future commits will,
without notice, reject affected names from other frontend programs.
Also extend the restriction to pg_dumpall --dbname=CONNSTR arguments and
--file arguments. Unlike the effects on role name arguments and
database names, this does not reflect a broad policy change. A
migration to CreateProcess() could lift these two restrictions.
Noah Misch [Mon, 8 Aug 2016 14:07:46 +0000 (10:07 -0400)]
Field conninfo strings throughout src/bin/scripts.
These programs nominally accepted conninfo strings, but they would
proceed to use the original dbname parameter as though it were an
unadorned database name. This caused "reindexdb dbname=foo" to issue an
SQL command that always failed, and other programs printed a conninfo
string in error messages that purported to print a database name. Fix
both problems by using PQdb() to retrieve actual database names.
Continue to print the full conninfo string when reporting a connection
failure. It is informative there, and if the database name is the sole
problem, the server-side error message will include the name. Beyond
those user-visible fixes, this allows a subsequent commit to synthesize
and use conninfo strings without that implementation detail leaking into
messages. As a side effect, the "vacuuming database" message now
appears after, not before, the connection attempt. Back-patch to 9.1
(all supported versions).
Noah Misch [Mon, 8 Aug 2016 14:07:46 +0000 (10:07 -0400)]
Introduce a psql "\connect -reuse-previous=on|off" option.
The decision to reuse values of parameters from a previous connection
has been based on whether the new target is a conninfo string. Add this
means of overriding that default. This feature arose as one component
of a fix for security vulnerabilities in pg_dump, pg_dumpall, and
pg_upgrade, so back-patch to 9.1 (all supported versions). In 9.3 and
later, comment paragraphs that required update had already-incorrect
claims about behavior when no connection is open; fix those problems.
Noah Misch [Mon, 8 Aug 2016 14:07:46 +0000 (10:07 -0400)]
Sort out paired double quotes in \connect, \password and \crosstabview.
In arguments, these meta-commands wrongly treated each pair as closing
the double quoted string. Make the behavior match the documentation.
This is a compatibility break, but I more expect to find software with
untested reliance on the documented behavior than software reliant on
today's behavior. Back-patch to 9.1 (all supported versions).
Although the standard has routines.result_cast_character_set_name, given
the naming of the surrounding columns, we concluded that this must have
been a mistake and that result_cast_char_set_name was intended, so
change the implementation. The documentation was already using the new
name.
found by Clément Prévost <prevostclement@gmail.com>
Tom Lane [Sun, 7 Aug 2016 22:52:02 +0000 (18:52 -0400)]
Fix misestimation of n_distinct for a nearly-unique column with many nulls.
If ANALYZE found no repeated non-null entries in its sample, it set the
column's stadistinct value to -1.0, intending to indicate that the entries
are all distinct. But what this value actually means is that the number
of distinct values is 100% of the table's rowcount, and thus it was
overestimating the number of distinct values by however many nulls there
are. This could lead to very poor selectivity estimates, as for example
in a recent report from Andreas Joseph Krogh. We should discount the
stadistinct value by whatever we've estimated the nulls fraction to be.
(That is what will happen if we choose to use a negative stadistinct for
a column that does have repeated entries, so this code path was just
inconsistent.)
In addition to fixing the stadistinct entries stored by several different
ANALYZE code paths, adjust the logic where get_variable_numdistinct()
forces an "all distinct" estimate on the basis of finding a relevant unique
index. Unique indexes don't reject nulls, so there's no reason to assume
that the null fraction doesn't apply.
Back-patch to all supported branches. Back-patching is a bit of a judgment
call, but this problem seems to affect only a few users (else we'd have
identified it long ago), and it's bad enough when it does happen that
destabilizing plan choices in a worse direction seems unlikely.
Patch by me, with documentation wording suggested by Dean Rasheed
Tom Lane [Sun, 7 Aug 2016 21:46:08 +0000 (17:46 -0400)]
Fix TOAST access failure in RETURNING queries.
Discussion of commit 3e2f3c2e4 exposed a problem that is of longer
standing: since we don't detoast data while sticking it into a portal's
holdStore for PORTAL_ONE_RETURNING and PORTAL_UTIL_SELECT queries, and we
release the query's snapshot as soon as we're done loading the holdStore,
later readout of the holdStore can do TOAST fetches against data that can
no longer be seen by any of the session's live snapshots. This means that
a concurrent VACUUM could remove the TOAST data before we can fetch it.
Commit 3e2f3c2e4 exposed the problem by showing that sometimes we had *no*
live snapshots while fetching TOAST data, but we'd be at risk anyway.
I believe this code was all right when written, because our management of a
session's exposed xmin was such that the TOAST references were safe until
end of transaction. But that's no longer true now that we can advance or
clear our PGXACT.xmin intra-transaction.
To fix, copy the query's snapshot during FillPortalStore() and save it in
the Portal; release it only when the portal is dropped. This essentially
implements a policy that we must hold a relevant snapshot whenever we
access potentially-toasted data. We had already come to that conclusion
in other places, cf commits 08e261cbc94ce9a7 and ec543db77b6b72f2.
I'd have liked to add a regression test case for this, but I didn't see
a way to make one that's not unreasonably bloated; it seems to require
returning a toasted value to the client, and those will be big.
In passing, improve PortalRunUtility() so that it positively verifies
that its ending PopActiveSnapshot() call will pop the expected snapshot,
removing a rather shaky assumption about which utility commands might
do their own PopActiveSnapshot(). There's no known bug here, but now
that we're actively referencing the snapshot it's almost free to make
this code a bit more bulletproof.
We might want to consider back-patching something like this into older
branches, but it would be prudent to let it prove itself more in HEAD
beforehand.
Tom Lane [Sun, 7 Aug 2016 18:36:02 +0000 (14:36 -0400)]
Avoid crashing in GetOldestSnapshot() if there are no known snapshots.
The sole caller expects NULL to be returned in such a case, so make
it so and document it.
Per reports from Andreas Seltenreich and Regina Obe. This doesn't
really fix their problem, as now their RETURNING queries will say
"ERROR: no known snapshots", but in any case this function should
not dump core in a reasonably-foreseeable situation.
Tom Lane [Sun, 7 Aug 2016 17:15:55 +0000 (13:15 -0400)]
Don't propagate a null subtransaction snapshot up to parent transaction.
This oversight could cause logical decoding to fail to decode an outer
transaction containing changes, if a subtransaction had an XID but no
actual changes. Per bug #14279 from Marko Tiikkaja. Patch by Marko
based on analysis by Andrew Gierth.
Tom Lane [Sat, 6 Aug 2016 18:28:37 +0000 (14:28 -0400)]
In B-tree page deletion, clean up properly after page deletion failure.
In _bt_unlink_halfdead_page(), we might fail to find an immediate left
sibling of the target page, perhaps because of corruption of the page
sibling links. The code intends to cope with this by just abandoning
the deletion attempt; but what actually happens is that it fails outright
due to releasing the same buffer lock twice. (And error recovery masks
a second problem, which is possible leakage of a pin on another page.)
Seems to have been introduced by careless refactoring in commit efada2b8e.
Since there are multiple cases to consider, let's make releasing the buffer
lock in the failure case the responsibility of _bt_unlink_halfdead_page()
not its caller.
Also, avoid fetching the leaf page's left-link again after we've dropped
lock on the page. This is probably harmless, but it's not exactly good
coding practice.
Per report from Kyotaro Horiguchi. Back-patch to 9.4 where the faulty code
was introduced.
Tom Lane [Fri, 5 Aug 2016 22:58:12 +0000 (18:58 -0400)]
Teach libpq to decode server version correctly from future servers.
Beginning with the next development cycle, PG servers will report two-part
not three-part version numbers. Fix libpq so that it will compute the
correct numeric representation of such server versions for reporting by
PQserverVersion(). It's desirable to get this into the field and
back-patched ASAP, so that older clients are more likely to understand the
new server version numbering by the time any such servers are in the wild.
(The results with an old client would probably not be catastrophic anyway
for a released server; for example "10.1" would be interpreted as 100100
which would be wrong in detail but would not likely cause an old client to
misbehave badly. But "10devel" or "10beta1" would result in sversion==0
which at best would result in disabling all use of modern features.)
Extracted from a patch by Peter Eisentraut; comments added by me
Tom Lane [Fri, 5 Aug 2016 19:14:08 +0000 (15:14 -0400)]
Fix ts_delete(tsvector, text[]) to cope with duplicate array entries.
Such cases either failed an Assert, or produced a corrupt tsvector in
non-Assert builds, as reported by Andreas Seltenreich. The reason is
that tsvector_delete_by_indices() just assumed that its input array had
no duplicates. Fix by explicitly de-duping.
In passing, improve some comments, and fix a number of tests for null
values to use ERRCODE_NULL_VALUE_NOT_ALLOWED not
ERRCODE_INVALID_PARAMETER_VALUE.
Tom Lane [Fri, 5 Aug 2016 16:58:17 +0000 (12:58 -0400)]
Update time zone data files to tzdata release 2016f.
DST law changes in Kemerovo and Novosibirsk. Historical corrections for
Azerbaijan, Belarus, and Morocco. Asia/Novokuznetsk and Asia/Novosibirsk
now use numeric time zone abbreviations instead of invented ones. Zones
for Antarctic bases and other locations that have been uninhabited for
portions of the time span known to the tzdata database now report "-00"
rather than "zzz" as the zone abbreviation for those time spans.
Also, I decided to remove some of the timezone/data/ files that we don't
use. At one time that subdirectory was a complete copy of what IANA
distributes in the tzdata tarballs, but that hasn't been true for a long
time. There seems no good reason to keep shipping those specific files
but not others; they're just bloating our tarballs.
Robert Haas [Fri, 5 Aug 2016 15:57:00 +0000 (11:57 -0400)]
Change InitToastSnapshot to a macro.
tqual.h is included in some front-end compiles, and a static inline
breaks on buildfarm member castoroides. Since the macro is never
referenced, it should dodge that problem, although this doesn't
seem like the cleanest way of hiding things from front-end compiles.
Andres Freund [Fri, 5 Aug 2016 03:07:16 +0000 (20:07 -0700)]
Fix hard to hit race condition in heapam's tuple locking code.
As mentioned in its commit message, eca0f1db left open a race condition,
where a page could be marked all-visible, after the code checked
PageIsAllVisible() to pin the VM, but before the page is locked. Plug
that hole.
Reviewed-By: Robert Haas, Andres Freund
Author: Amit Kapila
Discussion: CAEepm=3fWAbWryVW9swHyLTY4sXVf0xbLvXqOwUoDiNCx9mBjQ@mail.gmail.com
Backpatch: -
Tom Lane [Thu, 4 Aug 2016 20:06:14 +0000 (16:06 -0400)]
Fix bogus coding in WaitForBackgroundWorkerShutdown().
Some conditions resulted in "return" directly out of a PG_TRY block,
which left the exception stack dangling, and to add insult to injury
failed to restore the state of set_latch_on_sigusr1.
This is a bug only in 9.5; in HEAD it was accidentally fixed by commit db0f6cad4, which removed the surrounding PG_TRY block. However, I (tgl)
chose to apply the patch to HEAD as well, because the old coding was
gratuitously different from WaitForBackgroundWorkerStartup(), and there
would indeed have been no bug if it were done like that to start with.
Robert Haas [Wed, 3 Aug 2016 20:41:43 +0000 (16:41 -0400)]
Prevent "snapshot too old" from trying to return pruned TOAST tuples.
Previously, we tested for MVCC snapshots to see whether they were too
old, but not TOAST snapshots, which can lead to complaints about missing
TOAST chunks if those chunks are subject to early pruning. Ideally,
the threshold lsn and timestamp for a TOAST snapshot would be that of
the corresponding MVCC snapshot, but since we have no way of deciding
which MVCC snapshot was used to fetch the TOAST pointer, use the oldest
active or registered snapshot instead.
Reported by Andres Freund, who also sketched out what the fix should
look like. Patch by me, reviewed by Amit Kapila.
Tom Lane [Wed, 3 Aug 2016 20:37:03 +0000 (16:37 -0400)]
Make INSERT-from-multiple-VALUES-rows handle targetlist indirection better.
Previously, if an INSERT with multiple rows of VALUES had indirection
(array subscripting or field selection) in its target-columns list, the
parser handled that by applying transformAssignedExpr() to each element
of each VALUES row independently. This led to having ArrayRef assignment
nodes or FieldStore nodes in each row of the VALUES RTE. That works for
simple cases, but in bug #14265 Nuri Boardman points out that it fails
if there are multiple assignments to elements/fields of the same target
column. For such cases to work, rewriteTargetListIU() has to nest the
ArrayRefs or FieldStores together to produce a single expression to be
assigned to the column. But it failed to find them in the top-level
targetlist and issued an error about "multiple assignments to same column".
We could possibly fix this by teaching the rewriter to apply
rewriteTargetListIU to each VALUES row separately, but that would be messy
(it would change the output rowtype of the VALUES RTE, for example) and
inefficient. Instead, let's fix the parser so that the VALUES RTE outputs
are just the user-specified values, cast to the right type if necessary,
and then the ArrayRefs or FieldStores are applied in the top-level
targetlist to Vars representing the RTE's outputs. This is the same
parsetree representation already used for similar cases with INSERT/SELECT
syntax, so it allows simplifications in ruleutils.c, which no longer needs
to treat INSERT-from-multiple-VALUES as its own special case.
This implementation works by applying transformAssignedExpr to the VALUES
entries as before, and then stripping off any ArrayRefs or FieldStores it
adds. With lots of VALUES rows it would be noticeably more efficient to
not add those nodes in the first place. But that's just an optimization
not a bug fix, and there doesn't seem to be any good way to do it without
significant refactoring. (A non-invasive answer would be to apply
transformAssignedExpr + stripping to just the first VALUES row, and then
just forcibly cast remaining rows to the same data types exposed in the
first row. But this way would lead to different, not-INSERT-specific
errors being reported in casting failure cases, so it doesn't seem very
nice.) So leave that for later; this patch at least isn't making the
per-row parsing work worse, and it does make the finished parsetree
smaller, saving rewriter and planner work.
Catversion bump because stored rules containing such INSERTs would need
to change. Because of that, no back-patch, even though this is a very
long-standing bug.
Tom Lane [Wed, 3 Aug 2016 18:48:05 +0000 (14:48 -0400)]
Do not let PostmasterContext survive into background workers.
We don't want postmaster child processes to contain a copy of the
postmaster's PostmasterContext. That would be a waste of memory at least,
and at worst a security issue, since there are copies of the semi-sensitive
pg_hba and pg_ident data in there. All other child process types delete
the PostmasterContext after forking, but the original coding of the
background worker patch (commit da07a1e85) did not do so. It appears
that the only reason for that was to avoid copying the bgworker's
MyBgworkerEntry out of that context; but the couple of additional
statements needed to do so are hardly good justification for it. Hence,
copy that data and then clear the context as other child processes do.
Because this patch changes the memory context in which a bgworker function
gains control, back-patching it would be a bit risky, so we won't fix this
in back branches. The "security" complaint is pretty thin anyway for
generic bgworkers; only with the introduction of parallel query is there
any question of running untrusted code in a bgworker process.
Alvaro Herrera [Wed, 3 Aug 2016 17:21:23 +0000 (13:21 -0400)]
Fix assorted problems in recovery tests
In test 001_stream_rep we're using pg_stat_replication.write_location to
determine catch-up status, but we care about xlog having been applied
not just received, so change that to apply_location.
In test 003_recovery_targets, we query the database for a recovery
target specification and later for the xlog position supposedly
corresponding to that recovery specification. If for whatever reason
more WAL is written between the two queries, the recovery specification
is earlier than the xlog position used by the query in the test harness,
so we wait forever, leading to test failures. Deal with this by using a
single query to extract both items. In 2a0f89cd717 we tried to deal
with it by giving them more tests to run, but in hindsight that was
obviously doomed to failure (no revert of that, though).
Tom Lane [Tue, 2 Aug 2016 22:39:14 +0000 (18:39 -0400)]
Remove duplicate InitPostmasterChild() call while starting a bgworker.
This is apparently harmless on Windows, but on Unix it results in an
assertion failure. We'd not noticed because this code doesn't get
used on Unix unless you build with -DEXEC_BACKEND. Bug was evidently
introduced by sloppy refactoring in commit 31c453165.
Tom Lane [Tue, 2 Aug 2016 20:39:16 +0000 (16:39 -0400)]
Block interrupts during HandleParallelMessages().
As noted by Alvaro, there are CHECK_FOR_INTERRUPTS() calls in the shm_mq.c
functions called by HandleParallelMessages(). I believe they're all
unreachable since we always pass nowait = true, but it doesn't seem like
a great idea to assume that no such call will ever be reachable from
HandleParallelMessages(). If that did happen, there would be a risk of a
recursive call to HandleParallelMessages(), which it does not appear to be
designed for --- for example, there's nothing that would prevent
out-of-order processing of received messages. And certainly such cases
cannot easily be tested. So let's prevent it by holding off interrupts for
the duration of the function. Back-patch to 9.5 which contains identical
code.
Tom Lane [Tue, 2 Aug 2016 16:48:51 +0000 (12:48 -0400)]
Fix pg_dump's handling of public schema with both -c and -C options.
Since -c plus -C requests dropping and recreating the target database
as a whole, not dropping individual objects in it, we should assume that
the public schema already exists and need not be created. The previous
coding considered only the state of the -c option, so it would emit
"CREATE SCHEMA public" anyway, leading to an unexpected error in restore.
Back-patch to 9.2. Older versions did not accept -c with -C so the
issue doesn't arise there. (The logic being patched here dates to 8.0,
cf commit 2193121fa, so it's not really wrong that it didn't consider
the case at the time.)
Note that versions before 9.6 will still attempt to emit REVOKE/GRANT
on the public schema; but that happens without -c/-C too, and doesn't
seem to be the focus of this complaint. I considered extending this
stanza to also skip the public schema's ACL, but that would be a
misfeature, as it'd break cases where users intentionally changed that
ACL. The real fix for this aspect is Stephen Frost's work to not dump
built-in ACLs, and that's not going to get back-ported.
Per bugs #13804 and #14271. Solution found by David Johnston and later
rediscovered by me.
Tom Lane [Mon, 1 Aug 2016 20:12:01 +0000 (16:12 -0400)]
Minor cleanup for access/transam/parallel.c.
ParallelMessagePending *must* be marked volatile, because it's set
by a signal handler. On the other hand, it's pointless for
HandleParallelMessageInterrupt to save/restore errno; that must be,
and is, done at the outer level of the SIGUSR1 signal handler.
Calling CHECK_FOR_INTERRUPTS() inside HandleParallelMessages, which itself
is called from CHECK_FOR_INTERRUPTS(), seems both useless and hazardous.
The comment claiming that this is needed to handle the error queue going
away is certainly misguided, in any case.
Improve a couple of error message texts, and use
ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE to report loss of parallel worker
connection, since that's what's used in e.g. tqueue.c. (Maybe it would be
worth inventing a dedicated ERRCODE for this type of failure? But I do not
think ERRCODE_INTERNAL_ERROR is appropriate.)
Tom Lane [Mon, 1 Aug 2016 19:13:53 +0000 (15:13 -0400)]
Don't CHECK_FOR_INTERRUPTS between WaitLatch and ResetLatch.
This coding pattern creates a race condition, because if an interesting
interrupt happens after we've checked InterruptPending but before we reset
our latch, the latch-setting done by the signal handler would get lost,
and then we might block at WaitLatch in the next iteration without ever
noticing the interrupt condition. You can put the CHECK_FOR_INTERRUPTS
before WaitLatch or after ResetLatch, but not between them.
Aside from fixing the bugs, add some explanatory comments to latch.h
to perhaps forestall the next person from making the same mistake.
In HEAD, also replace gather_readnext's direct call of
HandleParallelMessages with CHECK_FOR_INTERRUPTS. It does not seem clean
or useful for this one caller to bypass ProcessInterrupts and go straight
to HandleParallelMessages; not least because that fails to consider the
InterruptPending flag, resulting in useless work both here
(if InterruptPending isn't set) and in the next CHECK_FOR_INTERRUPTS call
(if it is).
This thinko seems to have been introduced in the initial coding of
storage/ipc/shm_mq.c (commit ec9037df2), and then blindly copied into all
the subsequent parallel-query support logic. Back-patch relevant hunks
to 9.4 to extirpate the error everywhere.
Fujii Masao [Mon, 1 Aug 2016 17:43:17 +0000 (02:43 +0900)]
Remove unused arguments from pg_replication_origin_xact_reset function.
The document specifies that pg_replication_origin_xact_reset function
doesn't have any argument variables. But previously it was actually
defined so as to have two argument variables, though they were not
used at all. That is, the pg_proc entry for that function was incorrect.
This patch fixes the pg_proc entry and removes those two arguments
from the function definition.
No back-patch because this change needs a catalog version bump
although the issue exists in 9.5 as well. Instead, a note about those
unused argument variables will be added to 9.5 document later.
Catalog version bumped due to the change of pg_proc.
Fujii Masao [Mon, 1 Aug 2016 08:36:14 +0000 (17:36 +0900)]
Fix pg_basebackup so that it accepts 0 as a valid compression level.
The help message for pg_basebackup specifies that the numbers 0 through 9
are accepted as valid values of -Z option. But, previously -Z 0 was rejected
as an invalid compression level.
Per discussion, it's better to make pg_basebackup treat 0 as valid
compression level meaning no compression, like pg_dump.
Back-patch to all supported versions.
Reported-By: Jeff Janes Reviewed-By: Amit Kapila
Discussion: CAMkU=1x+GwjSayc57v6w87ij6iRGFWt=hVfM0B64b1_bPVKRqg@mail.gmail.com
Tom Lane [Sun, 31 Jul 2016 22:32:34 +0000 (18:32 -0400)]
Doc: remove claim that hash index creation depends on effective_cache_size.
This text was added by commit ff213239c, and not long thereafter obsoleted
by commit 4adc2f72a (which made the test depend on NBuffers instead); but
nobody noticed the need for an update. Commit 9563d5b5e adds some further
dependency on maintenance_work_mem, but the existing verbiage seems to
cover that with about as much precision as we really want here. Let's
just take it all out rather than leaving ourselves open to more errors of
omission in future. (That solution makes this change back-patchable, too.)
Tom Lane [Sun, 31 Jul 2016 20:05:12 +0000 (16:05 -0400)]
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
Stephen Frost [Sun, 31 Jul 2016 14:57:15 +0000 (10:57 -0400)]
Correctly handle owned sequences with extensions
With the refactoring of pg_dump to handle components, getOwnedSeqs needs
to be a bit more intelligent regarding which components to dump when.
Specifically, we can't simply use the owning table's components as the
set of components to dump as the table might only be including certain
components while all components of the sequence should be dumped, for
example, when the table is a member of an extension while the sequence
is not.
Handle this by combining the set of components to be dumped for the
sequence explicitly and those to be dumped for the table when setting
the components to be dumped for the sequence.
Also add a number of regression tests around this to, hopefully, catch
any future changes which break the expected behavior.
Discovered by: Philippe BEAUDOIN
Reviewed by: Michael Paquier