Alvaro Herrera [Mon, 30 May 2016 18:47:22 +0000 (14:47 -0400)]
Fix PageAddItem BRIN bug
BRIN was relying on the ability to remove a tuple from an index page,
then putting another tuple in the same line pointer. But PageAddItem
refuses to add a tuple beyond the first free item past the last used
item, and in particular, it rejects an attempt to add an item to an
empty page anywhere other than the first line pointer. PageAddItem
issues a WARNING and indicates to the caller that it failed, which in
turn causes the BRIN calling code to issue a PANIC, so the whole
sequence looks like this:
WARNING: specified item offset is too large
PANIC: failed to add BRIN tuple
To fix, create a new function PageAddItemExtended which is like
PageAddItem except that the two boolean arguments become a flags bitmap;
the "overwrite" and "is_heap" boolean flags in PageAddItem become
PAI_OVERWITE and PAI_IS_HEAP flags in the new function, and a new flag
PAI_ALLOW_FAR_OFFSET enables the behavior required by BRIN.
PageAddItem() retains its original signature, for compatibility with
third-party modules (other callers in core code are not modified,
either).
Also, in the belt-and-suspenders spirit, I added a new sanity check in
brinGetTupleForHeapBlock to raise an error if an TID found in the revmap
is not marked as live by the page header. This causes it to react with
"ERROR: corrupted BRIN index" to the bug at hand, rather than a hard
crash.
Backpatch to 9.5.
Bug reported by Andreas Seltenreich as detected by his handy sqlsmith
fuzzer.
Discussion: https://www.postgresql.org/message-id/87mvni77jh.fsf@elite.ansel.ydns.eu
Tom Lane [Sun, 29 May 2016 17:18:48 +0000 (13:18 -0400)]
Fix missing abort checks in pg_backup_directory.c.
Parallel restore from directory format failed to respond to control-C
in a timely manner, because there were no checkAborting() calls in the
code path that reads data from a file and sends it to the backend.
If any worker was in the midst of restoring data for a large table,
you'd just have to wait.
This fix doesn't do anything for the problem of aborting a long-running
server-side command, but at least it fixes things for data transfers.
Back-patch to 9.3 where parallel restore was introduced.
This was effectively dead code, since the places that tested it could not
be reached after we entered the on-exit-cleanup routine that would set it.
It seems to have been a leftover from a design in which error abort would
try to send fresh commands to the workers --- a design which could never
have worked reliably, of course. Since the flag is not cross-platform, it
complicates reasoning about the code's behavior, which we could do without.
Although this is effectively just cosmetic, back-patch anyway, because
there are some actual bugs in the vicinity of this behavior.
Tom Lane [Sat, 28 May 2016 18:02:11 +0000 (14:02 -0400)]
Lots of comment-fixing, and minor cosmetic cleanup, in pg_dump/parallel.c.
The commentary in this file was in extremely sad shape. The author(s)
had clearly never heard of the project convention that a function header
comment should provide an API spec of some sort for that function. Much
of it was flat out wrong, too --- maybe it was accurate when written, but
if so it had not been updated to track subsequent code revisions. Rewrite
and rearrange to try to bring it up to speed, and annotate some of the
places where more work is needed. (I've refrained from actually fixing
anything of substance ... yet.)
Also, rename a couple of functions for more clarity as to what they do,
do some very minor code rearrangement, remove some pointless Asserts,
fix an incorrect Assert in readMessageFromPipe, and add a missing socket
close in one error exit from pgpipe(). The last would be a bug if we
tried to continue after pgpipe() failure, but since we don't, it's just
cosmetic at present.
Although this is only cosmetic, back-patch to 9.3 where parallel.c was
added. It's sufficiently invasive that it'll pose a hazard for future
back-patching if we don't.
Tom Lane [Fri, 27 May 2016 16:02:09 +0000 (12:02 -0400)]
Clean up thread management in parallel pg_dump for Windows.
Since we start the worker threads with _beginthreadex(), we should use
_endthreadex() to terminate them. We got this right in the normal-exit
code path, but not so much during an error exit from a worker.
In addition, be sure to apply CloseHandle to the thread handle after
each thread exits.
It's not clear that these oversights cause any user-visible problems,
since the pg_dump run is about to terminate anyway. Still, it's clearly
better to follow Microsoft's API specifications than ignore them.
Also a few cosmetic cleanups in WaitForTerminatingWorkers(), including
being a bit less random about where to cast between uintptr_t and HANDLE,
and being sure to clear the worker identity field for each dead worker
(not that false matches should be possible later, but let's be careful).
Original observation and patch by Armin Schöffmann, cosmetic improvements
by Michael Paquier and me. (Armin's patch also included closing sockets
in ShutdownWorkersHard(), but that's been dealt with already in commit df8d2d8c4.) Back-patch to 9.3 where parallel pg_dump was introduced.
Tom Lane [Fri, 27 May 2016 15:03:18 +0000 (11:03 -0400)]
Fix DROP ACCESS METHOD IF EXISTS.
The IF EXISTS option was documented, and implemented in the grammar, but
it didn't actually work for lack of support in does_not_exist_skipping().
Per bug #14160.
Tom Lane [Fri, 27 May 2016 14:40:20 +0000 (10:40 -0400)]
Be more predictable about reporting "lock timeout" vs "statement timeout".
If both timeout indicators are set when we arrive at ProcessInterrupts,
we've historically just reported "lock timeout". However, some buildfarm
members have been observed to fail isolationtester's timeouts test by
reporting "lock timeout" when the statement timeout was expected to fire
first. The cause seems to be that the process is allowed to sleep longer
than expected (probably due to heavy machine load) so that the lock
timeout happens before we reach the point of reporting the error, and
then this arbitrary tiebreak rule does the wrong thing. We can improve
matters by comparing the scheduled timeout times to decide which error
to report.
I had originally proposed greatly reducing the 1-second window between
the two timeouts in the test cases. On reflection that is a bad idea,
at least for the case where the lock timeout is expected to fire first,
because that would assume that it takes negligible time to get from
statement start to the beginning of the lock wait. Thus, this patch
doesn't completely remove the risk of test failures on slow machines.
Empirically, however, the case this handles is the one we are seeing
in the buildfarm. The explanation may be that the other case requires
the scheduler to take the CPU away from a busy process, whereas the
case fixed here only requires the scheduler to not give the CPU back
right away to a process that has been woken from a multi-second sleep
(and, perhaps, has been swapped out meanwhile).
Back-patch to 9.3 where the isolationtester timeouts test was added.
Magnus Hagander [Thu, 26 May 2016 20:14:23 +0000 (22:14 +0200)]
Make pg_dump error cleanly with -j against hot standby
Getting a synchronized snapshot is not supported on a hot standby node,
and is by default taken when using -j with multiple sessions. Trying to
do so still failed, but with a server error that would also go in the
log. Instead, proprely detect this case and give a better error message.
Tom Lane [Thu, 26 May 2016 18:52:24 +0000 (14:52 -0400)]
Disable physical tlist if any Var would need multiple sortgroupref labels.
As part of upper planner pathification (commit 3fc6e2d7f5b652b4) I redid
createplan.c's approach to the physical-tlist optimization, in which scan
nodes are allowed to return exactly the underlying table's columns so as
to save doing a projection step at runtime. The logic was intentionally
more aggressive than before about applying the optimization, which is
generally a good thing, but Andres Freund found a case in which it got
too aggressive. Namely, if any column is referenced more than once in
the parent plan node's sorting or grouping column list, we can't optimize
because then that column would need to have more than one ressortgroupref
label, and we only have space for one.
Add logic to detect this situation in use_physical_tlist(), and also add
some error checking in apply_pathtarget_labeling_to_tlist(), which this
example proves was being overly cavalier about whether what it was doing
made any sense.
The added test case exposes the problem only because we do not eliminate
duplicate grouping keys. That might be something to fix someday, but it
doesn't seem like appropriate post-beta work.
Tom Lane [Thu, 26 May 2016 15:51:04 +0000 (11:51 -0400)]
Make pg_dump behave more sanely when built without HAVE_LIBZ.
For some reason the code to emit a warning and switch to uncompressed
output was placed down in the guts of pg_backup_archiver.c. This is
definitely too late in the case of parallel operation (and I rather
wonder if it wasn't too late for other purposes as well). Put it in
pg_dump.c's option-processing logic, which seems a much saner place.
Also, the default behavior with custom or directory output format was
to emit the warning telling you the output would be uncompressed. This
seems unhelpful, so silence that case.
Back-patch to 9.3 where parallel dump was introduced.
Tom Lane [Thu, 26 May 2016 14:50:30 +0000 (10:50 -0400)]
In Windows pg_dump, ensure idle workers will shut down during error exit.
The Windows coding of ShutdownWorkersHard() thought that setting termEvent
was sufficient to make workers exit after an error. But that only helps
if a worker is busy and passes through checkAborting(). An idle worker
will just sit, resulting in pg_dump failing to exit until the user gives up
and hits control-C. We should close the write end of the command pipe
so that idle workers will see socket EOF and exit, as the Unix coding was
already doing.
Back-patch to 9.3 where parallel pg_dump was introduced.
Tom Lane [Wed, 25 May 2016 23:11:00 +0000 (19:11 -0400)]
Remove option to write USING before opclass name in CREATE INDEX.
Dating back to commit f10b63923, our grammar has allowed "USING" to
optionally appear before an opclass name in CREATE INDEX (and, lately,
some related places such as ON CONFLICT specifications). Nikolay Shaplov
noticed that this syntax existed but wasn't documented, and proposed
documenting it. But what seems like a better idea is to remove the
production, thereby making the code match the docs not vice versa.
This isn't our usual modus operandi for such cases, but there are a
couple of good reasons to proceed this way:
* So far as I can find, this syntax has never been documented anywhere.
It isn't relied on by any of our own code or test cases, and there seems
little reason to suppose that it's been used in the wild either.
* Documenting it would mean that there would be two separate uses of
USING in the CREATE INDEX syntax, the other being "USING access_method".
That can lead to nothing but confusion.
So, let's just remove it. On the off chance that somebody somewhere
is using it, this isn't something to back-patch, but we can fix it
in HEAD.
Tom Lane [Wed, 25 May 2016 21:48:15 +0000 (17:48 -0400)]
Ensure that backends see up-to-date statistics for shared catalogs.
Ever since we split the statistics collector's reports into per-database
files (commit 187492b6c2e8cafc), backends have been seeing stale statistics
for shared catalogs. This is because the inquiry message only prompts the
collector to write the per-database file for the requesting backend's own
database. Stats for shared catalogs are in a separate file for "DB 0",
which didn't get updated.
In normal operation this was partially masked by the fact that the
autovacuum launcher would send an inquiry message at least once per
autovacuum_naptime that asked for "DB 0"; so the shared-catalog stats would
never be more than a minute out of date. However the problem becomes very
obvious with autovacuum disabled, as reported by Peter Eisentraut.
To fix, redefine the semantics of inquiry messages so that both the
specified DB and DB 0 will be dumped. (This might seem a bit inefficient,
but we have no good way to know whether a backend's transaction will look
at shared-catalog stats, so we have to read both groups of stats whenever
we request stats. Sending two inquiry messages would definitely not be
better.)
Tom Lane [Wed, 25 May 2016 16:39:57 +0000 (12:39 -0400)]
Fix broken error handling in parallel pg_dump/pg_restore.
In the original design for parallel dump, worker processes reported errors
by sending them up to the master process, which would print the messages.
This is unworkably fragile for a couple of reasons: it risks deadlock if a
worker sends an error at an unexpected time, and if the master has already
died for some reason, the user will never get to see the error at all.
Revert that idea and go back to just always printing messages to stderr.
This approach means that if all the workers fail for similar reasons (eg,
bad password or server shutdown), the user will see N copies of that
message, not only one as before. While that's slightly annoying, it's
certainly better than not seeing any message; not to mention that we
shouldn't assume that only the first failure is interesting.
An additional problem in the same area was that the master failed to
disable SIGPIPE (at least until much too late), which meant that sending a
command to an already-dead worker would cause the master to crash silently.
That was bad enough in itself but was made worse by the total reliance on
the master to print errors: even if the worker had reported an error, you
would probably not see it, depending on timing. Instead disable SIGPIPE
right after we've forked the workers, before attempting to send them
anything.
Additionally, the master relies on seeing socket EOF to realize that a
worker has exited prematurely --- but on Windows, there would be no EOF
since the socket is attached to the process that includes both the master
and worker threads, so it remains open. Make archive_close_connection()
close the worker end of the sockets so that this acts more like the Unix
case. It's not perfect, because if a worker thread exits without going
through exit_nicely() the closures won't happen; but that's not really
supposed to happen.
This has been wrong all along, so back-patch to 9.3 where parallel dump
was introduced.
Tom Lane [Wed, 25 May 2016 01:04:23 +0000 (21:04 -0400)]
Fix contrib/bloom to work for unlogged indexes.
blbuildempty did not do even approximately the right thing: it tried
to add a metapage to the relation's regular data fork, which already
has one at that point. It should look like the ambuildempty methods
for all the standard index types, ie, initialize a metapage image in
some transient storage and then write it directly to the init fork.
To support that, refactor BloomInitMetapage into two functions.
In passing, fix BloomInitMetapage so it doesn't leave the rd_options
field of the index's relcache entry pointing at transient storage.
I'm not sure this had any visible consequence, since nothing much
else is likely to look at a bloom index's rd_options, but it's
certainly poor practice.
Stephen Frost [Wed, 25 May 2016 00:10:16 +0000 (20:10 -0400)]
Qualify table usage in dumpTable() and use regclass
All of the other tables used in the query in dumpTable(), which is
collecting column-level ACLs, are qualified, so we should be qualifying
the pg_init_privs, the related sub-select against pg_class and the
other queries added by the pg_dump catalog ACLs work.
Also, use ::regclass (or ::pg_catalog.regclass, where appropriate)
instead of using a poorly constructed query to get the OID for various
catalog tables.
Issues identified by Noah and Alvaro, patch by me.
Tom Lane [Tue, 24 May 2016 19:47:51 +0000 (15:47 -0400)]
Fetch XIDs atomically during vac_truncate_clog().
Because vac_update_datfrozenxid() updates datfrozenxid and datminmxid
in-place, it's unsafe to assume that successive reads of those values will
give consistent results. Fetch each one just once to ensure sane behavior
in the minimum calculation. Noted while reviewing Alexander Korotkov's
patch in the same area.
Tom Lane [Tue, 24 May 2016 19:20:12 +0000 (15:20 -0400)]
Avoid consuming an XID during vac_truncate_clog().
vac_truncate_clog() uses its own transaction ID as the comparison point in
a sanity check that no database's datfrozenxid has already wrapped around
"into the future". That was probably fine when written, but in a lazy
vacuum we won't have assigned an XID, so calling GetCurrentTransactionId()
causes an XID to be assigned when otherwise one would not be. Most of the
time that's not a big problem ... but if we are hard up against the
wraparound limit, consuming XIDs during antiwraparound vacuums is a very
bad thing.
Instead, use ReadNewTransactionId(), which not only avoids this problem
but is in itself a better comparison point to test whether wraparound
has already occurred.
Report and patch by Alexander Korotkov. Back-patch to all versions.
Alvaro Herrera [Tue, 24 May 2016 18:55:34 +0000 (14:55 -0400)]
Fix range check for effective_io_concurrency
Commit 1aba62ec moved the range check of that option form guc.c into
bufmgr.c, but introduced a bug by changing a >= 0.0 to > 0.0, which made
the value 0 no longer accepted. Put it back.
Tom Lane [Tue, 24 May 2016 17:30:40 +0000 (13:30 -0400)]
In examples of Oracle PL/SQL code, use varchar2 not varchar.
Oracle recommends using VARCHAR2 not VARCHAR, allegedly because they might
someday change VARCHAR to be spec-compliant about distinguishing null from
empty string. (I'm not holding my breath, though.) Our examples of PL/SQL
code were using VARCHAR, which while not wrong is missing the pedagogical
opportunity to talk about converting Oracle type names to Postgres. So
switch the examples to use VARCHAR2, and add some text about what to do
with common Oracle type names like VARCHAR2 and NUMBER. (There is probably
more to be said here, but those are the ones I'm sure about offhand.)
Per suggestion from rapg12@gmail.com.
Tom Lane [Mon, 23 May 2016 23:41:11 +0000 (19:41 -0400)]
Fix BTREE_BUILD_STATS build.
Commit 65c5fcd353a859da9e61bfb2b92a99f12937de3b broke this by removing a
header include directive that is conditionally required. Add that back
to nbtree.c, with annotation to keep pgrminclude from re-breaking it.
Tom Lane [Mon, 23 May 2016 23:08:26 +0000 (19:08 -0400)]
Add support for more extensive testing of raw_expression_tree_walker().
If RAW_EXPRESSION_COVERAGE_TEST is defined, do a no-op tree walk over
every basic DML statement submitted to parse analysis. If we'd had this
in place earlier, bug #14153 would have been caught by buildfarm testing.
The difficulty is that raw_expression_tree_walker() is only used in
limited cases involving CTEs (particularly recursive ones), so it's
very easy for an oversight in it to not be noticed during testing of a
seemingly-unrelated feature.
The type of error we can expect to catch with this is complete omission
of a node type from raw_expression_tree_walker(), and perhaps also
recursion into a field that doesn't contain a node tree, though that
would be an unlikely mistake. It won't catch failure to add new fields
that need to be recursed into, unfortunately.
I'll go enable this on one or two of my own buildfarm animals once
bug #14153 is dealt with.
Tom Lane [Mon, 23 May 2016 18:16:40 +0000 (14:16 -0400)]
Fix latent crash in do_text_output_multiline().
do_text_output_multiline() would fail (typically with a null pointer
dereference crash) if its input string did not end with a newline. Such
cases do not arise in our current sources; but it certainly could happen
in future, or in extension code's usage of the function, so we should fix
it. To fix, replace "eol += len" with "eol = text + len".
While at it, make two cosmetic improvements: mark the input string const,
and rename the argument from "text" to "txt" to dodge pgindent strangeness
(since "text" is a typedef name).
Even though this problem is only latent at present, it seems like a good
idea to back-patch the fix, since it's a very simple/safe patch and it's
not out of the realm of possibility that we might in future back-patch
something that expects sane behavior from do_text_output_multiline().
Tom Lane [Sat, 21 May 2016 16:55:31 +0000 (12:55 -0400)]
Improve docs about using ORDER BY to control aggregate input order.
David Johnston pointed out that the original text here had been obsoleted
by SQL:2008, which allowed ORDER BY in subqueries. We could weaken the
text to describe ORDER-BY-in-subqueries as an optional SQL feature that's
possibly unportable; but then the exact same statements would apply to
the alternative it's being compared to (ORDER-BY-in-aggregate-calls).
So really that would be pretty useless; let's just take out the sentence
entirely. Instead, point out the hazard that any extra processing in the
upper query might cause the subquery output order to be destroyed.
Tom Lane [Thu, 19 May 2016 18:40:02 +0000 (14:40 -0400)]
Pin the built-in index access methods.
This was overlooked in commit 473b93287, which introduced DROP ACCESS
METHOD. Although that command is restricted to superusers, we don't want
even superusers dropping the built-in methods; "DROP ACCESS METHOD btree"
in particular is unrecoverable from. Pin these objects in the same way
that other initdb-created objects are pinned.
I chose to bump catversion for this fix. That's not absolutely necessary
perhaps, but it will ensure that no 9.6 production systems are missing
the pin entries.
Tom Lane [Tue, 17 May 2016 21:01:18 +0000 (17:01 -0400)]
Avoid possible crash in contrib/bloom's blendscan().
It's possible to begin and end an indexscan without ever calling
amrescan. contrib/bloom, unlike every other index AM, allocated
its "scan->opaque" storage at amrescan time, and thus would crash
in amendscan if amrescan hadn't been called. We could fix this
by putting in a null-pointer check in blendscan, but I see no very
good reason why contrib/bloom should march to its own drummer in
this respect. Let's move that initialization to blbeginscan
instead. Per report from Jeff Janes.
Robert Haas [Mon, 16 May 2016 15:28:28 +0000 (11:28 -0400)]
postgres_fdw: Fix the fix for crash when pushing down multiple joins.
Commit 3151f16e1874db82ed85a005dac15368903ca9fb was intended to be
a commit of a patch from Ashutosh Bapat, but instead I mistakenly
committed an earlier version from Michael Paquier (because both
patches were submitted with the same filename, and I confused them).
Michael's patch fixes the crash but doesn't actually implement the
correct test.
Repair the incorrect logic, and also expand the comments considerably
so that this is all more clear.
Robert Haas [Mon, 16 May 2016 15:19:10 +0000 (11:19 -0400)]
Fix multiple problems in postgres_fdw query cancellation logic.
First, even if we cancel a query, we still have to roll back the
containing transaction; otherwise, the session will be left in a
failed transaction state.
Second, we need to support canceling queries whe aborting a
subtransaction as well as when aborting a toplevel transaction.
Tom Lane [Fri, 13 May 2016 00:04:12 +0000 (20:04 -0400)]
Ensure plan stability in contrib/btree_gist regression test.
Buildfarm member skink failed with symptoms suggesting that an
auto-analyze had happened and changed the plan displayed for a
test query. Although this is evidently of low probability,
regression tests that sometimes fail are no fun, so add commands
to force a bitmap scan to be chosen.
Alvaro Herrera [Thu, 12 May 2016 19:02:49 +0000 (16:02 -0300)]
Fix bogus comments
Some comments mentioned XLogReplayBuffer, but there's no such function:
that was an interim name for a function that got renamed to
XLogReadBufferForRedo, before commit 2c03216d831160 was pushed.
Tom Lane [Wed, 11 May 2016 21:06:53 +0000 (17:06 -0400)]
Fix infer_arbiter_indexes() to not barf on system columns.
While it could be argued that rejecting system column mentions in the
ON CONFLICT list is an unsupported feature, falling over altogether
just because the table has a unique index on OID is indubitably a bug.
As far as I can tell, fixing infer_arbiter_indexes() is sufficient to
make ON CONFLICT (oid) actually work, though making a regression test
for that case is problematic because of the impossibility of setting
the OID counter to a known value.
Tom Lane [Wed, 11 May 2016 20:20:03 +0000 (16:20 -0400)]
Fix assorted missing infrastructure for ON CONFLICT.
subquery_planner() failed to apply expression preprocessing to the
arbiterElems and arbiterWhere fields of an OnConflictExpr. No doubt the
theory was that this wasn't necessary because we don't actually try to
execute those expressions; but that's wrong, because it results in failure
to match to index expressions or index predicates that are changed at all
by preprocessing. Per bug #14132 from Reynold Smith.
Also add pullup_replace_vars processing for onConflictWhere. Perhaps
it's impossible to have a subquery reference there, but I'm not exactly
convinced; and even if true today it's a failure waiting to happen.
Also add some comments to other places where one or another field of
OnConflictExpr is intentionally ignored, with explanation as to why it's
okay to do so.
Also, catalog/dependency.c failed to record any dependency on the named
constraint in ON CONFLICT ON CONSTRAINT, allowing such a constraint to
be dropped while rules exist that depend on it, and allowing pg_dump to
dump such a rule before the constraint it refers to. The normal execution
path managed to error out reasonably for a dangling constraint reference,
but ruleutils.c dumped core; so in addition to fixing the omission, add
a protective check in ruleutils.c, since we can't retroactively add a
dependency in existing databases.
Alvaro Herrera [Tue, 10 May 2016 19:23:54 +0000 (16:23 -0300)]
Fix autovacuum for shared relations
The table-skipping logic in autovacuum would fail to consider that
multiple workers could be processing the same shared catalog in
different databases. This normally wouldn't be a problem: firstly
because autovacuum workers not for wraparound would simply ignore tables
in which they cannot acquire lock, and secondly because most of the time
these tables are small enough that even if multiple for-wraparound
workers are stuck in the same catalog, they would be over pretty
quickly. But in cases where the catalogs are severely bloated it could
become a problem.
Backpatch all the way back, because the problem has been there since the
beginning.
Tom Lane [Sun, 8 May 2016 20:36:19 +0000 (16:36 -0400)]
Docs: create some user-facing documentation about index-only scans.
We didn't have any real user documentation about how index-only scans
work or how to design indexes to exploit them. Remedy that.
Per gripe from David Johnston.
Tom Lane [Sat, 7 May 2016 20:36:50 +0000 (16:36 -0400)]
In new pg_dump TAP tests, remove trailing "$" from regexps using /m.
It emerges that some Perl versions before 5.8.9 have a bug with regexps
that use the /m flag and contain "$". This is the reason why jacana
is still failing on HEAD, and I was able to duplicate the failure on
prairiedog's host. There's no real need for "$" in these patterns,
since they are already matching through the statement-terminating
semicolons (or matching an explicit \n in some cases). So just
remove it.
Note: the reason jacana hasn't actually reported any failures in the
last little while is that the way the pg_dump TAP tests are set up, any
failure of this sort results in echoing the entire pg_dump dump output
to stderr. Since there were about a hundred such failures, that resulted
in a 30MB log file which choked the buildfarm upload script. There is
room for improvement here :-(.
Tom Lane [Sat, 7 May 2016 17:16:50 +0000 (13:16 -0400)]
Docs: improve warnings about nextval() not producing gapless sequences.
In the documentation for nextval(), point out explicitly that INSERT ...
ON CONFLICT will call nextval() if needed for the insertion case, whether
or not it ends up following the ON CONFLICT path. This seems to be a
matter of some confusion, cf bug #14126, so let's be clear about it.
Also mention the issue in the CREATE SEQUENCE reference page, since that
is another place where people might expect such things to be covered.
Minor wording improvements nearby, as well.
Back-patch to 9.5 where ON CONFLICT was introduced.
Tom Lane [Sat, 7 May 2016 02:05:51 +0000 (22:05 -0400)]
Fix pg_upgrade to not fail when new-cluster TOAST rules differ from old.
This patch essentially reverts commit 4c6780fd17aa43ed, in favor of a much
simpler solution for the case where the new cluster would choose to create
a TOAST table but the old cluster doesn't have one: just don't create a
TOAST table.
The existing code failed in at least two different ways if the situation
arose: (1) ALTER TABLE RESET didn't grab an exclusive lock, so that the
lock sanity check in create_toast_table failed; (2) pg_upgrade did not
provide a pg_type OID for the new toast table, so that the crosscheck in
TypeCreate failed. While both these problems were introduced by later
patches, they show that the hack being used to cause TOAST table creation
is overwhelmingly fragile (and untested). I also note that before the
TypeCreate crosscheck was added, the code would have resulted in assigning
an indeterminate pg_type OID to the toast table, possibly causing a later
OID conflict in that catalog; so that it didn't really work even when
committed.
If we simply don't create a TOAST table, there will only be a problem if
the code tries to store a tuple that's wider than a page, and field
compression isn't sufficient to get it under a page. Given that the TOAST
creation threshold is intended to be about a quarter of a page, it's very
hard to believe that cross-version differences in the do-we-need-a-toast-
table heuristic could result in an observable problem. So let's just
follow the old version's conclusion about whether a TOAST table is needed.
(If we ever do change needs_toast_table() so much that this conclusion
doesn't apply, we can devise a solution at that time, and hopefully do
it in a less klugy way than 4c6780fd17aa43ed did.)
Stephen Frost [Sat, 7 May 2016 01:24:31 +0000 (21:24 -0400)]
Disable BLOB test in pg_dump TAP tests
Buildfarm member jacana appears to have an issue with running this
test. It's not entirely clear to me why, but rather than try to
fight with it, just disable it for now.
None of the other tests try to write out from psql directly as
this test does, so it seems likely that the rest of the tests will
be fine (as they have been on numerous other systems).
Kevin Grittner [Sat, 7 May 2016 01:05:29 +0000 (20:05 -0500)]
Mitigate "snapshot too old" performance regression on NUMA
Limit maintenance of time to xid mapping to once per minute. At
least in the tested case this brings performance within 5% of when
the feature is off, compared to several times slower without this
patch.
While there, fix comments and whitespace.
Ants Aasma, with cosmetic adjustments suggested by Andres Freund
Reviewed by Kevin Grittner and Andres Freund
Tom Lane [Fri, 6 May 2016 21:42:44 +0000 (17:42 -0400)]
Docs: minor copy-editing for GSSAPI/SSPI authentication docs.
Describe compat_realm = 0 as "disabled" not "enabled", per discussion
with Christian Ullrich. I failed to resist the temptation to do some
other minor copy-editing in the same area.
Stephen Frost [Fri, 6 May 2016 20:39:56 +0000 (16:39 -0400)]
Add test_pg_dump to @contrib_excludes
The test_pg_dump extension doesn't have a C component, so we need
to exclude it from the MSVC build system trying to figure out how
to build it.
Also add a "MODULES" line to the Makefile, as test_extensions has.
Might not be necessary, but seems good to keep things consistent.
Lastly, remove the 'installcheck' line from test_pg_dump, as that
was causing redefinition errors, at least on my box. This also
makes test_pg_dump consistent with how commit_ts is set up.
Stephen Frost [Fri, 6 May 2016 19:26:57 +0000 (15:26 -0400)]
Remove MODULES_big from test_pg_dump
The Makefile for test_pg_dump shouldn't have a MODULES_big line
because there's no actual compiled bit for that extension. Hopefully
this will fix the Windows buildfarm members which were complaining.
In passing, also add the 'prove_installcheck' bit to the pg_dump and
test_pg_dump Makefiles, to get the buildfarm members to actually run
those tests.
Robert Haas [Fri, 6 May 2016 19:00:55 +0000 (15:00 -0400)]
Minimal fix for crash bug in quals_match_foreign_key.
Discussion is still underway as to whether to revert the entire patch
that added this function, but that discussion may not conclude before
beta1. So, in the meantime, let's do at least this much.
Robert Haas [Fri, 6 May 2016 18:43:34 +0000 (14:43 -0400)]
Limit maximum parallel degree to 1024.
This new limit affects both the max_parallel_degree GUC and the
parallel_degree reloption. There may some day be a use case for using
more than 1024 CPUs for a single query, but that's surely not the case
right now. Not only do not very many people have that many CPUs, but
the code hasn't been tested at that kind of scale and is very unlikely
to perform well, or even work at all, without a lot more work. The
issue addressed by commit 06bd458cb812623c3f1fdd55216c4c08b06a8447 is
probably just one problem of many.
The idea of a more reasonable limit here was suggested by Tom Lane;
the value of 1024 was suggested by Amit Kapila.
Tom Lane [Fri, 6 May 2016 18:23:45 +0000 (14:23 -0400)]
Improve pg_upgrade's report about failure to match up old and new tables.
Ordinarily, pg_upgrade shouldn't have any difficulty in matching up all
the relations it sees in the old and new databases. If it does, however,
it just goes belly-up with a pretty unhelpful error message. That seemed
fine as long as we expected the case never to occur in the wild, but
Alvaro reported that it had been seen in a database whose pg_largeobject
table had somehow acquired a TOAST table. That doesn't quite seem like
a case that pg_upgrade actually needs to handle, but it would be good if
the report were more diagnosable. Hence, extend the logic to print out
as much information as we can about the mismatch(es) before we quit.
In passing, improve the readability of get_rel_infos()'s data collection
query, which had suffered seriously from lets-not-bother-to-update-comments
syndrome, and generally was unnecessarily disrespectful to readers.
It could be argued that this is a bug fix, but given that we have so few
reports, I don't feel a need to back-patch; at least not before this has
baked awhile in HEAD.
Robert Haas [Fri, 6 May 2016 18:23:47 +0000 (14:23 -0400)]
Use mul_size when multiplying by the number of parallel workers.
That way, if the result overflows size_t, you'll get an error instead
of undefined behavior, which seems like a plus. This also has the
effect of casting the number of workers from int to Size, which is
better because it's harder to overflow int than size_t.
Dilip Kumar reported this issue and provided a patch upon which this
patch is based, but his version did use mul_size.
Stephen Frost [Fri, 6 May 2016 18:06:50 +0000 (14:06 -0400)]
Remove various special checks around default roles
Default roles really should be like regular roles, for the most part.
This removes a number of checks that were trying to make default roles
extra special by not allowing them to be used as regular roles.
We still prevent users from creating roles in the "pg_" namespace or
from altering roles which exist in that namespace via ALTER ROLE, as
we can't preserve such changes, but otherwise the roles are very much
like regular roles.
Stephen Frost [Fri, 6 May 2016 18:06:50 +0000 (14:06 -0400)]
Add TAP tests for pg_dump
This TAP test suite will create a new cluster, populate it based on
the 'create_sql' values in the '%tests' hash, run all of the runs
defined in the '%pgdump_runs' hash, and then for each test in the
'%tests' hash, compare each run's output the the regular expression
defined for the test under the 'like' and 'unlike' functions, as
appropriate.
While this test suite covers a fair bit of ground (67% of pg_dump.c
and quite a bit of the other files in src/bin/pg_dump), there is
still quite a bit which remains to be added to provide better code
coverage. Still, this is quite a bit better than we had, and has
found a few bugs already (note that the CREATE TRANSFORM test is
commented out, as it is currently failing).
Idea for using the TAP system from Tom, though all of the code is mine.
Stephen Frost [Fri, 6 May 2016 18:06:50 +0000 (14:06 -0400)]
Only issue LOCK TABLE commands when necessary
Reviewing the cases where we need to LOCK a given table during a dump,
it was pointed out by Tom that we really don't need to LOCK a table if
we are only looking to dump the ACL for it, or certain other
components. After reviewing the queries run for all of the component
pieces, a list of components were determined to not require LOCK'ing
of the table.
This implements a check to avoid LOCK'ing those tables.
Initial complaint from Rushabh Lathia, discussed with Robert and Tom,
the patch is mine.
Stephen Frost [Fri, 6 May 2016 18:06:50 +0000 (14:06 -0400)]
pg_dump performance and other fixes
Do not try to dump objects which do not have ACLs when only ACLs are
being requested. This results in a significant performance improvement
as we can avoid querying for further information on these objects when
we don't need to.
When limiting the components to dump for an extension, consider what
components have been requested. Initially, we incorrectly hard-coded
the components of the extension objects to dump, which would mean that
we wouldn't dump some components even with they were asked for and in
other cases we would dump components which weren't requested.
Correct defaultACLs to use 'dump_contains' instead of 'dump'. The
defaultACL is considered a member of the namespace and should be
dumped based on the same set of components that the other objects in
the schema are, not based on what we're dumping for the namespace
itself (which might not include ACLs, if the namespace has just the
default or initial ACL).
Use DUMP_COMPONENT_ACL for from-initdb objects, to allow users to
change their ACLs, should they wish to. This just extends what we
are doing for the pg_catalog namespace to objects which are not
members of namespaces.
Due to column ACLs being treated a bit differently from other ACLs
(they are actually reset to NULL when all privileges are revoked),
adjust the query which gathers column-level ACLs to consider all of
the ACL-relevant columns.
Stephen Frost [Fri, 6 May 2016 18:06:50 +0000 (14:06 -0400)]
Correct pg_dump WHERE clause for functions/aggregates
The query to grab the function/aggregate information is now joining
to pg_init_privs, so we can simplify (and correct) the WHERE clause
used to determine if a given function's ACL has changed from the
initial ACL on the function.
Tom Lane [Fri, 6 May 2016 16:09:20 +0000 (12:09 -0400)]
Fix possible read past end of string in to_timestamp().
to_timestamp() handles the TH/th format codes by advancing over two input
characters, whatever those are. It failed to notice whether there were
two characters available to be skipped, making it possible to advance
the pointer past the end of the input string and keep on parsing.
A similar risk existed in the handling of "Y,YYY" format: it would advance
over three characters after the "," whether or not three characters were
available.
In principle this might be exploitable to disclose contents of server
memory. But the security team concluded that it would be very hard to use
that way, because the parsing loop would stop upon hitting any zero byte,
and TH/th format codes can't be consecutive --- they have to follow some
other format code, which would have to match whatever data is there.
So it seems impractical to examine memory very much beyond the end of the
input string via this bug; and the input string will always be in local
memory not in disk buffers, making it unlikely that anything very
interesting is close to it in a predictable way. So this doesn't quite
rise to the level of needing a CVE.
Tom Lane [Fri, 6 May 2016 15:01:05 +0000 (11:01 -0400)]
Improve handling of numeric-valued variables in pgbench.
The previous coding always stored variable values as strings, doing
conversion on-the-fly when a numeric value was needed or a number was to be
assigned. This was a bit inefficient and risked loss of precision for
floating-point values. The precision aspect had been hacked around by
printing doubles in "%.18e" format, which is ugly and has machine-dependent
results. Instead, arrange to preserve an assigned numeric value in the
original binary numeric format, converting to string only when and if
needed. When we do need to convert a double to string, convert in "%g"
format with DBL_DIG precision, which is the standard way to do it and
produces the least surprising results in most cases.
The implementation supports storing both a string value and a numeric
value for any one variable, with lazy conversion between them. I also
arranged for lazy re-sorting of the variable array when new variables are
added. That was mainly to allow a clean refactoring of putVariable()
into two levels of subroutine, but it may allow us to save a few sorts.
Kevin Grittner [Fri, 6 May 2016 12:47:12 +0000 (07:47 -0500)]
Fix hash index vs "snapshot too old" problemms
Hash indexes are not WAL-logged, and so do not maintain the LSN of
index pages. Since the "snapshot too old" feature counts on
detecting error conditions using the LSN of a table and all indexes
on it, this makes it impossible to safely do early vacuuming on any
table with a hash index, so add this to the tests for whether the
xid used to vacuum a table can be adjusted based on
old_snapshot_threshold.
While at it, add a paragraph to the docs for old_snapshot_threshold
which specifically mentions this and other aspects of the feature
which may otherwise surprise users.
Problem reported and patch reviewed by Amit Kapila
Dean Rasheed [Fri, 6 May 2016 11:48:27 +0000 (12:48 +0100)]
Fix psql's \ev and \sv commands so that they handle view reloptions.
Commit 8eb6407aaeb6cbd972839e356b436bb698f51cff added support for
editing and showing view definitions, but neglected to account for
view options such as security_barrier and WITH CHECK OPTION which are
not returned by pg_get_viewdef() and so need special handling.
Author: Dean Rasheed Reviewed-by: Peter Eisentraut
Discussion: http://www.postgresql.org/message-id/CAEZATCWZjCgKRyM-agE0p8ax15j9uyQoF=qew7D2xB6cF76T8A@mail.gmail.com
Dean Rasheed [Fri, 6 May 2016 11:45:36 +0000 (12:45 +0100)]
Move and rename fmtReloptionsArray().
Move fmtReloptionsArray() from pg_dump.c to string_utils.c so that it
is available to other frontend code. In particular psql's \ev and \sv
commands need it to handle view reloptions. Also rename the function
to appendReloptionsArray(), which is a more accurate description of
what it does.
Author: Dean Rasheed Reviewed-by: Peter Eisentraut
Discussion: http://www.postgresql.org/message-id/CAEZATCWZjCgKRyM-agE0p8ax15j9uyQoF=qew7D2xB6cF76T8A@mail.gmail.com
Tom Lane [Fri, 6 May 2016 02:37:30 +0000 (22:37 -0400)]
Further 9.6 release note improvements.
Call out the major enhancements in this release as identified by
pgsql-advocacy discussion, and rearrange some of the entries to
make those items more prominent. Other minor improvements per
advice from Vitaly Burovoy, Masahiko Sawada, Peter Geoghegan,
and Andres Freund.