Tom Lane [Wed, 6 Dec 2017 03:40:05 +0000 (22:40 -0500)]
Adjust regression test cases added by commit ab7271677.
I suppose it is a copy-and-paste error that this test doesn't actually
test the "Parallel Append with both partial and non-partial subplans"
case (EXPLAIN alone surely doesn't qualify as a test of executor
behavior). Fix that.
Also, add cosmetic aliases to make it possible to tell apart these
otherwise-identical test cases in log_statement output.
Remove the designation that Flex is a GNU package. Even though Bison is
a GNU package, leave out the designation to not make the sentence
unnecessarily complicated.
Robert Haas [Tue, 5 Dec 2017 22:28:39 +0000 (17:28 -0500)]
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Robert Haas [Tue, 5 Dec 2017 19:35:33 +0000 (14:35 -0500)]
Fix accumulation of parallel worker instrumentation.
When a Gather or Gather Merge node is started and stopped multiple
times, the old code wouldn't reset the shared state between executions,
potentially resulting in dramatically inflated instrumentation data
for nodes beneath it. (The per-worker instrumentation ended up OK,
I think, but the overall totals were inflated.)
Report by hubert depesz lubaczewski. Analysis and fix by Amit Kapila,
reviewed and tweaked a bit by me.
Andres Freund [Tue, 5 Dec 2017 18:55:56 +0000 (10:55 -0800)]
Fix EXPLAIN ANALYZE of hash join when the leader doesn't participate.
If a hash join appears in a parallel query, there may be no hash table
available for explain.c to inspect even though a hash table may have
been built in other processes. This could happen either because
parallel_leader_participation was set to off or because the leader
happened to hit the end of the outer relation immediately (even though
the complete relation is not empty) and decided not to build the hash
table.
Commit bf11e7ee introduced a way for workers to exchange
instrumentation via the DSM segment for Sort nodes even though they
are not parallel-aware. This commit does the same for Hash nodes, so
that explain.c has a way to find instrumentation data from an
arbitrary participant that actually built the hash table.
Author: Thomas Munro Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D3DUQC2-z252N55eOcZBer6DPdM%3DFzrxH9dZc5vYLsjaA%40mail.gmail.com
Robert Haas [Tue, 5 Dec 2017 16:19:45 +0000 (11:19 -0500)]
postgres_fdw: Judge password use by run-as user, not session user.
This is a backward incompatibility which should be noted in the
release notes for PostgreSQL 11.
For security reasons, we require that a postgres_fdw foreign table use
password authentication when accessing a remote server, so that an
unprivileged user cannot usurp the server's credentials. Superusers
are exempt from this requirement, because we assume they are entitled
to usurp the server's credentials or, at least, can find some other
way to do it.
But what should happen when the foreign table is accessed by a view
owned by a user different from the session user? Is it the view owner
that must be a superuser in order to avoid the requirement of using a
password, or the session user? Historically it was the latter, but
this requirement makes it the former instead. This allows superusers
to delegate to other users the right to select from a foreign table
that doesn't use password authentication by creating a view over the
foreign table and handing out rights to the view. It is also more
consistent with the idea that access to a view should use the view
owner's privileges rather than the session user's privileges.
The upshot of this change is that a superuser selecting from a view
created by a non-superuser may now get an error complaining that no
password was used, while a non-superuser selecting from a view
created by a superuser will no longer receive such an error.
No documentation changes are present in this patch because the
wording of the documentation already suggests that it works this
way. We should perhaps adjust the documentation in the back-branches,
but that's a task for another patch.
Originally proposed by Jeff Janes, but with different semantics;
adjusted to work like this by me per discussion.
Tom Lane [Tue, 5 Dec 2017 01:52:48 +0000 (20:52 -0500)]
Treat directory open failures as hard errors in ResetUnloggedRelations().
Previously, this code just reported such problems at LOG level and kept
going. The problem with this approach is that transient failures (e.g.,
ENFILE) could prevent us from resetting unlogged relations to empty,
yet allow recovery to appear to complete successfully. That seems like
a data corruption hazard large enough to treat such problems as reasons
to fail startup.
For the same reason, treat unlink failures for unlogged files as hard
errors not just LOG messages. It's a little odd that we did it like that
when file-level errors in other steps (copy_file, fsync_fname) are ERRORs.
The sole case that I left alone is that ENOENT failure on a tablespace
(not database) directory is not an error, though it will now be logged
rather than just silently ignored. This is to cover the scenario where
a previous DROP TABLESPACE removed the tablespace directory but failed
before removing the pg_tblspc symlink. I'm not sure that that's very
likely in practice, but that seems like the only real excuse for the
old behavior here, so let's allow for it. (As coded, this will also
allow ENOENT on $PGDATA/base/. But since we'll fail soon enough if
that's gone, I don't think we need to complicate this code by
distinguishing that from a true tablespace case.)
Tom Lane [Mon, 4 Dec 2017 23:37:54 +0000 (18:37 -0500)]
Simplify do_pg_start_backup's API by opening pg_tblspc internally.
do_pg_start_backup() expects its callers to pass in an open DIR pointer
for the pg_tblspc directory, but there's no apparent advantage in that.
It complicates the callers without adding any flexibility, and there's no
robustness advantage, since we surely have to be prepared for errors during
the scan of pg_tblspc anyway. In fact, by holding an extra kernel resource
during operations like the preliminary checkpoint, we might be making
things a fraction more failure-prone not less. Hence, remove that argument
and open the directory just for the duration of the actual scan.
Tom Lane [Mon, 4 Dec 2017 22:59:35 +0000 (17:59 -0500)]
Improve error handling in RemovePgTempFiles().
Modify this function and its subsidiaries so that syscall failures are
reported via ereport(LOG), rather than silently ignored as before.
We don't want to throw a hard ERROR, as that would prevent database
startup, and getting rid of leftover temporary files is not important
enough for that. On the other hand, not reporting trouble at all
seems like an odd choice not in line with current project norms,
especially since any failure here is quite unexpected.
On the same reasoning, adjust these functions' AllocateDir/ReadDir calls
so that failure to scan a directory results in LOG not ERROR. I also
removed the previous practice of silently ignoring ENOENT failures during
directory opens --- there are some corner cases where that could happen
given a previous database crash, but that seems like a bad excuse for
ignoring a condition that isn't expected in most cases. A LOG message
during postmaster start seems OK in such situations, and better than
no output at all.
In passing, make RemovePgTempRelationFiles' test for "is the file name
all digits" look more like the way it's done elsewhere.
Tom Lane [Mon, 4 Dec 2017 22:02:52 +0000 (17:02 -0500)]
Clean up assorted messiness around AllocateDir() usage.
This patch fixes a couple of low-probability bugs that could lead to
reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE)
concerning directory-open or file-open failures. It also fixes places
where we took shortcuts in reporting such errors, either by using elog
instead of ereport or by using ereport but forgetting to specify an
errcode. And it eliminates a lot of just plain redundant error-handling
code.
In service of all this, export fd.c's formerly-static function
ReadDirExtended, so that external callers can make use of the coding
pattern
dir = AllocateDir(path);
while ((de = ReadDirExtended(dir, path, LOG)) != NULL)
if they'd like to treat directory-open failures as mere LOG conditions
rather than errors. Also fix FreeDir to be a no-op if we reach it
with dir == NULL, as such a coding pattern would cause.
Then, remove code at many call sites that was throwing an error or log
message for AllocateDir failure, as ReadDir or ReadDirExtended can handle
that job just fine. Aside from being a net code savings, this gets rid of
a lot of not-quite-up-to-snuff reports, as mentioned above. (In some
places these changes result in replacing a custom error message such as
"could not open tablespace directory" with more generic wording "could not
open directory", but it was agreed that the custom wording buys little as
long as we report the directory name.) In some other call sites where we
can't just remove code, change the error reports to be fully
project-style-compliant.
Also reorder code in restoreTwoPhaseData that was acquiring a lock
between AllocateDir and ReadDir; in the unlikely but surely not
impossible case that LWLockAcquire changes errno, AllocateDir failures
would be misreported. There is no great value in opening the directory
before acquiring TwoPhaseStateLock, so just do it in the other order.
Also fix CheckXLogRemoved to guarantee that it preserves errno,
as quite a number of call sites are implicitly assuming. (Again,
it's unlikely but I think not impossible that errno could change
during a SpinLockAcquire. If so, this function was broken for its
own purposes as well as breaking callers.)
And change a few places that were using not-per-project-style messages,
such as "could not read directory" when "could not open directory" is
more correct.
Back-patch the exporting of ReadDirExtended, in case we have occasion
to back-patch some fix that makes use of it; it's not needed right now
but surely making it global is pretty harmless. Also back-patch the
restoreTwoPhaseData and CheckXLogRemoved fixes. The rest of this is
essentially cosmetic and need not get back-patched.
Michael Paquier, with a bit of additional work by me
Tom Lane [Mon, 4 Dec 2017 16:51:43 +0000 (11:51 -0500)]
Support boolean columns in functional-dependency statistics.
There's no good reason that the multicolumn stats stuff shouldn't work on
booleans. But it looked only for "Var = pseudoconstant" clauses, and it
will seldom find those for boolean Vars, since earlier phases of planning
will fold "boolvar = true" or "boolvar = false" to just "boolvar" or
"NOT boolvar" respectively. Improve dependencies_clauselist_selectivity()
to recognize such clauses as equivalent to equality restrictions.
This fixes a failure of the extended stats mechanism to apply in a case
reported by Vitaliy Garnashevich. It's not a complete solution to his
problem because the bitmap-scan costing code isn't consulting extended
stats where it should, but that's surely an independent issue.
In passing, improve some comments, get rid of a NumRelids() test that's
redundant with the preceding bms_membership() test, and fix
dependencies_clauselist_selectivity() so that estimatedclauses actually
is a pure output argument as stated by its API contract.
Robert Haas [Mon, 4 Dec 2017 15:33:09 +0000 (10:33 -0500)]
Remove memory leak protection from Gather and Gather Merge nodes.
Before commit 6b65a7fe62e129d5c2b85cd74d6a91d8f7564608, tqueue.c could
perform tuple remapping and thus leak memory, which is why commit af33039317ddc4a0e38a02e2255c2bf453115fd2 made TupleQueueReaderNext
run in a short-lived context. Now, however, tqueue.c has been reduced
to a shadow of its former self, and there shouldn't be any chance of
leaks any more. Accordingly, remove some tuple copying and memory
context manipulation to speed up processing.
Patch by me, reviewed by Amit Kapila. Some testing by Rafia Sabih.
Tom Lane [Sun, 3 Dec 2017 16:25:17 +0000 (11:25 -0500)]
Fix uninitialized-variable compiler warning induced by commit e4128ee76.
I'm a little bit astonished that anyone's compiler would have failed to
complain about this. The compiler surely does not know that is_procedure
means the function return value will be ignored.
Andres Freund [Sat, 2 Dec 2017 00:30:56 +0000 (16:30 -0800)]
Add infrastructure for sharing temporary files between backends.
SharedFileSet allows temporary files to be created by one backend and
then exported for read-only access by other backends, with clean-up
managed by reference counting associated with a DSM segment. This
includes changes to fd.c and buffile.c to support the new kind of
temporary file.
This will be used by an upcoming patch adding support for parallel
hash joins.
Author: Thomas Munro Reviewed-By: Peter Geoghegan, Andres Freund, Robert Haas, Rushabh Lathia
Discussion:
https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
https://postgr.es/m/CAH2-WznJ_UgLux=_jTgCQ4yFz0iBntudsNKa1we3kN1BAG=88w@mail.gmail.com
Robert Haas [Fri, 1 Dec 2017 18:47:00 +0000 (13:47 -0500)]
postgres_fdw: Fix test that didn't test what it claimed.
Antonin Houska reported that the planner does consider pushing
postgres_fdw_abs() to the remote side, which happens because we make
it shippable earlier in the test case file.
Jeevan Chalke provided this patch, which changes the join
condition to use random(), which is not shippable, instead.
Antonin reviewed the patch.
pg_basebackup: Fix progress messages when writing to a file
The progress messages print out \r to keep overwriting the same line on
the screen. But this does not yield useful results when writing the
output to a file. So in that case, print out \n instead.
Author: Martín Marqués <martin@2ndquadrant.com> Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Robert Haas [Thu, 30 Nov 2017 21:22:38 +0000 (16:22 -0500)]
Remove extra word from comment.
David Rowley, who also was the primary author of the patch that
added this function; the attribution in my previous commit, 84940644de931f331433b35e3a391822671f8c9c, was incorrect due to
sloppiness on my part.
Peter Eisentraut [Thu, 30 Nov 2017 13:46:13 +0000 (08:46 -0500)]
SQL procedures
This adds a new object type "procedure" that is similar to a function
but does not have a return type and is invoked by the new CALL statement
instead of SELECT or similar. This implementation is aligned with the
SQL standard and compatible with or similar to other SQL implementations.
This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well
as ALTER/DROP ROUTINE that can refer to either a function or a
procedure (or an aggregate function, as an extension to SQL). There is
also support for procedures in various utility commands such as COMMENT
and GRANT, as well as support in pg_dump and psql. Support for defining
procedures is available in all the languages supplied by the core
distribution.
While this commit is mainly syntax sugar around existing functionality,
future features will rely on having procedures as a separate object
type.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Robert Haas [Thu, 30 Nov 2017 14:50:10 +0000 (09:50 -0500)]
Make create_unique_path manage memory like mark_dummy_rel.
Put the unique path in the same context as the owning RelOptInfo, rather
than the toplevel planner context. This is how this function worked
originally, but commit f41803bb39bc2949db200116a609fd242d0ec221
changed it without explanation. mark_dummy_rel adopted the older (or
newer?) technique in commit eca75a12a27d28b972fc269c1c8813cd8eb15441,
which also featured a much better explanation of why it is correct.
So, switch back to that technique here, with the same explanation
given there.
Although this fixes a possible memory leak when GEQO is in use, the
leak is minor and probably nobody cares, so no back-patch.
Noah Misch [Thu, 30 Nov 2017 08:57:22 +0000 (00:57 -0800)]
Fix non-GNU makefiles for AIX make.
Invoking the Makefile without an explicit target was building every
possible target instead of just the "all" target. Back-patch to 9.3
(all supported versions).
Tom Lane [Thu, 30 Nov 2017 03:00:29 +0000 (22:00 -0500)]
Fix neqjoinsel's behavior for semi/anti join cases.
Previously, this function estimated the selectivity as 1 minus eqjoinsel()
for the negator equality operator, regardless of join type (I think there
was an expectation that eqjoinsel would handle the join type). But
actually this is completely wrong for semijoin cases: the fraction of the
LHS that has a non-matching row is not one minus the fraction of the LHS
that has a matching row. In reality a semijoin with <> will nearly always
succeed: it can only fail when the RHS is empty, or it contains a single
distinct value that is equal to the particular LHS value, or the LHS value
is null. The only one of those things we should have much confidence in
estimating is the fraction of LHS values that are null, so let's just take
the selectivity as 1 minus outer nullfrac.
Per coding convention, antijoin should be estimated the same as semijoin.
Arguably this is a bug fix, but in view of the lack of field complaints
and the risk of destabilizing plans, no back-patch.
Andres Freund [Thu, 30 Nov 2017 01:07:16 +0000 (17:07 -0800)]
Add a barrier primitive for synchronizing backends.
Provide support for dynamic or static parties of processes to wait for
all processes to reach point in the code before continuing.
This is similar to the mechanism of the same name in POSIX threads and
MPI, though has explicit phasing and dynamic party support like the
Java core library's Phaser.
This will be used by an upcoming patch adding support for parallel
hash joins.
Author: Thomas Munro Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=2_y7oi01OjA_wLvYcWMc9_d=LaoxrY3eiROCZkB_qakA@mail.gmail.com
Andres Freund [Thu, 30 Nov 2017 00:06:50 +0000 (16:06 -0800)]
Add some regression tests that exercise hash join code.
Although hash joins are already tested by many queries, these tests
systematically cover the four different states we can reach as part of
the strategy for respecting work_mem.
Robert Haas [Wed, 29 Nov 2017 22:06:14 +0000 (17:06 -0500)]
New C function: bms_add_range
This will be used by pending patches to improve partition pruning.
Amit Langote and Kyotaro Horiguchi, per a suggestion from David
Rowley. Review and testing of the larger patch set of which this is a
part by Ashutosh Bapat, David Rowley, Dilip Kumar, Jesper Pedersen,
Rajkumar Raghuwanshi, Beena Emerson, Amul Sul, and Kyotaro Horiguchi.
Robert Haas [Wed, 29 Nov 2017 20:19:07 +0000 (15:19 -0500)]
Add extensive tests for partition pruning.
Currently, partition pruning happens via constraint exclusion, but
there are pending places to replace that with a different and
hopefully faster mechanism. To be sure that we don't change behavior
without realizing it, add extensive test coverage.
Note that not all of these behaviors are optimal; in some cases,
partitions are not pruned even though it would be safe to do so.
These tests therefore serve to memorialize the current state rather
than the ideal state. Patches that improve things can update the test
results as appropriate.
Amit Langote, adjusted by me. Review and testing of the larger patch
set of which this is a part by Ashutosh Bapat, David Rowley, Dilip
Kumar, Jesper Pedersen, Rajkumar Raghuwanshi, Beena Emerson, Amul Sul,
and Kyotaro Horiguchi.
Robert Haas [Tue, 28 Nov 2017 17:05:52 +0000 (12:05 -0500)]
Fix ReinitializeParallelDSM to tolerate finding no error queues.
Commit d4663350646ca0c069a36d906155a0f7e3372eb7 changed things so
that shm_toc_lookup would fail with an error rather than silently
returning NULL in the hope that such failures would be reported
in a useful way rather than via a system crash. However, it
overlooked the fact that the lookup of PARALLEL_KEY_ERROR_QUEUE
in ReinitializeParallelDSM is expected to fail when no DSM segment
was created in the first place; in that case, we end up with a
backend-private memory segment that still contains an entry for
PARALLEL_KEY_FIXED but no others. Consequently a benign failure
to initialize parallelism can escalate into an elog(ERROR);
repair.
Robert Haas [Tue, 28 Nov 2017 16:39:16 +0000 (11:39 -0500)]
Teach bitmap heap scan to cope with absence of a DSA.
If we have a plan that uses parallelism but are unable to execute it
using parallelism, for example due to a lack of available DSM
segments, then the EState's es_query_dsa will be NULL. Parallel
bitmap heap scan needs to fall back to a non-parallel scan in such
cases.
Peter Eisentraut [Tue, 28 Nov 2017 16:28:05 +0000 (11:28 -0500)]
PL/Python: Fix potential NULL pointer dereference
After d0aa965c0a0ac2ff7906ae1b1dad50a7952efa56, one error path in
PLy_spi_execute_fetch_result() could result in the variable "result"
being dereferenced after being set to NULL. To fix that, just clear the
resources right there and return early.
Also add another SPI_freetuptable() call so that that is cleared in all
error paths.
discovered by John Naylor <jcnaylor@gmail.com> via scan-build
Robert Haas [Tue, 28 Nov 2017 15:51:01 +0000 (10:51 -0500)]
Add null test to partition constraint for default range partitions.
Non-default range partitions have a constraint which include null
tests, and both default and non-default list partitions also have a
constraint which includes null tests, but for some reason this was
missed for default range partitions. This could cause the partition
constraint to evaluate to false for rows that were (correctly) routed
to that partition by insert tuple routing, which could in turn
cause constraint exclusion to prune the default partition in cases
where it should not.
Tom Lane [Tue, 28 Nov 2017 01:56:46 +0000 (20:56 -0500)]
Mark some more functions as pg_attribute_noreturn().
Doing this suppresses Coverity warnings and might allow improved
code in some cases. The prospects of that are not so bright as
to warrant back-patching, though.
Tom Lane [Tue, 28 Nov 2017 00:22:08 +0000 (19:22 -0500)]
Fix assorted syscache lookup sloppiness in partition-related code.
heap_drop_with_catalog and ATExecDetachPartition neglected to check for
SearchSysCache failures, as noted in bugs #14927 and #14928 from Pan Bian.
Such failures are pretty unlikely, since we should already have some sort
of lock on the rel at these points, but it's neither a good idea nor
per project style to omit a check for failure.
Also, StorePartitionKey contained a syscache lookup that it never did
anything with, including never releasing the result. Presumably the
reason why we don't see refcount-leak complaints is that the lookup
always fails; but in any case it's pretty useless, so remove it.
All of these errors were evidently introduced by the relation
partitioning feature. Back-patch to v10 where that came in.
Tom Lane [Mon, 27 Nov 2017 22:53:56 +0000 (17:53 -0500)]
Fix creation of resjunk tlist entries for inherited mixed UPDATE/DELETE.
rewriteTargetListUD's processing is dependent on the relkind of the query's
target table. That was fine at the time it was made to act that way, even
for queries on inheritance trees, because all tables in an inheritance tree
would necessarily be plain tables. However, the 9.5 feature addition
allowing some members of an inheritance tree to be foreign tables broke the
assumption that rewriteTargetListUD's output tlist could be applied to all
child tables with nothing more than column-number mapping. This led to
visible failures if foreign child tables had row-level triggers, and would
also break in cases where child tables belonged to FDWs that used methods
other than CTID for row identification.
To fix, delay running rewriteTargetListUD until after the planner has
expanded inheritance, so that it is applied separately to the (already
mapped) tlist for each child table. We can conveniently call it from
preprocess_targetlist. Refactor associated code slightly to avoid the
need to heap_open the target relation multiple times during
preprocess_targetlist. (The APIs remain a bit ugly, particularly around
the point of which steps scribble on parse->targetList and which don't.
But avoiding such scribbling would require a change in FDW callback APIs,
which is more pain than it's worth.)
Also fix ExecModifyTable to ensure that "tupleid" is reset to NULL when
we transition from rows providing a CTID to rows that don't. (That's
really an independent bug, but it manifests in much the same cases.)
Add a regression test checking one manifestation of this problem, which
was that row-level triggers on a foreign child table did not work right.
Back-patch to 9.5 where the problem was introduced.
Etsuro Fujita, reviewed by Ildus Kurbangaliev and Ashutosh Bapat
Simon Riggs [Mon, 27 Nov 2017 09:34:05 +0000 (09:34 +0000)]
Pad XLogReaderState's per-buffer data_bufsz more aggressively.
Originally, we palloc'd this buffer just barely big enough to hold the
largest xlog backup block seen so far. We now MAXALIGN the palloc size.
The original coding could result in many repeated palloc cycles, in the
worst case where we see a series ofgradually larger xlog records. We
ameliorate that similarly to 8735978e7aebfbc499843630131c18d1f7346c79
by imposing a minimum buffer size of BLCKSZ.
Tom Lane [Sun, 26 Nov 2017 20:17:24 +0000 (15:17 -0500)]
Pad XLogReaderState's main_data buffer more aggressively.
Originally, we palloc'd this buffer just barely big enough to hold the
largest xlog record seen so far. It turns out that that can result in
valgrind complaints, because some compilers will emit code that assumes
it can safely fetch padding bytes at the end of a struct, and those
padding bytes were unallocated so far as aset.c was concerned. We can
fix that by MAXALIGN'ing the palloc request size, ensuring that it is big
enough to include any possible padding that might've been omitted from
the on-disk record.
An additional objection to the original coding is that it could result in
many repeated palloc cycles, in the worst case where we see a series of
gradually larger xlog records. We can ameliorate that cheaply by
imposing a minimum buffer size that's large enough for most xlog records.
BLCKSZ/2 was chosen after a bit of discussion.
In passing, remove an obsolete comment in struct xl_heap_new_cid that the
combocid field is free due to alignment considerations. Perhaps that was
true at some point, but it's not now.
Joe Conway [Sun, 26 Nov 2017 17:49:40 +0000 (09:49 -0800)]
Make has_sequence_privilege support WITH GRANT OPTION
The various has_*_privilege() functions all support an optional
WITH GRANT OPTION added to the supported privilege types to test
whether the privilege is held with grant option. That is, all except
has_sequence_privilege() variations. Fix that.
Tom Lane [Sat, 25 Nov 2017 20:30:11 +0000 (15:30 -0500)]
Replace raw timezone source data with IANA's new compact format.
Traditionally IANA has distributed their timezone data in pure source
form, replete with extensive historical comments. As of release 2017c,
they've added a compact single-file format that omits comments and
abbreviates command keywords. This form is way shorter than the pure
source, even before considering its allegedly better compressibility.
Hence, let's distribute the data in that form rather than pure source.
I'm pushing this now, rather than at the next timezone database update,
so that it's easy to confirm that this data file produces compiled zic
output that's identical to what we were getting before.
Tom Lane [Sat, 25 Nov 2017 19:42:10 +0000 (14:42 -0500)]
Avoid formally-undefined use of memcpy() in hstoreUniquePairs().
hstoreUniquePairs() often called memcpy with equal source and destination
pointers. Although this is almost surely harmless in practice, it's
undefined according to the letter of the C standard. Some versions of
valgrind will complain about it, and some versions of libc as well
(cf. commit ad520ec4a). Tweak the code to avoid doing that.
Noted by Tomas Vondra. Back-patch to all supported versions because
of the hazard of libc assertions.
Tom Lane [Sat, 25 Nov 2017 19:15:48 +0000 (14:15 -0500)]
Repair failure with SubPlans in multi-row VALUES lists.
When nodeValuesscan.c was written, it was impossible to have a SubPlan in
VALUES --- any sub-SELECT there would have to be uncorrelated and thereby
would produce an InitPlan instead. We therefore took a shortcut in the
logic that throws away a ValuesScan's per-row expression evaluation data
structures. This was broken by the introduction of LATERAL however; a
sub-SELECT containing a lateral reference produces a correlated SubPlan.
The cleanest fix for this would be to give up the optimization of
discarding the expression eval state. But that still seems pretty
unappetizing for long VALUES lists. It seems to work to just prevent
the subexpressions from hooking into the ValuesScan node's subPlan
list, so let's do that and see how well it works. (If this breaks,
due to additional connections between the subexpressions and the outer
query structures, we might consider compromises like throwing away data
only for VALUES rows not containing SubPlans.)
Per bug #14924 from Christian Duta. Back-patch to 9.3 where LATERAL
was introduced.
Tom Lane [Sat, 25 Nov 2017 18:19:43 +0000 (13:19 -0500)]
Update buffile.h/.c comments for removal of non-temp option.
Commit 11e264517 removed BufFile's isTemp flag, thereby eliminating
the possibility of resurrecting BufFileCreate(). But it left that
function in place, as well as a bunch of comments describing how things
worked for the non-temp-file case. At best, that's now a source of
confusion. So remove the long-since-commented-out function and change
relevant comments.
I (tgl) wanted to rename BufFileCreateTemp() to BufFileCreate(), but
that seems not to be the consensus position, so leave it as-is.
In passing, fix commit f0828b2fc's failure to update BufFileSeek's
comment to match the change of its argument type from long to off_t.
(I think that might actually have been intentional at the time, but
now that 64-bit off_t is nearly universal, it looks anachronistic.)
Tom Lane [Sat, 25 Nov 2017 16:48:09 +0000 (11:48 -0500)]
Improve planner's handling of set-returning functions in grouping columns.
Improve query_is_distinct_for() to accept SRFs in the targetlist when
we can prove distinctness from a DISTINCT clause. In that case the
de-duplication will surely happen after SRF expansion, so the proof
still works. Continue to punt in the case where we'd try to prove
distinctness from GROUP BY (or, in the future, source relations).
To do that, we'd have to determine whether the SRFs were in the
grouping columns or elsewhere in the tlist, and it still doesn't
seem worth the trouble. But this trivial change allows us to
recognize that "SELECT DISTINCT unnest(foo) FROM ..." produces
unique-ified output, which seems worth having.
Also, fix estimate_num_groups() to consider the possibility of SRFs in
the grouping columns. Its failure to do so was masked before v10 because
grouping_planner() scaled up plan rowcount estimates by the estimated SRF
multiplier after performing grouping. That doesn't happen anymore, which
is more correct, but it means we need an adjustment in the estimate for
the number of groups. Failure to do this leads to an underestimate for
the number of output rows of subqueries like "SELECT DISTINCT unnest(foo)"
compared to what 9.6 and earlier estimated, thus breaking plan choices
in some cases.
Per report from Dmitry Shalashov. Back-patch to v10 to avoid degraded
plan choices compared to previous releases.
Robert Haas [Sat, 25 Nov 2017 15:49:17 +0000 (10:49 -0500)]
Avoid projecting tuples unnecessarily in Gather and Gather Merge.
It's most often the case that the target list for the Gather (Merge)
node matches the target list supplied by the underlying plan node;
when this is so, we can avoid the overhead of projecting.
Tom Lane [Sat, 25 Nov 2017 00:28:19 +0000 (19:28 -0500)]
Improve valgrind logic in aset.c, and fix multiple issues in generation.c.
Revise aset.c so that all the "private" fields of chunk headers are
marked NOACCESS when outside the module, improving on the previous
coding which protected only requested_size. Fix a couple of corner
case bugs, such as failing to re-protect the header during a failure
exit from AllocSetRealloc, and wrong padding-size calculation for an
oversize allocation request.
Apply the same design to generation.c, and also fix several bugs therein
that I found by dint of hacking the code to use generation.c as the
standard allocator and then running the core regression tests with it.
Notably, we have to track the actual size of each block, else the
wipe_mem call in GenerationReset clears the wrong amount of memory for
an oversize-chunk block; and GenerationCheck needs a way of identifying
freed chunks that isn't fooled by palloc(0). I chose to fix the latter
by resetting the context pointer to NULL in a freed chunk, roughly like
what happens in a freed aset.c chunk.
Tom Lane [Fri, 24 Nov 2017 20:50:22 +0000 (15:50 -0500)]
Mostly-cosmetic improvements in memory chunk header alignment coding.
Add commentary about what we're doing and why. Apply the method used for
padding in GenerationChunk to AllocChunkData, replacing the rather ad-hoc
solution used in commit 7e3aa03b4. Reorder fields in GenerationChunk so
that the padding calculation will work even if sizeof(size_t) is different
from sizeof(void *) --- likely that will never happen, but we don't need
the assumption if we do it like this. Improve static assertions about
alignment.
In passing, fix a couple of oversights in the "large chunk" path in
GenerationAlloc().
Dean Rasheed [Fri, 24 Nov 2017 14:14:40 +0000 (14:14 +0000)]
RLS comment fixes.
The comments in get_policies_for_relation() say that CREATE POLICY
does not support defining restrictive policies. This is no longer
true, starting from PG10.
Dean Rasheed [Fri, 24 Nov 2017 12:01:18 +0000 (12:01 +0000)]
Doc: add a summary table to the CREATE POLICY docs.
This table summarizes which RLS policy expressions apply to each
command type, and whether they apply to the old or new tuples (or
both), which saves reading through a lot of text.
Rod Taylor, hacked on by me. Reviewed by Fabien Coelho.
Tom Lane [Fri, 24 Nov 2017 05:29:20 +0000 (00:29 -0500)]
Fix unstable regression test added by commits 59b71c6fe et al.
The query didn't really have a preferred index, leading to platform-
specific choices of which one to use. Adjust it to make sure tenk1_hundred
is always chosen.
Noah Misch [Fri, 24 Nov 2017 04:22:04 +0000 (20:22 -0800)]
Support linking with MinGW-built Perl.
This is necessary for ActivePerl 5.18 onwards and for Strawberry Perl.
It is not sufficient for 32-bit builds with newer Visual Studio; these
fail with error LINK2026. Back-patch to 9.3 (all supported versions).
Andres Freund [Fri, 24 Nov 2017 01:13:09 +0000 (17:13 -0800)]
Fix handling of NULLs returned by aggregate combine functions.
When strict aggregate combine functions, used in multi-stage/parallel
aggregation, returned NULL, we didn't check for that, invoking the
combine function with NULL the next round, despite it being strict.
The equivalent code invoking normal transition functions has a check
for that situation, which did not get copied in a7de3dc5c346. Fix the
bug by adding the equivalent check.
Based on a quick look I could not find any strict combine functions in
core actually returning NULL, and it doesn't seem very likely external
users have done so. So this isn't likely to have caused issues in
practice.
Reported-By: Andres Freund
Author: Andres Freund
Discussion: https://postgr.es/m/20171121033642.7xvmjqrl4jdaaat3@alap3.anarazel.de
Backpatch: 9.6, where parallel aggregation was introduced
Peter Eisentraut [Thu, 23 Nov 2017 14:39:47 +0000 (09:39 -0500)]
Convert documentation to DocBook XML
Since some preparation work had already been done, the only source
changes left were changing empty-element tags like <xref linkend="foo">
to <xref linkend="foo"/>, and changing the DOCTYPE.
The source files are still named *.sgml, but they are actually XML files
now. Renaming could be considered later.
In the build system, the intermediate step to convert from SGML to XML
is removed. Everything is build straight from the source files again.
The OpenSP (or the old SP) package is no longer needed.
The documentation toolchain instructions are updated and are much
simpler now.
Fujii Masao [Thu, 23 Nov 2017 07:46:42 +0000 (16:46 +0900)]
doc: mention wal_receiver_status_interval as GUC affecting logical rep worker.
wal_receiver_timeout, wal_receiver_status_interval and
wal_retrieve_retry_interval configuration parameters affect the logical rep
worker, but previously only wal_receiver_status_interval was not mentioned
as such parameter in the doc.
Noah Misch [Thu, 23 Nov 2017 04:18:15 +0000 (20:18 -0800)]
Build src/test/isolation during "make" and "make install".
This hack closes a race condition in "make -j check-world" and "make -j
installcheck-world". Back-patch to v10, before which these parallel
invocations had worse problems.
Simon Riggs [Wed, 22 Nov 2017 18:45:07 +0000 (05:45 +1100)]
Generational memory allocator
Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).
Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.
Tom Lane [Tue, 21 Nov 2017 22:30:48 +0000 (17:30 -0500)]
pgbench: fix stats reporting when some transactions are skipped.
pgbench can skip some transactions when both -R and -L options are used.
Previously, this resulted in slightly silly statistics both in progress
reports and final output, because the skipped transactions were counted
as executed for TPS and related stats. Discount skipped xacts in TPS
numbers, and also when figuring the percentage of xacts exceeding the
latency limit.
Also, don't print per-script skipped-transaction counts when there is
only one script. That's redundant with the overall count, and it's
inconsistent with the fact that we don't print other per-script stats
when there's only one script. Clean up some unnecessary interactions
between what should be independent options that were due to that
decision.
While at it, avoid division-by-zero in cases where no transactions were
executed. While on modern platforms this would generally result in
printing "NaN" rather than a crash, that isn't spelled consistently
across platforms and it would confuse many people. Skip the relevant
output entirely when practical, else print zeroes.
Fabien Coelho, reviewed by Steve Singer, additional hacking by me
Robert Haas [Tue, 21 Nov 2017 18:56:24 +0000 (13:56 -0500)]
Provide for forward compatibility with future minor protocol versions.
Previously, any attempt to request a 3.x protocol version other than
3.0 would lead to a hard connection failure, which made the minor
protocol version really no different from the major protocol version
and precluded gentle protocol version breaks. Instead, when the
client requests a 3.x protocol version where x is greater than 0, send
the new NegotiateProtocolVersion message to convey that we support
only 3.0. This makes it possible to introduce new minor protocol
versions without requiring a connection retry when the server is
older.
In addition, if the startup packet includes name/value pairs where
the name starts with "_pq_.", assume that those are protocol options,
not GUCs. Include those we don't support (i.e. all of them, at
present) in the NegotiateProtocolVersion message so that the client
knows they were not understood. This makes it possible for the
client to request previously-unsupported features without bumping
the protocol version at all; the client can tell from the server's
response whether the option was understood.
It will take some time before servers that support these new
facilities become common in the wild; to speed things up and make
things easier for a future 3.1 protocol version, back-patch to all
supported releases.
Robert Haas [Tue, 21 Nov 2017 18:06:32 +0000 (13:06 -0500)]
Fix multiple problems with satisfies_hash_partition.
Fix the function header comment to describe the actual behavior.
Check that table OID, modulus, and remainder arguments are not NULL
before accessing them. Check that the modulus and remainder are
sensible. If the table OID doesn't exist, return NULL instead of
emitting an internal error, similar to what we do elsewhere. Check
that the actual argument types match, or at least are binary coercible
to, the expected argument types. Correctly handle invocation of this
function using the VARIADIC syntax. Add regression tests.
Robert Haas and Amul Sul, per a report by Andreas Seltenreich and
subsequent followup investigation.
Tom Lane [Tue, 21 Nov 2017 01:25:18 +0000 (20:25 -0500)]
Support index-only scans in contrib/cube and contrib/seg GiST indexes.
To do this, we only have to remove the compress and decompress support
functions, which have never done anything more than detoasting.
In the wake of commit d3a4f89d8, this results in automatically enabling
index-only scans, since the core code will now know that the stored
representation is the same as the original data (up to detoasting).
The only exciting part of this is that ALTER OPERATOR FAMILY lacks
a way to drop a support function that was declared as being part of
an opclass rather than being loose in the family. For the moment,
we'll hack our way to a solution with a manual update of the pg_depend
entry type, which is what distinguishes the two cases. Perhaps
someday it'll be worth providing a cleaner way to do that, but for
now it seems like a very niche problem.
Note that the underlying C functions remain, to support use of the shared
libraries with older versions of the modules' SQL declarations. Someday
we may be able to remove them, but not soon.
Tom Lane [Mon, 20 Nov 2017 22:57:46 +0000 (17:57 -0500)]
Add support for Motorola 88K to s_lock.h.
Apparently there are still people out there who care about this old
architecture. They probably care about dusty versions of Postgres
too, so back-patch to all supported branches.
David Carlier (from a patch being carried by OpenBSD packagers)
Robert Haas [Mon, 20 Nov 2017 17:34:06 +0000 (12:34 -0500)]
Tweak use of ExecContextForcesOids by Gather (Merge).
Specifically, pass the outer plan's PlanState instead of our own
PlanState. At present, ExecContextForcesOids doesn't actually care
which PlanState we pass; it just looks through to the underlying
EState to find the result relation or top-level eflags. However, in
the future it might care. If that happens, and if our goal is to get
a tuple descriptor that matches that of the outer plan, then I think
what we care about is whether the outer plan's context forces OIDs,
rather than whether our own context forces OIDs, just as we use the
outer node's target list rather than our own.
Robert Haas [Mon, 20 Nov 2017 17:00:33 +0000 (12:00 -0500)]
Pass eflags down to parallel workers.
Currently, there are no known consequences of this oversight, so no
back-patch. Several of the EXEC_FLAG_* constants aren't usable in
parallel mode anyway, and potential problems related to the presence
or absence of OIDs (see EXEC_FLAG_WITH_OIDS, EXEC_FLAG_WITHOUT_OIDS)
seem at present to be masked by the unconditional projection step
performed by Gather and Gather Merge. In general, however, it seems
important that all participants agree on the values of these flags,
which modify executor behavior globally, and a pending patch to skip
projection in Gather (Merge) would be outright broken in certain cases
without this fix.
Patch by me, based on investigation of a test case provided by Amit
Kapila. This patch was also reviewed by Amit Kapila.
Tom Lane [Sat, 18 Nov 2017 21:46:29 +0000 (16:46 -0500)]
Fix compiler warning in rangetypes_spgist.c.
On gcc 7.2.0, comparing pointer to (Datum) 0 produces a warning.
Treat it as a simple pointer to avoid that; this is more consistent
with comparable code elsewhere, anyway.