Tomas Vondra [Sun, 29 Jul 2018 21:08:00 +0000 (23:08 +0200)]
Mark variable used only in assertion with PG_USED_FOR_ASSERTS_ONLY
Perpendicular lines always intersect, so the line_interpt_line() return
value in line_closept_point() was used only in an assertion, triggering
compiler warnings in non-assert builds.
Tomas Vondra [Sun, 29 Jul 2018 18:35:18 +0000 (20:35 +0200)]
Restore handling of -0 in the C field of lines in line_construct().
Commit a7dc63d904 inadvertedly removed this bit originally introduced
by 43fe90f66a, causing regression test failures on some platforms,
due to producing {1,-1,-0} instead of {1,-1,0}.
Affected test queries have been testing the wrong thing since their
introduction in commit 4c1383efd132e4f532213c8a8cc63a455f55e344.
Back-patch to 9.3 (all supported versions).
Michael Paquier [Sun, 29 Jul 2018 14:50:35 +0000 (23:50 +0900)]
Make error message of pageinspect more consistent for raw page inputs
There is a copy-paste error from bt_page_items() which got into
bt_page_items_bytea(). A second message in get_raw_page_internal() was
inconsistent with all the other sub-modules.
Michael Paquier [Sun, 29 Jul 2018 13:00:42 +0000 (22:00 +0900)]
Fix two oversights from 9ebe0572 which refactored cluster_rel
The recheck option became a no-op as ClusterOption failed to set proper
values for each element. There was a second code path where local
options got overwritten.
Document security implications of qualified names.
Commit 5770172cb0c9df9e6ce27c507b449557e5b45124 documented secure schema
usage, and that advice suffices for using unqualified names securely.
Document, in typeconv-func primarily, the additional issues that arise
with qualified names. Back-patch to 9.3 (all supported versions).
Tomas Vondra [Sun, 29 Jul 2018 01:30:48 +0000 (03:30 +0200)]
Provide separate header file for built-in float types
Some data types under adt/ have separate header files, but most simple
ones do not, and their public functions are defined in builtins.h. As
the patches improving geometric types will require making additional
functions public, this seems like a good opportunity to create a header
for floats types.
Commit 1acf757255 made _cmp functions public to solve NaN issues locally
for GiST indexes. This patch reworks it in favour of a more widely
applicable API. The API uses inline functions, as they are easier to
use compared to macros, and avoid double-evaluation hazards.
Tomas Vondra [Sun, 29 Jul 2018 00:02:48 +0000 (02:02 +0200)]
Refactor geometric functions and operators
The primary goal of this patch is to eliminate duplicate code and share
code between different geometric data types more often, to prepare the
ground for additional patches. Until now the code reuse was limited,
probably because the simpler types (line and point) were implemented
after the more complex ones.
The changes are quite extensive and can be summarised as:
* Eliminate SQL-level function calls.
* Re-use more functions to implement others.
* Unify internal function names and signatures.
* Remove private functions from geo_decls.h.
* Replace should-not-happen checks with assertions.
* Add comments describe for various functions.
* Remove some unreachable code.
* Define delimiter symbols of line datatype like the other ones.
* Remove the GEODEBUG macro and printf() calls.
* Unify code style of a few oddly formatted lines.
While the goal was to cause minimal user-visible changes, it was not
possible to keep the original behavior in all cases - for example when
handling NaN values, or when reusing code makes the functions return
consistent results.
Author: Emre Hasegeli Reviewed-by: Kyotaro Horiguchi, me
Discussion: https://www.postgresql.org/message-id/CAE2gYzxF7-5djV6-cEvqQu-fNsnt%3DEqbOURx7ZDg%2BVv6ZMTWbg%40mail.gmail.com
Michael Paquier [Sat, 28 Jul 2018 22:53:11 +0000 (07:53 +0900)]
Add verbosity to pg_basebackup for sync
This is useful to know when the data copy has been finished. The
current situation can be confusing for users as the last message is
"waiting for background process to finish streaming", so it looks like
this is taking time but the final sync is instead.
Author: Jeff Janes
Discussion: https://postgr.es/m/CAMkU=1ypeoMJ=tFBG8vP13sxEtXd4Pm_x1SqsJdW_RvzpcvN=A@mail.gmail.com
Bruce Momjian [Sat, 28 Jul 2018 19:01:55 +0000 (15:01 -0400)]
pg_upgrade: check for clean server shutdowns
Previously pg_upgrade checked for the pid file and started/stopped the
server to force a clean shutdown. However, "pg_ctl -m immediate"
removes the pid file but doesn't do a clean shutdown, so check
pg_controldata for a clean shutdown too.
Diagnosed-by: Vimalraj A
Discussion: https://postgr.es/m/CAFKBAK5e4Q-oTUuPPJ56EU_d2Rzodq6GWKS3ncAk3xo7hAsOZg@mail.gmail.com
Reduce path length for locking leaf B-tree pages during insertion
In our B-tree implementation appropriate leaf page for new tuple
insertion is acquired using _bt_search() function. This function always
returns leaf page locked in shared mode. In order to obtain exclusive
lock, caller have to relock the page.
This commit makes _bt_search() function lock leaf page immediately in
exclusive mode when needed. That removes unnecessary relock and, in
turn reduces lock contention for B-tree leaf pages. Our experiments
on multi-core systems showed acceleration up to 4.5 times in corner
case.
Discussion: https://postgr.es/m/CAPpHfduAMDFMNYTCN7VMBsFg_hsf0GqiqXnt%2BbSeaJworwFoig%40mail.gmail.com
Author: Alexander Korotkov Reviewed-by: Yoshikazu Imai, Simon Riggs, Peter Geoghegan
Robert Haas [Fri, 27 Jul 2018 13:34:57 +0000 (09:34 -0400)]
Use key and partdesc from PartitionDispatch where possible.
Instead of repeatedly fishing the data out of the relcache entry,
let's use the version that we cached in the PartitionDispatch. We
could alternatively rip out the PartitionDispatch fields altogether,
but it doesn't make much sense to have them and not use them; before
this patch, partdesc was set but altogether unused. Amit Langote and
I both thought using them was a litle better than removing them, so
this patch takes that approach.
Amit Kapila [Fri, 27 Jul 2018 05:23:00 +0000 (10:53 +0530)]
Fix the buffer release order for parallel index scans.
During parallel index scans, if the current page to be read is deleted, we
skip it and try to get the next page for a scan without releasing the buffer
lock on the current page. To get the next page, sometimes it needs to wait
for another process to complete its scan and advance it to the next page.
Now, it is quite possible that the master backend has errored out before
advancing the scan and issued a termination signal for all workers. The
workers failed to notice the termination request during wait because the
interrupts are held due to buffer lock on the previous page. This lead to
all workers being stuck.
The fix is to release the buffer lock on current page before trying to get
the next page. We are already doing same in backward scans, but missed
it for forward scans.
Reported-by: Victor Yegorov
Bug: 15290 Diagnosed-by: Thomas Munro and Amit Kapila
Author: Amit Kapila Reviewed-by: Thomas Munro Tested-By: Thomas Munro and Victor Yegorov
Backpatch-through: 10 where parallel index scans were introduced
Discussion: https://postgr.es/m/153228422922.1395.1746424054206154747@wrigleys.postgresql.org
Michael Paquier [Fri, 27 Jul 2018 04:40:24 +0000 (13:40 +0900)]
Fix handling of pgbench's hash when no argument is provided
Depending on the platform used, this can cause a crash in the worst
case, or an unhelpful error message, so fail gracefully.
Author: Fabien Coelho
Discussion: https://postgr.es/m/alpine.DEB.2.21.1807262302550.29874@lancre
Backpatch: 11-, where hash() has been added in pgbench.
Tom Lane [Thu, 26 Jul 2018 22:18:37 +0000 (18:18 -0400)]
Provide plpgsql tests for cases involving record field changes.
We suppressed one of these test cases in commit feb1cc559 because
it was failing to produce the expected results on CLOBBER_CACHE_ALWAYS
buildfarm members. But now we need another test with similar behavior,
so let's set up a test file that is expected to vary between regular and
CLOBBER_CACHE_ALWAYS cases, and provide variant expected files.
Someday we should fix plpgsql's failure for change-of-field-type, and
then the discrepancy will go away and we can fold these tests back
into plpgsql_record.sql. But today is not that day.
Tom Lane [Thu, 26 Jul 2018 20:08:45 +0000 (16:08 -0400)]
Avoid crash in eval_const_expressions if a Param's type changes.
Since commit 6719b238e it's been possible for the values of plpgsql
record field variables to be exposed to the planner as Params.
(Before that, plpgsql never supplied values for such variables during
planning, so that the problematic code wasn't reached.) Other places
that touch potentially-type-mutable Params either cope gracefully or
do runtime-test-and-ereport checks that the type is what they expect.
But eval_const_expressions() just had an Assert, meaning that it either
failed the assertion or risked crashes due to using an incompatible
value.
In this case, rather than throwing an ereport immediately, we can just
not perform a const-substitution in case of a mismatch. This seems
important for the same reason that the Param fetch was speculative:
we might not actually reach this part of the expression at runtime.
Test case will follow in a separate commit.
Patch by me, pursuant to bug report from Andrew Gierth.
Back-patch to v11 where the previous commit appeared.
Andres Freund [Wed, 25 Jul 2018 23:31:49 +0000 (16:31 -0700)]
LLVMJIT: Release JIT context after running ExprContext shutdown callbacks.
Due to inlining it previously was possible that an ExprContext's
shutdown callback pointed to a JITed function. As the JIT context
previously was shut down before the shutdown callbacks were called,
that could lead to segfaults. Fix the ordering.
Reported-By: Dmitry Dolgov
Author: Andres Freund
Discussion: https://postgr.es/m/CA+q6zcWO7CeAJtHBxgcHn_hj+PenM=tvG0RJ93X1uEJ86+76Ug@mail.gmail.com
Backpatch: 11-, where JIT compilation was added
Tomas Vondra [Tue, 24 Jul 2018 23:09:03 +0000 (01:09 +0200)]
Add strict_multi_assignment and too_many_rows plpgsql checks
Until now shadowed_variables was the only plpgsql check supported by
plpgsql.extra_warnings and plpgsql.extra_errors. This patch introduces
two new checks - strict_multi_assignment and too_many_rows. Unlike
shadowed_variables, these new checks are enforced at run-time.
strict_multi_assignment checks that commands allowing multi-assignment
(for example SELECT INTO) have the same number of sources and targets.
too_many_rows checks that queries with an INTO clause return one row
exactly.
These checks are aimed at cases that are technically valid and allowed,
but are often a sign of a bug. Therefore those checks are expected to
be enabled primarily in development and testing environments.
Author: Pavel Stehule Reviewed-by: Stephen Frost, Tomas Vondra
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRA2kKRDKpUNwLY0GeG1OqOp+tLS2yQA1V41gzuSz-hCng@mail.gmail.com
Thomas Munro [Tue, 24 Jul 2018 22:57:52 +0000 (10:57 +1200)]
Pad semaphores to avoid false sharing.
In a USE_UNNAMED_SEMAPHORES build, the default on Linux and FreeBSD
since commit ecb0d20a, we have an array of sem_t objects. This
turned out to reduce performance compared to the previous default
USE_SYSV_SEMAPHORES on an 8 socket system. Testing showed that the
lost performance could be regained by padding the array elements so
that they have their own cache lines. This matches what we do for
similar hot arrays (see LWLockPadded, WALInsertLockPadded).
Back-patch to 10, where unnamed semaphores were adopted as the default
semaphore interface on those operating systems.
Author: Thomas Munro Reviewed-by: Andres Freund Reported-by: Mithun Cy Tested-by: Mithun Cy, Tom Lane, Thomas Munro
Discussion: https://postgr.es/m/CAD__OugYDM3O%2BdyZnnZSbJprSfsGFJcQ1R%3De59T3hcLmDug4_w%40mail.gmail.com
Andres Freund [Tue, 24 Jul 2018 17:42:59 +0000 (10:42 -0700)]
doc: Fix reference to "decoder" to instead be the correct "output plugin".
Author: Jonathan Katz
Discussion: https://postgr.es/m/DD02DD86-5989-4BFD-8712-468541F68383@postgresql.org
Backpatch: 9.4-, where logical decoding was added
Michael Paquier [Tue, 24 Jul 2018 02:37:32 +0000 (11:37 +0900)]
Refactor cluster_rel() to handle more options
This extends cluster_rel() in such a way that more options can be added
in the future, which will reduce the amount of chunk code for an
upcoming SKIP_LOCKED aimed for VACUUM. As VACUUM FULL is a different
flavor of CLUSTER, we want to make that extensible to ease integration.
This only reworks the API and its callers, without providing anything
user-facing. Two options are present now: verbose mode and relation
recheck when doing the cluster command work across multiple
transactions. This could be used as well as a base to extend the
grammar of CLUSTER later on.
Author: Michael Paquier Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/20180723031058.GE2854@paquier.xyz
Michael Paquier [Tue, 24 Jul 2018 01:32:56 +0000 (10:32 +0900)]
Fix calculation for WAL segment recycling and removal
Commit 4b0d28de06 has removed the prior checkpoint and related
facilities but has left WAL recycling based on the LSN of the prior
checkpoint, which causes incorrect calculations for WAL removal and
recycling for max_wal_size and min_wal_size. This commit changes things
so as the base calculation point is the last checkpoint generated.
Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/20180723.135748.42558387.horiguchi.kyotaro@lab.ntt.co.jp
Backpatch: 11-, where the prior checkpoint has been removed.
Andres Freund [Mon, 23 Jul 2018 04:13:20 +0000 (21:13 -0700)]
LLVMJIT: Adapt to API changes in gdb and perf support.
During the work of upstreaming my previous patches for gdb and perf
support the API changed. Adapt. Normally this wouldn't necessarily be
something to backpatch, but the previous API wasn't upstream, and at
least the gdb support is quite useful for debugging.
Author: Andres Freund
Backpatch: 11, where LLVM based JIT support was added.
Michael Paquier [Mon, 23 Jul 2018 00:37:36 +0000 (09:37 +0900)]
Add proper errcodes to new error messages for read() failures
Those would use the default ERRCODE_INTERNAL_ERROR, but for foreseeable
failures an errcode ought to be set, ERRCODE_DATA_CORRUPTED making the
most sense here.
While on the way, fix one errcode_for_file_access missing in origin.c
since the code has been created, and remove one assignment of errno to 0
before calling read(), as this was around to fit with what was present
before 811b6e36 where errno would not be set when not enough bytes are
read. I have noticed the first one, and Tom has pinged me about the
second one.
Author: Michael Paquier Reported-by: Tom Lane
Discussion: https://postgr.es/m/27265.1531925836@sss.pgh.pa.us
Michael Paquier [Mon, 23 Jul 2018 00:19:12 +0000 (09:19 +0900)]
Make more consistent some error messages for file-related operations
Some error messages which report something about a file operation use
as well context which is already provided within the path being worked
on, making things rather duplicated. This creates more work for
translators, and does not actually bring clarity.
More could be done, however in a lot of cases the context used is
actually useful, still that patch gets down things with a good cut.
Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Tom Lane
Discussion: https://postgr.es/m/20180718044711.GA8565@paquier.xyz
Andres Freund [Sun, 22 Jul 2018 23:47:00 +0000 (16:47 -0700)]
Fix JITed EEOP_AGG_INIT_TRANS, which missed some state.
The JIT compiled implementation missed maintaining
AggState->{current_set,curaggcontext}. That could lead to trouble
because the transition value could be allocated in the wrong context.
Reported-By: Rushabh Lathia Diagnosed-By: Dmitry Dolgov
Author: Dmitry Dolgov, with minor changes by me
Discussion: https://postgr.es/m/CAGPqQf165-=+Drw3Voim7M5EjHT1zwPF9BQRjLFQzCzYnNZEiQ@mail.gmail.com
Backpatch: 11-, where JIT compilation support was added
Andres Freund [Sun, 22 Jul 2018 21:58:01 +0000 (14:58 -0700)]
Hand code string to integer conversion for performance.
As benchmarks show, using libc's string-to-integer conversion is
pretty slow. At least part of the reason for that is that strtol[l]
have to be more generic than what largely is required inside pg.
This patch considerably speeds up int2/int4 input (int8 already was
already using hand-rolled code).
Most of the existing pg_atoi callers have been converted. But as one
requires pg_atoi's custom delimiter functionality, and as it seems
likely that there's external pg_atoi users, it seems sensible to just
keep pg_atoi around.
Author: Andres Freund Reviewed-By: Robert Haas
Discussion: https://postgr.es/m/20171208214437.qgn6zdltyq5hmjpk@alap3.anarazel.de
Andres Freund [Sun, 22 Jul 2018 21:58:01 +0000 (14:58 -0700)]
Deduplicate "invalid input syntax" messages for various types.
Previously a lot of the error messages referenced the type in the
error message itself. That requires that the message is translated
separately for each type.
Note that currently a few smallint cases continue to reference the
integer, rather than smallint, type. A later patch will create a
separate routine for 16bit input.
Author: Andres Freund
Discussion: https://postgr.es/m/20180707200158.wpqkd7rjr4jxq5g7@alap3.anarazel.de
Tom Lane [Sat, 21 Jul 2018 19:40:51 +0000 (15:40 -0400)]
Further portability hacking in pg_upgrade's test script.
I blew the dust off a Bourne shell (file date 1996, yea verily) and
tried to run test.sh with it. It mostly worked, but I found that the
temp-directory creation code introduced by commit be76a6d39 was not
compatible, for a couple of reasons: this shell thinks "set -e" should
force an exit if a command within backticks fails, and it also thinks code
within braces should be executed by a sub-shell, meaning that variable
settings don't propagate back up to the parent shell. In view of Victor
Wagner's report that Solaris is still using pre-POSIX shells, seems like
we oughta make this case work. It's not like the code is any less
idiomatic this way; the prior coding technique appeared nowhere else.
(There is a remaining bash-ism here, which is that $RANDOM doesn't do
what the code hopes in non-bash shells. But the use of $$ elsewhere in
that path should be enough to ensure uniqueness and some amount of
randomness, so I think it's okay as-is.)
Back-patch to all supported branches, as the previous commit was.
Tom Lane [Sat, 21 Jul 2018 16:05:25 +0000 (12:05 -0400)]
Be more paranoid about quoting in pg_upgrade's test script.
Double-quote $PGDATA in "find" commands introduced by commit da9b580d8,
in case that path contains spaces or other special characters.
Adjust a few other places so that quoting is done more consistently.
None of the others are actual bugs AFAICS, but it's confusing to readers
if the same thing is done differently in different places.
Tom Lane [Fri, 20 Jul 2018 17:59:27 +0000 (13:59 -0400)]
Avoid unportable shell syntax in pg_upgrade's test script.
Most of test.sh uses traditional backtick syntax for command substitution,
but commit da9b580d8 introduced two uses of $(...) syntax, which is not
recognized by very old shells. Bring those into line with the rest.
Dean Rasheed [Fri, 20 Jul 2018 07:55:44 +0000 (08:55 +0100)]
Guard against rare RAND_bytes() failures in pg_strong_random().
When built using OpenSSL, pg_strong_random() uses RAND_bytes() to
generate the random number. On very rare occasions that can fail, if
its PRNG has not been seeded with enough data. Additionally, once it
does fail, all subsequent calls will also fail until more seed data is
added. Since this is required during backend startup, this can result
in all new backends failing to start until a postmaster restart.
Guard against that by checking the state of OpenSSL's PRNG using
RAND_status(), and if necessary (very rarely), seeding it using
RAND_poll().
Back-patch to v10, where pg_strong_random() was introduced.
Michael Paquier [Thu, 19 Jul 2018 22:43:41 +0000 (07:43 +0900)]
Add toast tables to most system catalogs
It has been project policy to create toast tables only for those catalogs
that might reasonably need one. Since this judgment call can change over
time, just create one for every catalog, as this can be useful when
creating rather-long entries in catalogs, with recent examples being in
the shape of policy expressions or customly-formatted SCRAM verifiers.
To prevent circular dependencies and to avoid adding complexity to VACUUM
FULL logic, exclude pg_class, pg_attribute, and pg_index. Also, to
prevent pg_upgrade from seeing a non-empty new cluster, exclude
pg_largeobject and pg_largeobject_metadata from the set as large object
data is handled as user data. Those relations have no reason to use a
toast table anyway.
Author: Joe Conway, John Naylor Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/84ddff04-f122-784b-b6c5-3536804495f8@joeconway.com
Tom Lane [Thu, 19 Jul 2018 19:41:46 +0000 (15:41 -0400)]
Remove undocumented restriction against duplicate partition key columns.
transformPartitionSpec rejected duplicate simple partition columns
(e.g., "PARTITION BY RANGE (x,x)") but paid no attention to expression
columns, resulting in inconsistent behavior. Worse, cases like
"PARTITION BY RANGE (x,(x))") were accepted but would then result in
dump/reload failures, since the expression (x) would get simplified
to a plain column later.
There seems no better reason for this restriction than there was for
the one against duplicate included index columns (cf commit 701fd0bbc),
so let's just remove it.
Tom Lane [Thu, 19 Jul 2018 18:53:41 +0000 (14:53 -0400)]
Improve psql's \d command to show whether index columns are key columns.
This is essential information when looking at an index that has
"included" columns. Per discussion, follow the style used in \dC
and some other places: column header is "Key?" and values are "yes"
or "no" (all translatable).
While at it, revise describeOneTableDetails to be a bit more maintainable:
avoid hard-wired column numbers and multiple repetitions of what needs
to be identical test logic. This also results in the emitted catalog
query corresponding more closely to what we print, which should be a
benefit to users of ECHO_HIDDEN mode, and perhaps a bit faster too
(the old logic sometimes asked for values it would not print, even
ones that are fairly expensive to get).
Tom Lane [Thu, 19 Jul 2018 17:48:05 +0000 (13:48 -0400)]
Fix pg_get_indexdef()'s behavior for included index columns.
The multi-argument form of pg_get_indexdef() failed to print anything when
asked to print a single index column that is an included column rather than
a key column. This seems an unintentional result of someone having tried
to take a short-cut and use the attrsOnly flag for two different purposes.
To fix, split said flag into two flags, attrsOnly which suppresses
non-attribute info, and keysOnly which suppresses included columns.
Add a test case using psql's \d command, which relies on that function.
(It's mighty tempting at this point to replace pg_get_indexdef_worker's
mess of boolean flag arguments with a single bitmask-of-flags argument,
which would allow making the call sites much more self-documenting.
But I refrained for the moment.)
Rewrite comments in replication slot advance implementation
The code added by 9c7d06d60680 was a bit obscure; clarify that by
rewriting the comments. Lack of clarity has already caused bugs, so
it's a worthy goal.
Fix handling of empty uncompressed posting list pages in GIN
PostgreSQL 9.4 introduces posting list compression in GIN. This feature
supports online upgrade, so that after pg_upgrade uncompressed posting
lists are compressed on-the-fly. Underlying code appears to always
expect at least one item on uncompressed posting list page. But there
could be completely empty pages, because VACUUM never deletes leftmost
and rightmost pages from posting trees. This commit fixes that.
I was confused by what "intended to be parallel serially" meant, until
Robert Haas and David G. Johnston explained it. Rephrase the comment to
make it more clear, using David's suggested wording.
Expand run-time partition pruning to work with MergeAppend
This expands the support for the run-time partition pruning which was added
for Append in 499be013de to also allow unneeded subnodes of a MergeAppend
to be removed.
Author: David Rowley
Discussion: https://www.postgresql.org/message-id/CAKJS1f_F_V8D7Wu-HVdnH7zCUxhoGK8XhLLtd%3DCu85qDZzXrgg%40mail.gmail.com
Michael Paquier [Thu, 19 Jul 2018 00:54:39 +0000 (09:54 +0900)]
Fix print of Path nodes when using OPTIMIZER_DEBUG
GatherMergePath (introduced in 10) and CustomPath (introduced in 9.5)
have gone missing. The order of the Path nodes was inconsistent with
what is listed in nodes.h, so make the order consistent at the same time
to ease future checks and additions.
Author: Sawada Masahiko Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAD21AoBQMLoc=ohH-oocuAPsELrmk8_EsRJjOyR8FQLZkbE0wA@mail.gmail.com
Tom Lane [Wed, 18 Jul 2018 21:39:27 +0000 (17:39 -0400)]
Remove race-prone hot_standby_feedback test cases in 001_stream_rep.pl.
This script supposed that if it turned hot_standby_feedback on and then
shut down the standby server, at least one feedback message would be
guaranteed to be sent before the standby stops. But there is no such
guarantee, if the standby's walreceiver process is slow enough --- and
we've seen multiple failures in the buildfarm showing that that does
happen in practice. While we could rearrange the walreceiver logic to
make it less likely, it seems probably impossible to create a really
bulletproof guarantee of that sort; and if we tried, we might create
situations where the walreceiver wouldn't react in a timely manner to
shutdown commands. It seems better instead to remove the script's
assumption that feedback will occur before shutdown.
But once we do that, these last few tests seem quite redundant with
the earlier tests in the script. So let's just drop them altogether
and save some buildfarm cycles.
Tom Lane [Wed, 18 Jul 2018 18:43:03 +0000 (14:43 -0400)]
Drop the rule against included index columns duplicating key columns.
The initial version of the included-index-column feature stated that
included columns couldn't be the same as any key column of the index.
While it'd be pretty silly to do that, since the included column would be
entirely redundant, we've never prohibited redundant index columns before
so it's not very consistent to do so here. Moreover, the prohibition
was itself badly implemented, so that it failed to reject columns that
were effectively identical but not spelled quite alike, as reported by
Aditya Toshniwal.
(Moreover, it's not hard to imagine that for some non-btree index types,
such cases would be non-silly anyhow: the index might use a lossy
representation for key columns but be able to support retrieval of the
original form of included columns.)
Hence, let's just drop the prohibition.
In passing, do some copy-editing on the documentation for the
included-column feature.
Yugo Nagata; documentation and test corrections by me
Tom Lane [Wed, 18 Jul 2018 16:15:16 +0000 (12:15 -0400)]
Use a ResourceOwner to track buffer pins in all cases.
Historically, we've allowed auxiliary processes to take buffer pins without
tracking them in a ResourceOwner. However, that creates problems for error
recovery. In particular, we've seen multiple reports of assertion crashes
in the startup process when it gets an error while holding a buffer pin,
as for example if it gets ENOSPC during a write. In a non-assert build,
the process would simply exit without releasing the pin at all. We've
gotten away with that so far just because a failure exit of the startup
process translates to a database crash anyhow; but any similar behavior
in other aux processes could result in stuck pins and subsequent problems
in vacuum.
To improve this, institute a policy that we must *always* have a resowner
backing any attempt to pin a buffer, which we can enforce just by removing
the previous special-case code in resowner.c. Add infrastructure to make
it easy to create a process-lifespan AuxProcessResourceOwner and clear
out its contents at appropriate times. Replace existing ad-hoc resowner
management in bgwriter.c and other aux processes with that. (Thus, while
the startup process gains a resowner where it had none at all before, some
other aux process types are replacing an ad-hoc resowner with this code.)
Also use the AuxProcessResourceOwner to manage buffer pins taken during
StartupXLOG and ShutdownXLOG, even when those are being run in a bootstrap
process or a standalone backend rather than a true auxiliary process.
In passing, remove some other ad-hoc resource owner creations that had
gotten cargo-culted into various other places. As far as I can tell
that was all unnecessary, and if it had been necessary it was incomplete,
due to lacking any provision for clearing those resowners later.
(Also worth noting in this connection is that a process that hasn't called
InitBufferPoolBackend has no business accessing buffers; so there's more
to do than just add the resowner if we want to touch buffers in processes
not covered by this patch.)
Although this fixes a very old bug, no back-patch, because there's no
evidence of any significant problem in non-assert builds.
Patch by me, pursuant to a report from Justin Pryzby. Thanks to
Robert Haas and Kyotaro Horiguchi for reviews.
Michael Paquier [Tue, 17 Jul 2018 23:01:23 +0000 (08:01 +0900)]
Rework error messages around file handling
Some error messages related to file handling are using the code path
context to define their state. For example, 2PC-related errors are
referring to "two-phase status files", or "relation mapping file" is
used for catalog-to-filenode mapping, however those prove to be
difficult to translate, and are not more helpful than just referring to
the path of the file being worked on. So simplify all those error
messages by just referring to files with their path used. In some
cases, like the manipulation of WAL segments, the context is actually
helpful so those are kept.
Calls to the system function read() have also been rather inconsistent
with their error handling sometimes not reporting the number of bytes
read, and some other code paths trying to use an errno which has not
been set. The in-core functions are using a more consistent pattern
with this patch, which checks for both errno if set or if an
inconsistent read is happening.
So as to care about pluralization when reading an unexpected number of
byte(s), "could not read: read %d of %zu" is used as error message, with
%d field being the output result of read() and %zu the expected size.
This simplifies the work of translators with less variations of the same
message.
Author: Michael Paquier Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20180520000522.GB1603@paquier.xyz
doc: move PARTITION OF stanza to just below PARTITION BY
It's more logical this way, since the new ordering matches the way the
tables are created; but in any case, the previous location of PARTITION OF
did not appear carefully chosen anyway (since it didn't match the
location in which it appears in the synopsys either, which is what we
normally do.)
In the PARTITION BY stanza, add a link to the partitioning section in
the DDL chapter, too.
Suggested-by: David G. Johnston
Discussion: https://postgr.es/m/CAKFQuwY4Ld7ecxL_KAmaxwt0FUu5VcPPN2L4dh+3BeYbrdBa5g@mail.gmail.com
Fix ALTER TABLE...SET STATS error message for included columns
The existing error message was complaining that the column is not an
expression, which is not correct. Introduce a suitable wording
variation and a test.
The original code was unable to prune partitions that could not possibly
contain NULL values, when the query specified less than all columns in a
multicolumn partition key. Reorder the if-tests so that it is, and add
more commentary and regression tests.
Robert Haas [Mon, 16 Jul 2018 21:33:22 +0000 (17:33 -0400)]
Add subtransaction handling for table synchronization workers.
Since the old logic was completely unaware of subtransactions, a
change made in a subsequently-aborted subtransaction would still cause
workers to be stopped at toplevel transaction commit. Fix that by
managing a stack of worker lists rather than just one.
Peter Eisentraut [Mon, 16 Jul 2018 08:44:06 +0000 (10:44 +0200)]
doc: Update redirecting links
Update links that resulted in redirects. Most are changes from http to
https, but there are also some other minor edits. (There are still some
redirects where the target URL looks less elegant than the one we
currently have. I have left those as is.)
Tom Lane [Sat, 14 Jul 2018 15:59:12 +0000 (11:59 -0400)]
Fix hashjoin costing mistake introduced with inner_unique optimization.
In final_cost_hashjoin(), commit 9c7f5229a allowed inner_unique cases
to follow a code path previously used only for SEMI/ANTI joins; but it
neglected to fix an if-test within that path that assumed SEMI and ANTI
were the only possible cases. This resulted in a wrong value for
hashjointuples, and an ensuing bad cost estimate, for inner_unique normal
joins. Fortunately, for inner_unique normal joins we can assume the number
of joined tuples is the same as for a SEMI join; so there's no need for
more code, we just have to invert the test to check for ANTI not SEMI.
It turns out that in two contrib tests in which commit 9c7f5229a
changed the plan expected for a query, the change was actually wrong
and induced by this estimation error, not by any real improvement.
Hence this patch also reverts those changes.
Per report from RK Korlapati. Backpatch to v10 where the error was
introduced.
Tom Lane [Fri, 13 Jul 2018 22:45:30 +0000 (18:45 -0400)]
Fix crash in contrib/ltree's lca() function for empty input array.
lca_inner() wasn't prepared for the possibility of getting no inputs.
Fix that, and make some cosmetic improvements to the code while at it.
Also, I thought the documentation of this function as returning the
"longest common prefix" of the paths was entirely misleading; it really
returns a path one shorter than the longest common prefix, for the typical
definition of "prefix". Don't use that term in the docs, and adjust the
examples to clarify what really happens.
This has been broken since its beginning, so back-patch to all supported
branches.
Per report from Hailong Li. Thanks to Pierre Ducroquet for diagnosing
and for the initial patch, though I whacked it around some and added
test cases.
Peter Eisentraut [Fri, 13 Jul 2018 19:23:41 +0000 (21:23 +0200)]
Update documentation editor setup instructions
Now that the documentation sources are in XML rather than SGML, some of
the documentation about the editor, or more specifically Emacs, setup
needs updating. The updated instructions recommend using nxml-mode,
which works mostly out of the box, with some small tweaks in
emacs.samples and .dir-locals.el.
Also remove some obsolete stuff in .dir-locals.el. I did, however,
leave the sgml-mode settings in there so that someone using Emacs
without emacs.samples gets those settings when editing a *.sgml file.
Tom Lane [Fri, 13 Jul 2018 18:16:47 +0000 (14:16 -0400)]
Fix crash in json{b}_populate_recordset() and json{b}_to_recordset().
As of commit 37a795a60, populate_recordset_worker() tried to pass back
(as rsi.setDesc) a tupdesc that it also had cached in its fn_extra.
But the core executor would free the passed-back tupdesc, risking a
crash if the function were called again in the same query. The safest
and least invasive way to fix that is to make an extra tupdesc copy
to pass back.
While at it, I failed to resist the temptation to get rid of unnecessary
get_fn_expr_argtype() calls here and in populate_record_worker().
Per report from Dmitry Dolgov; thanks to Michael Paquier and
Andrew Gierth for investigation and discussion.
The patch that ended up as commit 3de241dba86f ("Foreign keys on
partitioned tables") lacked pg_dump tests, so the pg_dump code that was
there to support it inadvertently stopped working when in later
development I modified the backend code not to emit pg_trigger rows for
the partitioned table itself.
Bug analysis and code fix is by Michaël. I (Álvaro) added the test.
Improve performance of tuple conversion map generation
Previously convert_tuples_by_name_map naively performed a search of each
outdesc column starting at the first column in indesc and searched each
indesc column until a match was found. When partitioned tables had many
columns this could result in slow generation of the tuple conversion maps.
For INSERT and UPDATE statements that touched few rows, this could mean a
very large overhead indeed.
We can do a bit better with this loop. It's quite likely that the columns
in partitioned tables and their partitions are in the same order, so it
makes sense to start searching for each column outer column at the inner
column position 1 after where the previous match was found (per idea from
Alexander Kuzmenkov). This makes the best case search O(N) instead of
O(N^2). The worst case is still O(N^2), but it seems unlikely that would
happen.
Likewise, in the planner, make_inh_translation_list's search for the
matching column could often end up falling back on an O(N^2) type search.
This commit also improves that by first checking the column that follows
the previous match, instead of the column with the same attnum. If we
fail to match here we fallback on the syscache's hashtable lookup.
Author: David Rowley Reviewed-by: Alexander Kuzmenkov
Discussion: https://www.postgresql.org/message-id/CAKJS1f9-wijVgMdRp6_qDMEQDJJ%2BA_n%3DxzZuTmLx5Fz6cwf%2B8A%40mail.gmail.com
Tom Lane [Fri, 13 Jul 2018 15:52:50 +0000 (11:52 -0400)]
Fix inadequate buffer locking in FSM and VM page re-initialization.
When reading an existing FSM or VM page that was found to be corrupt by the
buffer manager, the code applied PageInit() to reinitialize the page, but
did so without any locking. There is thus a hazard that two backends might
concurrently do PageInit, which in itself would still be OK, but the slower
one might then zero over subsequent data changes applied by the faster one.
Even that is unlikely to be fatal; but it's not desirable, so add locking
to prevent it.
This does not add any locking overhead in the normal code path where the
page is OK. It's not immediately obvious that that's safe, but I believe
it is, for reasons explained in the added comments.
Problem noted by R P Asim. It's been like this for a long time, so
back-patch to all supported branches.
Prohibit transaction commands in security definer procedures
Starting and aborting transactions in security definer procedures
doesn't work. StartTransaction() insists that the security context
stack is empty, so this would currently cause a crash, and
AbortTransaction() resets it. This could be made to work by
reorganizing the code, but right now we just prohibit it.
Reported-by: amul sul <sulamul@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b96Gupt_LFL7uNyy3c50-wbhA68NUjiK5%3DrF6_w%3Dpq_T%3DQ%40mail.gmail.com
Peter Eisentraut [Fri, 13 Jul 2018 08:01:04 +0000 (10:01 +0200)]
Remove obsolete documentation build tools for Windows
The scripts and instructions have been nonfunctional at least since
PostgreSQL 10 (commit 510074f9f0131a04322d6a3d2a51c87e6db243f9) and
nobody has stepped up to fix them. So right now just remove them until
someone wants to resurrect them.
Thomas Munro [Fri, 13 Jul 2018 04:12:03 +0000 (16:12 +1200)]
Accept invalidation messages in InitializeSessionUserId().
If the authentication method modified the system catalogs through a
separate database connection (say, to create a new role on the fly),
make sure syscache sees the changes before we try to find the user.
Author: Thomas Munro Reviewed-by: Tom Lane, Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D3_h0_cgmz5PMyab4xk_OFrg6G5VCN%3DnF4chFXM9iFOqA%40mail.gmail.com
Michael Paquier [Fri, 13 Jul 2018 00:32:12 +0000 (09:32 +0900)]
Fix argument of pg_create_logical_replication_slot for slot name
All attributes and arguments using a slot name map to the data type
"name", but this function has been using "text". This is cosmetic, as
even if text is used then the slot name would be truncated to 64
characters anyway and stored as such. The documentation already said
so and the function already assumed that the argument was of this type
when fetching its value.
Michael Paquier [Thu, 12 Jul 2018 21:43:20 +0000 (06:43 +0900)]
Clean up temporary WAL segments after an instance crash
Temporary WAL segments are created in pg_wal and named as xlogtemp.pid
before being renamed to the real deal when creating a new segment. If
an instance crashes after the temporary segment is created and before
the rename is done, then the server would finish with unremovable data.
After an instance crash, scan pg_wal and remove any such segments. With
repetitive unlucky crashes this would contribute to disk bloat and
presents risks of ENOSPC especially with max_wal_size close to the
maximum allowed.
Author: Michael Paquier Reviewed-by: Yugo Nagata, Heikki Linnakangas
Discussion: https://postgr.es/m/20180514054955.GF1528@paquier.xyz
Peter Eisentraut [Thu, 12 Jul 2018 18:22:17 +0000 (20:22 +0200)]
Reset shmem_exit_inprogress after shmem_exit()
In ad9a274778d2d88c46b90309212b92ee7fdf9afe, shmem_exit_inprogress was
introduced. But we need to reset it after shmem_exit(), because unlike
the similar proc_exit(), shmem_exit() can also be called for cleanup
when the process will not exit.
Reported-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
Tom Lane [Thu, 12 Jul 2018 16:28:43 +0000 (12:28 -0400)]
Doc: minor improvement in pl/pgsql FETCH/MOVE documentation.
Explain that you can use any integer expression for the "count" in
pl/pgsql's versions of FETCH/MOVE, unlike the SQL versions which only
allow a constant.
Remove the duplicate version of this para under MOVE. I don't see
a good reason to maintain two identical paras when we just said that
MOVE works exactly like FETCH.
Fix FK checks of TRUNCATE involving partitioned tables
When truncating a table that is referenced by foreign keys in
partitioned tables, the check to ensure the referencing table are also
truncated spuriously failed. This is because it was relying on
relhastriggers as a proxy for the table having FKs, and that's wrong for
partitioned tables. Fix it to consider such tables separately. There
may be a better way ... but this code is pretty inefficient already.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Michael Paquiër <michael@paquier.xyz>
Discussion: https://postgr.es/m/20180711000624.zmeizicibxeehhsg@alvherre.pgsql
Tom Lane [Thu, 12 Jul 2018 15:10:24 +0000 (11:10 -0400)]
Doc: update documentation for requirement of ORDER BY in GROUPS mode.
Commit ff4f88916 adjusted the code to enforce the SQL spec's requirement
that a window using GROUPS mode must have an ORDER BY clause. But I missed
that the documentation explicitly said you didn't have to have one.
Also minor wordsmithing in the window-function section of select.sgml.
Per Masahiko Sawada, though I didn't use his patch.
Amit Kapila [Thu, 12 Jul 2018 07:21:39 +0000 (12:51 +0530)]
Allow using the updated tuple while moving it to a different partition.
An update that causes the tuple to be moved to a different partition was
missing out on re-constructing the to-be-updated tuple, based on the latest
tuple in the update chain. Instead, it's simply deleting the latest tuple
and inserting a new tuple in the new partition based on the old tuple.
Commit 2f17844104 didn't consider this case, so some of the updates were
getting lost.
In passing, change the argument order for output parameter in ExecDelete
and add some commentary about it.
Reported-by: Pavan Deolasee
Author: Amit Khandekar, with minor changes by me Reviewed-by: Dilip Kumar, Amit Kapila and Alvaro Herrera
Backpatch-through: 11
Discussion: https://postgr.es/m/CAJ3gD9fRbEzDqdeDq1jxqZUb47kJn+tQ7=Bcgjc8quqKsDViKQ@mail.gmail.com
Michael Paquier [Thu, 12 Jul 2018 05:28:28 +0000 (14:28 +0900)]
Rename VACOPT_NOWAIT to VACOPT_SKIP_LOCKED
When it comes to SELECT ... FOR or LOCK, NOWAIT means to not wait for
something to happen, and issue an error. SKIP LOCKED means to not wait
for something to happen but to move on without issuing an error. The
internal option of autovacuum and autoanalyze mentioned above, used only
when wraparound is not involved was named NOWAIT, but behaves like SKIP
LOCKED which is confusing.
Michael Paquier [Thu, 12 Jul 2018 04:50:17 +0000 (13:50 +0900)]
Add assertion in expand_vacuum_rel() for non-autovacuum path
The code path where the assertion is added helps to check that
autovacuum always includes a relation OID when doing a vacuum on it.
Extracted from a larger patch set to add support for SKIP LOCKED with
manual VACUUM commands.
Michael Paquier [Thu, 12 Jul 2018 01:19:35 +0000 (10:19 +0900)]
Make logical WAL sender report streaming state appropriately
WAL senders sending logically-decoded data fail to properly report in
"streaming" state when starting up, hence as long as one extra record is
not replayed, such WAL senders would remain in a "catchup" state, which
is inconsistent with the physical cousin.
This can be easily reproduced by for example using pg_recvlogical and
restarting the upstream server. The TAP tests have been slightly
modified to detect the failure and strengthened so as future tests also
make sure that a node is in streaming state when waiting for its
catchup.
Backpatch down to 9.4 where this code has been introduced.
Reported-by: Sawada Masahiko
Author: Simon Riggs, Sawada Masahiko Reviewed-by: Petr Jelinek, Michael Paquier, Vaishnavi Prabakaran
Discussion: https://postgr.es/m/CAD21AoB2ZbCCqOx=bgKMcLrAvs1V0ZMqzs7wBTuDySezTGtMZA@mail.gmail.com
Tom Lane [Wed, 11 Jul 2018 22:47:31 +0000 (18:47 -0400)]
Mark built-in btree comparison functions as leakproof where it's safe.
Generally, if the comparison operators for a datatype or pair of datatypes
are leakproof, the corresponding btree comparison support function can be
considered so as well. But we had not originally worried about marking
support functions as leakproof, reasoning that they'd not likely be used in
queries so the marking wouldn't matter. It turns out there's at least one
place where it does matter: calc_arraycontsel() finds the target datatype's
default btree comparison function and tries to use that to estimate
selectivity, but it will be blocked in some cases if the function isn't
leakproof. This leads to unnecessarily poor selectivity estimates and bad
plans, as seen in bug #15251.
Hence, run around and apply proleakproof markings where the corresponding
btree comparison operators are leakproof. (I did eyeball each function
to verify that it wasn't doing anything surprising, too.)
This isn't a full solution to bug #15251, and it's not back-patchable
because of the need for a catversion bump. A more useful response probably
is to consider whether we can check permissions on the parent table instead
of the child. However, this change will help in some cases where that
won't, and it's easy enough to do in HEAD, so let's do so.
Tom Lane [Wed, 11 Jul 2018 19:25:28 +0000 (15:25 -0400)]
Fix create_scan_plan's handling of sortgrouprefs for physical tlists.
We should only run apply_pathtarget_labeling_to_tlist if CP_LABEL_TLIST
was specified, because only in that case has use_physical_tlist checked
that the labeling will succeed; otherwise we may get an "ORDER/GROUP BY
expression not found in targetlist" error. (This subsumes the previous
test about gating_clauses, because we reset "flags" to zero earlier
if there are gating clauses to apply.)
The only known case in which a failure can occur is with a ProjectSet
path directly atop a table scan path, although it seems likely that there
are other cases or will be such in future. This means that the failure
is currently only visible in the v10 branch: 9.6 didn't have ProjectSet,
while in v11 and HEAD, apply_scanjoin_target_to_paths for some weird
reason is using create_projection_path not apply_projection_to_path,
masking the problem because there's a ProjectionPath in between.
Nonetheless this code is clearly wrong on its own terms, so back-patch
to 9.6 where this logic was introduced.
Tom Lane [Wed, 11 Jul 2018 16:07:20 +0000 (12:07 -0400)]
Fix bugs with degenerate window ORDER BY clauses in GROUPS/RANGE mode.
nodeWindowAgg.c failed to cope with the possibility that no ordering
columns are defined in the window frame for GROUPS mode or RANGE OFFSET
mode, leading to assertion failures or odd errors, as reported by Masahiko
Sawada and Lukas Eder. In RANGE OFFSET mode, an ordering column is really
required, so add an Assert about that. In GROUPS mode, the code would
work, except that the node initialization code wasn't in sync with the
execution code about when to set up tuplestore read pointers and spare
slots. Fix the latter for consistency's sake (even though I think the
changes described below make the out-of-sync cases unreachable for now).
Per SQL spec, a single ordering column is required for RANGE OFFSET mode,
and at least one ordering column is required for GROUPS mode. The parser
enforced the former but not the latter; add a check for that.
We were able to reach the no-ordering-column cases even with fully spec
compliant queries, though, because the planner would drop partitioning
and ordering columns from the generated plan if they were redundant with
earlier columns according to the redundant-pathkey logic, for instance
"PARTITION BY x ORDER BY y" in the presence of a "WHERE x=y" qual.
While in principle that's an optimization that could save some pointless
comparisons at runtime, it seems unlikely to be meaningful in the real
world. I think this behavior was not so much an intentional optimization
as a side-effect of an ancient decision to construct the plan node's
ordering-column info by reverse-engineering the PathKeys of the input
path. If we give up redundant-column removal then it takes very little
code to generate the plan node info directly from the WindowClause,
ensuring that we have the expected number of ordering columns in all
cases. (If anyone does complain about this, the planner could perhaps
be taught to remove redundant columns only when it's safe to do so,
ie *not* in RANGE OFFSET mode. But I doubt anyone ever will.)
With these changes, the WindowAggPath.winpathkeys field is not used for
anything anymore, so remove it.
The test cases added here are not actually very interesting given the
removal of the redundant-column-removal logic, but they would represent
important corner cases if anyone ever tries to put that back.
Tom Lane and Masahiko Sawada. Back-patch to v11 where RANGE OFFSET
and GROUPS modes were added.