Tom Lane [Mon, 6 Aug 2018 14:53:35 +0000 (10:53 -0400)]
Fix failure to reset libpq's state fully between connection attempts.
The logic in PQconnectPoll() did not take care to ensure that all of
a PGconn's internal state variables were reset before trying a new
connection attempt. If we got far enough in the connection sequence
to have changed any of these variables, and then decided to try a new
server address or server name, the new connection might be completed
with some state that really only applied to the failed connection.
While this has assorted bad consequences, the only one that is clearly
a security issue is that password_needed didn't get reset, so that
if the first server asked for a password and the second didn't,
PQconnectionUsedPassword() would return an incorrect result. This
could be leveraged by unprivileged users of dblink or postgres_fdw
to allow them to use server-side login credentials that they should
not be able to use.
Other notable problems include the possibility of forcing a v2-protocol
connection to a server capable of supporting v3, or overriding
"sslmode=prefer" to cause a non-encrypted connection to a server that
would have accepted an encrypted one. Those are certainly bugs but
it's harder to paint them as security problems in themselves. However,
forcing a v2-protocol connection could result in libpq having a wrong
idea of the server's standard_conforming_strings setting, which opens
the door to SQL-injection attacks. The extent to which that's actually
a problem, given the prerequisite that the attacker needs control of
the client's connection parameters, is unclear.
These problems have existed for a long time, but became more easily
exploitable in v10, both because it introduced easy ways to force libpq
to abandon a connection attempt at a late stage and then try another one
(rather than just giving up), and because it provided an easy way to
specify multiple target hosts.
Fix by rearranging PQconnectPoll's state machine to provide centralized
places to reset state properly when moving to a new target host or when
dropping and retrying a connection to the same host.
Tom Lane, reviewed by Noah Misch. Our thanks to Andrew Krasichkov
for finding and reporting the problem.
There are some problems with the tls-unique channel binding type. It's not
supported by all SSL libraries, and strictly speaking it's not defined for
TLS 1.3 at all, even though at least in OpenSSL, the functions used for it
still seem to work with TLS 1.3 connections. And since we had no
mechanism to negotiate what channel binding type to use, there would be
awkward interoperability issues if a server only supported some channel
binding types. tls-server-end-point seems feasible to support with any SSL
library, so let's just stick to that.
This removes the scram_channel_binding libpq option altogether, since there
is now only one supported channel binding type.
This also removes all the channel binding tests from the SSL test suite.
They were really just testing the scram_channel_binding option, which
is now gone. Channel binding is used if both client and server support it,
so it is used in the existing tests. It would be good to have some tests
specifically for channel binding, to make sure it really is used, and the
different combinations of a client and a server that support or doesn't
support it. The current set of settings we have make it hard to write such
tests, but I did test those things manually, by disabling
HAVE_BE_TLS_GET_CERTIFICATE_HASH and/or
HAVE_PGTLS_GET_PEER_CERTIFICATE_HASH.
I also removed the SCRAM_CHANNEL_BINDING_TLS_END_POINT constant. This is a
matter of taste, but IMO it's more readable to just use the
"tls-server-end-point" string.
Refactor the checks on whether the SSL library supports the functions
needed for tls-server-end-point channel binding. Now the server won't
advertise, and the client won't choose, the SCRAM-SHA-256-PLUS variant, if
compiled with an OpenSSL version too old to support it.
In the passing, add some sanity checks to check that the chosen SASL
mechanism, SCRAM-SHA-256 or SCRAM-SHA-256-PLUS, matches whether the SCRAM
exchange used channel binding or not. For example, if the client selects
the non-channel-binding variant SCRAM-SHA-256, but in the SCRAM message
uses channel binding anyway. It's harmless from a security point of view,
I believe, and I'm not sure if there are some other conditions that would
cause the connection to fail, but it seems better to be strict about these
things and check explicitly.
Tom Lane [Sun, 5 Aug 2018 03:49:53 +0000 (23:49 -0400)]
Update version 11 release notes.
Remove description of commit 1944cdc98, which has now been back-patched
so it's not relevant to v11 any longer. Add descriptions of other
recent commits that seemed worth mentioning.
I marked the update as stopping at 2018-07-30, because it's unclear
whether d06eebce5 will be allowed to stay in v11, and I didn't feel like
putting effort into writing a description of it yet. If it does stay,
I think it will deserve mention in the Source Code section.
Tom Lane [Sat, 4 Aug 2018 23:38:58 +0000 (19:38 -0400)]
Fix INSERT ON CONFLICT UPDATE through a view that isn't just SELECT *.
When expanding an updatable view that is an INSERT's target, the rewriter
failed to rewrite Vars in the ON CONFLICT UPDATE clause. This accidentally
worked if the view was just "SELECT * FROM ...", as the transformation
would be a no-op in that case. With more complicated view targetlists,
this omission would often lead to "attribute ... has the wrong type" errors
or even crashes, as reported by Mario De Frutos Dieguez.
Fix by adding code to rewriteTargetView to fix up the data structure
correctly. The easiest way to update the exclRelTlist list is to rebuild
it from scratch looking at the new target relation, so factor the code
for that out of transformOnConflictClause to make it sharable.
In passing, avoid duplicate permissions checks against the EXCLUDED
pseudo-relation, and prevent useless view expansion of that relation's
dummy RTE. The latter is only known to happen (after this patch) in cases
where the query would fail later due to not having any INSTEAD OF triggers
for the view. But by exactly that token, it would create an unintended
and very poorly tested state of the query data structure, so it seems like
a good idea to prevent it from happening at all.
This has been broken since ON CONFLICT was introduced, so back-patch
to 9.5.
Dean Rasheed, based on an earlier patch by Amit Langote;
comment-kibitzing and back-patching by me
Michael Paquier [Sat, 4 Aug 2018 20:31:56 +0000 (05:31 +0900)]
Reset properly errno before calling write()
6cb3372 enforces errno to ENOSPC when less bytes than what is expected
have been written when it is unset, though it forgot to properly reset
errno before doing a system call to write(), causing errno to
potentially come from a previous system call.
Reported-by: Tom Lane
Author: Michael Paquier Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/31797.1533326676@sss.pgh.pa.us
Noah Misch [Sat, 4 Aug 2018 03:53:25 +0000 (20:53 -0700)]
Make "kerberos" test suite independent of "localhost" name resolution.
This suite malfunctioned if the canonical name of "localhost" was
something other than "localhost", such as "localhost.localdomain". Use
hostaddr=127.0.0.1 and a fictitious host=, so the resolver's answers for
"localhost" don't affect the outcome. Back-patch to v11, which
introduced this test suite.
Peter Geoghegan [Fri, 3 Aug 2018 21:45:02 +0000 (14:45 -0700)]
Add table relcache invalidation to index builds.
It's necessary to make sure that owning tables have a relcache
invalidation prior to advancing the command counter to make
newly-entered catalog tuples for the index visible. inval.c must be
able to maintain the consistency of the local caches in the event of
transaction abort. There is usually only a problem when CREATE INDEX
transactions abort, since there is a generic invalidation once we reach
index_update_stats().
This bug is of long standing. Problems were made much more likely by
the addition of parallel CREATE INDEX (commit 9da0cc35284), but it is
strongly suspected that similar problems can be triggered without
involving plan_create_index_workers(). (plan_create_index_workers()
triggers a relcache build or rebuild, which previously only happened in
rare edge cases.)
Author: Peter Geoghegan Reported-By: Luca Ferrari Diagnosed-By: Andres Freund Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAKoxK+5fVodiCtMsXKV_1YAKXbzwSfp7DgDqUmcUAzeAhf=HEQ@mail.gmail.com
Backpatch: 9.3-
Tom Lane [Fri, 3 Aug 2018 16:20:47 +0000 (12:20 -0400)]
Remove no-longer-appropriate special case in psql's \conninfo code.
\conninfo prints the results of PQhost() and some other libpq functions.
It used to override the PQhost() result with the hostaddr parameter if
that'd been given, but that's unhelpful when multiple hosts were listed
in the connection string. Furthermore, it seems unnecessary in the wake
of commit 1944cdc98, since PQhost does any useful substitution itself.
So let's just remove the extra code and print PQhost()'s result without
any editorialization.
Tom Lane [Fri, 3 Aug 2018 16:12:10 +0000 (12:12 -0400)]
Change libpq's internal uses of PQhost() to inspect host field directly.
Commit 1944cdc98 changed PQhost() to return the hostaddr value when that
is specified and host isn't. This is a good idea in general, but
fe-auth.c and related files contain PQhost() calls for which it isn't.
Specifically, when we compare SSL certificates or other server identity
information to the host field, we do not want to use hostaddr instead;
that's not what's documented, that's not what happened pre-v10, and
it doesn't seem like a good idea.
Instead, we can just look at connhost[].host directly. This does what
we want in v10 and up; in particular, if neither host nor hostaddr
were given, the host field will be replaced with the default host name.
That seems useful, and it's likely the reason that these places were
coded to call PQhost() originally (since pre-v10, the stored field was
not replaced with the default).
Amit Kapila [Fri, 3 Aug 2018 05:46:25 +0000 (11:16 +0530)]
Fix buffer usage stats for parallel nodes.
The buffer usage stats is accounted only for the execution phase of the
node. For Gather and Gather Merge nodes, such stats are accumulated at
the time of shutdown of workers which is done after execution of node due
to which we missed to account them for such nodes. Fix it by treating
nodes as running while we shut down them.
We can also miss accounting for a Limit node when Gather or Gather Merge
is beneath it, because it can finish the execution before shutting down
such nodes. So we allow a Limit node to shut down the resources before it
completes the execution.
In the passing fix the gather node code to allow workers to shut down as
soon as we find that all the tuples from the workers have been retrieved.
The original code use to do that, but is accidently removed by commit 01edb5c7fc.
Reported-by: Adrien Nayrat
Author: Amit Kapila and Robert Haas Reviewed-by: Robert Haas and Andres Freund
Backpatch-through: 9.6 where this code was introduced
Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info
Amit Kapila [Fri, 3 Aug 2018 03:59:45 +0000 (09:29 +0530)]
Match the buffer usage tracking for leader and worker backends.
In the leader backend, we don't track the buffer usage for ExecutorStart
phase whereas in worker backend we track it for ExecutorStart phase as
well. This leads to different value for buffer usage stats for the
parallel and non-parallel query. Change the code so that worker backend
also starts tracking buffer usage after ExecutorStart.
Author: Amit Kapila and Robert Haas Reviewed-by: Robert Haas and Andres Freund
Backpatch-through: 9.6 where this code was introduced
Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info
Tom Lane [Wed, 1 Aug 2018 23:42:46 +0000 (19:42 -0400)]
Fix run-time partition pruning for appends with multiple source rels.
The previous coding here supposed that if run-time partitioning applied to
a particular Append/MergeAppend plan, then all child plans of that node
must be members of a single partitioning hierarchy. This is totally wrong,
since an Append could be formed from a UNION ALL: we could have multiple
hierarchies sharing the same Append, or child plans that aren't part of any
hierarchy.
To fix, restructure the related plan-time and execution-time data
structures so that we can have a separate list or array for each
partitioning hierarchy. Also track subplans that are not part of any
hierarchy, and make sure they don't get pruned.
Per reports from Phil Florent and others. Back-patch to v11, since
the bug originated there.
David Rowley, with a lot of cosmetic adjustments by me; thanks also
to Amit Langote for review.
Alvaro Herrera [Wed, 1 Aug 2018 21:39:07 +0000 (17:39 -0400)]
Fix logical replication slot initialization
This was broken in commit 9c7d06d60680, which inadvertently gave the
wrong value to fast_forward in one StartupDecodingContext call. Fix by
flipping the value. Add a test for the obvious error, namely trying to
initialize a replication slot with an nonexistent output plugin.
While at it, move the CreateDecodingContext call earlier, so that any
errors are reported before sending the CopyBoth message.
Author: Dave Cramer <davecramer@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CADK3HHLVkeRe1v4P02-5hj55H3_yJg3AEtpXyEY5T3wuzO2jSg@mail.gmail.com
Alvaro Herrera [Wed, 1 Aug 2018 19:06:47 +0000 (15:06 -0400)]
Fix per-tuple memory leak in partition tuple routing
Some operations were being done in a longer-lived memory context,
causing intra-query leaks. It's not noticeable unless you're doing a
large COPY, but if you are, it eats enough memory to cause a problem.
Tom Lane [Wed, 1 Aug 2018 16:30:36 +0000 (12:30 -0400)]
Fix libpq's code for searching .pgpass; rationalize empty-list-item cases.
Before v10, we always searched ~/.pgpass using the host parameter,
and nothing else, to match to the "hostname" field of ~/.pgpass.
(However, null host or host matching DEFAULT_PGSOCKET_DIR was replaced by
"localhost".) In v10, this got broken by commit 274bb2b38, repaired by
commit bdac9836d, and broken again by commit 7b02ba62e; in the code
actually shipped, we'd search with hostaddr if both that and host were
specified --- though oddly, *not* if only hostaddr were specified.
Since this is directly contrary to the documentation, and not
backwards-compatible, it's clearly a bug.
However, the change wasn't totally without justification, even though it
wasn't done quite right, because the pre-v10 behavior has arguably been
buggy since we added hostaddr. If hostaddr is specified and host isn't,
the pre-v10 code will search ~/.pgpass for "localhost", and ship that
password off to a server that most likely isn't local at all. That's
unhelpful at best, and could be a security breach at worst.
Therefore, rather than just revert to that old behavior, let's define
the behavior as "search with host if provided, else with hostaddr if
provided, else search for localhost". (As before, a host name matching
DEFAULT_PGSOCKET_DIR is replaced by localhost.) This matches the
behavior of the actual connection code, so that we don't pick up an
inappropriate password; and it allows useful searches to happen when
only hostaddr is given.
While we're messing around here, ensure that empty elements within a
host or hostaddr list select the same behavior as a totally-empty
field would; for instance "host=a,,b" is equivalent to "host=a,/tmp,b"
if DEFAULT_PGSOCKET_DIR is /tmp. Things worked that way in some cases
already, but not consistently so, which contributed to the confusion
about what key ~/.pgpass would get searched with.
Update documentation accordingly, and also clarify some nearby text.
Back-patch to v10 where the host/hostaddr list functionality was
introduced.
Tom Lane [Tue, 31 Jul 2018 17:00:08 +0000 (13:00 -0400)]
Further fixes for quoted-list GUC values in pg_dump and ruleutils.c.
Commits 742869946 et al turn out to be a couple bricks shy of a load.
We were dumping the stored values of GUC_LIST_QUOTE variables as they
appear in proconfig or setconfig catalog columns. However, although that
quoting rule looks a lot like SQL-identifier double quotes, there are two
critical differences: empty strings ("") are legal, and depending on which
variable you're considering, values longer than NAMEDATALEN might be valid
too. So the current technique fails altogether on empty-string list
entries (as reported by Steven Winfield in bug #15248) and it also risks
truncating file pathnames during dump/reload of GUC values that are lists
of pathnames.
To fix, split the stored value without any downcasing or truncation,
and then emit each element as a SQL string literal.
This is a tad annoying, because we now have three copies of the
comma-separated-string splitting logic in varlena.c as well as a fourth
one in dumputils.c. (Not to mention the randomly-different-from-those
splitting logic in libpq...) I looked at unifying these, but it would
be rather a mess unless we're willing to tweak the API definitions of
SplitIdentifierString, SplitDirectoriesString, or both. That might be
worth doing in future; but it seems pretty unsafe for a back-patched
bug fix, so for now accept the duplication.
Back-patch to all supported branches, as the previous fix was.
Change bms_add_range to be a no-op for empty ranges
In commit 84940644de93, bms_add_range was added with an API to fail with
an error if an empty range was specified. This seems arbitrary and
unhelpful, so turn that case into a no-op instead. Callers that require
further verification on the arguments or result can apply them by
themselves.
This fixes the bug that partition pruning throws an API error for a case
involving the default partition of a default partition, as in the
included test case.
Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/16590.1532622503@sss.pgh.pa.us
Tom Lane [Mon, 30 Jul 2018 22:04:39 +0000 (18:04 -0400)]
Ensure we build generated headers at the start of some more cases.
"make installcheck" and some related cases, when invoked from the toplevel
directory, start out by doing "make all" in src/test/regress. Since that's
one make recursion level down, the submake-generated-headers target will
do nothing, causing us to fail to create/update generated headers before
building pg_regress. This is, I believe, a new failure mode induced by
commit 3b8f6e75f, so let's fix it. To do so, we have to invoke
submake-generated-headers at the top level.
Set ActiveSnapshot when logically replaying inserts
Input functions for the inserted tuples may require a snapshot, when
they are replayed by native logical replication. An example is a domain
with a constraint using a SQL-language function, which prior to this
commit failed to apply on the subscriber side.
Tom Lane [Mon, 30 Jul 2018 16:35:49 +0000 (12:35 -0400)]
Fix pg_dump's failure to dump REPLICA IDENTITY for constraint indexes.
pg_dump knew about printing ALTER TABLE ... REPLICA IDENTITY USING INDEX
for indexes declared as indexes, but it failed to print that for indexes
declared as unique or primary-key constraints. Per report from Achilleas
Mantzios.
This has been broken since the feature was introduced, AFAICS.
Back-patch to 9.4.
Tom Lane [Mon, 30 Jul 2018 15:54:41 +0000 (11:54 -0400)]
Doc: fix oversimplified example for CREATE POLICY.
As written, this policy constrained only the post-image not the pre-image
of rows, meaning that users could delete other users' rows or take
ownership of such rows, contrary to what the docs claimed would happen.
We need two separate policies to achieve the documented effect.
While at it, try to explain what's happening a bit more fully.
Per report from Олег Самойлов. Back-patch to 9.5 where this was added.
Thanks to Stephen Frost for off-list discussion.
Affected test queries have been testing the wrong thing since their
introduction in commit 4c1383efd132e4f532213c8a8cc63a455f55e344.
Back-patch to 9.3 (all supported versions).
Document security implications of qualified names.
Commit 5770172cb0c9df9e6ce27c507b449557e5b45124 documented secure schema
usage, and that advice suffices for using unqualified names securely.
Document, in typeconv-func primarily, the additional issues that arise
with qualified names. Back-patch to 9.3 (all supported versions).
Bruce Momjian [Sat, 28 Jul 2018 19:01:55 +0000 (15:01 -0400)]
pg_upgrade: check for clean server shutdowns
Previously pg_upgrade checked for the pid file and started/stopped the
server to force a clean shutdown. However, "pg_ctl -m immediate"
removes the pid file but doesn't do a clean shutdown, so check
pg_controldata for a clean shutdown too.
Diagnosed-by: Vimalraj A
Discussion: https://postgr.es/m/CAFKBAK5e4Q-oTUuPPJ56EU_d2Rzodq6GWKS3ncAk3xo7hAsOZg@mail.gmail.com
Amit Kapila [Fri, 27 Jul 2018 05:26:07 +0000 (10:56 +0530)]
Fix the buffer release order for parallel index scans.
During parallel index scans, if the current page to be read is deleted, we
skip it and try to get the next page for a scan without releasing the buffer
lock on the current page. To get the next page, sometimes it needs to wait
for another process to complete its scan and advance it to the next page.
Now, it is quite possible that the master backend has errored out before
advancing the scan and issued a termination signal for all workers. The
workers failed to notice the termination request during wait because the
interrupts are held due to buffer lock on the previous page. This lead to
all workers being stuck.
The fix is to release the buffer lock on current page before trying to get
the next page. We are already doing same in backward scans, but missed
it for forward scans.
Reported-by: Victor Yegorov
Bug: 15290 Diagnosed-by: Thomas Munro and Amit Kapila
Author: Amit Kapila Reviewed-by: Thomas Munro Tested-By: Thomas Munro and Victor Yegorov
Backpatch-through: 10 where parallel index scans were introduced
Discussion:https://postgr.es/m/153228422922.1395.1746424054206154747@wrigleys.postgresql.org
Michael Paquier [Fri, 27 Jul 2018 04:43:36 +0000 (13:43 +0900)]
Fix handling of pgbench's hash when no argument is provided
Depending on the platform used, this can cause a crash in the worst
case, or an unhelpful error message, so fail gracefully.
Author: Fabien Coelho
Discussion: https://postgr.es/m/alpine.DEB.2.21.1807262302550.29874@lancre
Backpatch: 11-, where hash() has been added in pgbench.
Tom Lane [Thu, 26 Jul 2018 22:18:37 +0000 (18:18 -0400)]
Provide plpgsql tests for cases involving record field changes.
We suppressed one of these test cases in commit feb1cc559 because
it was failing to produce the expected results on CLOBBER_CACHE_ALWAYS
buildfarm members. But now we need another test with similar behavior,
so let's set up a test file that is expected to vary between regular and
CLOBBER_CACHE_ALWAYS cases, and provide variant expected files.
Someday we should fix plpgsql's failure for change-of-field-type, and
then the discrepancy will go away and we can fold these tests back
into plpgsql_record.sql. But today is not that day.
Tom Lane [Thu, 26 Jul 2018 20:08:45 +0000 (16:08 -0400)]
Avoid crash in eval_const_expressions if a Param's type changes.
Since commit 6719b238e it's been possible for the values of plpgsql
record field variables to be exposed to the planner as Params.
(Before that, plpgsql never supplied values for such variables during
planning, so that the problematic code wasn't reached.) Other places
that touch potentially-type-mutable Params either cope gracefully or
do runtime-test-and-ereport checks that the type is what they expect.
But eval_const_expressions() just had an Assert, meaning that it either
failed the assertion or risked crashes due to using an incompatible
value.
In this case, rather than throwing an ereport immediately, we can just
not perform a const-substitution in case of a mismatch. This seems
important for the same reason that the Param fetch was speculative:
we might not actually reach this part of the expression at runtime.
Test case will follow in a separate commit.
Patch by me, pursuant to bug report from Andrew Gierth.
Back-patch to v11 where the previous commit appeared.
Andres Freund [Wed, 25 Jul 2018 23:31:49 +0000 (16:31 -0700)]
LLVMJIT: Release JIT context after running ExprContext shutdown callbacks.
Due to inlining it previously was possible that an ExprContext's
shutdown callback pointed to a JITed function. As the JIT context
previously was shut down before the shutdown callbacks were called,
that could lead to segfaults. Fix the ordering.
Reported-By: Dmitry Dolgov
Author: Andres Freund
Discussion: https://postgr.es/m/CA+q6zcWO7CeAJtHBxgcHn_hj+PenM=tvG0RJ93X1uEJ86+76Ug@mail.gmail.com
Backpatch: 11-, where JIT compilation was added
Thomas Munro [Tue, 24 Jul 2018 22:58:44 +0000 (10:58 +1200)]
Pad semaphores to avoid false sharing.
In a USE_UNNAMED_SEMAPHORES build, the default on Linux and FreeBSD
since commit ecb0d20a, we have an array of sem_t objects. This
turned out to reduce performance compared to the previous default
USE_SYSV_SEMAPHORES on an 8 socket system. Testing showed that the
lost performance could be regained by padding the array elements so
that they have their own cache lines. This matches what we do for
similar hot arrays (see LWLockPadded, WALInsertLockPadded).
Back-patch to 10, where unnamed semaphores were adopted as the default
semaphore interface on those operating systems.
Author: Thomas Munro Reviewed-by: Andres Freund Reported-by: Mithun Cy Tested-by: Mithun Cy, Tom Lane, Thomas Munro
Discussion: https://postgr.es/m/CAD__OugYDM3O%2BdyZnnZSbJprSfsGFJcQ1R%3De59T3hcLmDug4_w%40mail.gmail.com
Andres Freund [Tue, 24 Jul 2018 17:51:06 +0000 (10:51 -0700)]
doc: Fix reference to "decoder" to instead be the correct "output plugin".
Author: Jonathan Katz
Discussion: https://postgr.es/m/DD02DD86-5989-4BFD-8712-468541F68383@postgresql.org
Backpatch: 9.4-, where logical decoding was added
Michael Paquier [Tue, 24 Jul 2018 01:33:07 +0000 (10:33 +0900)]
Fix calculation for WAL segment recycling and removal
Commit 4b0d28de06 has removed the prior checkpoint and related
facilities but has left WAL recycling based on the LSN of the prior
checkpoint, which causes incorrect calculations for WAL removal and
recycling for max_wal_size and min_wal_size. This commit changes things
so as the base calculation point is the last checkpoint generated.
Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/20180723.135748.42558387.horiguchi.kyotaro@lab.ntt.co.jp
Backpatch: 11-, where the prior checkpoint has been removed.
Andres Freund [Mon, 23 Jul 2018 04:13:20 +0000 (21:13 -0700)]
LLVMJIT: Adapt to API changes in gdb and perf support.
During the work of upstreaming my previous patches for gdb and perf
support the API changed. Adapt. Normally this wouldn't necessarily be
something to backpatch, but the previous API wasn't upstream, and at
least the gdb support is quite useful for debugging.
Author: Andres Freund
Backpatch: 11, where LLVM based JIT support was added.
Andres Freund [Sun, 22 Jul 2018 23:47:00 +0000 (16:47 -0700)]
Fix JITed EEOP_AGG_INIT_TRANS, which missed some state.
The JIT compiled implementation missed maintaining
AggState->{current_set,curaggcontext}. That could lead to trouble
because the transition value could be allocated in the wrong context.
Reported-By: Rushabh Lathia Diagnosed-By: Dmitry Dolgov
Author: Dmitry Dolgov, with minor changes by me
Discussion: https://postgr.es/m/CAGPqQf165-=+Drw3Voim7M5EjHT1zwPF9BQRjLFQzCzYnNZEiQ@mail.gmail.com
Backpatch: 11-, where JIT compilation support was added
Tom Lane [Sat, 21 Jul 2018 19:40:51 +0000 (15:40 -0400)]
Further portability hacking in pg_upgrade's test script.
I blew the dust off a Bourne shell (file date 1996, yea verily) and
tried to run test.sh with it. It mostly worked, but I found that the
temp-directory creation code introduced by commit be76a6d39 was not
compatible, for a couple of reasons: this shell thinks "set -e" should
force an exit if a command within backticks fails, and it also thinks code
within braces should be executed by a sub-shell, meaning that variable
settings don't propagate back up to the parent shell. In view of Victor
Wagner's report that Solaris is still using pre-POSIX shells, seems like
we oughta make this case work. It's not like the code is any less
idiomatic this way; the prior coding technique appeared nowhere else.
(There is a remaining bash-ism here, which is that $RANDOM doesn't do
what the code hopes in non-bash shells. But the use of $$ elsewhere in
that path should be enough to ensure uniqueness and some amount of
randomness, so I think it's okay as-is.)
Back-patch to all supported branches, as the previous commit was.
Tom Lane [Sat, 21 Jul 2018 16:05:25 +0000 (12:05 -0400)]
Be more paranoid about quoting in pg_upgrade's test script.
Double-quote $PGDATA in "find" commands introduced by commit da9b580d8,
in case that path contains spaces or other special characters.
Adjust a few other places so that quoting is done more consistently.
None of the others are actual bugs AFAICS, but it's confusing to readers
if the same thing is done differently in different places.
Tom Lane [Fri, 20 Jul 2018 17:59:27 +0000 (13:59 -0400)]
Avoid unportable shell syntax in pg_upgrade's test script.
Most of test.sh uses traditional backtick syntax for command substitution,
but commit da9b580d8 introduced two uses of $(...) syntax, which is not
recognized by very old shells. Bring those into line with the rest.
Dean Rasheed [Fri, 20 Jul 2018 07:57:08 +0000 (08:57 +0100)]
Guard against rare RAND_bytes() failures in pg_strong_random().
When built using OpenSSL, pg_strong_random() uses RAND_bytes() to
generate the random number. On very rare occasions that can fail, if
its PRNG has not been seeded with enough data. Additionally, once it
does fail, all subsequent calls will also fail until more seed data is
added. Since this is required during backend startup, this can result
in all new backends failing to start until a postmaster restart.
Guard against that by checking the state of OpenSSL's PRNG using
RAND_status(), and if necessary (very rarely), seeding it using
RAND_poll().
Back-patch to v10, where pg_strong_random() was introduced.
Fix handling of empty uncompressed posting list pages in GIN
PostgreSQL 9.4 introduces posting list compression in GIN. This feature
supports online upgrade, so that after pg_upgrade uncompressed posting
lists are compressed on-the-fly. Underlying code appears to always
expect at least one item on uncompressed posting list page. But there
could be completely empty pages, because VACUUM never deletes leftmost
and rightmost pages from posting trees. This commit fixes that.
Tom Lane [Thu, 19 Jul 2018 19:41:46 +0000 (15:41 -0400)]
Remove undocumented restriction against duplicate partition key columns.
transformPartitionSpec rejected duplicate simple partition columns
(e.g., "PARTITION BY RANGE (x,x)") but paid no attention to expression
columns, resulting in inconsistent behavior. Worse, cases like
"PARTITION BY RANGE (x,(x))") were accepted but would then result in
dump/reload failures, since the expression (x) would get simplified
to a plain column later.
There seems no better reason for this restriction than there was for
the one against duplicate included index columns (cf commit 701fd0bbc),
so let's just remove it.
Tom Lane [Thu, 19 Jul 2018 18:53:41 +0000 (14:53 -0400)]
Improve psql's \d command to show whether index columns are key columns.
This is essential information when looking at an index that has
"included" columns. Per discussion, follow the style used in \dC
and some other places: column header is "Key?" and values are "yes"
or "no" (all translatable).
While at it, revise describeOneTableDetails to be a bit more maintainable:
avoid hard-wired column numbers and multiple repetitions of what needs
to be identical test logic. This also results in the emitted catalog
query corresponding more closely to what we print, which should be a
benefit to users of ECHO_HIDDEN mode, and perhaps a bit faster too
(the old logic sometimes asked for values it would not print, even
ones that are fairly expensive to get).
Tom Lane [Thu, 19 Jul 2018 17:48:05 +0000 (13:48 -0400)]
Fix pg_get_indexdef()'s behavior for included index columns.
The multi-argument form of pg_get_indexdef() failed to print anything when
asked to print a single index column that is an included column rather than
a key column. This seems an unintentional result of someone having tried
to take a short-cut and use the attrsOnly flag for two different purposes.
To fix, split said flag into two flags, attrsOnly which suppresses
non-attribute info, and keysOnly which suppresses included columns.
Add a test case using psql's \d command, which relies on that function.
(It's mighty tempting at this point to replace pg_get_indexdef_worker's
mess of boolean flag arguments with a single bitmask-of-flags argument,
which would allow making the call sites much more self-documenting.
But I refrained for the moment.)
Rewrite comments in replication slot advance implementation
The code added by 9c7d06d60680 was a bit obscure; clarify that by
rewriting the comments. Lack of clarity has already caused bugs, so
it's a worthy goal.
I was confused by what "intended to be parallel serially" meant, until
Robert Haas and David G. Johnston explained it. Rephrase the comment to
make it more clear, using David's suggested wording.
Michael Paquier [Thu, 19 Jul 2018 00:55:02 +0000 (09:55 +0900)]
Fix print of Path nodes when using OPTIMIZER_DEBUG
GatherMergePath (introduced in 10) and CustomPath (introduced in 9.5)
have gone missing. The order of the Path nodes was inconsistent with
what is listed in nodes.h, so make the order consistent at the same time
to ease future checks and additions.
Author: Sawada Masahiko Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAD21AoBQMLoc=ohH-oocuAPsELrmk8_EsRJjOyR8FQLZkbE0wA@mail.gmail.com
Tom Lane [Wed, 18 Jul 2018 21:39:27 +0000 (17:39 -0400)]
Remove race-prone hot_standby_feedback test cases in 001_stream_rep.pl.
This script supposed that if it turned hot_standby_feedback on and then
shut down the standby server, at least one feedback message would be
guaranteed to be sent before the standby stops. But there is no such
guarantee, if the standby's walreceiver process is slow enough --- and
we've seen multiple failures in the buildfarm showing that that does
happen in practice. While we could rearrange the walreceiver logic to
make it less likely, it seems probably impossible to create a really
bulletproof guarantee of that sort; and if we tried, we might create
situations where the walreceiver wouldn't react in a timely manner to
shutdown commands. It seems better instead to remove the script's
assumption that feedback will occur before shutdown.
But once we do that, these last few tests seem quite redundant with
the earlier tests in the script. So let's just drop them altogether
and save some buildfarm cycles.
Tom Lane [Wed, 18 Jul 2018 18:43:03 +0000 (14:43 -0400)]
Drop the rule against included index columns duplicating key columns.
The initial version of the included-index-column feature stated that
included columns couldn't be the same as any key column of the index.
While it'd be pretty silly to do that, since the included column would be
entirely redundant, we've never prohibited redundant index columns before
so it's not very consistent to do so here. Moreover, the prohibition
was itself badly implemented, so that it failed to reject columns that
were effectively identical but not spelled quite alike, as reported by
Aditya Toshniwal.
(Moreover, it's not hard to imagine that for some non-btree index types,
such cases would be non-silly anyhow: the index might use a lossy
representation for key columns but be able to support retrieval of the
original form of included columns.)
Hence, let's just drop the prohibition.
In passing, do some copy-editing on the documentation for the
included-column feature.
Yugo Nagata; documentation and test corrections by me
doc: move PARTITION OF stanza to just below PARTITION BY
It's more logical this way, since the new ordering matches the way the
tables are created; but in any case, the previous location of PARTITION OF
did not appear carefully chosen anyway (since it didn't match the
location in which it appears in the synopsys either, which is what we
normally do.)
In the PARTITION BY stanza, add a link to the partitioning section in
the DDL chapter, too.
Suggested-by: David G. Johnston
Discussion: https://postgr.es/m/CAKFQuwY4Ld7ecxL_KAmaxwt0FUu5VcPPN2L4dh+3BeYbrdBa5g@mail.gmail.com
Fix ALTER TABLE...SET STATS error message for included columns
The existing error message was complaining that the column is not an
expression, which is not correct. Introduce a suitable wording
variation and a test.
The original code was unable to prune partitions that could not possibly
contain NULL values, when the query specified less than all columns in a
multicolumn partition key. Reorder the if-tests so that it is, and add
more commentary and regression tests.
Robert Haas [Mon, 16 Jul 2018 21:33:22 +0000 (17:33 -0400)]
Add subtransaction handling for table synchronization workers.
Since the old logic was completely unaware of subtransactions, a
change made in a subsequently-aborted subtransaction would still cause
workers to be stopped at toplevel transaction commit. Fix that by
managing a stack of worker lists rather than just one.
Peter Eisentraut [Mon, 16 Jul 2018 08:44:06 +0000 (10:44 +0200)]
doc: Update redirecting links
Update links that resulted in redirects. Most are changes from http to
https, but there are also some other minor edits. (There are still some
redirects where the target URL looks less elegant than the one we
currently have. I have left those as is.)
Tom Lane [Sat, 14 Jul 2018 15:59:12 +0000 (11:59 -0400)]
Fix hashjoin costing mistake introduced with inner_unique optimization.
In final_cost_hashjoin(), commit 9c7f5229a allowed inner_unique cases
to follow a code path previously used only for SEMI/ANTI joins; but it
neglected to fix an if-test within that path that assumed SEMI and ANTI
were the only possible cases. This resulted in a wrong value for
hashjointuples, and an ensuing bad cost estimate, for inner_unique normal
joins. Fortunately, for inner_unique normal joins we can assume the number
of joined tuples is the same as for a SEMI join; so there's no need for
more code, we just have to invert the test to check for ANTI not SEMI.
It turns out that in two contrib tests in which commit 9c7f5229a
changed the plan expected for a query, the change was actually wrong
and induced by this estimation error, not by any real improvement.
Hence this patch also reverts those changes.
Per report from RK Korlapati. Backpatch to v10 where the error was
introduced.
Tom Lane [Fri, 13 Jul 2018 22:45:30 +0000 (18:45 -0400)]
Fix crash in contrib/ltree's lca() function for empty input array.
lca_inner() wasn't prepared for the possibility of getting no inputs.
Fix that, and make some cosmetic improvements to the code while at it.
Also, I thought the documentation of this function as returning the
"longest common prefix" of the paths was entirely misleading; it really
returns a path one shorter than the longest common prefix, for the typical
definition of "prefix". Don't use that term in the docs, and adjust the
examples to clarify what really happens.
This has been broken since its beginning, so back-patch to all supported
branches.
Per report from Hailong Li. Thanks to Pierre Ducroquet for diagnosing
and for the initial patch, though I whacked it around some and added
test cases.
Peter Eisentraut [Fri, 13 Jul 2018 19:23:41 +0000 (21:23 +0200)]
Update documentation editor setup instructions
Now that the documentation sources are in XML rather than SGML, some of
the documentation about the editor, or more specifically Emacs, setup
needs updating. The updated instructions recommend using nxml-mode,
which works mostly out of the box, with some small tweaks in
emacs.samples and .dir-locals.el.
Also remove some obsolete stuff in .dir-locals.el. I did, however,
leave the sgml-mode settings in there so that someone using Emacs
without emacs.samples gets those settings when editing a *.sgml file.
Tom Lane [Fri, 13 Jul 2018 18:16:47 +0000 (14:16 -0400)]
Fix crash in json{b}_populate_recordset() and json{b}_to_recordset().
As of commit 37a795a60, populate_recordset_worker() tried to pass back
(as rsi.setDesc) a tupdesc that it also had cached in its fn_extra.
But the core executor would free the passed-back tupdesc, risking a
crash if the function were called again in the same query. The safest
and least invasive way to fix that is to make an extra tupdesc copy
to pass back.
While at it, I failed to resist the temptation to get rid of unnecessary
get_fn_expr_argtype() calls here and in populate_record_worker().
Per report from Dmitry Dolgov; thanks to Michael Paquier and
Andrew Gierth for investigation and discussion.
The patch that ended up as commit 3de241dba86f ("Foreign keys on
partitioned tables") lacked pg_dump tests, so the pg_dump code that was
there to support it inadvertently stopped working when in later
development I modified the backend code not to emit pg_trigger rows for
the partitioned table itself.
Bug analysis and code fix is by Michaël. I (Álvaro) added the test.
Tom Lane [Fri, 13 Jul 2018 15:52:50 +0000 (11:52 -0400)]
Fix inadequate buffer locking in FSM and VM page re-initialization.
When reading an existing FSM or VM page that was found to be corrupt by the
buffer manager, the code applied PageInit() to reinitialize the page, but
did so without any locking. There is thus a hazard that two backends might
concurrently do PageInit, which in itself would still be OK, but the slower
one might then zero over subsequent data changes applied by the faster one.
Even that is unlikely to be fatal; but it's not desirable, so add locking
to prevent it.
This does not add any locking overhead in the normal code path where the
page is OK. It's not immediately obvious that that's safe, but I believe
it is, for reasons explained in the added comments.
Problem noted by R P Asim. It's been like this for a long time, so
back-patch to all supported branches.
Prohibit transaction commands in security definer procedures
Starting and aborting transactions in security definer procedures
doesn't work. StartTransaction() insists that the security context
stack is empty, so this would currently cause a crash, and
AbortTransaction() resets it. This could be made to work by
reorganizing the code, but right now we just prohibit it.
Reported-by: amul sul <sulamul@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b96Gupt_LFL7uNyy3c50-wbhA68NUjiK5%3DrF6_w%3Dpq_T%3DQ%40mail.gmail.com
Peter Eisentraut [Thu, 12 Jul 2018 18:22:17 +0000 (20:22 +0200)]
Reset shmem_exit_inprogress after shmem_exit()
In ad9a274778d2d88c46b90309212b92ee7fdf9afe, shmem_exit_inprogress was
introduced. But we need to reset it after shmem_exit(), because unlike
the similar proc_exit(), shmem_exit() can also be called for cleanup
when the process will not exit.
Reported-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
Tom Lane [Thu, 12 Jul 2018 16:28:43 +0000 (12:28 -0400)]
Doc: minor improvement in pl/pgsql FETCH/MOVE documentation.
Explain that you can use any integer expression for the "count" in
pl/pgsql's versions of FETCH/MOVE, unlike the SQL versions which only
allow a constant.
Remove the duplicate version of this para under MOVE. I don't see
a good reason to maintain two identical paras when we just said that
MOVE works exactly like FETCH.
Fix FK checks of TRUNCATE involving partitioned tables
When truncating a table that is referenced by foreign keys in
partitioned tables, the check to ensure the referencing table are also
truncated spuriously failed. This is because it was relying on
relhastriggers as a proxy for the table having FKs, and that's wrong for
partitioned tables. Fix it to consider such tables separately. There
may be a better way ... but this code is pretty inefficient already.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Michael Paquiër <michael@paquier.xyz>
Discussion: https://postgr.es/m/20180711000624.zmeizicibxeehhsg@alvherre.pgsql
Tom Lane [Thu, 12 Jul 2018 15:10:24 +0000 (11:10 -0400)]
Doc: update documentation for requirement of ORDER BY in GROUPS mode.
Commit ff4f88916 adjusted the code to enforce the SQL spec's requirement
that a window using GROUPS mode must have an ORDER BY clause. But I missed
that the documentation explicitly said you didn't have to have one.
Also minor wordsmithing in the window-function section of select.sgml.
Per Masahiko Sawada, though I didn't use his patch.
Amit Kapila [Thu, 12 Jul 2018 06:47:27 +0000 (12:17 +0530)]
Allow using the updated tuple while moving it to a different partition.
An update that causes the tuple to be moved to a different partition was
missing out on re-constructing the to-be-updated tuple, based on the latest
tuple in the update chain. Instead, it's simply deleting the latest tuple
and inserting a new tuple in the new partition based on the old tuple.
Commit 2f17844104 didn't consider this case, so some of the updates were
getting lost.
In passing, change the argument order for output parameter in ExecDelete
and add some commentary about it.
Reported-by: Pavan Deolasee
Author: Amit Khandekar, with minor changes by me Reviewed-by: Dilip Kumar, Amit Kapila and Alvaro Herrera
Backpatch-through: 11
Discussion: https://postgr.es/m/CAJ3gD9fRbEzDqdeDq1jxqZUb47kJn+tQ7=Bcgjc8quqKsDViKQ@mail.gmail.com
Michael Paquier [Thu, 12 Jul 2018 01:19:51 +0000 (10:19 +0900)]
Make logical WAL sender report streaming state appropriately
WAL senders sending logically-decoded data fail to properly report in
"streaming" state when starting up, hence as long as one extra record is
not replayed, such WAL senders would remain in a "catchup" state, which
is inconsistent with the physical cousin.
This can be easily reproduced by for example using pg_recvlogical and
restarting the upstream server. The TAP tests have been slightly
modified to detect the failure and strengthened so as future tests also
make sure that a node is in streaming state when waiting for its
catchup.
Backpatch down to 9.4 where this code has been introduced.
Reported-by: Sawada Masahiko
Author: Simon Riggs, Sawada Masahiko Reviewed-by: Petr Jelinek, Michael Paquier, Vaishnavi Prabakaran
Discussion: https://postgr.es/m/CAD21AoB2ZbCCqOx=bgKMcLrAvs1V0ZMqzs7wBTuDySezTGtMZA@mail.gmail.com
Tom Lane [Wed, 11 Jul 2018 19:25:28 +0000 (15:25 -0400)]
Fix create_scan_plan's handling of sortgrouprefs for physical tlists.
We should only run apply_pathtarget_labeling_to_tlist if CP_LABEL_TLIST
was specified, because only in that case has use_physical_tlist checked
that the labeling will succeed; otherwise we may get an "ORDER/GROUP BY
expression not found in targetlist" error. (This subsumes the previous
test about gating_clauses, because we reset "flags" to zero earlier
if there are gating clauses to apply.)
The only known case in which a failure can occur is with a ProjectSet
path directly atop a table scan path, although it seems likely that there
are other cases or will be such in future. This means that the failure
is currently only visible in the v10 branch: 9.6 didn't have ProjectSet,
while in v11 and HEAD, apply_scanjoin_target_to_paths for some weird
reason is using create_projection_path not apply_projection_to_path,
masking the problem because there's a ProjectionPath in between.
Nonetheless this code is clearly wrong on its own terms, so back-patch
to 9.6 where this logic was introduced.
Tom Lane [Wed, 11 Jul 2018 16:07:21 +0000 (12:07 -0400)]
Fix bugs with degenerate window ORDER BY clauses in GROUPS/RANGE mode.
nodeWindowAgg.c failed to cope with the possibility that no ordering
columns are defined in the window frame for GROUPS mode or RANGE OFFSET
mode, leading to assertion failures or odd errors, as reported by Masahiko
Sawada and Lukas Eder. In RANGE OFFSET mode, an ordering column is really
required, so add an Assert about that. In GROUPS mode, the code would
work, except that the node initialization code wasn't in sync with the
execution code about when to set up tuplestore read pointers and spare
slots. Fix the latter for consistency's sake (even though I think the
changes described below make the out-of-sync cases unreachable for now).
Per SQL spec, a single ordering column is required for RANGE OFFSET mode,
and at least one ordering column is required for GROUPS mode. The parser
enforced the former but not the latter; add a check for that.
We were able to reach the no-ordering-column cases even with fully spec
compliant queries, though, because the planner would drop partitioning
and ordering columns from the generated plan if they were redundant with
earlier columns according to the redundant-pathkey logic, for instance
"PARTITION BY x ORDER BY y" in the presence of a "WHERE x=y" qual.
While in principle that's an optimization that could save some pointless
comparisons at runtime, it seems unlikely to be meaningful in the real
world. I think this behavior was not so much an intentional optimization
as a side-effect of an ancient decision to construct the plan node's
ordering-column info by reverse-engineering the PathKeys of the input
path. If we give up redundant-column removal then it takes very little
code to generate the plan node info directly from the WindowClause,
ensuring that we have the expected number of ordering columns in all
cases. (If anyone does complain about this, the planner could perhaps
be taught to remove redundant columns only when it's safe to do so,
ie *not* in RANGE OFFSET mode. But I doubt anyone ever will.)
With these changes, the WindowAggPath.winpathkeys field is not used for
anything anymore, so remove it.
The test cases added here are not actually very interesting given the
removal of the redundant-column-removal logic, but they would represent
important corner cases if anyone ever tries to put that back.
Tom Lane and Masahiko Sawada. Back-patch to v11 where RANGE OFFSET
and GROUPS modes were added.
Michael Paquier [Tue, 10 Jul 2018 23:57:18 +0000 (08:57 +0900)]
Block replication slot advance for these not yet reserving WAL
Such replication slots are physical slots freshly created without WAL
being reserved, which is the default behavior, which have not been used
yet as WAL consumption resources to retain WAL. This prevents advancing
a slot to a position older than any WAL available, which could falsify
calculations for WAL segment recycling.
This also cleans up a bit the code, as ReplicationSlotRelease() would be
called on ERROR, and improves error messages.
Reported-by: Kyotaro Horiguchi
Author: Michael Paquier Reviewed-by: Andres Freund, Álvaro Herrera, Kyotaro Horiguchi
Discussion: https://postgr.es/m/20180626071305.GH31353@paquier.xyz
We fail to handle polymorphic types properly when they are used as
partition keys: we were unnecessarily adding a RelabelType node on top,
which confuses code examining the nodes. In particular, this makes
predtest.c-based partition pruning not to work, and ruleutils.c to emit
expressions that are uglier than needed. Fix it by not adding RelabelType
when not needed.
In master/11 the new pruning code is separate so it doesn't suffer from
this problem, since we already fixed it (in essentially the same way) in e5dcbb88a15d, which also added a few tests; back-patch those tests to
pg10 also. But since UPDATE/DELETE still uses predtest.c in pg11, this
change improves partitioning for those cases too. Add tests for this.
The ruleutils.c behavior change is relevant in pg11/master too.
Co-authored-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/54745d13-7ed4-54ac-97d8-ea1eec95ae25@lab.ntt.co.jp
Tom Lane [Mon, 9 Jul 2018 23:26:19 +0000 (19:26 -0400)]
Avoid emitting a bogus WAL record when recycling an all-zero btree page.
Commit fafa374f2 caused _bt_getbuf() to possibly emit a WAL record for
a page that it was about to recycle. However, it failed to distinguish
all-zero pages from dead pages, which is important because only the
latter have valid btpo.xact values, or indeed any special space at all.
Recycling an all-zero page with XLogStandbyInfoActive() enabled therefore
led to an Assert failure, or to emission of a WAL record containing a
bogus cutoff XID, which might lead to unnecessary query cancellations
on hot standby servers.
Per reports from Antonin Houska and 自己. Amit Kapila was first to
propose this fix, and Robert Haas, myself, and Kyotaro Horiguchi
reviewed it at various times.
This is an old bug, so back-patch to all supported branches.
Tom Lane [Mon, 9 Jul 2018 18:28:04 +0000 (14:28 -0400)]
Fix yet more problems with incorrectly-constructed zero-length arrays.
Commit 716ea626a attempted to fix the problem of building 1-D zero-size
arrays once and for all. But it turns out that contrib/intarray has some
code that doesn't use construct_array() but just builds arrays by hand,
so it didn't get the memo. This appears to affect all of subarray(),
intset_subtract(), inner_int_union(), inner_int_inter(), and
intarray_concat_arrays().
Back-patch into v11. In the past we've not back-patched this type of
change, but since v11 is still in beta it seems all right to include
this fix in it. Besides it's more consistent to make the fix in v11
where 716ea626a appeared.
Report and patch by Alexey Kryuchkov, some cosmetic adjustments by me
Michael Paquier [Mon, 9 Jul 2018 01:25:40 +0000 (10:25 +0900)]
Rework order of end-of-recovery actions to delay timeline history write
A critical failure in some of the end-of-recovery actions before the
end-of-recovery record is written can cause PostgreSQL to react
inconsistently with the rest of the cluster in the event of a crash
before the final record is written. Two such failures are for example
an error while processing a two-phase state files or when operating on
recovery.conf. With this commit, the failures are still considered
FATAL, but the write of the timeline history file is delayed as much as
possible so as the window between the moment the file is written and the
end-of-recovery record is generated gets minimized. This way, in the
event of a crash or a failure, the new timeline decided at promotion
will not seem taken by other nodes in the cluster. It is not really
possible to reduce to zero this window, hence one could still see
failures if a crash happens between the history file write and the
end-of-recovery record, so any future code should be careful when
adding new end-of-recovery actions. The original report from Magnus
Hagander mentioned a renamed recovery.conf as original end-of-recovery
failure which caused a timeline to be seen as taken but the subsequent
processing on the now-missing recovery.conf cause the startup process to
issue stop on FATAL, which at follow-up startup made the system
inconsistent because of on-disk changes which already happened.
Processing of two-phase state files still needs some work as corrupted
entries are simply ignored now. This is left as a future item and this
commit fixes the original complain.
Reported-by: Magnus Hagander
Author: Heikki Linnakangas Reviewed-by: Alexander Korotkov, Michael Paquier, David Steele
Discussion: https://postgr.es/m/CABUevEz09XY2EevA2dLjPCY-C5UO4Hq=XxmXLmF6ipNFecbShQ@mail.gmail.com
Add separate error message for procedure does not exist
While we probably don't want to split up all error messages into
function and procedure variants, this one is a very prominent one, so
it's helpful to be more specific here.