Tom Lane [Sun, 23 Sep 2018 20:05:45 +0000 (16:05 -0400)]
Fix failure in WHERE CURRENT OF after rewinding the referenced cursor.
In a case where we have multiple relation-scan nodes in a cursor plan,
such as a scan of an inheritance tree, it's possible to fetch from a
given scan node, then rewind the cursor and fetch some row from an
earlier scan node. In such a case, execCurrent.c mistakenly thought
that the later scan node was still active, because ExecReScan hadn't
done anything to make it look not-active. We'd get some sort of
failure in the case of a SeqScan node, because the node's scan tuple
slot would be pointing at a HeapTuple whose t_self gets reset to
invalid by heapam.c. But it seems possible that for other relation
scan node types we'd actually return a valid tuple TID to the caller,
resulting in updating or deleting a tuple that shouldn't have been
considered current. To fix, forcibly clear the ScanTupleSlot in
ExecScanReScan.
Another issue here, which seems only latent at the moment but could
easily become a live bug in future, is that rewinding a cursor does
not necessarily lead to *immediately* applying ExecReScan to every
scan-level node in the plan tree. Upper-level nodes will think that
they can postpone that call if their child node is already marked
with chgParam flags. I don't see a way for that to happen today in
a plan tree that's simple enough for execCurrent.c's search_plan_tree
to understand, but that's one heck of a fragile assumption. So, add
some logic in search_plan_tree to detect chgParam flags being set on
nodes that it descended to/through, and assume that that means we
should consider lower scan nodes to be logically reset even if their
ReScan call hasn't actually happened yet.
Per bug #15395 from Matvey Arye. This has been broken for a long time,
so back-patch to all supported branches.
Replace CAS loop with single TAS in ProcArrayGroupClearXid()
Single pg_atomic_exchange_u32() is expected to be faster than loop of
pg_atomic_compare_exchange_u32(). Also, it would be consistent with
clog group update code.
Discussion: https://postgr.es/m/CAPpHfdtxLsC-bqfxFcHswZ91OxXcZVNDBBVfg9tAWU0jvn1tQA%40mail.gmail.com Reviewed-by: Amit Kapila
Michael Paquier [Sat, 22 Sep 2018 06:23:59 +0000 (15:23 +0900)]
Make GUC wal_sender_timeout user-settable
Being able to use a value that can be changed on a connection basis is
useful with clusters distributed geographically, and makes failure
detection more flexible. A note is added in the documentation about the
use of "options" in primary_conninfo, which can be hard to grasp for
newcomers with the need of two single quotes when listing a set of
parameters.
Tom Lane [Sat, 22 Sep 2018 00:50:18 +0000 (20:50 -0400)]
Get rid of explicit argument-count markings in tab-complete.c.
This replaces the "TailMatchesN" macros with just "TailMatches",
and likewise "HeadMatchesN" becomes "HeadMatches" and "MatchesN"
becomes "Matches". The various COMPLETE_WITH_LISTn macros are
reduced to COMPLETE_WITH, and the single-item COMPLETE_WITH_CONST
also gets folded into that. This eliminates a lot of minor
annoyance in writing tab-completion rules. Usefully, the compiled
code also gets a bit smaller (10% or so, on my machine).
The implementation depends on variadic macros, so we couldn't have
done this before we required C99.
Andres Freund and Thomas Munro; some cosmetic cleanup by me.
Bruce Momjian [Sat, 22 Sep 2018 00:28:55 +0000 (20:28 -0400)]
doc: JIT is enabled by default in PG 12
JIT was disabled by default in a PG 11 in a separate commit that will
normally not appear in the PG 12 git logs. Therefore, create a PG 12
document and mention the fact that JIT is enabled by default in this
release. (A similar change in parallelism was missed in a prior
release.)
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20180922000554.qukbhhlagpnopvko@alap3.anarazel.de
Tom Lane [Fri, 21 Sep 2018 19:58:37 +0000 (15:58 -0400)]
Fix bogus tab-completion rule for CREATE PUBLICATION.
You can't use "FOR TABLE" as a single Matches argument, because readline
will consider that input to be two words not one. It's necessary to make
the pattern contain two arguments.
The case accidentally worked anyway because the words_after_create
test fired ... but only for the first such table name.
Noted by Edmund Horner, though this isn't exactly his proposed fix.
Backpatch to v10 where the faulty code came in.
Tom Lane [Fri, 21 Sep 2018 19:22:26 +0000 (15:22 -0400)]
Improve tab completion for ANALYZE, EXPLAIN, and VACUUM.
Previously, we made no attempt to provide tab completion in these
statements' optional parenthesized options lists. This patch teaches
psql to do so.
To prevent the option completions from being offered after we've already
seen a complete parenthesized option list, it's necessary to improve
word_matches() so that it allows a wildcard '*' in the middle of an
alternative, not only at the end as formerly. That requires only a
little more code than before, and it allows us to test for "incomplete
parenthesized options" with a test like
else if (HeadMatches2("EXPLAIN", "(*") &&
!HeadMatches2("EXPLAIN", "(*)"))
In addition, add some logic to offer column names in the context of
"ANALYZE tablename ( ...", and likewise for VACUUM. This isn't real
complete; it won't offer column names again after a comma. But it's
better than before, and it doesn't take much code.
Justin Pryzby, reviewed at various times by Álvaro Herrera, Arthur
Zakirov, and Edmund Horner; some additional fixups by me
Tom Lane [Fri, 21 Sep 2018 16:41:00 +0000 (12:41 -0400)]
Rationalize Query_for_list_of_[relations] query names in tab-complete.c.
The previous convention was to use names based on the set of relkinds being
selected for, which was not at all helpful for maintenance, especially
since people had been quite inconsistent about whether to change the names
when they changed the relkinds being selected for. Instead, use names
based on the functionality we need the relation to have, following the
model established by Query_for_list_of_updatables.
While at it, sort the list of Query constants a bit better; it had the
distinct air of code-assembled-by-dartboard before.
Thomas Munro [Fri, 21 Sep 2018 12:40:13 +0000 (00:40 +1200)]
Use size_t consistently in dsa.{ch}.
Takeshi Ideriha complained that there is a mixture of Size and size_t
in dsa.c and corresponding header. Let's use size_t. Back-patch to 10
where dsa.c landed, to make future back-patching easy.
Andres Freund [Thu, 13 Sep 2018 21:18:43 +0000 (14:18 -0700)]
Error out for clang on x86-32 without SSE2 support, no -fexcess-precision.
As clang currently doesn't support -fexcess-precision=standard,
compiling x86-32 code with SSE2 disabled, can lead to problems with
floating point overflow checks and the like.
This issue was noticed because clang, on at least some BSDs, defaults
to i386 compatibility, whereas it defaults to pentium4 on Linux. Our
forced usage of __builtin_isinf() lead to some overflow checks not
triggering when compiling for i386, e.g. when the result of the
calculation didn't overflow in 80bit registers, but did so in 64bit.
While we could just fall back to a non-builtin isinf, it seems likely
that the use of 80bit registers leads to other problems (which is why
we force the flag for GCC already). Therefore error out when
detecting clang in that situation.
Reported-By: Victor Wagner Analyzed-By: Andrew Gierth and Andres Freund
Author: Andres Freund
Discussion: https://postgr.es/m/20180905005130.ewk4xcs5dgyzcy45@alap3.anarazel.de
Backpatch: 9.3-, all supported versions are affected
Tom Lane [Thu, 20 Sep 2018 21:21:14 +0000 (17:21 -0400)]
Fix psql's tab completion for TABLE.
This should offer the same relation types that SELECT ... FROM would.
You can't select from an index for instance, so offering it here is
unhelpful. Noted while testing ilmari's recent patch.
Tom Lane [Thu, 20 Sep 2018 21:16:06 +0000 (17:16 -0400)]
Fix psql's tab completion for ALTER DATABASE ... SET TABLESPACE.
We have the infrastructure to offer a list of tablespace names, but
it wasn't being used here; instead you got "FROM", "CURRENT", and "TO"
which aren't actually legal in this syntax.
Dagfinn Ilmari Mannsåker, reviewed by Arthur Zakirov
Tom Lane [Thu, 20 Sep 2018 20:06:18 +0000 (16:06 -0400)]
Add missing pg_description strings for pg_type entries.
I noticed that all non-composite, non-array entries in pg_type.dat
had descr strings, except for "json" and the pseudo-types. The
lack for json seems certainly an oversight, and there's surely
little reason to not have entries for the pseudo-types either.
So add some.
"make reformat-dat-files" turned up some formatting issues in
pg_amop.dat, too, so fix those in passing.
No catversion bump since the backend doesn't care too much what is
in pg_description.
Tom Lane [Thu, 20 Sep 2018 19:14:46 +0000 (15:14 -0400)]
Teach genbki.pl to auto-generate pg_type entries for array types.
This eliminates some more tedium in adding new catalog entries,
specifically the need to set up an array type when adding a new
built-in data type. Now it's sufficient to assign an OID for the
array type and write it in an "array_type_oid" metadata field.
You don't have to fill the base type's typarray link explicitly, either.
No catversion bump since the contents of pg_type aren't changed.
(Well, their order might be different, but that doesn't matter.)
John Naylor, reviewed and whacked around a bit by
Dagfinn Ilmari Mannsåker, and some more by me.
Fix handling of format string text characters in to_timestamp()/to_date()
cf984672 introduced improvement of handling of spaces and separators in
to_timestamp()/to_date() functions. In particular, now we're skipping spaces
both before and after fields. That may cause format string text character to
consume part of field in the situations, when it didn't happen before cf984672.
This commit cause format string text character consume input string characters
only when since previous field (or string beginning) number of skipped input
string characters is not greater than number of corresponding format string
characters (that is we didn't skip any extra characters in input string).
Thomas Munro [Thu, 20 Sep 2018 03:52:39 +0000 (15:52 +1200)]
Fix segment_bins corruption in dsa.c.
If a segment has been freed by dsa.c because it is entirely empty, other
backends must make sure to unmap it before following links to new
segments that might happen to have the same index number, or they could
finish up looking at a defunct segment and then corrupt the segment_bins
lists. The correct protocol requires checking freed_segment_counter
after acquiring the area lock and before resolving any index number to a
segment. Add the missing checks and an assertion.
Back-patch to 10, where dsa.c first arrived.
Author: Thomas Munro Reported-by: Tomas Vondra
Discussion: https://postgr.es/m/CAEepm%3D0thg%2Bja5zGVa7jBy-uqyHrTqTm8HGhEOtMmigGrAqTbw%40mail.gmail.com
Thomas Munro [Thu, 20 Sep 2018 02:02:40 +0000 (14:02 +1200)]
Defer restoration of libraries in parallel workers.
Several users of extensions complained of crashes in parallel workers
that turned out to be due to syscache access from their _PG_init()
functions. Reorder the initialization of parallel workers so that
libraries are restored after the caches are initialized, and inside a
transaction.
This was reported in bug #15350 and elsewhere. We don't consider it
to be a bug: extensions shouldn't do that, because then they can't be
used in shared_preload_libraries. However, it's a fairly obscure
hazard and these extensions worked in practice before parallel query
came along. So let's make it work. Later commits might add a warning
message and eventually an error.
Back-patch to 9.6, where parallel query landed.
Author: Thomas Munro Reviewed-by: Amit Kapila Reported-by: Kieran McCusker, Jimmy
Discussion: https://postgr.es/m/153512195228.1489.8545997741965926448%40wrigleys.postgresql.org
Michael Paquier [Wed, 19 Sep 2018 23:54:37 +0000 (08:54 +0900)]
Enforce translation mode for Windows frontends to text with open/fopen
Allowing frontends to use concurrent-safe open() and fopen() via 0ba06e0
has the side-effect of switching the default translation mode from text
to binary, so the switch can cause breakages for frontend tools when the
caller of those new versions specifies neither binary and text. This
commit makes sure to maintain strict compatibility with past versions,
so as no frontends should see a difference when upgrading.
Author: Laurenz Albe Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20180917140202.GF31460@paquier.xyz
Tom Lane [Wed, 19 Sep 2018 16:43:51 +0000 (12:43 -0400)]
Don't ignore locktable-full failures in StandbyAcquireAccessExclusiveLock.
Commit 37c54863c removed the code in StandbyAcquireAccessExclusiveLock
that checked the return value of LockAcquireExtended. That created a
bug, because it's still passing reportMemoryError = false to
LockAcquireExtended, meaning that LOCKACQUIRE_NOT_AVAIL will be returned
if we're out of shared memory for the lock table.
In such a situation, the startup process would believe it had acquired an
exclusive lock even though it hadn't, with potentially dire consequences.
To fix, just drop the use of reportMemoryError = false, which allows us
to simplify the call into a plain LockAcquire(). It's unclear that the
locktable-full situation arises often enough that it's worth having a
better recovery method than crash-and-restart. (I strongly suspect that
the only reason the code path existed at all was that it was relatively
simple to do in the pre-37c54863c implementation. But now it's not.)
LockAcquireExtended's reportMemoryError parameter is now dead code and
could be removed. I refrained from doing so, however, because there
was some interest in resurrecting the behavior if we do get reports of
locktable-full failures in the field. Also, it seems unwise to remove
the parameter concurrently with shipping commit f868a8143, which added a
parameter; if there are any third-party callers of LockAcquireExtended,
we want them to get a wrong-number-of-parameters compile error rather
than a possibly-silent misinterpretation of its last parameter.
Add support for nearest-neighbor (KNN) searches to SP-GiST
Currently, KNN searches were supported only by GiST. SP-GiST also capable to
support them. This commit implements that support. SP-GiST scan stack is
replaced with queue, which serves as stack if no ordering is specified. KNN
support is provided for three SP-GIST opclasses: quad_point_ops, kd_point_ops
and poly_ops (catversion is bumped). Some common parts between GiST and SP-GiST
KNNs are extracted into separate functions.
Discussion: https://postgr.es/m/570825e8-47d0-4732-2bf6-88d67d2d51c8%40postgrespro.ru
Author: Nikita Glukhov, Alexander Korotkov based on GSoC work by Vlad Sterzhanov
Review: Andrey Borodin, Alexander Korotkov
Tom Lane [Tue, 18 Sep 2018 21:11:54 +0000 (17:11 -0400)]
Add a debugging option to stress-test outfuncs.c and readfuncs.c.
In the normal course of operation, query trees will be serialized only if
they are stored as views or rules; and plan trees will be serialized only
if they get passed to parallel-query workers. This leaves an awful lot of
opportunity for bugs/oversights to not get detected, as indeed we've just
been reminded of the hard way.
To improve matters, this patch adds a new compile option
WRITE_READ_PARSE_PLAN_TREES, which is modeled on the longstanding option
COPY_PARSE_PLAN_TREES; but instead of passing all parse and plan trees
through copyObject, it passes them through nodeToString + stringToNode.
Enabling this option in a buildfarm animal or two will catch problems
at least for cases that are exercised by the regression tests.
A small problem with this idea is that readfuncs.c historically has
discarded location fields, on the reasonable grounds that parse
locations in a retrieved view are not relevant to the current query.
But doing that in WRITE_READ_PARSE_PLAN_TREES breaks pg_stat_statements,
and it could cause problems for future improvements that might try to
report error locations at runtime. To fix that, provide a variant
behavior in readfuncs.c that makes it restore location fields when
told to.
In passing, const-ify the string arguments of stringToNode and its
subsidiary functions, just because it annoyed me that they weren't
const already.
Tom Lane [Tue, 18 Sep 2018 19:08:28 +0000 (15:08 -0400)]
Fix some minor issues exposed by outfuncs/readfuncs testing.
A test patch to pass parse and plan trees through outfuncs + readfuncs
exposed several issues that need to be fixed to get clean matches:
Query.withCheckOptions failed to get copied; it's intentionally ignored
by outfuncs/readfuncs on the grounds that it'd always be NIL anyway in
stored rules. This seems less than future-proof, and it's not even
saving very much, so just undo the decision and treat the field like
all others.
Several places that convert a view RTE into a subquery RTE, or similar
manipulations, failed to clear out fields that were specific to the
original RTE type and should be zero in a subquery RTE. Since readfuncs.c
will leave such fields as zero, equalfuncs.c thinks the nodes are different
leading to a reported mismatch. It seems like a good idea to clear out the
no-longer-needed fields, even though in principle nothing should look at
them; the node ought to be indistinguishable from how it would look if
we'd built a new node instead of scribbling on the old one.
BuildOnConflictExcludedTargetlist randomly set the resname of some
TargetEntries to "" not NULL. outfuncs/readfuncs don't distinguish those
cases, and so the string will read back in as NULL ... but equalfuncs.c
does distinguish. Perhaps we ought to try to make things more consistent
in this area --- but it's just useless extra code space for
BuildOnConflictExcludedTargetlist to not use NULL here, so I fixed it for
now by making it do that.
catversion bumped because the change in handling of Query.withCheckOptions
affects stored rules.
Tom Lane [Tue, 18 Sep 2018 17:02:27 +0000 (13:02 -0400)]
Fix some probably-minor oversights in readfuncs.c.
The system expects TABLEFUNC RTEs to have coltypes, coltypmods, and
colcollations lists, but outfuncs doesn't dump them and readfuncs doesn't
restore them. This doesn't cause obvious failures, because the only things
that look at those fields are expandRTE() and get_rte_attribute_type(),
which are mostly used during parse analysis, before anything would've
passed the parsetree through outfuncs/readfuncs. But expandRTE() is used
in build_physical_tlist(), which means that that function will return a
wrong answer for a TABLEFUNC RTE that came from a view. Very accidentally,
this doesn't cause serious problems, because what it will return is NIL
which callers will interpret as "couldn't build a physical tlist because
of dropped columns". So you still get a plan that works, though it's
marginally less efficient than it could be. There are also some other
expandRTE() calls associated with transformation of whole-row Vars in
the planner. I have been unable to exhibit misbehavior from that, and
it may be unreachable in any case that anyone would care about ... but
I'm not entirely convinced, so this seems like something we should back-
patch a fix for. Fortunately, we can fix it without forcing a change
of stored rules and a catversion bump, because we can just copy these
lists from the subsidiary TableFunc object.
readfuncs.c was also missing support for NamedTuplestoreScan plan nodes.
This accidentally fails to break parallel query because a query using
a named tuplestore would never be considered parallel-safe anyway.
However, project policy since parallel query came in is that all plan
node types should have outfuncs/readfuncs support, so this is clearly
an oversight that should be repaired.
Noted while fooling around with a patch to test outfuncs/readfuncs more
thoroughly. That exposed some other issues too, but these are the only
ones that seem worth back-patching.
Back-patch to v10 where both of these features came in.
Thomas Munro [Tue, 18 Sep 2018 10:56:36 +0000 (22:56 +1200)]
Allow DSM allocation to be interrupted.
Chris Travers reported that the startup process can repeatedly try to
cancel a backend that is in a posix_fallocate()/EINTR loop and cause it
to loop forever. Teach the retry loop to give up if an interrupt is
pending. Don't actually check for interrupts in that loop though,
because a non-local exit would skip some clean-up code in the caller.
Back-patch to 9.4 where DSM was added (and posix_fallocate() was later
back-patched).
Author: Chris Travers Reviewed-by: Ildar Musin, Murat Kabilov, Oleksii Kliukin Tested-by: Oleksii Kliukin
Discussion: https://postgr.es/m/CAN-RpxB-oeZve_J3SM_6%3DHXPmvEG%3DHX%2B9V9pi8g2YR7YW0rBBg%40mail.gmail.com
Michael Paquier [Tue, 18 Sep 2018 03:00:18 +0000 (12:00 +0900)]
Refactor routines for subscription and publication lookups
Those routines gain a missing_ok argument, allowing a caller to get a
NULL result instead of an error if set to true. This is part of a
larger refactoring effort for objectaddress.c where trying to check for
non-existing objects does not result in cache lookup failures.
Author: Michael Paquier Reviewed-by: Aleksander Alekseev, Álvaro Herrera
Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
Tom Lane [Mon, 17 Sep 2018 17:16:32 +0000 (13:16 -0400)]
Fix parsetree representation of XMLTABLE(XMLNAMESPACES(DEFAULT ...)).
The original coding for XMLTABLE thought it could represent a default
namespace by a T_String Value node with a null string pointer. That's
not okay, though; in particular outfuncs.c/readfuncs.c are not on board
with such a representation, meaning you'll get a null pointer crash
if you try to store a view or rule containing this construct.
To fix, change the parsetree representation so that we have a NULL
list element, instead of a bogus Value node.
This isn't really a functional limitation since default XML namespaces
aren't yet implemented in the executor; you'd just get "DEFAULT
namespace is not supported" anyway. But crashes are not nice, so
back-patch to v10 where this syntax was added. Ordinarily we'd consider
a parsetree representation change to be un-backpatchable; but since
existing releases would crash on the way to storing such constructs,
there can't be any existing views/rules to be incompatible with.
Tom Lane [Mon, 17 Sep 2018 16:43:07 +0000 (12:43 -0400)]
Remove dead code from pop_next_work_item().
The pref_non_data heuristic has been dead code for nearly ten years,
and as far as I can tell was dead code even when it was first committed.
I'm tired of silencing Coverity complaints about it, so get rid of it.
If anyone is ever interested in pursuing the concept, they can get the
code out of our git history.
Tom Lane [Mon, 17 Sep 2018 16:11:43 +0000 (12:11 -0400)]
Fix pgbench lexer's "continuation" rule to cope with Windows newlines.
Our general practice in frontend code is to accept input with either
Unix-style newlines (\n) or DOS-style (\r\n). pgbench was mostly down
with that, but its rule for line continuations (backslash-newline) was
not. This had been masked on Windows buildfarm machines before commit 0ba06e0bf by use of Windows text mode to read files. We could have fixed
it by forcing text mode again, but it's better to fix the parsing code
so that Windows-style text files on Unix systems don't cause problems.
Back-patch to v10 where pgbench grew line continuations.
Andrew Gierth [Sun, 16 Sep 2018 17:46:45 +0000 (18:46 +0100)]
Fix out-of-tree build for transform modules.
Neither plperl nor plpython installed sufficient header files to
permit transform modules to be built out-of-tree using PGXS. Fix that
by installing all plperl and plpython header files (other than those
with special purposes such as generated data tables), and also install
plpython's special .mk file for mangling regression tests.
(This commit does not fix the windows install, which does not
currently install _any_ plperl or plpython headers.)
Also fix the existing transform modules for hstore and ltree so that
their cross-module #include directives work as anticipated by commit df163230b9 et seq. This allows them to serve as working examples of
how to reference other modules when doing separate out-of-tree builds.
Tom Lane [Sun, 16 Sep 2018 17:02:47 +0000 (13:02 -0400)]
Add outfuncs.c support for RawStmt nodes.
I noticed while poking at a report from Andrey Lepikhov that the
recent addition of RawStmt nodes at the top of raw parse trees
makes it impossible to print any raw parse trees whatsoever,
because outfuncs.c doesn't know RawStmt and hence fails to descend
into it.
While we generally lack outfuncs.c support for utility statements,
there is reasonably complete support for what you can find in a
raw SELECT statement. It was not my intention to make that all
dead code ... so let's add support for RawStmt.
Tom Lane [Sat, 15 Sep 2018 21:24:35 +0000 (17:24 -0400)]
In v11, disable JIT by default (it's still enabled by default in HEAD).
Per discussion, JIT isn't quite mature enough to ship enabled-by-default.
I failed to resist the temptation to do a bunch of copy-editing on the
related documentation. Also, clean up some inconsistencies in which
section of config.sgml the JIT GUCs are documented in vs. what guc.c
and postgresql.config.sample had.
Tom Lane [Sat, 15 Sep 2018 17:42:33 +0000 (13:42 -0400)]
Fix failure with initplans used conditionally during EvalPlanQual rechecks.
The EvalPlanQual machinery assumes that any initplans (that is,
uncorrelated sub-selects) used during an EPQ recheck would have already
been evaluated during the main query; this is implicit in the fact that
execPlan pointers are not copied into the EPQ estate's es_param_exec_vals.
But it's possible for that assumption to fail, if the initplan is only
reached conditionally. For example, a sub-select inside a CASE expression
could be reached during a recheck when it had not been previously, if the
CASE test depends on a column that was just updated.
This bug is old, appearing to date back to my rewrite of EvalPlanQual in
commit 9f2ee8f28, but was not detected until Kyle Samson reported a case.
To fix, force all not-yet-evaluated initplans used within the EPQ plan
subtree to be evaluated at the start of the recheck, before entering the
EPQ environment. This could be inefficient, if such an initplan is
expensive and goes unused again during the recheck --- but that's piling
one layer of improbability atop another. It doesn't seem worth adding
more complexity to prevent that, at least not in the back branches.
It was convenient to use the new-in-v11 ExecEvalParamExecParams function
to implement this, but I didn't like either its name or the specifics of
its API, so revise that.
Back-patch all the way. Rather than rewrite the patch to avoid depending
on bms_next_member() in the oldest branches, I chose to back-patch that
function into 9.4 and 9.3. (This isn't the first time back-patches have
needed that, and it exhausted my patience.) I also chose to back-patch
some test cases added by commits 71404af2a and 342a1ffa2 into 9.4 and 9.3,
so that the 9.x versions of eval-plan-qual.spec are all the same.
Andrew Gierth diagnosed the problem and contributed the added test cases,
though the actual code changes are by me.
Tom Lane [Fri, 14 Sep 2018 21:31:51 +0000 (17:31 -0400)]
Improve parallel scheduling logic in pg_dump/pg_restore.
Previously, the way this worked was that a parallel pg_dump would
re-order the TABLE_DATA items in the dump's TOC into decreasing size
order, and separately re-order (some of) the INDEX items into decreasing
size order. Then pg_dump would dump the items in that order. Later,
parallel pg_restore just followed the TOC order. This method had lots
of deficiencies:
* TOC ordering randomly differed between parallel and non-parallel
dumps, and was hard to predict in the former case, causing problems
for building stable pg_dump test cases.
* Parallel restore only followed a well-chosen order if the dump had
been done in parallel; in particular, this never happened for restore
from custom-format dumps.
* The best order for restore isn't necessarily the same as for dump,
and it's not really static either because of locking considerations.
* TABLE_DATA and INDEX items aren't the only things that might take a lot
of work during restore. Scheduling was particularly stupid for the BLOBS
item, which might require lots of work during dump as well as restore,
but was left to the end in either case.
This patch removes the logic that changed the TOC order, fixing the
test instability problem. Instead, we sort the parallelizable items
just before processing them during a parallel dump. Independently
of that, parallel restore prioritizes the ready-to-execute tasks
based on the size of the underlying table. In the case of dependent
tasks such as index, constraint, or foreign key creation, the largest
relevant table is used as the metric for estimating the task length.
(This is pretty crude, but it should be enough to avoid the case we
want to avoid, which is ending the run with just a few large tasks
such that we can't make use of all N workers.)
Patch by me, responding to a complaint from Peter Eisentraut,
who also reviewed the patch.
Fix ALTER/TYPE on columns referenced by FKs in partitioned tables
When ALTER TABLE ... SET DATA TYPE affects a column referenced by
constraints and indexes, it drop those constraints and indexes and
recreates them afterwards, so that the definitions match the new data
type. The original code did this by dropping one object at a time
(commit 077db40fa1f3 of May 2004), which worked fine because the
dependencies between the objects were pretty straightforward, and
ordering the objects in a specific way was enough to make this work.
However, when there are foreign key constraints in partitioned tables,
the dependencies are no longer so straightforward, and we were getting
errors when attempted:
ERROR: cache lookup failed for constraint 16398
This can be fixed by doing all the drops in one pass instead, using
performMultipleDeletions (introduced by df18c51f2955 of Aug 2006). With
this change we can also remove the code to carefully order the list of
objects to be deleted.
Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAKcux6nWS_m+s=1Udk_U9B+QY7pA-Ac58qR5BdUfOyrwnWHDew@mail.gmail.com
Andrew Gierth [Fri, 14 Sep 2018 16:35:42 +0000 (17:35 +0100)]
Order active window clauses for greater reuse of Sort nodes.
By sorting the active window list lexicographically by the sort clause
list but putting longer clauses before shorter prefixes, we generate
more chances to elide Sort nodes when building the path.
Author: Daniel Gustafsson (with some editorialization by me) Reviewed-by: Alexander Kuzmenkov, Masahiko Sawada, Tom Lane
Discussion: https://postgr.es/m/124A7F69-84CD-435B-BA0E-2695BE21E5C2%40yesql.se
Amit Kapila [Fri, 14 Sep 2018 04:06:30 +0000 (09:36 +0530)]
Don't allow LIMIT/OFFSET clause within sub-selects to be pushed to workers.
Allowing sub-select containing LIMIT/OFFSET in workers can lead to
inconsistent results at the top-level as there is no guarantee that the
row order will be fully deterministic. The fix is to prohibit pushing
LIMIT/OFFSET within sub-selects to workers.
Reported-by: Andrew Fletcher
Bug: 15324
Author: Amit Kapila Reviewed-by: Dilip Kumar
Backpatch-through: 9.6
Discussion: https://postgr.es/m/153417684333.10284.11356259990921828616@wrigleys.postgresql.org
Michael Paquier [Fri, 14 Sep 2018 01:04:14 +0000 (10:04 +0900)]
Allow concurrent-safe open() and fopen() in frontend code for Windows
PostgreSQL uses a custom wrapper for open() and fopen() which is
concurrent-safe, allowing multiple processes to open and work on the
same file. This has a couple of advantages:
- pg_test_fsync does not handle O_DSYNC correctly otherwise, leading to
false claims that disks are unsafe.
- TAP tests can run into race conditions when a postmaster and pg_ctl
open postmaster.pid, fixing some random failures in the buildfam.
pg_upgrade is one frontend tool using workarounds to bypass file locking
issues with the log files it generates, however the interactions with
pg_ctl are proving to be tedious to get rid of, so this is left for
later.
Michael Paquier [Thu, 13 Sep 2018 22:35:39 +0000 (07:35 +0900)]
Improve autovacuum logging for aggressive and anti-wraparound runs
A log message was being generated when log_min_duration is reached for
autovacuum on a given relation to indicate if it was an aggressive run,
and missed the point of mentioning if it is doing an anti-wrapround
run. The log message generated is improved so as one, both or no extra
details are added depending on the option set.
Author: Sergei Kornilov Reviewed-by: Masahiko Sawada, Michael Paquier
Discussion: https://postgr.es/m/11587951532155118@sas1-19a94364928d.qloud-c.yandex.net
Andres Freund [Thu, 13 Sep 2018 17:41:39 +0000 (10:41 -0700)]
Detect LLVM 7 without specifying binaries explicitly.
Before this commit LLVM 7 was supported, but only if one explicitly
provided LLVM_CONFIG= and CLANG= paths. As LLVM 7 is the first
version that includes our upstreamed debugging and profiling features,
and as debian is planning to default to 7 due to wider architecture
support, it seems good to support auto-detecting that version.
Author: Christoph Berg
Discussion: https://postgr.es/m/20180912124517.GD24584@msg.df7cb.de
Backpatch: 11, where LLVM was introduced
Tom Lane [Thu, 13 Sep 2018 16:36:21 +0000 (12:36 -0400)]
Attempt to identify system timezone by reading /etc/localtime symlink.
On many modern platforms, /etc/localtime is a symlink to a file within the
IANA database. Reading the symlink lets us find out the name of the system
timezone directly, without going through the brute-force search embodied in
scan_available_timezones(). This shortens the runtime of initdb by some
tens of ms, which is helpful for the buildfarm, and it also allows us to
reliably select the same zone name the system was actually configured for,
rather than possibly choosing one of IANA's many zone aliases. (For
example, in a system configured for "Asia/Tokyo", the brute-force search
would not choose that name but its alias "Japan", on the grounds of the
latter string being shorter. More surprisingly, "Navajo" is preferred
to either "America/Denver" or "US/Mountain", as seen in an old complaint
from Josh Berkus.)
If /etc/localtime doesn't exist, or isn't a symlink, or we can't make
sense of its contents, or the contents match a zone we know but that
zone doesn't match the observed behavior of localtime(), fall back to
the brute-force search.
Also, tweak initdb so that it prints the zone name it selected.
In passing, replace the last few references to the "Olson" database in
code comments with "IANA", as that's been our preferred term since
commit b2cbced9e.
Patch by me, per a suggestion from Robert Haas; review by Michael Paquier
Amit Kapila [Thu, 13 Sep 2018 10:00:26 +0000 (15:30 +0530)]
Attach FPI to the first record after full_page_writes is turned on.
XLogInsert fails to attach a required FPI to the first record after
full_page_writes is turned on by the last checkpoint. This bug got
introduced in 9.5 due to code rearrangement in commits 2c03216d83 and 2076db2aea. Fix it by ensuring that XLogInsertRecord performs a
recomputation when the given record is generated with FPW as off but
found that the flag has been turned on while actually inserting the
record.
Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi Reviewed-by: Amit Kapila
Backpatch-through: 9.5 where this problem was introduced
Discussion: https://postgr.es/m/20180420.151043.74298611.horiguchi.kyotaro@lab.ntt.co.jp
Michael Paquier [Thu, 13 Sep 2018 07:56:57 +0000 (16:56 +0900)]
Simplify static function in extension.c
An extra argument for the filename defining the extension script
location was present, aimed at being used for error reporting, but has
never been used. This was around since extensions have been added in d9572c4.
Peter Eisentraut [Mon, 27 Aug 2018 13:50:50 +0000 (15:50 +0200)]
Simplify index tuple descriptor initialization
We have two code paths for initializing the tuple descriptor for a new
index: For a normal index, we copy the tuple descriptor from the table
and reset a number of fields that are not applicable to indexes. For an
expression index, we make a blank tuple descriptor and fill in the
needed fields based on the provided expressions. As pg_attribute has
grown over time, the number of fields that we need to reset in the first
case is now bigger than the number of fields we actually want to copy,
so it's sensible to do it the other way around: Make a blank descriptor
and copy just the fields we need. This also allows more code sharing
between the two branches, and it avoids having to touch this code for
almost every unrelated change to the pg_attribute structure.
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Andrew Gierth [Wed, 12 Sep 2018 18:31:06 +0000 (19:31 +0100)]
Repair bug in regexp split performance improvements.
Commit c8ea87e4b introduced a temporary conversion buffer for
substrings extracted during regexp splits. Unfortunately the code that
sized it was failing to ignore the effects of ignored degenerate
regexp matches, so for regexp_split_* calls it could under-size the
buffer in such cases.
Fix, and add some regression test cases (though those will only catch
the bug if run in a multibyte encoding).
Backpatch to 9.3 as the faulty code was.
Thanks to the PostGIS project, Regina Obe and Paul Ramsey for the
report (via IRC) and assistance in analysis. Patch by me.
Michael Paquier [Tue, 11 Sep 2018 21:46:01 +0000 (06:46 +0900)]
Parse more strictly integer parameters from connection strings in libpq
The following parameters have been parsed in lossy ways when specified
in a connection string processed by libpq:
- connect_timeout
- keepalives
- keepalives_count
- keepalives_idle
- keepalives_interval
- port
Overflowing values or the presence of incorrect characters were not
properly checked, leading to libpq trying to use such values and fail
with unhelpful error messages. This commit hardens the parsing of those
parameters so as it is possible to find easily incorrect values.
Author: Fabien Coelho Reviewed-by: Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/alpine.DEB.2.21.1808171206180.20841@lancre
Tom Lane [Tue, 11 Sep 2018 20:32:12 +0000 (16:32 -0400)]
Remove ruleutils.c's special case for BIT [VARYING] literals.
Up to now, get_const_expr() insisted on prefixing BIT and VARBIT
literals with 'B'. That's not really necessary, because we always
append explicit-cast syntax to identify the constant's type.
Moreover, it's subtly wrong for VARBIT, because the parser will
interpret B'...' as '...'::"bit"; see make_const() which explicitly
assigns type BITOID for a T_BitString literal. So what had been
a simple VARBIT literal is reconstructed as ('...'::"bit")::varbit,
which is not the same thing, at least not before constant folding.
This results in odd differences after dump/restore, as complained
of by the patch submitter, and it could result in actual failures in
partitioning or inheritance DDL operations (see commit 542320c2b,
which repaired similar misbehaviors for some other data types).
Fixing it is pretty easy: just remove the special case and let the
default code path handle these types. We could have kept the special
case for BIT only, but there seems little point in that.
Like the previous patch, I judge that back-patching this into stable
branches wouldn't be a good idea. However, it seems not quite too
late for v11, so let's fix it there.
Paul Guo, reviewed by Davy Machado and John Naylor, minor adjustments
by me
Andrew Gierth [Tue, 11 Sep 2018 17:14:19 +0000 (18:14 +0100)]
Repair double-free in SP-GIST rescan (bug #15378)
spgrescan would first reset traversalCxt, and then traverse a
potentially non-empty stack containing pointers to traversalValues
which had been allocated in those contexts, freeing them a second
time. This bug originates in commit ccd6eb49a where traversalValue was
introduced.
Repair by traversing the stack before the context reset; this isn't
ideal, since it means doing retail pfree in a context that's about to
be reset, but the freeing of a stack entry is also done in other
places in the code during the scan so it's not worth trying to
refactor it further. Regression test added.
Backpatch to 9.6 where the problem was introduced.
Per bug #15378; analysis and patch by me, originally from a report on
IRC by user velix; see also PostGIS ticket #4174; review by Alexander
Korotkov.
Tom Lane [Tue, 11 Sep 2018 02:22:12 +0000 (22:22 -0400)]
Use -Bsymbolic for shared libraries on HP-UX and Solaris.
These platforms are also subject to the mis-linking problem addressed
in commit e3d77ea6b. It's not clear whether we could solve it with
a solution equivalent to GNU ld's version scripts, but -Bsymbolic
appears to fix it, so let's use that.
Like the previous commit, back-patch as far as v10.
Tom Lane [Mon, 10 Sep 2018 16:47:02 +0000 (12:47 -0400)]
Hide a static inline from FRONTEND code.
For some reason pg_waldump is including tuptable.h, and the recent
addition of a static inline function to it is causing problems on
older buildfarm members that fail to optimize such functions away
completely. I wonder if this situation doesn't mean that some header
refactoring is called for ... but as a band-aid, wrap the static
function in "#ifndef FRONTEND".
Tom Lane [Sun, 9 Sep 2018 19:16:51 +0000 (15:16 -0400)]
Prevent mis-linking of src/port and src/common functions on *BSD.
On ELF-based platforms (and maybe others?) it's possible for a shared
library, when dynamically loaded into the backend, to call the backend
versions of src/port and src/common functions rather than the frontend
versions that are actually linked into the shlib. This is the cause
of bug #15367 from Jeremy Evans, and is likely to lead to more problems
in future; it's accidental that we've failed to notice any bad effects
up to now.
The recommended way to fix this on ELF-based platforms is to use a
linker "version script" that makes the shlib's versions of the functions
local. (Apparently, -Bsymbolic would fix it as well, but with other
side effects that we don't want.) Doing so has the additional benefit
that we can make sure the shlib only exposes the symbols that are meant
to be part of its API, and not ones that are just for cross-file
references within the shlib. So we'd already been using a version
script for libpq on popular platforms, but it's now apparent that it's
necessary for correctness on every ELF-based platform.
Hence, add appropriate logic to the openbsd, freebsd, and netbsd stanzas
of Makefile.shlib; this is just a copy-and-paste from the linux stanza.
There may be additional work to do if commit ed0cdf0e0 reveals that the
problem exists elsewhere, but this is all that is known to be needed
right now.
Back-patch to v10 where SCRAM support came in. The problem is ancient,
but analysis suggests that there were no really severe consequences
in older branches. Hence, I won't take the risk of such a large change
in the build process for older branches.
In passing, remove a rather opaque comment about -Bsymbolic; I don't
think it's very on-point about why we don't use that, if indeed that's
what it's talking about at all.
Patch by me; thanks to Andrew Gierth for helping to diagnose the problem,
and for additional testing.
Improve behavior of to_timestamp()/to_date() functions
to_timestamp()/to_date() functions were introduced mainly for Oracle
compatibility, and became very popular among PostgreSQL users. However, some
behavior of to_timestamp()/to_date() functions are both incompatible with Oracle
and confusing for our users. This behavior is related to handling of spaces and
separators in non FX (fixed format) mode. This commit reworks this behavior
making less confusing, better documented and more compatible with Oracle.
Nevertheless, there are still following incompatibilities with Oracle.
1) We don't insist that there are no format string patterns unmatched to
input string.
2) In FX mode we don't insist space and separators in format string to exactly
match input string.
3) When format string patterns are divided by mix of spaces and separators, we
don't distinguish them, while Oracle takes into account only last group of
spaces/separators.
Discussion: https://postgr.es/m/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com
Author: Artur Zakirov, Alexander Korotkov, Liudmila Mantrova
Review: Amul Sul, Robert Haas, Tom Lane, Dmitry Dolgov, David G. Johnston
ginRedoRecompress() replays actions over compressed segments of posting list
in-place. However, it might lead to write past pg_upper, because intermediate
state during playing the changes can take more space than both original state
and final state. This commit fixes that by refuse from in-place modification.
Instead page tail is copied once modification is started, and then it's used
as the source of original segments. Backpatch to 9.4 where posting list
compression was introduced.
Reported-by: Sivasubramanian Ramasubramanian
Discussion: https://postgr.es/m/1536091151804.6588%40amazon.com
Author: Alexander Korotkov based on patch from and ideas by Sivasubramanian Ramasubramanian
Review: Sivasubramanian Ramasubramanian
Backpatch-through: 9.4
Tom Lane [Sun, 9 Sep 2018 16:41:27 +0000 (12:41 -0400)]
Work around stdbool problem in dfmgr.c.
Commit 842cb9fa6 refactored things so that dfmgr.c includes <dlfcn.h>,
which before that had only been directly included in platform-specific
stub files. It turns out that on macOS, <dlfcn.h> includes <stdbool.h>,
and that causes problems on platforms where _Bool is not char-sized ...
which happens to include the PPC versions of macOS. Work around it
much as we have in plperl.h, by #undef'ing bool after including the
problematic file, but only if we're not using stdbool-style booleans.
Tom Lane [Sun, 9 Sep 2018 16:23:23 +0000 (12:23 -0400)]
Install a check for mis-linking of src/port and src/common functions.
On ELF-based platforms (and maybe others?) it's possible for a shared
library, when dynamically loaded into the backend, to call the backend
versions of src/port and src/common functions rather than the frontend
versions that are actually linked into the shlib. This is definitely
not what we want, because the frontend versions often behave slightly
differently. Up to now it's been "slight" enough that nobody noticed;
but with the addition of SCRAM support functions in src/common, we're
observing crashes due to the difference between palloc and malloc
memory allocation rules, as reported in bug #15367 from Jeremy Evans.
The purpose of this patch is to create a direct test for this type of
mis-linking, so that we know whether any given platform requires extra
measures to prevent using the wrong functions. If the test fails, it
will lead to connection failures in the contrib/postgres_fdw regression
test. At the moment, *BSD platforms using ELF format are known to have
the problem and can be expected to fail; but we need to know whether
anything else does, and we need a reliable ongoing check for future
platforms.
Actually fixing the problem will be the subject of later commit(s).
Buildfarm member tern failed src/bin/pg_ctl/t/001_start_stop.pl when a
check_mode_recursive() call overlapped a server's startup-time deletion
of pg_stat/global.stat. Just warn. Also, include errno in the message.
Back-patch to v11, where check_mode_recursive() first appeared.
Tom Lane [Sat, 8 Sep 2018 22:20:36 +0000 (18:20 -0400)]
Minor cleanup/future-proofing for pg_saslprep().
Ensure that pg_saslprep() initializes its output argument to NULL in
all failure paths, and then remove the redundant initialization that
some (not all) of its callers did. This does not fix any live bug,
but it reduces the odds of future bugs of omission.
Also add a comment about why the existing failure-path coding is
adequate.
Back-patch so as to keep the function's API consistent across branches,
again to forestall future bug introduction.
Tom Lane [Sat, 8 Sep 2018 00:09:57 +0000 (20:09 -0400)]
Save/restore SPI's global variables in SPI_connect() and SPI_finish().
This patch removes two sources of interference between nominally
independent functions when one SPI-using function calls another,
perhaps without knowing that it does so.
Chapman Flack pointed out that xml.c's query_to_xml_internal() expects
SPI_tuptable and SPI_processed to stay valid across datatype output
function calls; but it's possible that such a call could involve
re-entrant use of SPI. It seems likely that there are similar hazards
elsewhere, if not in the core code then in third-party SPI users.
Previously SPI_finish() reset SPI's API globals to zeroes/nulls, which
would typically make for a crash in such a situation. Restoring them
to the values they had at SPI_connect() seems like a considerably more
useful behavior, and it still meets the design goal of not leaving any
dangling pointers to tuple tables of the function being exited.
Also, cause SPI_connect() to reset these variables to zeroes/nulls after
saving them. This prevents interference in the opposite direction: it's
possible that a SPI-using function that's only ever been tested standalone
contains assumptions that these variables start out as zeroes. That was
the case as long as you were the outermost SPI user, but not so much for
an inner user. Now it's consistent.
Report and fix suggestion by Chapman Flack, actual patch by me.
Back-patch to all supported branches.
Tom Lane [Fri, 7 Sep 2018 22:13:29 +0000 (18:13 -0400)]
Limit depth of forced recursion for CLOBBER_CACHE_RECURSIVELY.
It's somewhat surprising that we got away with this before. (Actually,
since nobody tests this routinely AFAIK, it might've been broken for
awhile. But it's definitely broken in the wake of commit f868a8143.)
It seems sufficient to limit the forced recursion to a small number
of levels.
Back-patch to all supported branches, like the preceding patch.
Tom Lane [Fri, 7 Sep 2018 22:04:37 +0000 (18:04 -0400)]
Fix longstanding recursion hazard in sinval message processing.
LockRelationOid and sibling routines supposed that, if our session already
holds the lock they were asked to acquire, they could skip calling
AcceptInvalidationMessages on the grounds that we must have already read
any remote sinval messages issued against the relation being locked.
This is normally true, but there's a critical special case where it's not:
processing inside AcceptInvalidationMessages might attempt to access system
relations, resulting in a recursive call to acquire a relation lock.
Hence, if the outer call had acquired that same system catalog lock, we'd
fall through, despite the possibility that there's an as-yet-unread sinval
message for that system catalog. This could, for example, result in
failure to access a system catalog or index that had just been processed
by VACUUM FULL. This is the explanation for buildfarm failures we've been
seeing intermittently for the past three months. The bug is far older
than that, but commits a54e1f158 et al added a new recursion case within
AcceptInvalidationMessages that is apparently easier to hit than any
previous case.
To fix this, we must not skip calling AcceptInvalidationMessages until
we have *finished* a call to it since acquiring a relation lock, not
merely acquired the lock. (There's already adequate logic inside
AcceptInvalidationMessages to deal with being called recursively.)
Fortunately, we can implement that at trivial cost, by adding a flag
to LOCALLOCK hashtable entries that tracks whether we know we have
completed such a call.
There is an API hazard added by this patch for external callers of
LockAcquire: if anything is testing for LOCKACQUIRE_ALREADY_HELD,
it might be fooled by the new return code LOCKACQUIRE_ALREADY_CLEAR
into thinking the lock wasn't already held. This should be a fail-soft
condition, though, unless something very bizarre is being done in
response to the test.
Also, I added an additional output argument to LockAcquireExtended,
assuming that that probably isn't called by any outside code given
the very limited usefulness of its additional functionality.
Michael Paquier [Fri, 7 Sep 2018 18:00:16 +0000 (11:00 -0700)]
Improve handling of corrupted two-phase state files at recovery
When a corrupted two-phase state file is found by WAL replay, be it for
crash recovery or archive recovery, then the file is simply skipped and
a WARNING is logged to the user, causing the transaction to be silently
lost. Facing an on-disk WAL file which is corrupted is as likely to
happen as what is stored in WAL records, but WAL records are already
able to fail hard if there is a CRC mismatch. On-disk two-phase state
files, on the contrary, are simply ignored if corrupted. Note that when
restoring the initial two-phase data state at recovery, files newer than
the horizon XID are discarded hence no files present in pg_twophase/
should be torned and have been made durable by a previous checkpoint, so
recovery should never see any corrupted two-phase state file by design.
The situation got better since 978b2f6 which has added two-phase state
information directly in WAL instead of using on-disk files, so the risk
is limited to two-phase transactions which live across at least one
checkpoint for long periods. Backups having legit two-phase state files
on-disk could also lose silently transactions when restored if things
get corrupted.
This behavior exists since two-phase commit has been introduced, no
back-patch is done for now per the lack of complaints about this
problem.
Author: Michael Paquier
Discussion: https://postgr.es/m/20180709050309.GM1467@paquier.xyz
Andrew Gierth [Fri, 7 Sep 2018 12:51:30 +0000 (13:51 +0100)]
Refactor installation of extension headers.
Commit be54b3777 failed on gmake 3.80 due to a chained conditional,
which on closer examination could be removed entirely with some
refactoring elsewhere for a net simplification and more robustness
against empty expansions. Along the way, add some more comments.
Also make explicit in the documentation and comments that built
headers are not removed by 'make clean', since we don't typically want
that for headers generated by a separate ./configure step, and it's
much easier to add your own 'distclean' rule or use EXTRA_CLEAN than
to try and override a deletion rule in pgxs.mk.
Per buildfarm member prariedog and comments by Michael Paquier, though
all the actual changes are my fault.
libpq connection options as returned by PQconndefaults() have a
"dispchar" field that determines (among other things) whether an option
is a "debug" option, which shouldn't be shown by default to clients.
postgres_fdw makes use of that to control which connection options to
accept from a foreign server configuration.
Curiously, the "options" option, which allows passing configuration
settings to the backend server, was listed as a debug option, which
prevented it from being used by postgres_fdw. Maybe it was once meant
for debugging, but it's clearly in general use nowadays.
So change the dispchar for it to be the normal non-debug case. Also
remove the "debug" reference from its label field.
Tom Lane [Thu, 6 Sep 2018 14:49:45 +0000 (10:49 -0400)]
Make contrib/unaccent's unaccent() function work when not in search path.
Since the fixes for CVE-2018-1058, we've advised people to schema-qualify
function references in order to fix failures in code that executes under
a minimal search_path setting. However, that's insufficient to make the
single-argument form of unaccent() work, because it looks up the "unaccent"
text search dictionary using the search path.
The most expedient answer seems to be to remove the search_path dependency
by making it look in the same schema that the unaccent() function itself
is declared in. This will definitely work for the normal usage of this
function with the unaccent dictionary provided by the extension.
It's barely possible that there are people who were relying on the
search-path-dependent behavior to select other dictionaries with the same
name; but if there are any such people at all, they can still get that
behavior by writing unaccent('unaccent', ...), or possibly
unaccent('unaccent'::text::regdictionary, ...) if the lookup has to be
postponed to runtime.
Per complaint from Gunnlaugur Thor Briem. Back-patch to all supported
branches.
Nowadays, all platforms except Windows and older HP-UX have standard
dlopen() support. So having a separate implementation per platform
under src/backend/port/dynloader/ is a bit excessive. Instead, treat
dlopen() like other library functions that happen to be missing
sometimes and put a replacement implementation under src/port/.
Amit Kapila [Thu, 6 Sep 2018 03:57:19 +0000 (09:27 +0530)]
Fix the overrun in hash index metapage for smaller block sizes.
The commit 620b49a1 changed the value of HASH_MAX_BITMAPS with the intent
to allow many non-unique values in hash indexes without worrying to reach
the limit of the number of overflow pages. At that time, this didn't
occur to us that it can overrun the block for smaller block sizes.
Choose the value of HASH_MAX_BITMAPS based on BLCKSZ such that it gives
the same answer as now for the cases where the overrun doesn't occur, and
some other sufficiently-value for the cases where an overrun currently
does occur. This allows us not to change the behavior in any case that
currently works, so there's really no reason for a HASH_VERSION bump.
Author: Dilip Kumar Reviewed-by: Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/CAA4eK1LtF4VmU4mx_+i72ff1MdNZ8XaJMGkt2HV8+uSWcn8t4A@mail.gmail.com
Andrew Gierth [Wed, 5 Sep 2018 21:01:21 +0000 (22:01 +0100)]
Allow extensions to install built as well as unbuilt headers.
Commit df163230b overlooked the case that an out-of-tree extension
might need to build its header files (e.g. via ./configure). If it is
also doing a VPATH build, the HEADERS_* rules in the original commit
would then fail to find the files, since they would be looking only
under $(srcdir) and not in the build directory.
Fix by adding HEADERS_built and HEADERS_built_$(MODULE) which behave
like DATA_built in that they look in the build dir rather than the
source dir (and also make the files dependencies of the "all" target).
No Windows support appears to be needed for this, since it is only
relevant to out-of-tree builds (no support exists in Mkvcbuild.pm to
build extension header files in any case).
Tom Lane [Wed, 5 Sep 2018 17:47:15 +0000 (13:47 -0400)]
Make argument names of pg_get_object_address consistent, and fix docs.
pg_get_object_address and pg_identify_object_as_address are supposed
to be inverses, but they disagreed as to the names of the arguments
representing the textual form of an object address. Moreover, the
documented argument names didn't agree with reality at all, either
for these functions or pg_identify_object.
In HEAD and v11, I think we can get away with renaming the input
arguments of pg_get_object_address to match the outputs of
pg_identify_object_as_address. In theory that might break queries
using named-argument notation to call pg_get_object_address, but
it seems really unlikely that anybody is doing that, or that they'd
have much trouble adjusting if they were. In older branches, we'll
just live with the lack of consistency.
Aside from fixing the documentation of these functions to match reality,
I couldn't resist the temptation to do some copy-editing.
Per complaint from Jean-Pierre Pelletier. Back-patch to 9.5 where these
functions were introduced. (Before v11, this is a documentation change
only.)
In the original code, we were storing the pg_inherits row for a
partitioned table too early: enough that we had a hack for relcache to
avoid falling flat on its face while reading such a partial entry. If
we finish the pg_class creation first and *then* store the pg_inherits
entry, we don't need that hack.
Also recognize that pg_class.relpartbound is not marked NOT NULL and
therefore it's entirely possible to read null values, so having only
Assert() protection isn't enough. Change those to if/elog tests
instead. This qualifies as a robustness fix, so backpatch to pg11.
In passing, remove one access that wasn't actually needed, and reword
one message to be like all the others that check for the same thing.
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/20180903213916.hh6wasnrdg6xv2ud@alvherre.pgsql
Peter Eisentraut [Wed, 29 Aug 2018 09:10:17 +0000 (11:10 +0200)]
PL/Python: Remove use of simple slicing API
The simple slicing API (sq_slice, sq_ass_slice) has been deprecated
since Python 2.0 and has been removed altogether in Python 3, so remove
those functions from the PLyResult class. Instead, the non-slice
mapping functions mp_subscript and mp_ass_subscript can take slice
objects as an index. Since we just pass the index through to the
underlying list object, we already support that. Test coverage was
already in place.
Bruce Momjian [Wed, 5 Sep 2018 02:34:07 +0000 (22:34 -0400)]
docs: improve AT TIME ZONE description
The previous description was unclear. Also add a third example, change
use of time zone acronyms to more verbose descriptions, and add a
mention that using 'time' with AT TIME ZONE uses the current time zone
rules.
Michael Paquier [Tue, 4 Sep 2018 18:06:04 +0000 (11:06 -0700)]
Improve some error message strings and errcodes
This makes a bit less work for translators, by unifying error strings a
bit more with what the rest of the code does, this time for three error
strings in autoprewarm and one in base backup code.
After some code review of slot.c, some file-access errcodes are reported
but lead to an incorrect internal error, while corrupted data makes the
most sense, similarly to the previous work done in e41d0a1. Also,
after calling rmtree(), a WARNING gets reported, which is a duplicate of
what the internal call report, so make the code more consistent with all
other code paths calling this function.
Author: Michael Paquier
Discussion: https://postgr.es/m/20180902200747.GC1343@paquier.xyz
Tom Lane [Tue, 4 Sep 2018 17:45:35 +0000 (13:45 -0400)]
Fully enforce uniqueness of constraint names.
It's been true for a long time that we expect names of table and domain
constraints to be unique among the constraints of that table or domain.
However, the enforcement of that has been pretty haphazard, and it missed
some corner cases such as creating a CHECK constraint and then an index
constraint of the same name (as per recent report from André Hänsel).
Also, due to the lack of an actual unique index enforcing this, duplicates
could be created through race conditions.
Moreover, the code that searches pg_constraint has been quite inconsistent
about how to handle duplicate names if one did occur: some places checked
and threw errors if there was more than one match, while others just
processed the first match they came to.
To fix, create a unique index on (conrelid, contypid, conname). Since
either conrelid or contypid is zero, this will separately enforce
uniqueness of constraint names among constraints of any one table and any
one domain. (If we ever implement SQL assertions, and put them into this
catalog, more thought might be needed. But it'd be at least as reasonable
to put them into a new catalog; having overloaded this one catalog with
two kinds of constraints was a mistake already IMO.) This index can replace
the existing non-unique index on conrelid, though we need to keep the one
on contypid for query performance reasons.
Having done that, we can simplify the logic in various places that either
coped with duplicates or neglected to, as well as potentially improve
lookup performance when searching for a constraint by name.
Also, as per our usual practice, install a preliminary check so that you
get something more friendly than a unique-index violation report in the
case complained of by André. And teach ChooseIndexName to avoid choosing
autogenerated names that would draw such a failure.
While it's not possible to make such a change in the back branches,
it doesn't seem quite too late to put this into v11, so do so.
Tom Lane [Tue, 4 Sep 2018 14:52:01 +0000 (10:52 -0400)]
Clean up after TAP tests in oid2name and vacuumlo.
Oversights in commits 1aaf532de and bfea331a5. Unlike the case for
traditional-style REGRESS tests, pgxs.mk doesn't have any builtin support
for TAP tests, so it doesn't realize it should remove tmp_check/.
Maybe we should build some actual pgxs infrastructure for TAP tests ...
but for the moment, just remove explicitly.
Amit Kapila [Tue, 4 Sep 2018 04:36:09 +0000 (10:06 +0530)]
Prohibit pushing subqueries containing window function calculation to
workers.
Allowing window function calculation in workers leads to inconsistent
results because if the input row ordering is not fully deterministic, the
output of window functions might vary across workers. The fix is to treat
them as parallel-restricted.
In the passing, improve the coding pattern in max_parallel_hazard_walker
so that it has a chain of mutually-exclusive if ... else if ... else if
... else if ... IsA tests.
Reported-by: Marko Tiikkaja
Bug: 15324
Author: Amit Kapila Reviewed-by: Tom Lane
Backpatch-through: 9.6
Discussion: https://postgr.es/m/CAL9smLAnfPJCDUUG4ckX2iznj53V7VSMsYefzZieN93YxTNOcw@mail.gmail.com
Amit Kapila [Tue, 4 Sep 2018 02:43:35 +0000 (08:13 +0530)]
During the split, set checksum on an empty hash index page.
On a split, we allocate a new splitpoint's worth of bucket pages wherein
we initialize the last page with zeros which is fine, but we forgot to set
the checksum for that last page.
We decided to back-patch this fix till 10 because we don't have an easy
way to test it in prior versions. Another reason is that the hash-index
code is changed heavily in 10, so it is not advisable to push the fix
without testing it in prior versions.
Author: Amit Kapila Reviewed-by: Yugo Nagata
Backpatch-through: 10
Discussion: https://postgr.es/m/5d03686d-727c-dbf8-0064-bf8b97ffe850@2ndquadrant.com
This column was added in commit 8224de4f42cc ("Indexes with INCLUDE
columns and their support in B-tree") to ease writing the ruleutils.c
supporting code for that feature, but it turns out to be unnecessary --
we can do the same thing with just one more syscache lookup.
Even the documentation for the new column being removed in this commit
is awkward.
Tomas Vondra [Mon, 3 Sep 2018 00:10:24 +0000 (02:10 +0200)]
Fix memory leak in TRUNCATE decoding
When decoding a TRUNCATE record, the relids array was being allocated in
the main ReorderBuffer memory context, but not released with the change
resulting in a memory leak.
The array was also ignored when serializing/deserializing the change,
assuming all the information is stored in the change itself. So when
spilling the change to disk, we've only we have serialized only the
pointer to the relids array. Thanks to never releasing the array,
the pointer however remained valid even after loading the change back
to memory, preventing an actual crash.
This fixes both the memory leak and (de)serialization. The relids array
is still allocated in the main ReorderBuffer memory context (none of the
existing ones seems like a good match, and adding an extra context seems
like an overkill). The allocation is wrapped in a new ReorderBuffer API
functions, to keep the details within reorderbuffer.c, just like the
other ReorderBufferGet methods do.
Author: Tomas Vondra
Discussion: https://www.postgresql.org/message-id/flat/66175a41-9342-2845-652f-1bd4c3ee50aa%402ndquadrant.com
Backpatch: 11, where decoding of TRUNCATE was introduced
Michael Paquier [Sun, 2 Sep 2018 19:40:30 +0000 (12:40 -0700)]
Fix initial sync of slot parent directory when restoring status
At the beginning of recovery, information from replication slots is
recovered from disk to memory. In order to ensure the durability of the
information, the status file as well as its parent directory are
synced. It happens that the sync on the parent directory was done
directly using the status file path, which is logically incorrect, and
the current code has been doing a sync on the same object twice in a
row.
Reported-by: Konstantin Knizhnik Diagnosed-by: Konstantin Knizhnik
Author: Michael Paquier
Discussion: https://postgr.es/m/9eb1a6d5-b66f-2640-598d-c5ea46b8f68a@postgrespro.ru
Backpatch-through: 9.4-
Tom Lane [Sat, 1 Sep 2018 20:02:47 +0000 (16:02 -0400)]
Doc: fix oversights in "Client/Server Character Set Conversions" table.
This table claimed that JOHAB could be used as a server encoding, which
was true originally but hasn't been true since 8.3. It also lacked
entries for EUC_JIS_2004 and SHIFT_JIS_2004.
JOHAB problem noted by Lars Kanis, the others by me.
Tom Lane [Sat, 1 Sep 2018 19:27:12 +0000 (15:27 -0400)]
Avoid using potentially-under-aligned page buffers.
There's a project policy against using plain "char buf[BLCKSZ]" local
or static variables as page buffers; preferred style is to palloc or
malloc each buffer to ensure it is MAXALIGN'd. However, that policy's
been ignored in an increasing number of places. We've apparently got
away with it so far, probably because (a) relatively few people use
platforms on which misalignment causes core dumps and/or (b) the
variables chance to be sufficiently aligned anyway. But this is not
something to rely on. Moreover, even if we don't get a core dump,
we might be paying a lot of cycles for misaligned accesses.
To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock
that the compiler must allocate with sufficient alignment, and use
those in place of plain char arrays.
I used these types even for variables where there's no risk of a
misaligned access, since ensuring proper alignment should make
kernel data transfers faster. I also changed some places where
we had been palloc'ing short-lived buffers, for coding style
uniformity and to save palloc/pfree overhead.
Since this seems to be a live portability hazard (despite the lack
of field reports), back-patch to all supported versions.
Patch by me; thanks to Michael Paquier for review.
Thomas Munro [Sat, 1 Sep 2018 19:12:24 +0000 (07:12 +1200)]
Add Greek characters to unaccent.rules.
Author: Tasos Maschalidis Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/153495048900.1368.11566580687623014380%40wrigleys.postgresql.org
Discussion: https://postgr.es/m/VI1PR01MB38537EBD529FE5EE3FE9A5FEB5370%40VI1PR01MB3853.eurprd01.prod.exchangelabs.com
Currently there are two ways to trigger log rotation in logging collector
process: call pg_rotate_logfile() SQL-function or send SIGUSR1 signal directly
to logging collector process. However, it's nice to have more suitable way
for external tools to do that, which wouldn't require SQL connection or
knowledge of logging collector pid. This commit implements triggering log
rotation by "pg_ctl logrotate" command.
Discussion: https://postgr.es/m/20180416.115435.28153375.horiguchi.kyotaro%40lab.ntt.co.jp
Author: Kyotaro Horiguchi, Alexander Kuzmenkov, Alexander Korotkov