granicus.if.org Git - postgresql/log

Use wildcard to match parens after CREATE STATISTICS

CREATE STATISTICS completion was checking manually for the start and end
of the parenthesised list of types. That works, but we now have a better
way to do that as commit 121213d9d taught word_matches() to allow '*' in
the middle of an alternative. But it only applied that to tab completion
for EXPLAIN, ANALYZE and VACUUM. Use it for CREATE STATISTICS too.

Author: Dagfinn Ilmari Mannsåker
Discussion: https://www.postgresql.org/message-id/flat/d8jwooziy1s.fsf%40dalvik.ping.uio.no

Don't count zero-filled buffers as 'read' in EXPLAIN.

If you extend a relation, it should count as a block written, not
read (we write a zero-filled block). If you ask for a zero-filled
buffer, it shouldn't be counted as read or written.

Later we might consider counting zero-filled buffers with a separate
counter, if they become more common due to future work.

Author: Thomas Munro
Reviewed-by: Haribabu Kommi, Kyotaro Horiguchi, David Rowley
Discussion: https://postgr.es/m/CAEepm%3D3JytB3KPpvSwXzkY%2Bdwc5zC8P8Lk7Nedkoci81_0E9rA%40mail.gmail.com

Ensure consistent sort order of large objects in pg_dump.

The primary purpose of this commit is to ensure pg_upgrade tests yield
comparable dumps pre/post upgrade, which got broken by 12a53c732 /
578b229718, as the order in pg_largeobject_metadata is likely to
differ pre/post upgrade.

It also seems like a generally good idea to make sure such dumps are
comparable, outside of pg_upgrade tests.

LO metadata already was already dumped in an ordered manner as the
metadata is dumped in a well defined order via
sortDumpableObjectsByTypeName() and sortDumpableObjects(). But large
object data is currently not tracked via that mechanism.

As Tom points out it seems possible that at some point dumpBlobs() was
assumed to dump out objects in a well defined order, due to the use of
DISTINCT, which at that time only was done using sorting.

Per complaint from Andrew Dunstan and discussion with him and Tom
Lane.

Author: Andres Freund
Discussion: https://postgr.es/m/2735.1543333649@sss.pgh.pa.us

Fix jit compilation bug on wide tables.

The function generated to perform JIT compiled tuple deforming failed
when HeapTupleHeader's t_hoff was bigger than a signed int8. I'd
failed to realize that LLVM's getelementptr would treat an int8 index
argument as signed, rather than unsigned. That means that a hoff
larger than 127 would result in a negative offset being applied. Fix
that by widening the index to 32bit.

Add a testcase with a wide table. Don't drop it, as it seems useful to
verify other tools deal properly with wide tables.

Thanks to Justin Pryzby for both reporting a bug and then reducing it
to a reproducible testcase!

Reported-By: Justin Pryzby
Author: Andres Freund
Discussion: https://postgr.es/m/20181115223959.GB10913@telsasoft.com
Backpatch: 11, just as jit compilation was

Update ssl test certificates and keys

Debian testing and newer now require that RSA and DHE keys are at
least 2048 bit long and no longer allow SHA-1 for signatures in
certificates. This is currently causing the ssl tests to fail there
because the test certificates and keys have been created in violation
of those conditions.

Update the parameters to create the test files and create a new set of
test files.

Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/20180917131340.GE31460%40paquier.xyz

Fix ac218aa4f6 to work on versions before 9.5.

Unfortunately ac218aa4f6 missed the fact that a reference to
'pg_catalog.regnamespace'::regclass wouldn't work before that type is
known. Fix that, by replacing the regtype usage with a join to
pg_type.

Reported-By: Tom Lane
Author: Andres Freund
Discussion: https://postgr.es/m/8863.1543297423@sss.pgh.pa.us
Backpatch: 9.5-, like ac218aa4f6

Update pg_upgrade test for reg* to include regrole and regnamespace.

When the regrole (0c90f6769) and regnamespace (cb9fa802b) types were
added in 9.5, pg_upgrade's check for reg* types wasn't updated. While
regrole currently is safe, regnamespace is not.

It seems unlikely that anybody uses regnamespace inside catalog tables
across a pg_upgrade, but the tests should be correct nevertheless.

While at it, reorder the types checked in the query to be
alphabetical. Otherwise it's annoying to compare existing and tested
for types.

Author: Andres Freund
Discussion: https://postgr.es/m/037e152a-cb25-3bcb-4f35-bdc9988f8204@2ndQuadrant.com
Backpatch: 9.5-, as regrole/regnamespace

doc: fix wording for plpgsql, add "and"

Reported-by: Anthony Greene
Discussion: https://postgr.es/m/CAPRNmnsSZ4QL75FUjcS8ND_oV+WjgyPbZ4ch2RUwmW6PWzF38w@mail.gmail.com

Backpatch-through: 9.4

Fix typo introduced in 578b229718.

Author: Andreas Karlsson
Discussion: https://postgr.es/m/0917c86f-e906-27c0-740e-abc581480823@proxel.se

Fix pg_upgrade for oid removal.

pg_upgrade previously copied pg_largeobject_metadata over from the old
cluster. That doesn't work, because the table has oids before
578b229718. I missed that.

As most pieces of metadata for large objects already were dumped as
DDL (except for comments overwritten by pg_upgrade, due to the copy of
pg_largeobject_metadata) it seems reasonable to just also dump grants
for large objects. If we ever consider this a relevant performance
problem, we'd need to fix the rest of the already emitted DDL
too.

There's still an open discussion about whether we'll want to force a
specific ordering for the dumped objects, as currently
pg_largeobjects_metadata potentially has a different ordering
before/after pg_upgrade, which can make automated testing a bit
harder.

Reported-By: Andrew Dunstan
Author: Andres Freund
Discussion: https://postgr.es/m/91a8a980-41bc-412b-fba2-2ba71a141c2b@2ndQuadrant.com

Fix translation of special characters in psql's LaTeX output modes.

latex_escaped_print() mistranslated \ and failed to provide any translation
for # ^ and ~, all of which would typically lead to LaTeX document syntax
errors.  In addition it didn't translate < > and |, which would typically
render as unexpected characters.

To some extent this represents shortcomings in ancient versions of LaTeX,
which if memory serves had no easy way to render these control characters
as ASCII text.  But that's been fixed for, um, decades.  In any case there
is no value in emitting guaranteed-to-fail output for these characters.

Noted while fooling with test cases added by commit 9a98984f4.  Back-patch
the code change to all supported versions.

Avoid locale-dependent output in numericlocale check.

I'd forgotten that in the buildfarm, parts of the regression tests
may run with psql exposed to a non-default LC_NUMERIC setting.
Hence we can't assume that C locale prevails, nor is there any
accessible way to force the setting for this single test step.
Lobotomize the test case added by commit 9a98984f4 so that it covers as
much as we can of print.c without having any locale-varying output.

Fix sample output for hash_metapage_info query

One output column was duplicated. Couldn't resist fixing the version
number while at it.

Reported-by: Gianni Ciolli

Add CSV table output mode in psql.

"\pset format csv", or --csv, selects comma-separated values table format.
This is compliant with RFC 4180, except that we aren't too picky about
whether the record separator is LF or CRLF; also, the user may choose a
field separator other than comma.

This output format is directly compatible with the server's COPY CSV
format, and will also be useful as input to other programs. It's
considerably safer for that purpose than the old recommendation to
use "unaligned" format, since the latter couldn't handle data
containing the field separator character.

Daniel Vérité, reviewed by Fabien Coelho and David Fetter, some
tweaking by me

Discussion: https://postgr.es/m/a8de371e-006f-4f92-ab72-2bbe3ee78f03@manitou-mail.org

Improve regression test coverage for psql output formats.

As penance for the "\pset format latex" silliness, add some regression
test coverage for the off-the-beaten-path output formats, which formerly
had exactly no coverage, except for some poorly-thought-out (unreadable,
repetitive, and incomplete) tests for asciidoc format.

I make no claims for the behavior exposed here actually being correct;
these test cases are just designed to ensure full code coverage in
fe_utils/print.c. This brings the line coverage for that file up
from ~60% to ~93%.

Fix breakage of "\pset format latex".

Commit eaf746a5b unintentionally made psql's "latex" output format
inaccessible, since not only "latex" but all abbreviations of it
were considered ambiguous against "latex-longtable". Let's go
back to the longstanding behavior that all shortened versions
mean "latex", and you have to write at least "latex-" to get
"latex-longtable". This leaves the only difference from pre-v12
behavior being that "\pset format a" is considered ambiguous.

The fact that the regression tests didn't expose this is pretty bad,
but fixing it is material for a separate commit.

Discussion: https://postgr.es/m/cb7e1caf-3ea6-450d-af28-f524903a030c@manitou-mail.org

Clarify that cross-row constraints are unsupported

Maybe we'll implement them later, or maybe not, but let's make the statu
quo clear for now.

Author: Lætitia Avrot, Patrick Francelle
Reviewers: too many to list
Discussion: https://postgr.es/m/CAB_COdhUuzNFOJfc7SNNso5rOuVA3ui93KMVunEM8Yih+K5A6A@mail.gmail.com

Revert "Fix typo in documentation of toast storage"

This reverts commit 058ef3a, per complains from Magnus Hagander and Vik
Fearing.

Fix typo in documentation of toast storage

Author: Nawaz Ahmed
Discussion: https://postgr.es/m/154319327168.1315.1846953598601966513@wrigleys.postgresql.org

Revert all new recent changes to add PGXS options for TAP and isolation

A set of failures in buildfarm machines are proving that this is not
quite ready yet because of another set of issues:
- MSVC scripts assume that REGRESS_OPTS can only use top_builddir. Some
test suites actually finish by using top_srcdir, like pg_stat_statements
which cause the regression tests to never run.
- Trying to enforce top_builddir does not work either when using VPATH
as this is not recognized properly.
- TAP tests of bloom are unstable on various platforms, causing various
failures.

Fix regression test handling of test_decoding with MSVC

The set of scripts in charge of running the regression tests for MSVC
run currently under the assumption that only $(top_builddir) can used in
option values defined in REGRESS_OPTS, and those options need to have a
specific format as well to be correctly parsed, so fix the Makefile
values so as those are correctly set.

Per complains from buildfarm member dory and whelk, with some extra
testing done on my side with MSVC to check this patch.

Disable temporarily TAP tests for contrib/bloom/

The recent commit 03faa4a8 has enabled those tests, however several
buildfarm members are complaining about their stability on Windows and
macOS. This will keep the buildfarm green, while investigating the
root problem.

Discussion: https://postgr.es/m/20181126003351.GE1776@paquier.xyz

Add PGXS options to control TAP and isolation tests

The following options are added for extensions:
- TAP_TESTS, to allow an extention to run TAP tests which are the ones
present in t/*.pl. A subset of tests can always be run with the
existing PROVE_TESTS for developers.
- ISOLATION, to define a list of isolation tests.
- ISOLATION_OPTS, to pass custom options to isolation_tester.

A couple of custom Makefile targets have been accumulated across the
tree to cover the lack of facility in PGXS for a couple of releases when
using those test suites, which are all now replaced with the new flags,
without reducing the test coverage. This also fixes an issue with
contrib/bloom/, which had a custom target to trigger its TAP tests of
its own not part of the main check runs.

Author: Michael Paquier
Reviewed-by: Adam Berlin, Álvaro Herrera, Tom Lane, Nikolay Shaplov,
Arthur Zakirov
Discussion: https://postgr.es/m/20180906014849.GG2726@paquier.xyz

Integrate recovery.conf into postgresql.conf

recovery.conf settings are now set in postgresql.conf (or other GUC
sources).  Currently, all the affected settings are PGC_POSTMASTER;
this could be refined in the future case by case.

Recovery is now initiated by a file recovery.signal.  Standby mode is
initiated by a file standby.signal.  The standby_mode setting is
gone.  If a recovery.conf file is found, an error is issued.

The trigger_file setting has been renamed to promote_trigger_file as
part of the move.

The documentation chapter "Recovery Configuration" has been integrated
into "Server Configuration".

pg_basebackup -R now appends settings to postgresql.auto.conf and
creates a standby.signal file.

Author: Fujii Masao <masao.fujii@gmail.com>
Author: Simon Riggs <simon@2ndquadrant.com>
Author: Abhijit Menon-Sen <ams@2ndquadrant.com>
Author: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/607741529606767@web3g.yandex.ru/

Fix assertion failure for SSL connections.

Commit cfdf4dc4 added an assertion that every WaitLatch() or similar
handles postmaster death. One place did not, but was missed in
review and testing due to the need for an SSL connection. Fix, by
asking for WL_EXIT_ON_PM_DEATH.

Reported-by: Christoph Berg
Discussion: https://postgr.es/m/20181124143845.GA15039%40msg.df7cb.de

Fix hstore hash function for empty hstores upgraded from 8.4.

Hstore data generated on pg 8.4 and pg_upgraded to current versions
remains in its original on-disk format unless modified. The same goes
for values generated by the addon hstore-new module on pre-9.0
versions. (The hstoreUpgrade function converts old values on the fly
when read in, but the on-disk value is not modified by this.)

Since old-format empty hstores (and hstore-new hstores) have
representations compatible with the new format, hstoreUpgrade thought
it could get away without modifying such values; but this breaks
hstore_hash (and the new hstore_hash_extended) which assumes
bit-perfect matching between semantically identical hstore values.
Only one bit actually differs (the "new version" flag in the count
field) but that of course is enough to break the hash.

Fix by making hstoreUpgrade unconditionally convert all old values to
new format.

Backpatch all the way, even though this changes a hash value in some
cases, because in those cases the hash value is already failing - for
example, a hash join between old- and new-format empty hstores will be
failing to match, or a hash index on an hstore column containing an
old-format empty value will be failing to find the value since it will
be searching for a hash derived from a new-format datum. (There are no
known field reports of this happening, probably because hashing of
hstores has only been useful in limited circumstances and there
probably isn't much upgraded data being used this way.)

Per concerns arising from discussion of commit eb6f29141be. Original
bug is my fault.

Discussion: https://postgr.es/m/60b1fd3b-7332-40f0-7e7f-f2f04f777747%402ndquadrant.com

Avoid crashes in contrib/intarray gist__int_ops (bug #15518)

1. Integer overflow in internal_size could result in memory corruption
in decompression since a zero-length array would be allocated and then
written to. This leads to crashes or corruption when traversing an
index which has been populated with sufficiently sparse values. Fix by
using int64 for computations and checking for overflow.

2. Integer overflow in g_int_compress could cause pessimal merge
choices, resulting in unnecessarily large ranges (which would in turn
trigger issue 1 above). Fix by using int64 again.

3. Even without overflow, array sizes could become large enough to
cause unexplained memory allocation errors. Fix by capping the sizes
to a safe limit and report actual errors pointing at gist__intbig_ops
as needed.

4. Large inputs to the compression function always consist of large
runs of consecutive integers, and the compression loop was processing
these one at a time in an O(N^2) manner with a lot of overhead. The
expected runtime of this function could easily exceed 6 months for a
single call as a result. Fix by performing a linear-time first pass,
which reduces the worst case to something on the order of seconds.

Backpatch all the way, since this has been wrong forever.

Per bug #15518 from report from irc user "dymk", analysis and patch by
me.

Discussion: https://postgr.es/m/15518-799e426c3b4f8358@postgresql.org

Adjust new test case for more portability.

Early returns from the buildfarm say that most critters are good with
commit cbdb8b4c0, but gaur gives unexpected results with the test case
involving a float8 that's one-ULP-less-than-2^63.  It appears that that
platform's version of rint() rounds that value up to 2^63 instead of
leaving it be.  This is possibly a bug, and it's also possible that no
other platform anybody is using anywhere behaves likewise.  Still, the
point of the test is not to insist that everybody's rint() behaves exactly
the same.  Let's use two-ULPs-less-than-2^63 instead, which I've tested
to act the same on gaur as on more modern hardware.

(This is, more or less, exactly the portability issue I'd feared might
arise...)

Discussion: https://postgr.es/m/15519-4fc785b483201ff1@postgresql.org

Fix float-to-integer coercions to handle edge cases correctly.

ftoi4 and its sibling coercion functions did their overflow checks in
a way that looked superficially plausible, but actually depended on an
assumption that the MIN and MAX comparison constants can be represented
exactly in the float4 or float8 domain.  That fails in ftoi4, ftoi8,
and dtoi8, resulting in a possibility that values near the MAX limit will
be wrongly converted (to negative values) when they need to be rejected.

Also, because we compared before rounding off the fractional part,
the other three functions threw errors for values that really ought
to get rounded to the min or max integer value.

Fix by doing rint() first (requiring an assumption that it handles
NaN and Inf correctly; but dtoi8 and ftoi8 were assuming that already),
and by comparing to values that should coerce to float exactly, namely
INTxx_MIN and -INTxx_MIN.  Also remove some random cosmetic discrepancies
between these six functions.

Per bug #15519 from Victor Petrovykh.  This should get back-patched,
but first let's see what the buildfarm thinks of it --- I'm not too
sure about portability of some of the regression test cases.

Patch by me; thanks to Andrew Gierth for analysis and discussion.

Discussion: https://postgr.es/m/15519-4fc785b483201ff1@postgresql.org

Add a 64-bit hash function for type hstore.

There's some question about the correctness of the hash function, but
if it's wrong, the 32-bit version is also wrong.

Amul Sul, reviewed by Hironobu Suzuki

Discussion: https://postgr.es/m/CAAJ_b947JjnNr9Cp45iNjSqKf6PA5mCTmKsRwPjows93YwQrmw@mail.gmail.com

Add a 64-bit hash function for type citext.

Amul Sul, reviewed by Hironobu Suzuki

Discussion: https://postgr.es/m/CAAJ_b947JjnNr9Cp45iNjSqKf6PA5mCTmKsRwPjows93YwQrmw@mail.gmail.com

Clamp semijoin selectivity to be not more than inner-join selectivity.

We should never estimate the output of a semijoin to be more rows than
we estimate for an inner join with the same input rels and join condition;
it's obviously impossible for that to happen.  However, given the
relatively poor quality of our semijoin selectivity estimates ---
particularly, but not only, in cases where we punt and return a default
estimate --- we did often deliver such estimates.  To improve matters,
calculate both estimates inside eqjoinsel() and take the smaller one.

The bulk of this patch is just mechanical refactoring to avoid repetitive
information lookup when we call both eqjoinsel_semi and eqjoinsel_inner.
The actual new behavior is just

selec = Min(selec, inner_rel->rows * selec_inner);

which looks a bit odd but is correct because of our different definitions
for inner and semi join selectivity.

There is one ensuing plan change in the regression tests, but it looks
reasonable enough (and checking the actual row counts shows that the
estimate moved closer to reality, not further away).

Per bug #15160 from Alexey Ermakov.  Although this is arguably a bug fix,
I won't risk destabilizing plan choices in stable branches by
back-patching.

Tom Lane, reviewed by Melanie Plageman

Discussion: https://postgr.es/m/152395805004.19366.3107109716821067806@wrigleys.postgresql.org

Silence compiler warnings

Commit cfdf4dc4fc96 left a few unnecessary assignments, one of which
caused compiler warnings, as reported by Erik Rijkers. Remove them all.

Discussion: https://postgr.es/m/df0dcca2025b3d90d946ecc508ca9678@xs4all.nl

Remove bogus file.

Don't allow partitioned indexes in pg_global tablespace

Missing in dfa608141982.

Author: David Rowley
Discussion: https://postgr.es/m/CAKJS1f-M3NMTCpv=vDfkoqHbMPFf=3-Z1ud=+1DHH00tC+zLaQ@mail.gmail.com

doc: Fix typo

Add WL_EXIT_ON_PM_DEATH pseudo-event.

Users of the WaitEventSet and WaitLatch() APIs can now choose between
asking for WL_POSTMASTER_DEATH and then handling it explicitly, or asking
for WL_EXIT_ON_PM_DEATH to trigger immediate exit on postmaster death.
This reduces code duplication, since almost all callers want the latter.

Repair all code that was previously ignoring postmaster death completely,
or requesting the event but ignoring it, or requesting the event but then
doing an unconditional PostmasterIsAlive() call every time through its
event loop (which is an expensive syscall on platforms for which we don't
have USE_POSTMASTER_DEATH_SIGNAL support).

Assert that callers of WaitLatchXXX() under the postmaster remember to
ask for either WL_POSTMASTER_DEATH or WL_EXIT_ON_PM_DEATH, to prevent
future bugs.

The only process that doesn't handle postmaster death is syslogger.  It
waits until all backends holding the write end of the syslog pipe
(including the postmaster) have closed it by exiting, to be sure to
capture any parting messages.  By using the WaitEventSet API directly
it avoids the new assertion, and as a by-product it may be slightly
more efficient on platforms that have epoll().

Author: Thomas Munro
Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas, Tom Lane
Discussion: https://postgr.es/m/CAEepm%3D1TCviRykkUb69ppWLr_V697rzd1j3eZsRMmbXvETfqbQ%40mail.gmail.com,
            https://postgr.es/m/CAEepm=2LqHzizbe7muD7-2yHUbTOoF7Q+qkSD5Q41kuhttRTwA@mail.gmail.com

Clarify documentation about PASSWORD in CREATE/ALTER ROLE

The documentation of CREATE/ALTER ROLE has been missing two things
related to PASSWORD:
- The password value provided needs to be quoted, some places of the
documentation marked the field with quotes, but not others, which led to
confusion.
- PASSWORD NULL was not provided consistently, with ENCRYPTED being not
compatible with it.

Reported-by: Steven Winfield
Author: Michael Paquier
Reviewed-by: David G. Johnston
Discussion: https://postgr.es/m/154282901979.1316.7418475422120496802@wrigleys.postgresql.org

Fix another crash in json{b}_populate_recordset and json{b}_to_recordset.

populate_recordset_worker() failed to consider the possibility that the
supplied JSON data contains no rows, so that update_cached_tupdesc never
got called.  This led to a null-pointer dereference since commit 9a5e8ed28;
before that it led to a bogus "set-valued function called in context that
cannot accept a set" error.  Fix by forcing the update to happen.

Per bug #15514.  Back-patch to v11 as 9a5e8ed28 was.  (If we were excited
about the bogus error, we could perhaps go back further, but it'd take more
work to figure out how to fix it in older branches.  Given the lack of
field complaints about that aspect, I'm not excited.)

Discussion: https://postgr.es/m/15514-59d5b4c4065b178b@postgresql.org

Doc: rework introductory documentation about covering indexes.

Documenting INCLUDE in the section about unique indexes is confusing,
as complained of by Emilio Platzer.  Furthermore, it entirely failed
to explain why you might want to use the feature.  The section about
index-only scans is really the right place; it already talked about
making such things the hard way.  Rewrite that text to describe INCLUDE
as the normal way to make a covering index.

Also, move that section up a couple of places, as it now seems more
important than some of the stuff we had before it.  It still has to
be after expression and partial indexes, since otherwise some of it
would involve forward references.

Discussion: https://postgr.es/m/154031939560.30897.14677735588262722042@wrigleys.postgresql.org

Fix typo in description of ExecFindPartition

Author: Amit Langote
Discussion: https://postgr.es/m/CA+HiwqHg0=UL+Dhh3gpiwYNA=ufk9Lb7GQ2c=5rs=ZmVTP7xAw@mail.gmail.com

doc: adjust time zone names text, v2

Removed one too many words. Fix for
7906de847f229f391b9e6b5892b4b4a89f29edb4.

Reported-by: Thomas Munro
Backpatch-through: 9.4

doc: adjust time zone names text

Reported-by: Kevin <kcolagio@gmail.com>
Discussion: https://postgr.es/m/154082462281.30897.14043119084654378035@wrigleys.postgresql.org

Backpatch-through: 9.4

Rework the pgbench state machine code for clarity

This commit continues the code improvements started by commit
12788ae49e19. With this commit, state machine transitions are better
contained in the routine that was called doCustom() and is now called
advanceConnectionState -- the resulting code is easier to reason about,
since there are no state changes occuring in the outer layer.

This change is prompted by future patches to add more features to
pgbench, which will need to effect some more surgery to this code.

Fabien's original had all the machine state changes inside one routine,
but I (Álvaro) thought that a subroutine to handle command execution is
more straightforward to review, so this commit does not match Fabien's
submission closely. If something is broken, it's probably my fault.

Author: Fabien Coelho, Álvaro Herrera
Reviewed-by: Kirk Jamison
Discussion: https://postgr.es/m/alpine.DEB.2.21.1808111104320.1705@lancre

Fix typo in commit 6f7d02aa60b7

Per pink buildfarm.

Fix PartitionDispatchData vertical whitespace

Per David Rowley
Discussion: https://postgr.es/m/CAKJS1f-MstvBWdkOzACsOHyBgj2oXcBM8kfv+NhVe-Ux-wq9Sg@mail.gmail.com

instr_time.h: add INSTR_TIME_SET_CURRENT_LAZY

Sets the timestamp to current if not already set. Will acquire more
callers momentarily.

Author: Fabien Coelho
Discussion: https://postgr.es/m/alpine.DEB.2.21.1808111104320.1705@lancre

Blind attempt at fixing sepgsql output for 578b22.

Fix sepgsql compile error caused by oid removal.

Per buildfarm animal rhinoceros. I (Andres) missed replacing a few
uses of ObjectIdAttributeNumber in sepgsql.

It's quite probable that the sepgsql test output will need more
adapting than done in 578b22...

Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm=2Sk+66HJV8FLDfm_sKTn22j7cWTY_Y1Rok3RxeWL_Y0w@mail.gmail.com

Remove WITH OIDS support, change oid catalog column visibility.

Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.

This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row.  Neither pg_dump nor COPY included the contents of the
oid column by default.

The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.

WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.

Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
  WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
  issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
  restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
  OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
  plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.

The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.

The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such.  This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.

The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.

Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).

The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.

While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.

Catversion bump, for obvious reasons.

Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de

Improve description of buffer used to store records in WAL reader

The dedicated private buffer to store records is used only for these
crossing a page boundary since 285bd0ac, but its description did not
match completely the reality.

Reported-by: Andrey Lepikhov
Author: Michael Paquier
Discussion: https://postgr.es/m/49518b48-2036-5e43-1818-0f594e375e76@postgrespro.ru

Make detection of SSL_CTX_set_min_proto_version more portable

As already explained in configure.in, using the OpenSSL version number
to detect presence of functions doesn't work, because LibreSSL reports
incompatible version numbers. Fortunately, the functions we need here
are actually macros, so we can just test for them directly.

Add settings to control SSL/TLS protocol version

For example:

ssl_min_protocol_version = 'TLSv1.1'
ssl_max_protocol_version = 'TLSv1.2'

Reviewed-by: Steve Singer <steve@ssinger.info>
Discussion: https://www.postgresql.org/message-id/flat/1822da87-b862-041a-9fc2-d0310c3da173@2ndquadrant.com

Make WAL description output more consistent

The output for record types XLOG_DBASE_CREATE and XLOG_DBASE_DROP used
the order dbid/tablespaceid, whereas elsewhere the order is
tablespaceid/dbid[/relfilenodeid]. Flip the order for those two types
to make it consistent.

Author: Jean-Christophe Arnu <jcarnu@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHZmTm18Ln62KW-G8NYvO1wbBL3QU1E76Zep=DuHmg-zS2XFAg@mail.gmail.com/

doc: Clarify CREATE TYPE ENUM documentation

The documentation claimed that an enum type requires "one or more"
labels, but since 1fd9883ff49, zero labels are also allowed.

Reported-by: Lukas Eder <lukas.eder@gmail.com>
Bug: #15356

Refine some guc.c help texts

These settings apply to communication with the sending server, which
is not necessarily a primary.

Author: Sergei Kornilov <sk@zsrv.org>

Fix issues with TAP tests of pg_verify_checksums

Two issues have been spotted and get fixed here:
- When checking for corrupted files, make sure that pg_verify_checksums
complains about the correct file.  In order to make the logic more
robust, all files created are immediately removed once checks on them
are done.  The error message generated by pg_verify_checksums also now
includes the file name it sees as corrupted.
- Before running corruption-related tests, empty files are generated
which used names mapping with the corrupted files, potentially leading
to conflicts.  So use different set of names for both.

Author: Michael Banck
Discussion: https://postgr.es/m/20181119181119.GC23740@nighthawk.caipicrew.dd-dns.de

Add needed #include.

Per POSIX, WIFSIGNALED and related macros are provided by <sys/wait.h>.
Apparently on Linux they're also pulled in by some other inclusions,
but BSD-ish systems are pickier. Fixes portability issue in ffa4cbd62.

Per buildfarm.

Handle EPIPE more sanely when we close a pipe reading from a program.

Previously, any program launched by COPY TO/FROM PROGRAM inherited the
server's setting of SIGPIPE handling, i.e. SIG_IGN.  Hence, if we were
doing COPY FROM PROGRAM and closed the pipe early, the child process
would see EPIPE on its output file and typically would treat that as
a fatal error, in turn causing the COPY to report error.  Similarly,
one could get a failure report from a query that didn't read all of
the output from a contrib/file_fdw foreign table that uses file_fdw's
PROGRAM option.

To fix, ensure that child programs inherit SIG_DFL not SIG_IGN processing
of SIGPIPE.  This seems like an all-around better situation since if
the called program wants some non-default treatment of SIGPIPE, it would
expect to have to set that up for itself.  Then in COPY, if it's COPY
FROM PROGRAM and we stop reading short of detecting EOF, treat a SIGPIPE
exit from the called program as a non-error condition.  This still allows
us to report an error for any case where the called program gets SIGPIPE
on some other file descriptor.

As coded, we won't report a SIGPIPE if we stop reading as a result of
seeing an in-band EOF marker (e.g. COPY BINARY EOF marker).  It's
somewhat debatable whether we should complain if the called program
continues to transmit data after an EOF marker.  However, it seems like
we should avoid throwing error in any questionable cases, especially in a
back-patched fix, and anyway it would take additional code to make such
an error get reported consistently.

Back-patch to v10.  We could go further back, since COPY FROM PROGRAM
has been around awhile, but AFAICS the only way to reach this situation
using core or contrib is via file_fdw, which has only supported PROGRAM
sources since v10.  The COPY statement per se has no feature whereby
it'd stop reading without having hit EOF or an error already.  Therefore,
I don't see any upside to back-patching further that'd outweigh the
risk of complaints about behavioral change.

Per bug #15449 from Eric Cyr.

Patch by me, review by Etsuro Fujita and Kyotaro Horiguchi

Discussion: https://postgr.es/m/15449-1cf737dd5929450e@postgresql.org

psql: Describe partitioned tables/indexes as such

In \d and \z, instead of conflating partitioned tables and indexes with
plain ones, set the "type" column and table title differently to make
the distinction obvious. A simple ease-of-use improvement.

Author: Pavel Stehule, Michaël Paquier, Álvaro Herrera
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/CAFj8pRDMWPgijpt_vPj1t702PgLG4Ls2NCf+rEcb+qGPpossmg@mail.gmail.com

Update config/ax_pthread.m4 to latest upstream version.

This change doesn't fix any bugs that we've heard about, but it seems
like a good idea on general principles to track upstream occasionally.

Discussion: https://postgr.es/m/3320.1542647565@sss.pgh.pa.us

Postpone LLVM-related uses of AC_CHECK_DECLS.

Calling AC_CHECK_DECLS before we've finished setting up the compiler's
CFLAGS seems like a pretty risky proposition, especially now that the
first use of that macro will result in a test to see whether the compiler
gives warning or error for undeclared built-in functions.  That answer
could very easily get changed later than where PGAC_LLVM_SUPPORT is
called; furthermore, it's hardly unlikely that flags such as -D_GNU_SOURCE
could change visibility of declarations.  Hence, be a little less cavalier
about where to do LLVM-related tests.  This results in v11 and HEAD doing
the warning-or-error check at the same place in the script as older
branches are doing it, which seems like a good thing.

Per further thought about commits 0b59b0e8b and 16fbac39f.

psql: Show IP address in \conninfo

When hostaddr is given, the actual IP address that psql is connected to
can be totally unexpected for the given host.  The more verbose output
we now generate makes things clearer.  Since the "host" and "hostaddr"
parts of the conninfo could come from different sources (say, one of
them is in the service specification or a URI-style conninfo and the
other is not), this is not as silly as it may first appear.  This is
also definitely useful if the hostname resolves to multiple addresses.

Author: Fabien Coelho
Reviewed-by: Pavel Stehule, Arthur Zakirov
Discussion: https://postgr.es/m/alpine.DEB.2.21.1810261532380.27686@lancre
https://postgr.es/m/alpine.DEB.2.21.1808201323020.13832@lancre

Reduce unnecessary list construction in RelationBuildPartitionDesc.

The 'partoids' list which was constructed by the previous version
of this code was necessarily identical to 'inhoids'.  There's no
point to duplicating the list, so avoid that.  Instead, construct
the array representation directly from the original 'inhoids' list.

Also, use an array rather than a list for 'boundspecs'.  We know
exactly how many items we need to store, so there's really no
reason to use a list.  Using an array instead reduces the number
of memory allocations we perform.

Patch by me, reviewed by Michael Paquier and Amit Langote, the
latter of whom also helped with rebasing.

Fix configure's AC_CHECK_DECLS tests to work correctly with clang.

The test case that Autoconf uses to discover whether a function has
been declared doesn't work reliably with clang, because clang reports
a warning not an error if the name is a known built-in function.
On some platforms, this results in a lot of compile-time warnings about
strlcpy and related functions not having been declared.

There is a fix for this (by Noah Misch) in the upstream Autoconf sources,
but since they've not made a release in years and show no indication of
doing so anytime soon, let's just absorb their fix directly. We can
revert this when and if we update to a newer Autoconf release.

Back-patch to all supported branches.

Discussion: https://postgr.es/m/26819.1542515567@sss.pgh.pa.us

Disallow COPY FREEZE on partitioned tables

This didn't actually work: COPY would fail to flush the right files, and
instead would try to flush a non-existing file, causing the whole
transaction to fail.

Cope by raising an error as soon as the command is sent instead, to
avoid a nasty later surprise. Of course, it would be much better to
make it work, but we don't have a patch for that yet, and we don't know
if we'll want to backpatch one when we do.

Reported-by: Tomas Vondra
Author: David Rowley
Reviewed-by: Amit Langote, Steve Singer, Tomas Vondra

pg_archivecleanup: Update file header comment a bit

PANIC on fsync() failure.

On some operating systems, it doesn't make sense to retry fsync(),
because dirty data cached by the kernel may have been dropped on
write-back failure.  In that case the only remaining copy of the
data is in the WAL.  A subsequent fsync() could appear to succeed,
but not have flushed the data.  That means that a future checkpoint
could apparently complete successfully but have lost data.

Therefore, violently prevent any future checkpoint attempts by
panicking on the first fsync() failure.  Note that we already
did the same for WAL data; this change extends that behavior to
non-temporary data files.

Provide a GUC data_sync_retry to control this new behavior, for
users of operating systems that don't eject dirty data, and possibly
forensic/testing uses.  If it is set to on and the write-back error
was transient, a later checkpoint might genuinely succeed (on a
system that does not throw away buffers on failure); if the error is
permanent, later checkpoints will continue to fail.  The GUC defaults
to off, meaning that we panic.

Back-patch to all supported releases.

There is still a narrow window for error-loss on some operating
systems: if the file is closed and later reopened and a write-back
error occurs in the intervening time, but the inode has the bad
luck to be evicted due to memory pressure before we reopen, we could
miss the error.  A later patch will address that with a scheme
for keeping files with dirty data open at all times, but we judge
that to be too complicated to back-patch.

Author: Craig Ringer, with some adjustments by Thomas Munro
Reported-by: Craig Ringer
Reviewed-by: Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de

Don't forget about failed fsync() requests.

If fsync() fails, md.c must keep the request in its bitmap, so that
future attempts will try again.

Back-patch to all supported releases.

Author: Thomas Munro
Reviewed-by: Amit Kapila
Reported-by: Andrew Gierth
Discussion: https://postgr.es/m/87y3i1ia4w.fsf%40news-spur.riddles.org.uk

Rework documentation of pg_promote

This clarifies the behavior of how the "wait" flag works, which is
something that the previous version of the documentation failed to do.

Author: Ian Barwick
Discussion: https://postgr.es/m/cbd38450-2295-10a1-1f73-591a692ae0b0@2ndquadrant.com

Remove unnecessary memcpy when reading WAL record fitting on page

When reading a WAL record, its contents are copied into an intermediate
buffer. However, doing so is not necessary if the record fits fully
into the current page, saving one memcpy for each such record. The
allocation handling of the intermediate buffer is also now done only
when a record crosses a page boundary, shaving some extra cycles when
reading a WAL record.

Author: Andrey Lepikhov
Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas
Discussion: https://postgr.es/m/c2ea54dd-a1d3-80eb-ddbf-7e6f258e615e@postgrespro.ru

fix typo

Silence MSVC warnings about redefinition of isnan

Some versions of perl.h define isnan when the compiler is MSVC. To avoid
warnings about this, undefine the symbol before including perl.h and
re-add the definition afterwards if it wasn't recreated.

Discussion: https://postgr.es/m/caf0568e-3c1f-07fd-6914-d903f22560f2@2ndQuadrant.com

Fix AC_REQUIRES breakage in LLVM autoconf tests.

Any Autoconf macro that uses AC_REQUIRES -- directly or indirectly --
must not be inside a plain shell "if" test; if it is, whatever code
gets pulled in by the AC_REQUIRES will also be inside that "if".
Instead of "if" we can use AS_IF, which knows how to get this right
(cf commit 01051a987).

The only immediate problem from getting this wrong was that AC_PROG_AWK
had to be run twice, once inside the "if llvm" block and once in the
main line. However, it broke a different patch I'm about to submit
more thoroughly.

Add valgrind suppressions for wcsrtombs optimizations

wcsrtombs (called through wchar2char from common functions like lower,
upper, etc.) uses various optimizations that may look like access to
uninitialized data, triggering valgrind reports.

For example AVX2 instructions load data in 256-bit chunks, and gconv
does something similar with 32-bit chunks. This is faster than accessing
the bytes one by one, and the uninitialized part of the buffer is not
actually used. So suppress the bogus reports.

The exact stack depends on possible optimizations - it might be AVX, SSE
(as in the report by Aleksander Alekseev) or something else. Hence the
last frame is wildcarded, to deal with this.

Backpatch all the way back to 9.4.

Author: Tomas Vondra
Discussion: https://www.postgresql.org/message-id/flat/90ac0452-e907-e7a4-b3c8-15bd33780e62%402ndquadrant.com
Discussion: https://www.postgresql.org/message-id/20180220150838.GD18315@e733.localdomain

Avoid defining SIGTTIN/SIGTTOU on Windows.

Setting them to SIG_IGN seems unlikely to have any beneficial effect
on that platform, and given the signal numbering collision with SIGABRT,
it could easily have bad effects.

Given the lack of field complaints that can be traced to this, I don't
presently feel a need to back-patch.

Discussion: https://postgr.es/m/5627.1542477392@sss.pgh.pa.us

Leave SIGTTIN/SIGTTOU signal handling alone in postmaster child processes.

For reasons lost in the mists of time, most postmaster child processes
reset SIGTTIN/SIGTTOU signal handling to SIG_DFL, with the major exception
that backend sessions do not.  It seems like a pretty bad idea for any
postmaster children to do that: if stderr is connected to the terminal,
and the user has put the postmaster in background, any log output would
result in the child process freezing up.  Hence, switch them all to
doing what backends do, ie, nothing.  This allows them to inherit the
postmaster's SIG_IGN setting.  On the other hand, manually-launched
processes such as standalone backends will have default processing,
which seems fine.

In passing, also remove useless resets of SIGCONT and SIGWINCH signal
processing.  Perhaps the postmaster once changed those to something
besides SIG_DFL, but it doesn't now, so these are just wasted (and
confusing) syscalls.

Basically, this propagates the changes made in commit 8e2998d8a from
backends to other postmaster children.  Probably the only reason these
calls now exist elsewhere is that I missed changing pgstat.c along with
postgres.c at the time.

Given the lack of field complaints that can be traced to this, I don't
presently feel a need to back-patch.

Discussion: https://postgr.es/m/5627.1542477392@sss.pgh.pa.us

Fix some spurious new compiler warnings in MSVC.

Per buildfarm animal bowerbird.

Discussion: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2018-11-17%2002%3A30%3A20

Make TupleTableSlots extensible, finish split of existing slot type.

This commit completes the work prepared in 1a0586de36, splitting the
old TupleTableSlot implementation (which could store buffer, heap,
minimal and virtual slots) into four different slot types.  As
described in the aforementioned commit, this is done with the goal of
making tuple table slots extensible, to allow for pluggable table
access methods.

To achieve runtime extensibility for TupleTableSlots, operations on
slots that can differ between types of slots are performed using the
TupleTableSlotOps struct provided at slot creation time.  That
includes information from the size of TupleTableSlot struct to be
allocated, initialization, deforming etc.  See the struct's definition
for more detailed information about callbacks TupleTableSlotOps.

I decided to rename TTSOpsBufferTuple to TTSOpsBufferHeapTuple and
ExecCopySlotTuple to ExecCopySlotHeapTuple, as that seems more
consistent with other naming introduced in recent patches.

There's plenty optimization potential in the slot implementation, but
according to benchmarking the state after this commit has similar
performance characteristics to before this set of changes, which seems
sufficient.

There's a few changes in execReplication.c that currently need to poke
through the slot abstraction, that'll be repaired once the pluggable
storage patchset provides the necessary infrastructure.

Author: Andres Freund and  Ashutosh Bapat, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de

Avoid re-typedef'ing PartitionTupleRouting

Apparently, gcc on macOS (?) doesn't like it.

Per buildfarm.

pgbench: introduce a RandomState struct

This becomes useful when used to retry a transaction after a
serialization error or deadlock abort. (We don't yet have that feature,
but this is preparation for it.)

While at it, use separate random state for thread administratrivia such
as deciding which script to run, how long to delay for throttling, or
whether to log a message when sampling; this not only makes these tasks
independent of each other, but makes the actual thread run
deterministic.

Author: Marina Polyakova
Reviewed-by: Fabien Coelho
Discussion: https://postgr.es/m/72a0d590d6ba06f242d75c2e641820ec@postgrespro.ru

Inline hot path of slot_getsomeattrs().

This yields a minor speedup, which roughly balances the loss from the
upcoming introduction of callbacks to do some operations on slots.

Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de

Redesign initialization of partition routing structures

This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well
as the future MERGE) on partitioned tables.

This changes the setup for tuple routing so that it does far less work
during the initial setup and pushes more work out to when partitions
receive tuples.  PartitionDispatchData structs for sub-partitioned
tables are only created when a tuple gets routed through it.  The
possibly large arrays in the PartitionTupleRouting struct have largely
been removed.  The partitions[] array remains but now never contains any
NULL gaps.  Previously the NULLs had to be skipped during
ExecCleanupTupleRouting(), which could add a large overhead to the
cleanup when the number of partitions was large.  The partitions[] array
is allocated small to start with and only enlarged when we route tuples
to enough partitions that it runs out of space. This allows us to keep
simple single-row partition INSERTs running quickly.  Redesign

The arrays in PartitionTupleRouting which stored the tuple translation maps
have now been removed.  These have been moved out into a
PartitionRoutingInfo struct which is an additional field in ResultRelInfo.

The find_all_inheritors() call still remains by far the slowest part of
ExecSetupPartitionTupleRouting(). This commit just removes the other slow
parts.

In passing also rename the tuple translation maps from being ParentToChild
and ChildToParent to being RootToPartition and PartitionToRoot. The old
names mislead you into thinking that a partition of some sub-partitioned
table would translate to the rowtype of the sub-partitioned table rather
than the root partitioned table.

Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera
Testing help from Jesper Pedersen and Kato Sho.
Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com

Fix slot type assumptions for nodeGather[Merge].

The assumption made in 1a0586de3657c was wrong, as evidenced by
buildfarm failure on locust, which runs with
force_parallel_mode=regress. The tuples accessed in either nodes are
in the outer slot, and we can't trivially rely on the slot type being
known because the leader might execute the subsidiary node directly,
or via the tuple queue on a worker. In the latter case the tuple will
always be a heaptuple slot, but in the former, it'll be whatever the
subsidiary node returns.

Add dummy field to currently empty struct TupleTableSlotOps.

Per MSVC complaint on buildfarm member dory.

Don't generate tuple deforming functions for virtual slots.

Virtual tuple table slots never need tuple deforming. Therefore, if we
know at expression compilation time, that a certain slot will always
be virtual, there's no need to create a tuple deforming routine for
it.

Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de

Verify that expected slot types match returned slot types.

This is important so JIT compilation knows what kind of tuple slot the
deforming routine can expect. There's also optimization potential for
expression initialization without JIT compilation. It e.g. seems
plausible to elide EEOP_*_FETCHSOME ops entirely when dealing with
virtual slots.

Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de

Compute information about EEOP_*_FETCHSOME at expression init time.

Previously this information was computed when JIT compiling an
expression. But the information is useful for assertions in the
non-JIT case too (for assertions), therefore it makes sense to move
it.

This will, in a followup commit, allow to treat different slot types
differently. E.g. for virtual slots there's no need to generate a JIT
function to deform the slot.

Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de

Introduce notion of different types of slots (without implementing them).

Upcoming work intends to allow pluggable ways to introduce new ways of
storing table data. Accessing those table access methods from the
executor requires TupleTableSlots to be carry tuples in the native
format of such storage methods; otherwise there'll be a significant
conversion overhead.

Different access methods will require different data to store tuples
efficiently (just like virtual, minimal, heap already require fields
in TupleTableSlot). To allow that without requiring additional pointer
indirections, we want to have different structs (embedding
TupleTableSlot) for different types of slots. Thus different types of
slots are needed, which requires adapting creators of slots.

The slot that most efficiently can represent a type of tuple in an
executor node will often depend on the type of slot a child node
uses. Therefore we need to track the type of slot is returned by
nodes, so parent slots can create slots based on that.

Relatedly, JIT compilation of tuple deforming needs to know which type
of slot a certain expression refers to, so it can create an
appropriate deforming function for the type of tuple in the slot.

But not all nodes will only return one type of slot, e.g. an append
node will potentially return different types of slots for each of its
subplans.

Therefore add function that allows to query the type of a node's
result slot, and whether it'll always be the same type (whether it's
fixed). This can be queried using ExecGetResultSlotOps().

The scan, result, inner, outer type of slots are automatically
inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(),
left/right subtrees respectively. If that's not correct for a node,
that can be overwritten using new fields in PlanState.

This commit does not introduce the actually abstracted implementation
of different kind of TupleTableSlots, that will be left for a followup
commit. The different types of slots introduced will, for now, still
use the same backing implementation.

While this already partially invalidates the big comment in
tuptable.h, it seems to make more sense to update it later, when the
different TupleTableSlot implementations actually exist.

Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de

Rejigger materializing and fetching a HeapTuple from a slot.

Previously materializing a slot always returned a HeapTuple. As
current work aims to reduce the reliance on HeapTuples (so other
storage systems can work efficiently), that needs to change. Thus
split the tasks of materializing a slot (i.e. making it independent
from the underlying storage / other memory contexts) from fetching a
HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from
a slot and materializing the slot at the same time, controlled by a
parameter.

For now some callers of ExecFetchSlotHeapTuple, with materialize =
true, expect that changes to the heap tuple will be reflected in the
underlying slot. Those places will be adapted in due course, so while
not pretty, that's OK for now.

Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and
ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely
that future storage methods will need similar methods. There already
is ExecFetchSlotMinimalTuple, so the new names make the naming scheme
more coherent.

Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de

A small tweak to some comments for PartitionKeyData

It was not really that obvious that there's meant to be exactly 1 item
in the partexprs List for each zero-valued partattrs element. Some
incorrect code using these fields was the cause of CVE-2018-1052, so
it's worthwhile to mention how they should be used in the comments.

Author: David Rowley <david.rowley@2ndquadrant.com>

Correct code comments for PartitionedRelPruneInfo struct

The comments above the PartitionedRelPruneInfo struct incorrectly
document how subplan_map and subpart_map are indexed. This seems to
have snuck in on 4e232364033.

Author: David Rowley <david.rowley@2ndquadrant.com>

Update executor documentation for run-time partition pruning

With run-time partition pruning, there is no longer necessarily an
executor node for each corresponding plan node.

Author: David Rowley <david.rowley@2ndquadrant.com>

Rationalize expression context reset in ExecModifyTable().

The current pattern of reseting expressions both in
ExecProcessReturning() and ExecOnConflictUpdate() makes it harder than
necessary to reason about memory lifetimes. It also requires
materializing slots unnecessarily, although this patch doesn't take
advantage of the fact that that's not necessary anymore.

Instead reset the expression context once for each input tuple.

Author: Ashutosh Bapat
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de

Make reformat-dat-files, reformat-dat-files VPATH safe.

The reformat_dat_file.pl script, added by 372728b0d49552641, supported
all the necessary options to make it work in a VPATH build, but the
makefile invocations didn't take VPATH into account. Fix that.

Discussion: https://postgr.es/m/20181115185303.d2z7wonx23mdfvd3@alap3.anarazel.de
Backpatch: 11-, where 372728b0d49552641 was merged

Improve performance of partition pruning remapping a little.

ExecFindInitialMatchingSubPlans has to update the PartitionPruneState's
subplan mapping data to account for the removal of subplans it prunes.
However, that's only necessary if run-time pruning will also occur,
so we can skip it when that won't happen, which should result in not
needing to do the remapping in many cases. (We now need it only when
some partitions are potentially startup-time prunable and others are
potentially run-time prunable, which seems like an unusual case.)

Also make some marginal performance improvements in the remapping
itself. These will mainly win if most partitions got pruned by
the startup-time pruning, which is perhaps a debatable assumption
in this context.

Also fix some bogus comments, and rearrange code to marginally
reduce space consumption in the executor's query-lifespan context.

David Rowley, reviewed by Yoshikazu Imai

Discussion: https://postgr.es/m/CAKJS1f9+m6-di-zyy4B4AGn0y1B9F8UKDRigtBbNviXOkuyOpw@mail.gmail.com

geo_ops.c: Clarify comments and function arguments

These functions were not crystal clear about what their respective APIs
are. Make an effort to improve that.

Emre's patch was correct AFAICT, but I (Álvaro) felt the need to improve
a few comments a bit more. Any resulting errors are my own.

Per complaint from Coverity, Ning Yu, and Tom Lane.

Author: Emre Hasegeli, Álvaro Herrera
Reviewed-by: Tomas Vondra, Álvaro Herrera
Discussion: https://postgr.es/m/26769.1533090136@sss.pgh.pa.us

Fix the omission in docs.

Commit 5373bc2a08 has added type for background workers but forgot to
update at one place in the documentation.

Reported-by: John Naylor
Author: John Naylor
Reviewed-by: Amit Kapila
Backpatch-through: 11
Discussion: https://postgr.es/m/CAJVSVGVmvgJ8Lq4WBxC3zV5wf0txdCqRSgkWVP+jaBF=HgWscA@mail.gmail.com

Further adjustment to random() seed initialization.

Per complaint from Tom Lane, don't chomp the timestamp at 32 bits, so we
can shift in some of its higher bits.

Discussion: https://postgr.es/m/14712.1542253115%40sss.pgh.pa.us

Increase the number of possible random seeds per time period.

Commit 197e4af9 refactored the initialization of the libc random()
seed, but reduced the number of possible seeds values that could be
chosen in a given time period. This negation of the effects of
commit 98c50656c was unintentional. Replace with code that
shifts the fast moving timestamp bits left, similar to the original
algorithm (though not the previous float-tolerating coding, which
is no longer necessary).

Author: Thomas Munro
Reported-by: Noah Misch
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20181112083358.GA1073796%40rfd.leadboat.com