granicus.if.org Git - postgresql/log

Reduce cost of test_decoding's new oldest_xmin test

Change a whole-database VACUUM into doing just pg_attribute, which is
the portion that verifies what we want it to do.  The original
formulation wastes a lot of CPU time, which leads the test to fail when
runtime exceeds isolationtester timeout when it's super-slow, such as
under CLOBBER_CACHE_ALWAYS.  Per buildfarm member friarbird.

It turns out that the previous shape of the test doesn't always detect
the condition it is supposed to detect (on unpatched reorderbuffer
code): the reason is that there is a good chance of encountering a
xl_running_xacts record (logged every 15 seconds) before the checkpoint
-- and because we advance the xmin when we receive that WAL record, and
we *don't* advance the xmin twice consecutively without receiving a
client message in between, that means the xmin is not advanced enough
for the tuple to be pruned from pg_attribute by VACUUM.  So the test
would spuriously pass.

The reason this test deficiency wasn't detected earlier is that HOT
pruning removes the tuple anyway, even if vacuum leaves it in place, so
the test correctly fails (detecting the coding mistake), but for the
wrong reason.

To fix this mess, run the s0_get_changes step twice before vacuum
instead of once: this seems to cause the xmin to be advanced reliably,
wreaking havoc with more certainty.

Author: Arseny Sher
Discussion: https://postgr.es/m/87h8lkuxoa.fsf@ars-thinkpad

Prevent references to invalid relation pages after fresh promotion

If a standby crashes after promotion before having completed its first
post-recovery checkpoint, then the minimal recovery point which marks
the LSN position where the cluster is able to reach consistency may be
set to a position older than the first end-of-recovery checkpoint while
all the WAL available should be replayed.  This leads to the instance
thinking that it contains inconsistent pages, causing a PANIC and a hard
instance crash even if all the WAL available has not been replayed for
certain sets of records replayed.  When in crash recovery,
minRecoveryPoint is expected to always be set to InvalidXLogRecPtr,
which forces the recovery to replay all the WAL available, so this
commit makes sure that the local copy of minRecoveryPoint from the
control file is initialized properly and stays as it is while crash
recovery is performed.  Once switching to archive recovery or if crash
recovery finishes, then the local copy minRecoveryPoint can be safely
updated.

Pavan Deolasee has reported and diagnosed the failure in the first
place, and the base fix idea to rely on the local copy of
minRecoveryPoint comes from Kyotaro Horiguchi, which has been expanded
into a full-fledged patch by me.  The test included in this commit has
been written by Álvaro Herrera and Pavan Deolasee, which I have modified
to make it faster and more reliable with sleep phases.

Backpatch down to all supported versions where the bug appears, aka 9.3
which is where the end-of-recovery checkpoint is not run by the startup
process anymore.  The test gets easily supported down to 10, still it
has been tested on all branches.

Reported-by: Pavan Deolasee
Diagnosed-by: Pavan Deolasee
Reviewed-by: Pavan Deolasee, Kyotaro Horiguchi
Author: Michael Paquier, Kyotaro Horiguchi, Pavan Deolasee, Álvaro
Herrera
Discussion: https://postgr.es/m/CABOikdPOewjNL=05K5CbNMxnNtXnQjhTx2F--4p4ruorCjukbA@mail.gmail.com

Use context with correct lifetime in hypothetical_dense_rank_final.

The query lifetime expression context created in
hypothetical_dense_rank_final() was buggily allocated in the calling
memory context. I (Andres) broke that in bf6c614a2f2.

Reported-By: Rajkumar Raghuwanshi
Author: Amit Langote
Discussion: https://postgr.es/m/CAKcux6kmzWmur5HhA_aU6gYVFu0RLQdgJJ+aC9SLdcOvBSrpfA@mail.gmail.com
Backpatch: 11-

Check for interrupts inside the nbtree page deletion code.

When deleting pages the nbtree code has to walk through siblings of a
tree node. When those sibling links are corrupted that can lead to
endless loops - which are currently not interruptible. This is
especially problematic if autovacuum is repeatedly blocked on such
indexes, as it can be hard to get out of that situation without
resorting to single user mode.

Thus add interrupt checks to appropriate places in such
loops. Unfortunately in one of the cases it's it's not easy to do so.

Between 9.3 and 9.4 the page deletion (and page split) code changed
significantly. Before it was significantly less robust against
interruptions. Therefore don't backpatch to 9.3.

Author: Andres Freund
Discussion: https://postgr.es/m/20180627191629.wkunw2qbibnvlz53@alap3.anarazel.de
Backpatch: 9.4-

Improve the performance of relation deletes during recovery.

When multiple relations are deleted at the same transaction,
the files of those relations are deleted by one call to smgrdounlinkall(),
which leads to scan whole shared_buffers only one time. OTOH,
previously, during recovery, smgrdounlink() (not smgrdounlinkall()) was
called for each file to delete, which led to scan shared_buffers
multiple times. Obviously this could cause to increase the WAL replay
time very much especially when shared_buffers was huge.

To alleviate this situation, this commit changes the recovery so that
it also calls smgrdounlinkall() only one time to delete multiple
relation files.

This is just fix for oversight of commit 279628a0a7, not new feature.
So, per discussion on pgsql-hackers, we concluded to backpatch this
to all supported versions.

Author: Fujii Masao
Reviewed-by: Michael Paquier, Andres Freund, Thomas Munro, Kyotaro Horiguchi, Takayuki Tsunakawa
Discussion: https://postgr.es/m/CAHGQGwHVQkdfDqtvGVkty+19cQakAydXn1etGND3X0PHbZ3+6w@mail.gmail.com

Remove dead code for temporary relations in partition planning

Since recent commit 1c7c317c, temporary relations cannot be mixed with
permanent relations within the same partition tree, and the same counts
for temporary relations created by other sessions, which the planner
simply discarded. Instead be paranoid and issue an error, as those
should be blocked at definition time, at least for now.

At the same time, a test case is added to stress what has been moved
when expand_partitioned_rtentry gets called recursively but bumps on a
partitioned relation with no partitions which should be handled the same
way as the non-inheritance case. This code may be reworked in a close
future, and covering this code path will limit surprises.

Reported-by: David Rowley
Author: David Rowley
Reviewed-by: Amit Langote, Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/CAKJS1f_HyV1txn_4XSdH5EOhBMYaCwsXyAj6bHXk9gOu4JKsbw@mail.gmail.com

Correct comment

Fix libpq example programs

When these programs call pg_catalog.set_config, they need to check for
PGRES_TUPLES_OK instead of PGRES_COMMAND_OK. Fix for
5770172cb0c9df9e6ce27c507b449557e5b45124.

Reported-by: Ideriha, Takeshi <ideriha.takeshi@jp.fujitsu.com>

perltidy run prior to branching

pgindent run prior to branching

Update typedefs list

Documentation spell checking and markup improvements

doc: Replace non-ASCII lines in psql example output

psql: show cloned triggers in partitions

In a partition, row triggers that had been cloned from their parent
partitioned table would not be listed at all in psql's \d, which could
surprise users, per insistent complaint from Ashutosh Bapat (though his
aim was elsewhere). The simplest possible fix, suggested by Peter
Eisentraut, seems to be to list triggers marked as internal if they have
a row in pg_depend that points to some other trigger.

Author: Álvaro Herrera
Discussion: https://postgr.es/m/20180618165910.p26vhk7dpq65ix54@alvherre.pgsql

Fix crash when ALTER TABLE recreates indexes on partitions

The skip_build flag was not being passed correctly when recursing to
indexes on partitions, leading to attempts to rebuild indexes when they
were not yet ready to be rebuilt.

Reported-by: Rajkumar Raghuwanshi
Discussion: https://postgr.es/m/CAKcux6mxNCGsgATwf5CGMF8g4WSupCXicCVMeKUTuWbyxHOMsQ@mail.gmail.com

Replace search.cpan.org with metacpan.org

search.cpan.org has been EOL'd, with metacpan.org being the official
replacement to which URLs now redirect. Update links to match the new
URL. Also update links to CPAN to use https as it will redirect from
http.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/B74C0219-6BA9-46E1-A524-5B9E8CD3BDB3@yesql.se

Make capitalization of term "OpenSSL" more consistent

This includes code comments and documentation. No backpatch as this is
cosmetic even if there are documentation changes which are user-facing.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/BB89928E-2BC7-489E-A5E4-6D204B3954CF@yesql.se

Fix typo in comment

Author: Amit Langote
Discussion: https://postgr.es/m/b23dc88b-df41-ef07-22c5-12f77cf73b57@lab.ntt.co.jp

Fix thinko in comments.

A slot can not be stored in a tuple but it's vice versa.

Reported-by: Ashutosh Bapat
Author: Ashutosh Bapat
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAFjFpRcHhNhXdegyJv3KKDWrwO1_NB_KYZM_ZSDeMOZaL1A5jQ@mail.gmail.com

Change pqformat.h's integer handling functions to take unsigned integers.

As added in 1de09ad8eb1fa673ee7899d6dfbb2b49ba204818 the new functions
all accept signed integers as parameters. That's not great, because
it's perfectly reasonable to call them with unsigned parameters.
Unfortunately unsigned to signed conversion is not well defined, when
exceeding the range of the signed value. That's presently not a
practical issue in postgres (among other reasons because we force
gcc's hand with -fwrapv). But it's clearly not quite right.

Thus change the signatures to accept unsigned integers instead, signed
to unsigned conversion is always well defined. Also change the
backward compat pq_sendint() - while it's deprecated it seems better
to be consistent.

Per discussion between Andrew Gierth, Michael Paquier, Alvaro Herrera
and Tom Lane.

Reported-By: Andrew Gierth
Author: Andres Freund
Discussion: https://postgr.es/m/87r2m10zm2.fsf@news-spur.riddles.org.uk

Remove duplicated return statement from llvmjit code.

The duplicated return clearly doesn't make sense / isn't
reachable. Likely introduced by me (Andres), while revising the code.

Author: Rushabh Lathia
Discussion: https://postgr.es/m/CAGPqQf2raxWOcbuTP36M1rEF3=Rfo7oD29K3psdyHMeE5swBRg@mail.gmail.com

Fix whitespace

doc: Improve wording and fix whitespace

doc: Document some nuances of logical replication of TRUNCATE

Cosmetic improvements for faster column addition.

Changed the name of few structure members for the sake of clarity and
removed spurious whitespace.

Reported-by: Amit Kapila
Author: Amit Kapila, based on suggestion by Andrew Dunstan
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/CAA4eK1K2znsFpC+NQ9A4vxT4uDxADN4RmvHX0L6Y=aHVo9gB4Q@mail.gmail.com

Fix "base" snapshot handling in logical decoding

Two closely related bugs are fixed.  First, xmin of logical slots was
advanced too early.  During xl_running_xacts processing, xmin of the
slot was set to the oldest running xid in the record, but that's wrong:
actually, snapshots which will be used for not-yet-replayed transactions
might consider older txns as running too, so we need to keep xmin back
for them.  The problem wasn't noticed earlier because DDL which allows
to delete tuple (set xmax) while some another not-yet-committed
transaction looks at it is pretty rare, if not unique: e.g. all forms of
ALTER TABLE which change schema acquire ACCESS EXCLUSIVE lock
conflicting with any inserts. The included test case (test_decoding's
oldest_xmin) uses ALTER of a composite type, which doesn't have such
interlocking.

To deal with this, we must be able to quickly retrieve oldest xmin
(oldest running xid among all assigned snapshots) from ReorderBuffer. To
fix, add another list of ReorderBufferTXNs to the reorderbuffer, where
transactions are sorted by base-snapshot-LSN.  This is slightly
different from the existing (sorted by first-LSN) list, because a
transaction can have an earlier LSN but a later Xmin, if its first
record does not obtain an xmin (eg. xl_xact_assignment).  Note this new
list doesn't fully replace the existing txn list: we still need that one
to prevent WAL recycling.

The second issue concerns SnapBuilder snapshots and subtransactions.
SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a
transaction that is known to be a subtxn, which is good in the common
case that the top-level transaction already has one (no point in doing
so), but a bug otherwise.  To fix, arrange to transfer the snapshot from
the subtxn to its top-level txn as soon as the kinship gets known.
test_decoding's snapshot_transfer verifies this.

Also, fix a minor memory leak: refcount of toplevel's old base snapshot
was not decremented when the snapshot is transferred from child.

Liberally sprinkle code comments, and rewrite a few existing ones.  This
part is my (Álvaro's) contribution to this commit, as I had to write all
those comments in order to understand the existing code and Arseny's
patch.

Reported-by: Arseny Sher <a.sher@postgrespro.ru>
Diagnosed-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/87lgdyz1wj.fsf@ars-thinkpad

Fix upper limit for vacuum_cleanup_index_scale_factor

6ca33a88 sets upper limit for vacuum_cleanup_index_scale_factor to
DBL_MAX. DBL_MAX appears to be platform-dependent. That causes
many buildfarm animals to fail, because we check boundaries of
vacuum_cleanup_index_scale_factor in regression tests.

This commit changes upper limit from DBL_MAX to just "large enough"
limit, which was arbitrary selected as 1e10.

Author: Alexander Korotkov
Reported-by: Tom Lane, Darafei Praliaskouski
Discussion: https://postgr.es/m/CAPpHfdvewmr4PcpRjrkstoNn1n2_6dL-iHRB21CCfZ0efZdBTg%40mail.gmail.com
Discussion: https://postgr.es/m/CAC8Q8tLYFOpKNaPS_E7V8KtPdE%3D_TnAn16t%3DA3LuL%3DXjfOO-BQ%40mail.gmail.com

|--- gitweb/email subject limit -----------------|-------------|

doc: PG 11 relnotes: remove channel binding from major features

Also move to the source code section, and expand the paragraph

Correct a comment on logtape.c's leader tape.

randomAccess parallel tuplesorts are disallowed because the leader would
try to write to its own leader tape, not because the leader would try to
write to a worker tape directly.

Cleanup from commit 9da0cc35284.

Remove obsolete comment block in nbtsort.c.

Building a new nbtree index through incremental insertions would always
be slower than our actual approach of sorting using tuplesort,
assembling leaf pages from tuplesort output, and writing and WAL-logging
whole pages. Remove a comment block from the Berkeley days claiming
that incremental insertions might be slightly faster with presorted
input.

Discussion: https://postgr.es/m/CAH2-WzmKs4mLAoFgJ3yHMRYc849efc=dw+pNRb3NEog2oJoCNw@mail.gmail.com

Enable failure to rename a partitioned index

Concurrently with partitioned index development (commit 8b08f7d4820f),
the code to handle failure to rename indexes was refactored (commit
8b9e9644dc6a). Turns out that that particular case was untested, which
naturally led it to be broken. Add tests and the missing code line.

Co-authored-by: David Rowley <dgrowley@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Discussion: https://postgr.es/m/CAKcux6mfYMS3OX0ywjOiWiGSEKhJf-1zdeTceHFbd0mScUzU5A@mail.gmail.com

Fix documentation bug related to backup history file.

The backup history file has been no longer necessary for recovery
since the version 9.0. It's now basically just for informational purpose.
But previously the documentations still described that a recovery
requests the backup history file to proceed. The commit fixes this
documentation bug.

Back-patch to all supported versions.

Author: Yugo Nagata
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20180626174752.0ce505e3.nagata@sraoss.co.jp

Allow direct lookups of AppendRelInfo by child relid

find_appinfos_by_relids had quite a large overhead when the number of
items in the append_rel_list was high, as it had to trawl through the
append_rel_list looking for AppendRelInfos belonging to the given
childrelids. Since there can only be a single AppendRelInfo for each
child rel, it seems much better to store an array in PlannerInfo which
indexes these by child relid, making the function O(1) rather than O(N).
This function was only called once inside the planner, so just replace
that call with a lookup to the new array. find_childrel_appendrelinfo
is now unused and thus removed.

This fixes a planner performance regression new to v11 reported by
Thomas Reiss.

Author: David Rowley
Reported-by: Thomas Reiss
Reviewed-by: Ashutosh Bapat
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/94dd7a4b-5e50-0712-911d-2278e055c622@dalibo.com

Increase upper limit for vacuum_cleanup_index_scale_factor

Upper limits for vacuum_cleanup_index_scale_factor GUC and reloption
were initially set to 100.0 in 857f9c36.  However, after further
discussion, it appears that some users like to disable B-tree cleanup
index scan completely (assuming there are no deleted pages).

vacuum_cleanup_index_scale_factor is used barely to protect against
stalled index statistics.  And after detailed consideration it appears
that risk of stalled index statistics is low.  And it would be nice to
allow advanced users setting higher values of
vacuum_cleanup_index_scale_factor.  So, set upper limit for these
GUC and reloption to DBL_MAX.

Author: Alexander Korotkov
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAC8Q8tJCb%3DgxhzcV7T6ctx7PY-Ux1oA-AsTJc6cAVNsQiYcCzA%40mail.gmail.com

Reword SPI_ERROR_TRANSACTION errors in PL/pgSQL

The previous message for SPI_ERROR_TRANSACTION claimed "cannot begin/end
transactions in PL/pgSQL", but that is no longer true. Nevertheless,
the error can still happen, so reword the messages. The error cases in
exec_prepare_plan() could never happen, so remove them.

Move RecoveryLockList into a hash table.

Standbys frequently need to release all locks held by a given xid.
Instead of searching one big list linearly, let's create one list
per xid and put them in a hash table, so we can find what we need
in O(1) time.

Earlier analysis and a prototype were done by David Rowley, though
this isn't his patch.

Back-patch all the way.

Author: Thomas Munro
Diagnosed-by: David Rowley, Andres Freund
Reviewed-by: Andres Freund, Tom Lane, Robert Haas
Discussion: https://postgr.es/m/CAEepm%3D1mL0KiQ2KJ4yuPpLGX94a4Ns_W6TL4EGRouxWibu56pA%40mail.gmail.com
Discussion: https://postgr.es/m/CAKJS1f9vJ841HY%3DwonnLVbfkTWGYWdPN72VMxnArcGCjF3SywA%40mail.gmail.com

Fix description and documentation related to pg_restore --no-comments

These descriptions have been referring to object dump, but a restore
operation is done.

Reported-by: Andrey Lizenko
Author: Andrey Lizenko
Discussion: https://postgr.es/m/152992021588.1268.16786093506650391435@wrigleys.postgresql.org

Correct handling of fsync failures with tar mode of walmethods.c

This file has been missing the fact that it needs to report back to
callers a proper failure on fsync calls. I have spotted the one in
tar_finish() while Kuntal has spotted the one in tar_close().

Backpatch down to 10 where this code has been introduced.

Reported by: Michael Paquier, Kuntal Ghosh
Author: Michael Paquier
Reviewed-by: Kuntal Ghosh, Magnus Hagander
Discussion: https://postgr.es/m/20180625024356.GD1146@paquier.xyz

Update obsolete comments

Commit 9fab40ad32ef removed some pre-allocating logic in
reorderbuffer.c, but left outdated comments in place. Repair.

Author: Álvaro Herrera

Stamp 11beta2.

Translation updates

Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 884f33d735870f94357820800840af3e93ff4628

Address set of issues with errno handling

System calls mixed up in error code paths are causing two issues which
several code paths have not correctly handled:
1) For write() calls, sometimes the system may return less bytes than
what has been written without errno being set. Some paths were careful
enough to consider that case, and assumed that errno should be set to
ENOSPC, other calls missed that.
2) errno generated by a system call is overwritten by other system calls
which may succeed once an error code path is taken, causing what is
reported to the user to be incorrect.

This patch uses the brute-force approach of correcting all those code
paths. Some refactoring could happen in the future, but this is let as
future work, which is not targeted for back-branches anyway.

Author: Michael Paquier
Reviewed-by: Ashutosh Sharma
Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz

doc: adjust order of NUMERIC arguments to match syntax

Specifically, mention precision before scale

Reported-by: claytonjsalem@gmail.com
Discussion: https://postgr.es/m/152967566691.1268.1062965601465200209@wrigleys.postgresql.org

Backpatch-through: 9.3

doc: show how interval's 3 unit buckets behave using EXTRACT()

This clarifies when justify_days() and justify_hours() are useful.
Paragraph moved too.

Reported-by: vodevsh@gmail.com
Discussion: https://postgr.es/m/152698651482.26744.15456677499485530703@wrigleys.postgresql.org

Backpatch-through: 9.3

doc: update PG 11 release notes, mostly typo fixes

Discussion: https://postgr.es/m/CAJnrtny0mYCMoRanZ1wvGqcPV-UDBoPetavDM1SqxnGVfZRV3g@mail.gmail.com

Author: Brad DeJong

doc: mention use of cross platform logical replication

Discussion: https://postgr.es/m/CAJrrPGfdknoqZcMipPy8XnH3hO3uRic6JTD=jv35oj1DWqL07g@mail.gmail.com

Author: Haribabu Kommi

Mark binary_upgrade_set_missing_value as parallel_unsafe

per buildfarm.

Bump catalog version again although in practice nobody is going to use
this in a parallel query.

When index recurses to a partition, map columns numbers

Two out of three code paths were mapping column numbers correctly if a
partition had different column numbers than parent table, but the most
commonly used one (recursing in CREATE INDEX to a new index on a
partition) failed to map attribute numbers in expressions. Oddly
enough, attnums in WHERE clauses are already handled correctly
everywhere.

Reported-by: Amit Langote
Author: Amit Langote
Discussion: https://postgr.es/m/dce1fda4-e0f0-94c9-6abb-f5956a98c057@lab.ntt.co.jp
Reviewed-by: Álvaro Herrera

Avoid generating bogus paths with partitionwise aggregate.

Previously, if some or all partitions had no partially aggregated path,
we would still try to generate a partially aggregated path for the
parent, leading to assertion failures or wrong answers.

Report by Rajkumar Raghuwanshi. Patch by Jeevan Chalke, reviewed
by Ashutosh Bapat. A few changes by me.

Discussion: http://postgr.es/m/CAKcux6=q4+Mw8gOOX16ef6ZMFp9Cve7KWFstUsrDa4GiFaXGUQ@mail.gmail.com

Allow for pg_upgrade of attributes with missing values

Commit 16828d5c02 neglected to do this, so upgraded databases would
silently get null instead of the specified default in rows without the
attribute defined.

A new binary upgrade function is provided to perform this and pg_dump is
adjusted to output a call to the function if required in binary upgrade
mode.

Also included is code to drop missing attribute values for dropped
columns. That way if the type is later dropped the missing value won't
have a dangling reference to the type.

Finally the regression tests are adjusted to ensure that there is a row
with a missing value so that this code is exercised in upgrade testing.

Catalog version unfortunately bumped.

Regression test changes from Tom Lane.
Remainder from me, reviewed by Tom Lane, Andres Freund, Alvaro Herrera

Discussion: https://postgr.es/m/19987.1529420110@sss.pgh.pa.us

Fixes for vacuum_cleanup_index_scale_factor GUC option

vacuum_cleanup_index_scale_factor was located in autovacuum group of
GUCs.  However, it affects not only autovacuum, but also manually run
VACUUM.  It appears that "client connection defaults" group of GUCs
is more appropriate for vacuum_cleanup_index_scale_factor, because
vacuum_*_age options are already located there.

Also, vacuum_cleanup_index_scale_factor was missed in
postgresql.conf.sample.  So, add it there with appropriate comment.

Author: Masahiko Sawada with minor editorization by me
Discussion: https://postgr.es/m/CAD21AoArsoXMLKudXSKN679FRzs6oubEchM53bHwn8Tp%3D2boNg%40mail.gmail.com

Fix typo in comment of commit_ts.c for incorrect reference to CLOG

Author: Shao Bret

Improve coding pattern in Parallel Append code.

The create_append_path code didn't consider that list_concat will
modify it's first argument leading to inconsistent traversal of
resulting list. In practice, it won't lead to any user-visible bug
but changing it for making the code behave consistently.

Reported-by: Tom Lane
Author: Tom Lane
Reviewed-by: Amit Khandekar and Amit Kapila
Discussion: https://postgr.es/m/32365.1528994120@sss.pgh.pa.us

Disclaim support for default namespace in XMLTABLE

Pavel Stehule's original patch had support for default namespace, but I
ripped it out before commit -- hence the docs were correct when written,
and I broke them by omission :-(. Remove the offending phrase.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/1550C5E5-FC70-4493-A226-AA137D831E8D@yesql.se

Fix partial aggregation for variance(int4) and related aggregates.

A typo in numeric_poly_combine caused bogus results for queries using
it, but of course would only manifest if parallel aggregation is
performed. Reported by Rajkumar Raghuwanshi.

David Rowley did the diagnosis and the fix; I editorialized rather
heavily on his regression test additions.

Back-patch to v10 where the breakage was introduced (by 9cca11c91).

Discussion: https://postgr.es/m/CAKcux6nU4E2x8nkSBpLOT2DPvQ5LviJ3SGyAN6Sz7qDH4G4+Pw@mail.gmail.com

Set correct context for XPath evaluation

According to the SQL standard, the context of XMLTABLE's XPath
row_expression is the document node of the XML input document, not the
root node.  This becomes visible when a relative path rather than
absolute is used as row expression.  Absolute paths is what was used in
original tests and docs (and the most common form used in examples
throughout the interwebs), which explains why this wasn't noticed
before.

Other functions such as xpath() and xpath_exists() also have this
problem.  While not specified by the SQL standard, it would be pretty
odd to leave those functions to behave differently than XMLTABLE, so
change them too.  However, this is a backwards-incompatible change.

No backpatch, out of fear of breaking code depending on the original
broken behavior.

Author: Markus Winand
Reported-By: Markus Winand
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/0684A598-002C-42A2-AE12-F024A324EAE4@winand.at

Improve requirements documentation for ldap test suite.

Text by me; data contributed by me, Thomas Munro, Michael Paquier.

Discussion: https://postgr.es/m/20180521013425.GA4476@paquier.xyz

Fix mishandling of sortgroupref labels while splitting SRF targetlists.

split_pathtarget_at_srfs() neglected to worry about sortgroupref labels
in the intermediate PathTargets it constructs.  I think we'd supposed
that their labeling didn't matter, but it does at least for the case that
GroupAggregate/GatherMerge nodes appear immediately under the ProjectSet
step(s).  This results in "ERROR: ORDER/GROUP BY expression not found in
targetlist" during create_plan(), as reported by Rajkumar Raghuwanshi.

To fix, make this logic track the sortgroupref labeling of expressions,
not just their contents.  This also restores the pre-v10 behavior that
separate GROUP BY expressions will be kept distinct even if they are
textually equal().

Discussion: https://postgr.es/m/CAKcux6=1_Ye9kx8YLBPmJs_xE72PPc6vNi5q2AOHowMaCWjJ2w@mail.gmail.com

Improve wording of release notes item

PostgreSQL 11 introduces compress method for SP-GiST opclasses. That
was mistakenly interpreted as compression support for SP-GiST while
actually that allows lossy representation of leaf keys.

Author: Alexander Korotkov, based on proposal by Darafei Praliaskouski
Discussion: https://postgr.es/m/CAC8Q8tKbYmNdiyWr7hE4GfMY4fbqHKkFziKgrUuWHH6HJQs3og%40mail.gmail.com

Update expected XML output with disabled XML

Should have been done in previous commit.

Accept TEXT and CDATA nodes in XMLTABLE's column_expression.

Column expressions that match TEXT or CDATA nodes must return the
contents of the nodes themselves, not the content of non-existing
children (i.e. the empty string).

Author: Markus Winand
Reported-by: Markus Winand
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/0684A598-002C-42A2-AE12-F024A324EAE4@winand.at

Add missing include

Per buildfarm

Consistently use the term 'partitioned rel' in partprune comments

We were using 'partition rel' in a few places, which is quite confusing.

Author: Amit Langote
Reviewed-by: David Rowley
Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/fd256561-31a2-4b7e-cd84-d8241e7ebc3f@lab.ntt.co.jp

Fix typo

Reported using the website comment form

Fix a number of typos

Author: Liudmila Mantrova <l.mantrova@postgrespro.ru>

Support long option for --pgdata in pg_verify_checksums

Author: Daniel Gustafsson <daniel@yesql.se>

Document the -D and $PGDATA switch/env for pg_verify_checksums

Author: Daniel Gustafsson <daniel@yesql.se>

Move pg_verify_checksum docs to Server utils

Author: Daniel Gustafsson <daniel@yesql.se>

Don't consider parallel append for parallel unsafe paths.

Commit ab72716778 allowed Parallel Append paths to be generated for a
relation that is not parallel safe. Prevent that from happening.

Initial analysis by Tom Lane.

Reported-by: Rajkumar Raghuwanshi
Author: Amit Kapila and Rajkumar Raghuwanshi
Reviewed-by: Amit Khandekar and Robert Haas
Discussion:https://postgr.es/m/CAKcux6=tPJ6nJ08r__nU_pmLQiC0xY15Fn0HvG1Cprsjdd9s_Q@mail.gmail.com

Clarify use of temporary tables within partition trees

Since their introduction, partition trees have been a bit lossy
regarding temporary relations.  Inheritance trees respect the following
patterns:
1) a child relation can be temporary if the parent is permanent.
2) a child relation can be temporary if the parent is temporary.
3) a child relation cannot be permanent if the parent is temporary.
4) The use of temporary relations also imply that when both parent and
child need to be from the same sessions.

Partitions share many similar patterns with inheritance, however the
handling of the partition bounds make the situation a bit tricky for
case 1) as the partition code bases a lot of its lookup code upon
PartitionDesc which does not really look after relpersistence.  This
causes for example a temporary partition created by session A to be
visible by another session B, preventing this session B to create an
extra partition which overlaps with the temporary one created by A with
a non-intuitive error message.  There could be use-cases where mixing
permanent partitioned tables with temporary partitions make sense, but
that would be a new feature.  Partitions respect 2), 3) and 4) already.

It is a bit depressing to see those error checks happening in
MergeAttributes() whose purpose is different, but that's left as future
refactoring work.

Back-patch down to 10, which is where partitioning has been introduced,
except that default partitions do not apply there.  Documentation also
includes limitations related to the use of temporary tables with
partition trees.

Reported-by: David Rowley
Author: Amit Langote, Michael Paquier
Reviewed-by: Ashutosh Bapat, Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/CAKJS1f94Ojk0og9GMkRHGt8wHTW=ijq5KzJKuoBoqWLwSVwGmw@mail.gmail.com

Clarify the README files for the various separate TAP-based test suites.

Explain the difference between "make check" and "make installcheck".
Mention the need for --enable-tap-tests (only some of these did so
before). Standardize their wording about how to run the tests.

README: add URLs for openldap installation

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/20180521013425.GA4476@paquier.xyz

Backpatch-through: head

doc: explain use of json_populate_record{set}()

The set-returning nature of these functions make their use unclear. The
modified paragraph was added in PG 9.4.

Reported-by: yshaladi@denodo.com
Discussion: https://postgr.es/m/152571684246.9460.18059951267371255159@wrigleys.postgresql.org

Backpatch-through: 9.4

Fix typos in release notes

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/8E8CF1F8-BCB2-4D86-A059-4BF5138F6D87%40yesql.se

Track new configure flags introduced for version 11 in pg_config.h.win32

The following set of flags mainly matter when building Postgres code
with MSVC and those have been forgotten with latest developments:
- HAVE_LDAP_INITIALIZE, added by 35c0754f, and marked as disabled.
ldap_initialize() is a non-standard extension that provides a way to use
"ldaps" with OpenLDAP, but it is not supported on Windows, and instead
the non-standard ldap_sslinit() is used if WIN32 is defined.  Per input
from Thomas Munro.
- HAVE_X509_GET_SIGNATURE_NID, added by 054e8c6c, which is used by
SCRAM's channel binding tls-server-end-point.  Having this flag disabled
would cause this channel binding type to be unsupported for Windows
builds.
- HAVE_SSL_CLEAR_OPTIONS, added recently as of a364dfa4 to disable SSL
compression.
- HAVE_ASN1_STRING_GET0_DATA, added by 5c6df67, which is used to track
a new compatibility with OpenSSL 1.1.0.  This was missing from
pg_config.win32.h and is not enabled by default.  HAVE_BIO_GET_DATA,
HAVE_OPENSSL_INIT_SSL and HAVE_BIO_METH_NEW gain the same treatment.

The second and third flags are enabled with this commit, which raises
the bar of OpenSSL support to 1.0.2 on Windows as a minimum.  As this
is the LTS (long-time support) version of OpenSSL community and knowing
that all recent installers referred by OpenSSL upstream don't have
anymore 1.0.1 or older, we could live with that requirement.  In order
to allow the code to compile with OpenSSL 1.1.0, all the flags mentioned
above need to be enabled in pg_config.h.win32.

Author: Michael Paquier
Reviewed-by: Andrew Dunstan
Discussion: https://postgr.es/m/20180529211559.GF6632@paquier.xyz

Fix jsonb_plperl to convert Perl UV values correctly.

Values greater than IV_MAX were incorrectly converted to SQL,
for instance ~0 would become -1 rather than 18446744073709551615
(on a 64-bit machine).

Dagfinn Ilmari Mannsåker, adjusted a bit by me

Discussion: https://postgr.es/m/d8jtvskjzzs.fsf@dalvik.ping.uio.no

Fix contrib/hstore_plperl to look through scalar refs.

Bring this transform function into sync with the policy established
by commit 3a382983d.

Also, fix it to make sure that what it drills down to is indeed a
hash, and not some other kind of Perl SV. Previously, the test
cases added here provoked crashes.

Because of the crash hazard, back-patch to 9.5 where this module
was introduced.

Discussion: https://postgr.es/m/28336.1528393969@sss.pgh.pa.us

Allow plperl_sv_to_datum to look through scalar refs.

There seems little reason for the policy of throwing error if we
find a ref to something other than a hash or array.   Recursively
look through the ref, instead.  This makes the behavior in non-transform
cases comparable to what was already instantiated for jsonb_plperl.

Note that because we invoke any available transform function before
considering the ref case, it's up to each transform function whether
it wants to play along with this behavior or do something different.

Because the previous behavior was just to throw a useless error,
this seems unlikely to create any compatibility issues.  Still, given
the lack of field complaints so far, seems best not to back-patch.

Discussion: https://postgr.es/m/28336.1528393969@sss.pgh.pa.us

Avoid platform-dependent output from Data::Dumper.

Per buildfarm, the output from Data::Dumper for an IEEE infinity
is platform-dependent (e.g. "inf" vs "Inf"). Just skip that one
test case in the plperlu test; testing it on the plperl side is
coverage enough. Fixes issue in commit 1731e3741.

Fix excessive enreferencing in jsonb-to-plperl transform.

We want, say, 2 to be transformed as 2, not \\2 which is what the
original coding produced.  Perl's standard seems to be to add an RV
wrapper only for hash and array SVs, so do it like that.

This was missed originally because the test cases only checked what came
out of a round trip back to SQL, and the strip-all-dereferences loop at
the top of SV_to_JsonbValue hides the extra refs from view.  As a better
test, print the Perl value with Data::Dumper, like the hstore_plperlu
tests do.  While we can't do that in the plperl test, only plperlu,
that should be good enough because this code is the same for both PLs.
But also add a simplistic test for extra REFs, which we can do in both.

That strip-all-dereferences behavior is now a bit dubious; it's unlike
what happens for other Perl-to-SQL conversions.  However, the best
thing to do seems to be to leave it alone and make the other conversions
act similarly.  That will be done separately.

Dagfinn Ilmari Mannsåker, adjusted a bit by me

Discussion: https://postgr.es/m/d8jlgbq66t9.fsf@dalvik.ping.uio.no

Remove obsolete prohibition on function name matching a column name.

ProcedureCreate formerly threw an error if the function to be created
has one argument of composite type and the function name matches some
column of the composite type.  This was a (very non-bulletproof) defense
against creating situations where f(x) and x.f are ambiguous.  But we
don't really need such a defense in the wake of commit b97a3465d, which
allows us to deal with such situations fairly cleanly.  This behavior
also created a dump-and-reload hazard, since a function might be
rejected if a conflicting column name had been added to the input
composite type later.  Hence, let's just drop the check.

Discussion: https://postgr.es/m/CAOW5sYa3Wp7KozCuzjOdw6PiOYPi6D=VvRybtH2S=2C0SVmRmA@mail.gmail.com

Consider syntactic form when disambiguating function vs column reference.

Postgres has traditionally considered the syntactic forms f(x) and x.f
to be equivalent, allowing tricks such as writing a function and then
using it as though it were a computed-on-demand column.  However, our
behavior when both interpretations are feasible left something to be
desired: we always chose the column interpretation.  This could lead
to very surprising results, as in a recent bug report from Neil Conway.
It also created a dump-and-reload hazard, since what was a function
call in a dumped view could get interpreted as a column reference
at reload, if a matching column name had been added to the underlying
table since the view was created.

What seems better, in ambiguous situations, is to prefer the choice
matching the syntactic form of the reference.  This seems much less
astonishing in general, and it fixes the dump/reload hazard.

Although this could be called a bug fix, there have been few complaints
and there's some small risk of breaking applications that depend on the
old behavior, so no back-patch.  It does seem reasonable to slip it
into v11, though.

Discussion: https://postgr.es/m/CAOW5sYa3Wp7KozCuzjOdw6PiOYPi6D=VvRybtH2S=2C0SVmRmA@mail.gmail.com

Add PGTYPESchar_free() to avoid cross-module problems on Windows.

On Windows, it is sometimes important for corresponding malloc() and
free() calls to be made from the same DLL, since some build options can
result in multiple allocators being active at the same time. For that
reason we already provided PQfreemem(). This commit adds a similar
function for freeing string results allocated by the pgtypes library.

Author: Takayuki Tsunakawa
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8AD5D6%40G01JPEXMBYT05

Prevent hard failures of standbys caused by recycled WAL segments

When a standby's WAL receiver stops reading WAL from a WAL stream, it
writes data to the current WAL segment without having priorily zero'ed
the page currently written to, which can cause the WAL reader to read
junk data from a past recycled segment and then it would try to get a
record from it.  While sanity checks in place provide most of the
protection needed, in some rare circumstances, with chances increasing
when a record header crosses a page boundary, then the startup process
could fail violently on an allocation failure, as follows:
FATAL:  invalid memory alloc request size XXX

This is confusing for the user and also unhelpful as this requires in
the worst case a manual restart of the instance, impacting potentially
the availability of the cluster, and this also makes WAL data look like
it is in a corrupted state.

The chances of seeing failures are higher if the connection between the
standby and its root node is unstable, causing WAL pages to be written
in the middle.  A couple of approaches have been discussed, like
zero-ing  new WAL pages within the WAL receiver itself but this has the
disadvantage of impacting performance of any existing instances as this
breaks the sequential writes done by the WAL receiver.  This commit
deals with the problem with a more simple approach, which has no
performance impact without reducing the detection of the problem: if a
record is found with a length higher than 1GB for backends, then do not
try any allocation and report a soft failure which will force the
standby to retry reading WAL.  It could be possible that the allocation
call passes and that an unnecessary amount of memory is allocated,
however follow-up checks on records would just fail, making this
allocation short-lived anyway.

This patch owes a great deal to Tsunakawa Takayuki for reporting the
failure first, and then discussing a couple of potential approaches to
the problem.

Backpatch down to 9.5, which is where palloc_extended has been
introduced.

Reported-by: Tsunakawa Takayuki
Reviewed-by: Tsunakawa Takayuki
Author: Michael Paquier
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05

Suppress -Wshift-negative-value warnings.

Clean up four places that result in compiler warnings when using recent
gcc with this warning class enabled (as seen on buildfarm members
calliphoridae, skink, and others). In all these places, this is purely
cosmetic, because the shift distance could not be large enough to risk
a change of sign, so there's no chance of implementation-dependent
behavior. Still, it's easy enough to avoid the warning by casting the
shifted value to unsigned, so let's do that.

Patch HEAD only, this isn't worth a back-patch.

Remove INCLUDE attributes section from docs.

Discussing covering indexes in a chapter that is mostly about the
behavior of B-Tree operator classes is unnecessary. The CREATE INDEX
documentation's handling of covering indexes seems sufficient.

Discussion: https://postgr.es/m/CAH2-WzmpU=L_6VjhhOAMfoyHLr-pZd1kDc+jpa3c3a8EOmtcXA@mail.gmail.com

Use -Wno-format-truncation and -Wno-stringop-truncation, if available.

gcc 8 has started emitting some warnings that are largely useless for
our purposes, particularly since they complain about code following
the project-standard coding convention that path names are assumed
to be shorter than MAXPGPATH. Even if we make the effort to remove
that assumption in some future release, the changes wouldn't get
back-patched. Hence, just suppress these warnings, on compilers that
have these switches.

Backpatch to all supported branches.

Discussion: https://postgr.es/m/1524563856.26306.9.camel@gunduz.org

Avoid unnecessary use of strncpy in a couple of places in ecpg.

Use of strncpy with a length limit based on the source, rather than
the destination, is non-idiomatic and draws warnings from gcc 8.
Replace with memcpy, which does exactly the same thing in these cases,
but with less chance for confusion.

Backpatch to all supported branches.

Discussion: https://postgr.es/m/21789.1529170195@sss.pgh.pa.us

Use snprintf not sprintf in pg_waldump's timestamptz_to_str.

This could only cause an issue if strftime returned a ridiculously
long timezone name, which seems unlikely; and it wouldn't qualify
as a security problem even then, since pg_waldump (nee pg_xlogdump)
is a debug tool not part of the server. But gcc 8 has started issuing
warnings about it, so let's use snprintf and be safe.

Backpatch to 9.3 where this code was added.

Discussion: https://postgr.es/m/21789.1529170195@sss.pgh.pa.us

Fix some minor error-checking oversights in ParseFuncOrColumn().

Recent additions to ParseFuncOrColumn to make it reject non-procedure
functions in CALL were neither adequate nor documented.  Reorganize
the code to ensure uniform results for all the cases that should be
rejected.  Also, use ERRCODE_WRONG_OBJECT_TYPE for this case as well
as the converse case of a procedure in a non-CALL context.  The
original coding used ERRCODE_UNDEFINED_FUNCTION which seems wrong,
and is certainly inconsistent with the adjacent wrong-kind-of-routine
errors.

This reorganization also causes the checks for aggregate decoration with
a non-aggregate function to be made in the FUNCDETAIL_COERCION case;
that they were not is a long-standing oversight.

Discussion: https://postgr.es/m/14497.1529089235@sss.pgh.pa.us

Remove AELs from subxids correctly on standby

Issues relate only to subtransactions that hold AccessExclusiveLocks
when replayed on standby.

Prior to PG10, aborting subtransactions that held an
AccessExclusiveLock failed to release the lock until top level commit or
abort. 49bff5300d527 fixed that.

However, 49bff5300d527 also introduced a similar bug where subtransaction
commit would fail to release an AccessExclusiveLock, leaving the lock to
be removed sometimes early and sometimes late. This commit fixes
that bug also. Backpatch to PG10 needed.

Tested by observation. Note need for multi-node isolationtester to improve
test coverage for this and other HS cases.

Reported-by: Simon Riggs
Author: Simon Riggs

Fix memory leak in BufFileCreateShared().

Also this commit unifies some duplicated code in makeBufFile() and
BufFileOpenShared() into new function makeBufFileCommon().

Author: Antonin Houska
Reviewed-By: Thomas Munro, Tatsuo Ishii
Discussion: https://postgr.es/m/16139.1529049566%40localhost

Fix off-by-one bug in XactLogCommitRecord

Commit 1eb6d6527aae introduced zeroed alignment bytes in the GID field
of commit/abort WAL records. Fixup commit cf5a1890592b later changed
that representation into a regular cstring with a single terminating
zero byte, but it also introduced an off-by-one mistake. Fix that.

Author: Nikhil Sontakke
Reported-by: Nikhil Sontakke
Discussion: https://postgr.es/m/CAMGcDxey6dG1DP34_tJMoWPcp5sPJUAL4K5CayUUXLQSx2GQpA@mail.gmail.com

Fix memory leak in PLySequence_ToJsonbValue()

PyObject returned from PySequence_GetItem() is not released. Similar code in PLyMapping_ToJsonbValue() is correct, because according to Python documentation
PyList_GetItem() and PyTuple_GetItem() return a borrowed reference while
PySequence_GetItem() returns new reference. contrib/jsonb_plpython is new
in PostgreSQL 11, no backpatch is needed.

Author: Nikita Glukhov
Discussion: https://postgr.es/m/6001af16-b242-2527-bc7e-84b8a959163b%40postgrespro.ru

Fix memory leak.

Memory is allocated twice for "file" and "files" variables in
BufFileOpenShared().

Author: Antonin Houska
Discussion: https://postgr.es/m/11329.1529045692%40localhost

Fail BRIN control functions during recovery explicitly

They already fail anyway, but prior to this patch they raise an ugly
error message about a lock that cannot be acquired. This just improves
the message.

Author: Masahiko Sawada
Reported-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoBZau4g4_NUf3BKNd=CdYK+xaPdtJCzvOC1TxGdTiJx_Q@mail.gmail.com
Reviewed-by: Kuntal Ghosh, Alexander Korotkov, Simon Riggs, Michaël Paquier, Álvaro Herrera

Remove spurious code comments in standby related code

GetRunningTransactionData() suggested that subxids were not worth
optimizing away if overflowed, yet they have already been removed
for that case.

Changes to LogAccessExclusiveLock() API forgot to remove the
prior comment when it was copied to LockAcquire().

Remove cut-off bug from RunningTransactionData

32ac7a118fc17f5 tried to fix a Hot Standby issue
reported by Greg Stark, but in doing so caused
a different bug to appear, noted by Andres Freund.

Revoke the core changes from 32ac7a118fc17f5,
leaving in its place a minor change in code
ordering and comments to explain for the future.

Code review for match_clause_to_partition_key().

Fix inconsistent decisions about NOMATCH vs UNSUPPORTED result codes.
If we're going to cater for partkeys that have the same expression and
different collations, surely we should also support partkeys with the
same expression and different opclasses.

Clean up shaky handling of commuted opclauses, eg checking the wrong
operator to see what its negator is. This wouldn't cause any actual
bugs given a sane opclass definition, but it doesn't seem helpful to
expend more code to be less correct.

Improve handling of null elements in ScalarArrayOp arrays: in the
"op ALL" case, we can conclude they result in an unsatisfiable clause.

Minor cosmetic changes and comment improvements.

Fix some ill-chosen names for globally-visible partition support functions.

"compute_hash_value" is particularly gratuitously generic, but IMO
all of these ought to have names clearly related to partitioning.