granicus.if.org Git - postgresql/log

Rename (new|old)estCommitTs to (new|old)estCommitTsXid

The variables newestCommitTs and oldestCommitTs sound as if they are
timestamps, but in fact they are the transaction Ids that correspond
to the newest and oldest timestamps rather than the actual timestamps.
Rename these variables to reflect that they are actually xids: to wit
newestCommitTsXid and oldestCommitTsXid respectively. Also modify
related code in a similar fashion, particularly the user facing output
emitted by pg_controldata and pg_resetxlog.

Complaint and patch by me, review by Tom Lane and Alvaro Herrera.
Backpatch to 9.5 where these variables were first introduced.

Code and docs review for cube kNN support.

Commit 33bd250f6c4cc309f4eeb657da80f1e7743b3e5c could have done with
some more review:

Adjust coding so that compilers unfamiliar with elog/ereport don't complain
about uninitialized values.

Fix misuse of PG_GETARG_INT16 to retrieve arguments declared as "integer"
at the SQL level. (This was evidently copied from cube_ll_coord and
cube_ur_coord, but those were wrong too.)

Fix non-style-guide-conforming error messages.

Fix underparenthesized if statements, which pgindent would have made a
hash of, and remove some unnecessary parens elsewhere.

Run pgindent over new code.

Revise documentation: repeated accretion of more operators without any
rethinking of the text already there had left things in a bit of a mess.
Merge all the cube operators into one table and adjust surrounding text
appropriately.

David Rowley and Tom Lane

Document brin_summarize_new_pages

Pointer out by Jeff Janes

Document the exponentiation operator as associating left to right.

Common mathematical convention is that exponentiation associates right to
left. We aren't going to change the parser for this, but we could note
it in the operator's description. (It's already noted in the operator
precedence/associativity table, but users might not look there.)
Per bug #13829 from Henrik Pauli.

Fix omission of -X (--no-psqlrc) in some psql invocations.

As of commit d5563d7df, psql -c no longer implies -X, but not all of
our regression testing scripts had gotten that memo.

To ensure consistency of results across different developers, make
sure that *all* invocations of psql in all scripts in our tree
use -X, even where this is not what previously happened.

Michael Paquier and Tom Lane

doc: pg_committs -> pg_commit_ts

Reported by: Alain Laporte (#13836)

Update documentation about pseudo-types.

Tone down an overly strong statement about which pseudo-types PLs are
likely to allow. Add "event_trigger" to the list, as well as
"pg_ddl_command" in 9.5/HEAD. Back-patch to 9.3 where event_trigger
was added.

Fix translation domain in pg_basebackup

For some reason, we've been overlooking the fact that pg_receivexlog
and pg_recvlogical are using wrong translation domains all along,
so their output hasn't ever been translated. The right domain is
pg_basebackup, not their own executable names.

Noticed by Ioseph Kim, who's been working on the Korean translation.

Backpatch pg_receivexlog to 9.2 and pg_recvlogical to 9.4.

Add forgotten CHECK_FOR_INTERRUPT calls in pgcrypto's crypt()

Both Blowfish and DES implementations of crypt() can take arbitrarily
long time, depending on the number of rounds specified by the caller;
make sure they can be interrupted.

Author: Andreas Karlsson
Reviewer: Jeff Janes

Backpatch to 9.1.

Include typmod when complaining about inherited column type mismatches.

MergeAttributes() rejects cases where columns to be merged have the same
type but different typmod, which is correct; but the error message it
printed didn't show either typmod, which is unhelpful. Changing this
requires using format_type_with_typemod() in place of TypeNameToString(),
which will have some minor side effects on the way some type names are
printed, but on balance this is an improvement: the old code sometimes
printed one type according to one set of rules and the other type according
to the other set, which could be confusing in its own way.

Oddly, there were no regression test cases covering any of this behavior,
so add some.

Complaint and fix by Amit Langote

Fix brin_summarize_new_values() to check index type and ownership.

brin_summarize_new_values() did not check that the passed OID was for
an index at all, much less that it was a BRIN index, and would fail in
obscure ways if it wasn't (possibly damaging data first?). It also
lacked any permissions test; by analogy to VACUUM, we should only allow
the table's owner to summarize.

Noted by Jeff Janes, fix by Michael Paquier and me

Improve SECURITY LABEL tab completion

Add DATABASE, EVENT TRIGGER, FOREIGN TABLE, ROLE, and TABLESPACE to
tab completion for SECURITY LABEL.

Kyotaro Horiguchi

Improve the gin index scan performance in pg_trgm.

Previous coding assumes too pessimistic upper bound of similarity
in consistent method of GIN.

Author: Fornaroli Christophe with comments by me.

Remove unnecessary row ordering dependency in pg_rewind test suite.

t/002_databases.pl was expecting to see a specific physical order of the
rows in pg_database. I broke that in HEAD with commit 01e386a325549b77,
but I'd say it's a pretty fragile test methodology in any case, so fix
it in 9.5 as well.

Docs: fix erroneously-given function name.

pg_replication_session_is_setup() exists nowhere; apparently this is
meant to refer to pg_replication_origin_session_is_setup().

Adrien Nayrat

Fix factual and grammatical errors in comments for struct _tableInfo.

Amit Langote, further adjusted by me

Docs typo fix.

Michael Paquier

Avoid VACUUM FULL altogether in initdb.

Commit ed7b3b3811c5836a purported to remove initdb's use of VACUUM FULL,
as had been agreed to in a pghackers discussion back in Dec 2014.
But it missed this one ...

Improve handling of password reuse in src/bin/scripts programs.

This reverts most of commit 83dec5a71 in favor of having connectDatabase()
store the possibly-reusable password in a static variable, similar to the
coding we've had for a long time in pg_dump's version of that function.
To avoid possible problems with unwanted password reuse, make callers
specify whether it's reasonable to attempt to re-use the password.
This is a wash for cases where re-use isn't needed, but it is far simpler
for callers that do want that. Functionally there should be no difference.

Even though we're past RC1, it seems like a good idea to back-patch this
into 9.5, like the prior commit. Otherwise, if there are any third-party
users of connectDatabase(), they'll have to deal with an API change in
9.5 and then another one in 9.6.

Michael Paquier

In pg_dump, remember connection passwords no matter how we got them.

When pg_dump prompts the user for a password, it remembers the password
for possible re-use by parallel worker processes.  However, libpq might
have extracted the password from a connection string originally passed
as "dbname".  Since we don't record the original form of dbname but
break it down to host/port/etc, the password gets lost.  Fix that by
retrieving the actual password from the PGconn.

(It strikes me that this whole approach is rather broken, as it will also
lose other information such as options that might have been present in
the connection string.  But we'll leave that problem for another day.)

In passing, get rid of rather silly use of malloc() for small fixed-size
arrays.

Back-patch to 9.3 where parallel pg_dump was introduced.

Report and fix by Zeus Kronion, adjusted a bit by Michael Paquier and me

Read from the same worker repeatedly until it returns no tuple.

The original coding read tuples from workers in round-robin fashion,
but performance testing shows that it works much better to read enough
to empty one queue before moving on to the next.  I believe the
reason for this is that, with the old approach, we could easily wake
up a worker repeatedly to write only one new tuple into the shm_mq
each time.  With this approach, by the time the process gets scheduled,
it has a decent chance of being able to fill the entire buffer in
one go.

Patch by me.  Dilip Kumar helped with performance testing.

Change Gather not to use a physical tlist.

This should have been part of the original commit, but was missed.
Pushing data between processes is expensive, so we definitely want
to project away unneeded columns here, just as we do for other nodes
like Sort and Hash that care about the volume of data.

Remove unnecessary escaping in C character literals

'\"' is more commonly written simply as '"'.

Allow omitting one or both boundaries in an array slice specifier.

Omitted boundaries represent the upper or lower limit of the corresponding
array subscript. This allows simpler specification of many common
use-cases.

(Revised version of commit 9246af6799819847faa33baf441251003acbb8fe)

YUriy Zhuravlev

Comment improvements for abbreviated keys.

Peter Geoghegan and Robert Haas

postgres_fdw: Consider requesting sorted data so we can do a merge join.

When use_remote_estimate is enabled, consider adding ORDER BY to the
query we sending to the remote server so that we can use that ordered
data for a merge join.  Commit f18c944b6137329ac4a6b2dce5745c5dc21a8578
arranges to push down the query pathkeys, which seems like the case
mostly likely to be a win, but testing shows this can sometimes win,
too.

For a regular table, we know which indexes are present and therefore
test whether the ordering provided by each such index is useful.  Here,
we take the opposite approach: guess what orderings would be useful if
they could be generated cheaply, and then ask the remote side what those
will cost.

Ashutosh Bapat, with very substantial cosmetic revisions by me.  Also
reviewed by Rushabh Lathia.

Fix calculation of space needed for parsed words in tab completion.

Yesterday in commit d854118c8, I had a serious brain fade leading me to
underestimate the number of words that the tab-completion logic could
divide a line into. On input such as "(((((", each character will get
seen as a separate word, which means we do indeed sometimes need more
space for the words than for the original line. Fix that.

Make viewquery a copy in rewriteTargetView()

Rather than expect the Query returned by get_view_query() to be
read-only and then copy bits and pieces of it out, simply copy the
entire structure when we get it. This addresses an issue where
AcquireRewriteLocks, which is called by acquireLocksOnSubLinks(),
scribbles on the parsetree passed in, which was actually an entry
in relcache, leading to segfaults with certain view definitions.
This also future-proofs us a bit for anyone adding more code to this
path.

The acquireLocksOnSubLinks() was added in commit c3e0ddd40.

Back-patch to 9.3 as that commit was.

Remove silly completion for "DELETE FROM tabname ...".

psql offered USING, WHERE, and SET in this context, but SET is not a valid
possibility here. Seems to have been a thinko in commit f5ab0a14ea83eb6c
which added DELETE's USING option.

Teach psql's tab completion to consider the entire input string.

Up to now, the tab completion logic has only examined the last few words
of the current input line; "last few" being originally as few as four
words, but lately up to nine words.  Furthermore, it only looked at what
libreadline considers the current line of input, which made it rather
myopic if you split your command across lines.  This was tolerable,
sort of, so long as the match patterns were only designed to consider the
last few words of input; but with the recent addition of HeadMatches()
and Matches() matching rules, we really have to do better if we want
those to behave sanely.

Hence, change the code to break the entire line down into words, and to
include any previous lines in the command buffer along with the active
readline input buffer.

This will be a little bit slower than the previous coding, but some
measurements say that even a query of several thousand characters can be
parsed in a hundred or so microseconds on modern machines; so it's really
not going to be significant for interactive tab completion.  To reduce
the cost some, I arranged to avoid the per-word malloc calls that used
to occur: all the words are now kept in one malloc'd buffer.

psql: Review of new help output strings

Add missing COSTS OFF to EXPLAIN commands in rowsecurity.sql.

Commit e5e11c8cc added a bunch of EXPLAIN statements without COSTS OFF
to the regression tests. This is contrary to project policy since it
results in unnecessary platform dependencies in the output (it's just
luck that we didn't get buildfarm failures from it). Per gripe from
Mike Wilson.

Adopt a more compact, less error-prone notation for tab completion code.

Replace tests like

    else if (pg_strcasecmp(prev4_wd, "CREATE") == 0 &&
             pg_strcasecmp(prev3_wd, "TRIGGER") == 0 &&
             (pg_strcasecmp(prev_wd, "BEFORE") == 0 ||
              pg_strcasecmp(prev_wd, "AFTER") == 0))

with new notation like this:

    else if (TailMatches4("CREATE", "TRIGGER", MatchAny, "BEFORE|AFTER"))

In addition, provide some macros COMPLETE_WITH_LISTn() to reduce the amount
of clutter needed to specify a small number of predetermined completion
alternatives.

This makes the code substantially more compact: tab-complete.c gets over a
thousand lines shorter in this patch, despite the addition of a couple of
hundred lines of infrastructure for the new notations.  The new way of
specifying match rules seems a whole lot more readable and less
error-prone, too.

There's a lot more that could be done now to make matching faster and more
reliable; for example I suspect that most of the TailMatches() rules should
now be Matches() rules.  That would allow them to be skipped after a single
integer comparison if there aren't the right number of words on the line,
and it would reduce the risk of unintended matches.  But for now, (mostly)
refrain from reworking any match rules in favor of just converting what
we've got into the new notation.

Thomas Munro, reviewed by Michael Paquier, some adjustments by me

Fix whitespace

Fix tab completion for ALTER ... TABLESPACE ... OWNED BY.

Previously the completion used the wrong word to match 'BY'. This was
introduced brokenly, in b2de2a. While at it, also add completion of
IN TABLESPACE ... OWNED BY and fix comments referencing nonexistent
syntax.

Reported-By: Michael Paquier
Author: Michael Paquier and Andres Freund
Discussion: CAB7nPqSHDdSwsJqX0d2XzjqOHr==HdWiubCi4L=Zs7YFTUne8w@mail.gmail.com
Backpatch: 9.4, like the commit introducing the bug

Revert 9246af6799819847faa33baf441251003acbb8fe because
I miss too much. Patch is returned to commitfest process.

pgbench: Change terminology from "threshold" to "parameter".

Per a recommendation from Tomas Vondra, it's more helpful to refer to
the value that determines how skewed a Gaussian or exponential
distribution is as a parameter rather than a threshold.

Since it's not quite too late to get this right in 9.5, where it was
introduced, back-patch this. Most of the patch changes only comments
and documentation, but a few pgbench messages are altered to match.

Fabien Coelho, reviewed by Michael Paquier and by me.

Remove duplicate word.

Kyotaro Horiguchi

Fix TupleQueueReaderNext not to ignore its nowait argument.

This was a silly goof on my (rhaas's) part.

Report and fix by Rushabh Lathia.

Fix copy-and-paste error in logical decoding callback.

This could result in the error context misidentifying where the error
actually occurred.

Craig Ringer

Fix typo in comment.

Amit Langote

Allow to omit boundaries in array subscript

Allow to omiy lower or upper or both boundaries in array subscript
for selecting slice of array.

Author: YUriy Zhuravlev

Cube extension kNN support

Introduce distance operators over cubes:
<#> taxicab distance
<-> euclidean distance
<=> chebyshev distance

Also add kNN support of those distances in GiST opclass.

Author: Stas Kelvich

Remove unreferenced function declarations.

datapagemap_create() and datapagemap_destroy() were declared extern,
but they don't actually exist anywhere. Per YUriy Zhuravlev and
Michael Paquier.

Use just one standalone-backend session for initdb's post-bootstrap steps.

Previously, each subroutine in initdb fired up its own standalone backend
session.  Over time we'd grown as many as fifteen of these sessions,
and the cumulative startup and shutdown work for them was getting pretty
noticeable.  Combining things so that all these steps share a single
backend session cuts a good 10% off the total runtime of initdb, more
if you're not fsync'ing.

The main stumbling block to doing this before was that some of the sessions
were run with -j and some not.  The improved definition of -j mode
implemented by my previous commit makes it possible to fix that by running
all the post-bootstrap steps with -j; we just have to use double instead of
single newlines to end command strings.  (This is only absolutely necessary
around the VACUUM and CREATE DATABASE steps, since those can't be run in a
transaction block.  But it seems best to make them all use double newlines
so that the commands remain separate for error-reporting purposes.)

A minor disadvantage is that since initdb can't tell how much of its
output the backend has executed, we can no longer have the per-step
progress reporting initdb used to print.  But things are fast enough
nowadays that that's not really all that useful anyway.

In passing, add more const decoration to some of the static arrays in
initdb.c.

Adjust behavior of single-user -j mode for better initdb error reporting.

Previously, -j caused the entire input file to be read in and executed as
a single command string.  That's undesirable, not least because any error
causes the entire file to be regurgitated as the "failing query".  Some
experimentation suggests a better rule: end the command string when we see
a semicolon immediately followed by two newlines, ie, an empty line after
a query.  This serves nicely to break up the existing examples such as
information_schema.sql and system_views.sql.  A limitation is that it's
no longer possible to write such a sequence within a string literal or
multiline comment in a file meant to be read with -j; but there are no
instances of such a problem within the data currently used by initdb.
(If someone does make such a mistake in future, it'll be obvious because
they'll get an unterminated-literal or unterminated-comment syntax error.)
Other than that, there shouldn't be any negative consequences; you're not
forced to end statements that way, it's just a better idea in most cases.

In passing, remove src/include/tcop/tcopdebug.h, which is dead code
because it's not included anywhere, and hasn't been for more than
ten years.  One of the debug-support symbols it purported to describe
has been unreferenced for at least the same amount of time, and the
other is removed by this commit on the grounds that it was useless:
forcing -j mode all the time would have broken initdb.  The lack of
complaints about that, or about the missing inclusion, shows that
no one has tried to use TCOP_DONTUSENEWLINE in many years.

Fix improper initialization order for readline.

Turns out we must set rl_basic_word_break_characters *before* we call
rl_initialize() the first time, because it will quietly copy that value
elsewhere --- but only on the first call.  (Love these undocumented
dependencies.)  I broke this yesterday in commit 2ec477dc8108339d;
like that commit, back-patch to all active branches.  Per report from
Pavel Stehule.

Rework internals of changing a type's ownership

This is necessary so that REASSIGN OWNED does the right thing with
composite types, to wit, that it also alters ownership of the type's
pg_class entry -- previously, the pg_class entry remained owned by the
original user, which caused later other failures such as the new owner's
inability to use ALTER TYPE to rename an attribute of the affected
composite.  Also, if the original owner is later dropped, the pg_class
entry becomes owned by a non-existant user which is bogus.

To fix, create a new routine AlterTypeOwner_oid which knows whether to
pass the request to ATExecChangeOwner or deal with it directly, and use
that in shdepReassignOwner rather than calling AlterTypeOwnerInternal
directly.  AlterTypeOwnerInternal is now simpler in that it only
modifies the pg_type entry and recurses to handle a possible array type;
higher-level tasks are handled by either AlterTypeOwner directly or
AlterTypeOwner_oid.

I took the opportunity to add a few more objects to the test rig for
REASSIGN OWNED, so that more cases are exercised.  Additional ones could
be added for superuser-only-ownable objects (such as FDWs and event
triggers) but I didn't want to push my luck by adding a new superuser to
the tests on a backpatchable bug fix.

Per bug #13666 reported by Chris Pacejo.

Backpatch to 9.5.

(I would back-patch this all the way back, except that it doesn't apply
cleanly in 9.4 and earlier because 59367fdf9 wasn't backpatched.  If we
decide that we need this in earlier branches too, we should backpatch
both.)

Cope with Readline's failure to track SIGWINCH events outside of input.

It emerges that libreadline doesn't notice terminal window size change
events unless they occur while collecting input.  This is easy to stumble
over if you resize the window while using a pager to look at query output,
but it can be demonstrated without any pager involvement.  The symptom is
that queries exceeding one line are misdisplayed during subsequent input
cycles, because libreadline has the wrong idea of the screen dimensions.

The safest, simplest way to fix this is to call rl_reset_screen_size()
just before calling readline().  That causes an extra ioctl(TIOCGWINSZ)
for every command; but since it only happens when reading from a tty, the
performance impact should be negligible.  A more valid objection is that
this still leaves a tiny window during entry to readline() wherein delivery
of SIGWINCH will be missed; but the practical consequences of that are
probably negligible.  In any case, there doesn't seem to be any good way to
avoid the race, since readline exposes no functions that seem safe to call
from a generic signal handler --- rl_reset_screen_size() certainly isn't.

It turns out that we also need an explicit rl_initialize() call, else
rl_reset_screen_size() dumps core when called before the first readline()
call.

rl_reset_screen_size() is not present in old versions of libreadline,
so we need a configure test for that.  (rl_initialize() is present at
least back to readline 4.0, so we won't bother with a test for it.)
We would need a configure test anyway since libedit's emulation of
libreadline doesn't currently include such a function.  Fortunately,
libedit seems not to have any corresponding bug.

Merlin Moncure, adjusted a bit by me

Speed up CREATE INDEX CONCURRENTLY's TID sort.

Encode TIDs as 64-bit integers to speed up comparisons. This seems to
speed things up on all platforms, but is even more beneficial when
8-byte integers are passed by value.

Peter Geoghegan. Design suggestions and review by Tom Lane. Review
also by Simon Riggs and by me.

Mark CHECK constraints declared NOT VALID valid if created with table.

FOREIGN KEY constraints have behaved this way for a long time, but for
some reason the behavior of CHECK constraints has been inconsistent up
until now.

Amit Langote and Amul Sul, with assorted tweaks by me.

Document use of Subject Alternative Names in SSL server certificates.

Commit acd08d764 did not bother with updating the documentation.

Update 9.5 release notes through today.

Also do another round of copy-editing, and fix up remaining FIXME items.

Teach mdnblocks() not to create zero-length files.

It's entirely surprising that mdnblocks() has the side effect of
creating new files on disk, so let's make it not do that. One
consequence of the old behavior is that, if running on a damaged
cluster that is missing a file, mdnblocks() can recreate the file
and allow a subsequent _mdfd_getseg() for a higher segment to succeed.
This happens because, while mdnblocks() stops when it finds a segment
that is shorter than 1GB, _mdfd_getseg() has no such check, and thus
the empty file created by mdnblocks() can allow it to continue its
traversal and find higher-numbered segments which remain.

It might be a good idea for _mdfd_getseg() to actually verify that
each segment it finds is exactly 1GB before proceeding to the next
one, but that would involve some additional system calls, so for
now I'm just doing this much.

Patch by me, per off-list analysis by Kevin Grittner and Rahila Syed.
Review by Andres Freund.

Move buffer I/O and content LWLocks out of the main tranche.

Move the content lock directly into the BufferDesc, so that locking and
pinning a buffer touches only one cache line rather than two. Adjust
the definition of BufferDesc slightly so that this doesn't make the
BufferDesc any larger than one cache line (at least on platforms where
a spinlock is only 1 or 2 bytes).

We can't fit the I/O locks into the BufferDesc and stay within one
cache line, so move those to a completely separate tranche. This
leaves a relatively limited number of LWLocks in the main tranche, so
increase the padding of those remaining locks to a full cache line,
rather than allowing adjacent locks to share a cache line, hopefully
reducing false sharing.

Performance testing shows that these changes make little difference
on laptop-class machines, but help significantly on larger servers,
especially those with more than 2 sockets.

Andres Freund, originally based on an earlier patch by Simon Riggs.
Review and cosmetic adjustments (including heavy rewriting of the
comments) by me.

Provide a way to predefine LWLock tranche IDs.

It's a bit cumbersome to use LWLockNewTrancheId(), because the returned
value needs to be shared between backends so that each backend can call
LWLockRegisterTranche() with the correct ID. So, for built-in tranches,
use a hard-coded value instead.

This is motivated by an upcoming patch adding further built-in tranches.

Andres Freund and Robert Haas

Improve CREATE POLICY documentation

Clarify that SELECT policies are now applied when SELECT rights
are required for a given query, even if the query is an UPDATE or
DELETE query. Pointed out by Noah.

Additionally, note the risk regarding concurrently open transactions
where a relation which controls access to the rows of another relation
are updated and the rows of the primary relation are also being
modified. Pointed out by Peter Geoghegan.

Back-patch to 9.5.

Collect the global OR of hasRowSecurity flags for plancache

We carry around information about if a given query has row security or
not to allow the plancache to use that information to invalidate a
planned query in the event that the environment changes.

Previously, the flag of one of the subqueries was simply being copied
into place to indicate if the query overall included RLS components.
That's wrong as we need the global OR of all subqueries. Fix by
changing the code to match how fireRIRules works, which is results
in OR'ing all of the flags.

Noted by Tom.

Back-patch to 9.5 where RLS was introduced.

Add missing cleanup logic in pg_rewind/t/005_same_timeline.pl test.

Per Michael Paquier

Add missing CHECK_FOR_INTERRUPTS in lseg_inside_poly

Apparently, there are bugs in this code that cause it to loop endlessly.
That bug still needs more research, but in the meantime it's clear that
the loop is missing a check for interrupts so that it can be cancelled
timely.

Backpatch to 9.1 -- this has been missing since 49475aab8d0d.

Remove xmlparse(document '') test

This one test was behaving differently between the ubuntu fix for
CVE-2015-7499 and the base "expected" file. It's not worth having
yet another version of the expected file for this test, so drop it.
Perhaps at some point when all distros have settled down to the
same behavior on this test, it can be restored.

Problem found by me on libxml2 (2.9.1+dfsg1-3ubuntu4.6).
Solution suggested by Tom Lane.
Backpatch to 9.5, where the test was added.

Fix out-of-memory error handling in ParameterDescription message processing.

If libpq ran out of memory while constructing the result set, it would hang,
waiting for more data from the server, which might never arrive. To fix,
distinguish between out-of-memory error and not-enough-data cases, and give
a proper error message back to the client on OOM.

There are still similar issues in handling COPY start messages, but let's
handle that as a separate patch.

Michael Paquier, Amit Kapila and me. Backpatch to all supported versions.

Fix bug in SetOffsetVacuumLimit() triggered by find_multixact_start() failure.

Previously, if find_multixact_start() failed, SetOffsetVacuumLimit() would
install 0 into MultiXactState->offsetStopLimit if it previously succeeded.
Luckily, there are no known cases where find_multixact_start() will return
an error in 9.5 and above. But if it were to happen, for example due to
filesystem permission issues, it'd be somewhat bad: GetNewMultiXactId()
could continue allocating mxids even if close to a wraparound, or it could
erroneously stop allocating mxids, even if no wraparound is looming. The
wrong value would be corrected the next time SetOffsetVacuumLimit() is
called, or by a restart.

Reported-By: Noah Misch, although this is not his preferred fix
Discussion: 20151210140450.GA22278@alap3.anarazel.de
Backpatch: 9.5, where the bug was introduced as part of 4f627f

Correct statement to actually be the intended assert statement.

e3f4cfc7 introduced a LWLockHeldByMe() call, without the corresponding
Assert() surrounding it.

Spotted by Coverity.

Backpatch: 9.1+, like the previous commit

Docs: document that psql's "\i -" means read from stdin.

This has worked that way for a long time, maybe always, but you would
not have known it from the documentation. Also back-patch the notes
I added to HEAD earlier today about behavior of the "-f -" switch,
which likewise have been valid for many releases.

Code and docs review for multiple -c and -f options in psql.

Commit d5563d7df94488bf drew complaints from Coverity, which quite
correctly complained that one copy of each -c or -f string was being
leaked. What's more, simple_action_list_append was allocating enough space
for still a third copy of each string as part of the SimpleActionListCell,
even though that coding method had been superseded by a separate strdup
operation. There were some other minor coding infelicities too. The
documentation needed more work as well, eg it forgot to explain that -c
causes psql not to accept any interactive input.

Consistently set all fields in pg_stat_replication to null instead of 0

Previously the "sent" field would be set to 0 and all other xlog
pointers be set to NULL if there were no valid values (such as when
in a backup sending walsender).

Properly initialize write, flush and replay locations in walsender slots

These would leak random xlog positions if a walsender used for backup would
a walsender slot previously used by a replication walsender.

In passing also fix a couple of cases where the xlog pointer is directly
compared to zero instead of using XLogRecPtrIsInvalid, noted by
Michael Paquier.

Doc: update external URLs for PostGIS project.

Paul Ramsey

doc: Add some markup

Fix ALTER TABLE ... SET TABLESPACE for unlogged relations.

Changing the tablespace of an unlogged relation did not WAL log the
creation and content of the init fork. Thus, after a standby is
promoted, unlogged relation cannot be accessed anymore, with errors
like:
ERROR: 58P01: could not open file "pg_tblspc/...": No such file or directory
Additionally the init fork was not synced to disk, independent of the
configured wal_level, a relatively small durability risk.

Investigation of that problem also brought to light that, even for
permanent relations, the creation of !main forks was not WAL logged,
i.e. no XLOG_SMGR_CREATE record were emitted. That mostly turns out not
to be a problem, because these files were created when the actual
relation data is copied; nonexistent files are not treated as an error
condition during replay. But that doesn't work for empty files, and
generally feels a bit haphazard. Luckily, outside init and main forks,
empty forks don't occur often or are not a problem.

Add the required WAL logging and syncing to disk.

Reported-By: Michael Paquier
Author: Michael Paquier and Andres Freund
Discussion: 20151210163230.GA11331@alap3.anarazel.de
Backpatch: 9.1, where unlogged relations were introduced

Add an expected-file to match behavior of latest libxml2.

Recent releases of libxml2 do not provide error context reports for errors
detected at the very end of the input string. This appears to be a bug, or
at least an infelicity, introduced by the fix for libxml2's CVE-2015-7499.
We can hope that this behavioral change will get undone before too long;
but the security patch is likely to spread a lot faster/further than any
follow-on cleanup, which means this behavior is likely to be present in the
wild for some time to come. As a stopgap, add a variant regression test
expected-file that matches what you get with a libxml2 that acts this way.

pg_rewind: Don't error if the two clusters are already on the same timeline

This previously resulted in an error and a nonzero exit status, but
after discussion this should rather be a noop with a zero exit status.

For REASSIGN OWNED for foreign user mappings

As reported in bug #13809 by Alexander Ashurkov, the code for REASSIGN
OWNED hadn't gotten word about user mappings.  Deal with them in the
same way default ACLs do, which is to ignore them altogether; they are
handled just fine by DROP OWNED.  The other foreign object cases are
already handled correctly by both commands.

Also add a REASSIGN OWNED statement to foreign_data test to exercise the
foreign data objects.  (The changes are just before the "cleanup" phase,
so it shouldn't remove any existing live test.)

Reported by Alexander Ashurkov, then independently by Jaime Casanova.

Install our "missing" script where PGXS builds can find it.

This allows sane behavior in a PGXS build done on a machine where build
tools such as bison are missing.

Jim Nasby

Handle policies during DROP OWNED BY

DROP OWNED BY handled GRANT-based ACLs but was not removing roles from
policies. Fix that by having DROP OWNED BY remove the role specified
from the list of roles the policy (or policies) apply to, or the entire
policy (or policies) if it only applied to the role specified.

As with ACLs, the DROP OWNED BY caller must have permission to modify
the policy or a WARNING is thrown and no change is made to the policy.

Get rid of the planner's LateralJoinInfo data structure.

I originally modeled this data structure on SpecialJoinInfo, but after
commit acfcd45cacb6df23 that looks like a pretty poor decision.
All we really need is relid sets identifying laterally-referenced rels;
and most of the time, what we want to know about includes indirect lateral
references, a case the LateralJoinInfo data was unsuited to compute with
any efficiency.  The previous commit redefined RelOptInfo.lateral_relids
as the transitive closure of lateral references, so that it easily supports
checking indirect references.  For the places where we really do want just
direct references, add a new RelOptInfo field direct_lateral_relids, which
is easily set up as a copy of lateral_relids before we perform the
transitive closure calculation.  Then we can just drop lateral_info_list
and LateralJoinInfo and the supporting code.  This makes the planner's
handling of lateral references noticeably more efficient, and shorter too.

Such a change can't be back-patched into stable branches for fear of
breaking extensions that might be looking at the planner's data structures;
but it seems not too late to push it into 9.5, so I've done so.

Handle dependencies properly in ALTER POLICY

ALTER POLICY hadn't fully considered partial policy alternation
(eg: change just the roles on the policy, or just change one of
the expressions) when rebuilding the dependencies. Instead, it
would happily remove all dependencies which existed for the
policy and then only recreate the dependencies for the objects
referred to in the specific ALTER POLICY command.

Correct that by extracting and building the dependencies for all
objects referenced by the policy, regardless of if they were
provided as part of the ALTER POLICY command or were already in
place as part of the pre-existing policy.

Still more fixes for planner's handling of LATERAL references.

More fuzz testing by Andreas Seltenreich exposed that the planner did not
cope well with chains of lateral references.  If relation X references Y
laterally, and Y references Z laterally, then we will have to scan X on the
inside of a nestloop with Z, so for all intents and purposes X is laterally
dependent on Z too.  The planner did not understand this and would generate
intermediate joins that could not be used.  While that was usually harmless
except for wasting some planning cycles, under the right circumstances it
would lead to "failed to build any N-way joins" or "could not devise a
query plan" planner failures.

To fix that, convert the existing per-relation lateral_relids and
lateral_referencers relid sets into their transitive closures; that is,
they now show all relations on which a rel is directly or indirectly
laterally dependent.  This not only fixes the chained-reference problem
but allows some of the relevant tests to be made substantially simpler
and faster, since they can be reduced to simple bitmap manipulations
instead of searches of the LateralJoinInfo list.

Also, when a PlaceHolderVar that is due to be evaluated at a join contains
lateral references, we should treat those references as indirect lateral
dependencies of each of the join's base relations.  This prevents us from
trying to join any individual base relations to the lateral reference
source before the join is formed, which again cannot work.

Andreas' testing also exposed another oversight in the "dangerous
PlaceHolderVar" test added in commit 85e5e222b1dd02f1.  Simply rejecting
unsafe join paths in joinpath.c is insufficient, because in some cases
we will end up rejecting *all* possible paths for a particular join, again
leading to "could not devise a query plan" failures.  The restriction has
to be known also to join_is_legal and its cohort functions, so that they
will not select a join for which that will happen.  I chose to move the
supporting logic into joinrels.c where the latter functions are.

Back-patch to 9.3 where LATERAL support was introduced.

Fix commit timestamp initialization

This module needs explicit initialization in order to replay WAL records
in recovery, but we had broken this recently following changes to make
other (stranger) scenarios work correctly. To fix, rework the
initialization sequence so that it always takes place before WAL replay
commences for both master and standby.

I could have gone for a more localized fix that just added a "startup"
call for the master server, but it seemed better to restructure the
existing callers as well so that the whole thing made more sense. As a
drawback, there is more control logic in xlog.c now than previously, but
doing otherwise meant passing down the ControlFile flag, which seemed
uglier as a whole.

This also meant adding a check to not re-execute ActivateCommitTs if it
had already been called.

Reported by Fujii Masao.

Backpatch to 9.5.

Improve some messages

Improve ALTER POLICY tab completion.

Complete "ALTER POLICY" with a policy name, as we do for DROP POLICY.
And, complete "ALTER POLICY polname ON" with a table name that has such
a policy, as we do for DROP POLICY, rather than with any table name
at all.

Masahiko Sawada

Fix typo.

Etsuro Fujita

Fix ON CONFLICT UPDATE bug breaking AFTER UPDATE triggers.

ExecOnConflictUpdate() passed t_ctid of the to-be-updated tuple to
ExecUpdate(). That's problematic primarily because of two reason: First
and foremost t_ctid could point to a different tuple. Secondly, and
that's what triggered the complaint by Stanislav, t_ctid is changed by
heap_update() to point to the new tuple version. The behavior of AFTER
UPDATE triggers was therefore broken, with NEW.* and OLD.* tuples
spuriously identical within AFTER UPDATE triggers.

To fix both issues, pass a pointer to t_self of a on-stack HeapTuple
instead.

Fixing this bug lead to one change in regression tests, which previously
failed due to the first issue mentioned above. There's a reasonable
expectation that test fails, as it updates one row repeatedly within one
INSERT ... ON CONFLICT statement. That is only possible if the second
update is triggered via ON CONFLICT ... SET, ON CONFLICT ... WHERE, or
by a WITH CHECK expression, as those are executed after
ExecOnConflictUpdate() does a visibility check. That could easily be
prohibited, but given it's allowed for plain UPDATEs and a rare corner
case, it doesn't seem worthwhile.

Reported-By: Stanislav Grozev
Author: Andres Freund and Peter Geoghegan
Discussion: CAA78GVqy1+LisN-8DygekD_Ldfy=BJLarSpjGhytOsgkpMavfQ@mail.gmail.com
Backpatch: 9.5, where ON CONFLICT was introduced

Fix bug leading to restoring unlogged relations from empty files.

At the end of crash recovery, unlogged relations are reset to the empty
state, using their init fork as the template. The init fork is copied to
the main fork without going through shared buffers. Unfortunately WAL
replay so far has not necessarily flushed writes from shared buffers to
disk at that point. In normal crash recovery, and before the
introduction of 'fast promotions' in fd4ced523 / 9.3, the
END_OF_RECOVERY checkpoint flushes the buffers out in time. But with
fast promotions that's not the case anymore.

To fix, force WAL writes targeting the init fork to be flushed
immediately (using the new FlushOneBuffer() function). In 9.5+ that
flush can centrally be triggered from the code dealing with restoring
full page writes (XLogReadBufferForRedoExtended), in earlier releases
that responsibility is in the hands of XLOG_HEAP_NEWPAGE's replay
function.

Backpatch to 9.1, even if this currently is only known to trigger in
9.3+. Flushing earlier is more robust, and it is advantageous to keep
the branches similar.

Typical symptoms of this bug are errors like
'ERROR: index "..." contains unexpected zero page at block 0'
shortly after promoting a node.

Reported-By: Thom Brown
Author: Andres Freund and Michael Paquier
Discussion: 20150326175024.GJ451@alap3.anarazel.de
Backpatch: 9.1-

Accept flex > 2.5.x on Windows, too.

Commit 32f15d05c fixed this in configure, but missed the similar check
in the MSVC scripts.

Michael Paquier, per report from Victor Wagner

Remove redundant sentence.

Peter Geoghegan

Allow EXPLAIN (ANALYZE, VERBOSE) to display per-worker statistics.

The original parallel sequential scan commit included only very limited
changes to the EXPLAIN output. Aggregated totals from all workers were
displayed, but there was no way to see what each individual worker did
or to distinguish the effort made by the workers from the effort made by
the leader.

Per a gripe by Thom Brown (and maybe others). Patch by me, reviewed
by Amit Kapila.

Improve performance in freeing memory contexts

The single linked list of memory contexts could result in O(N^2)
performance to free a set of contexts if they were not freed in
reverse order of creation. In many cases the reverse order was
used, but there were some significant exceptions that caused real-
world performance problems. Rather than requiring all callers to
care about the order in which contexts were freed, and hunting down
and changing all existing cases where the wrong order was used, we
add one pointer per memory context so that the implementation
details are not so visible.

Jan Wieck

Make failure to open psql's --log-file fatal.

Commit 344cdff2c made failure to open the target of --output fatal.
For consistency, the --log-file switch should behave similarly.
Like the previous commit, back-patch to 9.5 but no further.

Daniel Verite

Avoid odd portability problem in TestLib.pm's slurp_file function.

For unclear reasons, this function doesn't always read the expected data
in some old Perl versions. Rewriting it to avoid use of ARGV seems to
dodge the problem, and this version is clearer anyway if you ask me.

In passing, also improve error message in adjacent append_to_file function.

psql: Support multiple -c and -f options, and allow mixing them.

To support this, we must reconcile some historical anomalies in the
behavior of -c.  In particular, as a backward-incompatibility, -c no
longer implies --no-psqlrc.

Pavel Stehule (code) and Catalin Iacob (documentation).  Review by
Michael Paquier and myself.  Proposed behavior per Tom Lane.

Allow foreign and custom joins to handle EvalPlanQual rechecks.

Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 provided basic
infrastructure for allowing a foreign data wrapper or custom scan
provider to replace a join of one or more tables with a scan.
However, this infrastructure failed to take into account the need
for possible EvalPlanQual rechecks, and ExecScanFetch would fail
an assertion (or just overwrite memory) if such a check was attempted
for a plan containing a pushed-down join. To fix, adjust the EPQ
machinery to skip some processing steps when scanrelid == 0, making
those the responsibility of scan's recheck method, which also has
the responsibility in this case of correctly populating the relevant
slot.

To allow foreign scans to gain control in the right place to make
use of this new facility, add a new, optional RecheckForeignScan
method. Also, allow a foreign scan to have a child plan, which can
be used to correctly populate the slot (or perhaps for something
else, but this is the only use currently envisioned).

KaiGai Kohei, reviewed by Robert Haas, Etsuro Fujita, and Kyotaro
Horiguchi.

Simplify LATERAL-related calculations within add_paths_to_joinrel().

While convincing myself that commit 7e19db0c09719d79 would solve both of
the problems recently reported by Andreas Seltenreich, I realized that
add_paths_to_joinrel's handling of LATERAL restrictions could be made
noticeably simpler and faster if we were to retain the minimum possible
parameterization for each joinrel (that is, the set of relids supplying
unsatisfied lateral references in it). We already retain that for
baserels, in RelOptInfo.lateral_relids, so we can use that field for
joinrels too.

I re-pgindent'd the files touched here, which affects some unrelated
comments.

This is, I believe, just a minor optimization not a bug fix, so no
back-patch.

PostgresNode: wrap correctly around port number range end

Per note from Tom Lane

Fix another oversight in checking if a join with LATERAL refs is legal.

It was possible for the planner to decide to join a LATERAL subquery to
the outer side of an outer join before the outer join itself is completed.
Normally that's fine because of the associativity rules, but it doesn't
work if the subquery contains a lateral reference to the inner side of the
outer join. In such a situation the outer join *must* be done first.
join_is_legal() missed this consideration and would allow the join to be
attempted, but the actual path-building code correctly decided that no
valid join path could be made, sometimes leading to planner errors such as
"failed to build any N-way joins".

Per report from Andreas Seltenreich. Back-patch to 9.3 where LATERAL
support was added.

Cleanup some problems in new Perl test code

Noted by Tom Lane:
- PostgresNode had a BEGIN block which created files, contrary to
perlmod suggestions to do that only on INIT blocks.
- Assign ports randomly rather than starting from 90600.

Noted by Noah Misch:
- Change use of no-longer-set PGPORT environment variable to $node->port
- Don't start a server in pg_controldata test
- PostgresNode was reading the PID file incorrectly; test the right
thing, and chomp the line we read from the PID file.
- Remove an unused $devnull variable
- Use 'pg_ctl kill' instead of "kill" directly, for Windos portability.
- Make server log names more informative.

Author: Michael Paquier

Update xindex.sgml for recent additions to GIST opclass API.

Commit d04c8ed9044ec added another support function to the GIST API,
but overlooked mentioning it in xindex.sgml's summary of index support
functions.

Anastasia Lubennikova

Create TestLib.pm's tempdir underneath tmp_check/, not out in the open.

This way, existing .gitignore entries and makefile clean actions will
automatically apply to the tempdir, should it survive a TAP test run
(which can happen if the user control-C's out of the run, for example).

Michael Paquier, per a complaint from me

Instruct Coverity using an assertion.

This should make Coverity deduce that plperl_call_perl_func() does not
dereference NULL argtypes. Back-patch to 9.5, where the affected code
was introduced.

Michael Paquier