granicus.if.org Git - postgresql/log

Update copyright for 2016

On closer inspection, the reason copyright.pl was missing files is
that it is looking for 'Copyright (c)' and they had 'Copyright (C)'.
Fix that, and update a couple more that grepping for that revealed.

Update copyright for 2016

Manually fix some copyright lines missed by the automated script.

Update copyright for 2016

Backpatch certain files through 9.1

Cover heap_page_prune_opt()'s cleanup lock tactic in README.

Jeff Janes, reviewed by Jim Nasby.

Teach flatten_reloptions() to quote option values safely.

flatten_reloptions() supposed that it didn't really need to do anything
beyond inserting commas between reloption array elements.  However, in
principle the value of a reloption could be nearly anything, since the
grammar allows a quoted string there.  Any restrictions on it would come
from validity checking appropriate to the particular option, if any.

A reloption value that isn't a simple identifier or number could thus lead
to dump/reload failures due to syntax errors in CREATE statements issued
by pg_dump.  We've gotten away with not worrying about this so far with
the core-supported reloptions, but extensions might allow reloption values
that cause trouble, as in bug #13840 from Kouhei Sutou.

To fix, split the reloption array elements explicitly, and then convert
any value that doesn't look like a safe identifier to a string literal.
(The details of the quoting rule could be debated, but this way is safe
and requires little code.)  While we're at it, also quote reloption names
if they're not safe identifiers; that may not be a likely problem in the
field, but we might as well try to be bulletproof here.

It's been like this for a long time, so back-patch to all supported
branches.

Kouhei Sutou, adjusted some by me

Add some more defenses against silly estimates to gincostestimate().

A report from Andy Colson showed that gincostestimate() was not being
nearly paranoid enough about whether to believe the statistics it finds in
the index metapage.  The problem is that the metapage stats (other than the
pending-pages count) are only updated by VACUUM, and in the worst case
could still reflect the index's original empty state even when it has grown
to many entries.  We attempted to deal with that by scaling up the stats to
match the current index size, but if nEntries is zero then scaling it up
still gives zero.  Moreover, the proportion of pages that are entry pages
vs. data pages vs. pending pages is unlikely to be estimated very well by
scaling if the index is now orders of magnitude larger than before.

We can improve matters by expanding the use of the rule-of-thumb estimates
I introduced in commit 7fb008c5ee59b040: if the index has grown by more
than a cutoff amount (here set at 4X growth) since VACUUM, then use the
rule-of-thumb numbers instead of scaling.  This might not be exactly right
but it seems much less likely to produce insane estimates.

I also improved both the scaling estimate and the rule-of-thumb estimate
to account for numPendingPages, since it's reasonable to expect that that
is accurate in any case, and certainly pages that are in the pending list
are not either entry or data pages.

As a somewhat separate issue, adjust the estimation equations that are
concerned with extra fetches for partial-match searches.  These equations
suppose that a fraction partialEntries / numEntries of the entry and data
pages will be visited as a consequence of a partial-match search.  Now,
it's physically impossible for that fraction to exceed one, but our
estimate of partialEntries is mostly bunk, and our estimate of numEntries
isn't exactly gospel either, so we could arrive at a silly value.  In the
example presented by Andy we were coming out with a value of 100, leading
to insane cost estimates.  Clamp the fraction to one to avoid that.

Like the previous patch, back-patch to all supported branches; this
problem can be demonstrated in one form or another in all of them.

Fix comments about WAL rule "write xlog before data" versus pg_multixact.

Recovery does not achieve its goal of zeroing all pg_multixact entries
whose accompanying WAL records never reached disk.  Remove that claim
and justify its expendability.  Detail the need for TrimMultiXact(),
which has little in common with the TrimCLOG() rationale.  Merge two
tightly-related comments.  Stop presenting pg_multixact as specific to
heap_lock_tuple(); PostgreSQL 9.3 extended its use to heap_update().

Noticed while investigating a report from Andres Freund.

doc: Remove redundant duplicate URLs from ulink elements

Empty ulink elements default to displaying the URL, so there is no need
to specify the URL again. This was already done for most occurrences,
but some cases didn't follow this convention.

doc: Add index entries and better documentation link for Linux OOM

Add a comment noting that FDWs don't have to implement EXCEPT or LIMIT TO.

postgresImportForeignSchema pays attention to IMPORT's EXCEPT and LIMIT TO
options, but only as an efficiency hack, not for correctness' sake. The
FDW documentation does explain that, but someone using postgres_fdw.c
as a coding guide might not remember it, so let's add a comment here.
Per question from Regina Obe.

Fix ALTER OPERATOR to update dependencies properly.

Fix an oversight in commit 321eed5f0f7563a0: replacing an operator's
selectivity functions needs to result in a corresponding update in
pg_depend. We have a function that can handle that, but it was not
called by AlterOperator().

To fix this without enlarging pg_operator.h's #include list beyond
what clients can safely include, split off the function definitions
into a new file pg_operator_fn.h, similarly to what we've done for
some other catalog header files. It's not entirely clear whether
any client-side code needs to include pg_operator.h, but it seems
prudent to assume that there is some such code somewhere.

Dept of second thoughts: the !scan_all exit mustn't increase scanned_pages.

In the extreme edge case where contended pages are the only ones that
escape being scanned, the previous commit would have allowed us to think
that relfrozenxid could be advanced, which is exactly wrong.

Avoid useless truncation attempts during VACUUM.

VACUUM can skip heap pages altogether when there's a run of consecutive
pages that are all-visible according to the visibility map.  This causes it
to not update its nonempty_pages count, just as if those pages were empty,
which means that at the end we will think they are candidates for deletion.
Thus, we may take the table's AccessExclusive lock only to find that no
pages are really truncatable.  This usually causes no real problems on a
master server, thanks to the lock being acquired only conditionally; but on
hot-standby servers, the same lock must be acquired unconditionally which
can result in unnecessary query cancellations.

To improve matters, force examination of the table's last page whenever
we reach there with a nonempty_pages count that would allow a truncation
attempt.  If it's not empty, we'll advance nonempty_pages and thereby
prevent the truncation attempt.

If we are unable to acquire cleanup lock on that page, there's no need to
force it, unless we're doing an anti-wraparound vacuum.  We can just check
for tuples with a shared buffer lock and then give up.  (When we are doing
an anti-wraparound vacuum, and decide it's okay to skip the page because it
contains no freezable tuples, this patch still improves matters because
nonempty_pages is properly updated, which it was not before.)

Since only the last page is special-cased in this way, we might attempt a
truncation that will release many fewer pages than the normal heuristic
would suggest; at worst, only one page would be truncated.  But that seems
all right, because the situation won't repeat during the next vacuum.
The real problem with the old logic is that the useless truncation attempt
happens every time we vacuum, so long as the state of the last few dozen
pages doesn't change.

This is a longstanding deficiency, but since the consequences aren't very
severe in most scenarios, I'm not going to risk a back-patch.

Jeff Janes and Tom Lane

Minor hacking on contrib/cube documentation.

Improve markup, particularly of the table of functions; add or improve
examples for some of the functions; wordsmith some of the function
descriptions.

Add some comments about division of labor between rewriter and planner.

The rationale for the way targetlist processing is done wasn't clearly
stated anywhere, and I for one had forgotten some of the details. Having
just painfully re-learned them, add some breadcrumbs for the next person.

Put back one copyObject() in rewriteTargetView().

Commit 6f8cb1e23485bd6d tried to centralize rewriteTargetView's copying
of a target view's Query struct.  However, it ignored the fact that the
jointree->quals field was used twice.  This only accidentally failed to
fail immediately because the same ChangeVarNodes mutation is applied in
both cases, so that we end up with logically identical expression trees
for both uses (and, as the code stands, the second ChangeVarNodes call
actually does nothing).  However, we end up linking *physically*
identical expression trees into both an RTE's securityQuals list and
the WithCheckOption list.  That's pretty dangerous, mainly because
prepsecurity.c is utterly cavalier about further munging such structures
without copying them first.

There may be no live bug in HEAD as a consequence of the fact that we apply
preprocess_expression in between here and prepsecurity.c, and that will
make a copy of the tree anyway.  Or it may just be that the regression
tests happen to not trip over it.  (I noticed this only because things
fell over pretty badly when I tried to relocate the planner's call of
expand_security_quals to before expression preprocessing.)  In any case
it's very fragile because if anyone tried to make the securityQuals and
WithCheckOption trees diverge before we reach preprocess_expression, it
would not work.  The fact that the current code will preprocess
securityQuals and WithCheckOptions lists at completely different times in
different query levels does nothing to increase my trust that that can't
happen.

In view of the fact that 9.5.0 is almost upon us and the aforesaid commit
has seen exactly zero field testing, the prudent course is to make an extra
copy of the quals so that the behavior is not different from what has been
in the field during beta.

Rename (new|old)estCommitTs to (new|old)estCommitTsXid

The variables newestCommitTs and oldestCommitTs sound as if they are
timestamps, but in fact they are the transaction Ids that correspond
to the newest and oldest timestamps rather than the actual timestamps.
Rename these variables to reflect that they are actually xids: to wit
newestCommitTsXid and oldestCommitTsXid respectively. Also modify
related code in a similar fashion, particularly the user facing output
emitted by pg_controldata and pg_resetxlog.

Complaint and patch by me, review by Tom Lane and Alvaro Herrera.
Backpatch to 9.5 where these variables were first introduced.

Code and docs review for cube kNN support.

Commit 33bd250f6c4cc309f4eeb657da80f1e7743b3e5c could have done with
some more review:

Adjust coding so that compilers unfamiliar with elog/ereport don't complain
about uninitialized values.

Fix misuse of PG_GETARG_INT16 to retrieve arguments declared as "integer"
at the SQL level. (This was evidently copied from cube_ll_coord and
cube_ur_coord, but those were wrong too.)

Fix non-style-guide-conforming error messages.

Fix underparenthesized if statements, which pgindent would have made a
hash of, and remove some unnecessary parens elsewhere.

Run pgindent over new code.

Revise documentation: repeated accretion of more operators without any
rethinking of the text already there had left things in a bit of a mess.
Merge all the cube operators into one table and adjust surrounding text
appropriately.

David Rowley and Tom Lane

Document brin_summarize_new_pages

Pointer out by Jeff Janes

Document the exponentiation operator as associating left to right.

Common mathematical convention is that exponentiation associates right to
left. We aren't going to change the parser for this, but we could note
it in the operator's description. (It's already noted in the operator
precedence/associativity table, but users might not look there.)
Per bug #13829 from Henrik Pauli.

Fix omission of -X (--no-psqlrc) in some psql invocations.

As of commit d5563d7df, psql -c no longer implies -X, but not all of
our regression testing scripts had gotten that memo.

To ensure consistency of results across different developers, make
sure that *all* invocations of psql in all scripts in our tree
use -X, even where this is not what previously happened.

Michael Paquier and Tom Lane

doc: pg_committs -> pg_commit_ts

Reported by: Alain Laporte (#13836)

Update documentation about pseudo-types.

Tone down an overly strong statement about which pseudo-types PLs are
likely to allow. Add "event_trigger" to the list, as well as
"pg_ddl_command" in 9.5/HEAD. Back-patch to 9.3 where event_trigger
was added.

Fix translation domain in pg_basebackup

For some reason, we've been overlooking the fact that pg_receivexlog
and pg_recvlogical are using wrong translation domains all along,
so their output hasn't ever been translated. The right domain is
pg_basebackup, not their own executable names.

Noticed by Ioseph Kim, who's been working on the Korean translation.

Backpatch pg_receivexlog to 9.2 and pg_recvlogical to 9.4.

Add forgotten CHECK_FOR_INTERRUPT calls in pgcrypto's crypt()

Both Blowfish and DES implementations of crypt() can take arbitrarily
long time, depending on the number of rounds specified by the caller;
make sure they can be interrupted.

Author: Andreas Karlsson
Reviewer: Jeff Janes

Backpatch to 9.1.

Include typmod when complaining about inherited column type mismatches.

MergeAttributes() rejects cases where columns to be merged have the same
type but different typmod, which is correct; but the error message it
printed didn't show either typmod, which is unhelpful. Changing this
requires using format_type_with_typemod() in place of TypeNameToString(),
which will have some minor side effects on the way some type names are
printed, but on balance this is an improvement: the old code sometimes
printed one type according to one set of rules and the other type according
to the other set, which could be confusing in its own way.

Oddly, there were no regression test cases covering any of this behavior,
so add some.

Complaint and fix by Amit Langote

Fix brin_summarize_new_values() to check index type and ownership.

brin_summarize_new_values() did not check that the passed OID was for
an index at all, much less that it was a BRIN index, and would fail in
obscure ways if it wasn't (possibly damaging data first?). It also
lacked any permissions test; by analogy to VACUUM, we should only allow
the table's owner to summarize.

Noted by Jeff Janes, fix by Michael Paquier and me

Improve SECURITY LABEL tab completion

Add DATABASE, EVENT TRIGGER, FOREIGN TABLE, ROLE, and TABLESPACE to
tab completion for SECURITY LABEL.

Kyotaro Horiguchi

Improve the gin index scan performance in pg_trgm.

Previous coding assumes too pessimistic upper bound of similarity
in consistent method of GIN.

Author: Fornaroli Christophe with comments by me.

Remove unnecessary row ordering dependency in pg_rewind test suite.

t/002_databases.pl was expecting to see a specific physical order of the
rows in pg_database. I broke that in HEAD with commit 01e386a325549b77,
but I'd say it's a pretty fragile test methodology in any case, so fix
it in 9.5 as well.

Docs: fix erroneously-given function name.

pg_replication_session_is_setup() exists nowhere; apparently this is
meant to refer to pg_replication_origin_session_is_setup().

Adrien Nayrat

Fix factual and grammatical errors in comments for struct _tableInfo.

Amit Langote, further adjusted by me

Docs typo fix.

Michael Paquier

Avoid VACUUM FULL altogether in initdb.

Commit ed7b3b3811c5836a purported to remove initdb's use of VACUUM FULL,
as had been agreed to in a pghackers discussion back in Dec 2014.
But it missed this one ...

Improve handling of password reuse in src/bin/scripts programs.

This reverts most of commit 83dec5a71 in favor of having connectDatabase()
store the possibly-reusable password in a static variable, similar to the
coding we've had for a long time in pg_dump's version of that function.
To avoid possible problems with unwanted password reuse, make callers
specify whether it's reasonable to attempt to re-use the password.
This is a wash for cases where re-use isn't needed, but it is far simpler
for callers that do want that. Functionally there should be no difference.

Even though we're past RC1, it seems like a good idea to back-patch this
into 9.5, like the prior commit. Otherwise, if there are any third-party
users of connectDatabase(), they'll have to deal with an API change in
9.5 and then another one in 9.6.

Michael Paquier

In pg_dump, remember connection passwords no matter how we got them.

When pg_dump prompts the user for a password, it remembers the password
for possible re-use by parallel worker processes.  However, libpq might
have extracted the password from a connection string originally passed
as "dbname".  Since we don't record the original form of dbname but
break it down to host/port/etc, the password gets lost.  Fix that by
retrieving the actual password from the PGconn.

(It strikes me that this whole approach is rather broken, as it will also
lose other information such as options that might have been present in
the connection string.  But we'll leave that problem for another day.)

In passing, get rid of rather silly use of malloc() for small fixed-size
arrays.

Back-patch to 9.3 where parallel pg_dump was introduced.

Report and fix by Zeus Kronion, adjusted a bit by Michael Paquier and me

Read from the same worker repeatedly until it returns no tuple.

The original coding read tuples from workers in round-robin fashion,
but performance testing shows that it works much better to read enough
to empty one queue before moving on to the next.  I believe the
reason for this is that, with the old approach, we could easily wake
up a worker repeatedly to write only one new tuple into the shm_mq
each time.  With this approach, by the time the process gets scheduled,
it has a decent chance of being able to fill the entire buffer in
one go.

Patch by me.  Dilip Kumar helped with performance testing.

Change Gather not to use a physical tlist.

This should have been part of the original commit, but was missed.
Pushing data between processes is expensive, so we definitely want
to project away unneeded columns here, just as we do for other nodes
like Sort and Hash that care about the volume of data.

Remove unnecessary escaping in C character literals

'\"' is more commonly written simply as '"'.

Allow omitting one or both boundaries in an array slice specifier.

Omitted boundaries represent the upper or lower limit of the corresponding
array subscript. This allows simpler specification of many common
use-cases.

(Revised version of commit 9246af6799819847faa33baf441251003acbb8fe)

YUriy Zhuravlev

Comment improvements for abbreviated keys.

Peter Geoghegan and Robert Haas

postgres_fdw: Consider requesting sorted data so we can do a merge join.

When use_remote_estimate is enabled, consider adding ORDER BY to the
query we sending to the remote server so that we can use that ordered
data for a merge join.  Commit f18c944b6137329ac4a6b2dce5745c5dc21a8578
arranges to push down the query pathkeys, which seems like the case
mostly likely to be a win, but testing shows this can sometimes win,
too.

For a regular table, we know which indexes are present and therefore
test whether the ordering provided by each such index is useful.  Here,
we take the opposite approach: guess what orderings would be useful if
they could be generated cheaply, and then ask the remote side what those
will cost.

Ashutosh Bapat, with very substantial cosmetic revisions by me.  Also
reviewed by Rushabh Lathia.

Fix calculation of space needed for parsed words in tab completion.

Yesterday in commit d854118c8, I had a serious brain fade leading me to
underestimate the number of words that the tab-completion logic could
divide a line into. On input such as "(((((", each character will get
seen as a separate word, which means we do indeed sometimes need more
space for the words than for the original line. Fix that.

Make viewquery a copy in rewriteTargetView()

Rather than expect the Query returned by get_view_query() to be
read-only and then copy bits and pieces of it out, simply copy the
entire structure when we get it. This addresses an issue where
AcquireRewriteLocks, which is called by acquireLocksOnSubLinks(),
scribbles on the parsetree passed in, which was actually an entry
in relcache, leading to segfaults with certain view definitions.
This also future-proofs us a bit for anyone adding more code to this
path.

The acquireLocksOnSubLinks() was added in commit c3e0ddd40.

Back-patch to 9.3 as that commit was.

Remove silly completion for "DELETE FROM tabname ...".

psql offered USING, WHERE, and SET in this context, but SET is not a valid
possibility here. Seems to have been a thinko in commit f5ab0a14ea83eb6c
which added DELETE's USING option.

Teach psql's tab completion to consider the entire input string.

Up to now, the tab completion logic has only examined the last few words
of the current input line; "last few" being originally as few as four
words, but lately up to nine words.  Furthermore, it only looked at what
libreadline considers the current line of input, which made it rather
myopic if you split your command across lines.  This was tolerable,
sort of, so long as the match patterns were only designed to consider the
last few words of input; but with the recent addition of HeadMatches()
and Matches() matching rules, we really have to do better if we want
those to behave sanely.

Hence, change the code to break the entire line down into words, and to
include any previous lines in the command buffer along with the active
readline input buffer.

This will be a little bit slower than the previous coding, but some
measurements say that even a query of several thousand characters can be
parsed in a hundred or so microseconds on modern machines; so it's really
not going to be significant for interactive tab completion.  To reduce
the cost some, I arranged to avoid the per-word malloc calls that used
to occur: all the words are now kept in one malloc'd buffer.

psql: Review of new help output strings

Add missing COSTS OFF to EXPLAIN commands in rowsecurity.sql.

Commit e5e11c8cc added a bunch of EXPLAIN statements without COSTS OFF
to the regression tests. This is contrary to project policy since it
results in unnecessary platform dependencies in the output (it's just
luck that we didn't get buildfarm failures from it). Per gripe from
Mike Wilson.

Adopt a more compact, less error-prone notation for tab completion code.

Replace tests like

    else if (pg_strcasecmp(prev4_wd, "CREATE") == 0 &&
             pg_strcasecmp(prev3_wd, "TRIGGER") == 0 &&
             (pg_strcasecmp(prev_wd, "BEFORE") == 0 ||
              pg_strcasecmp(prev_wd, "AFTER") == 0))

with new notation like this:

    else if (TailMatches4("CREATE", "TRIGGER", MatchAny, "BEFORE|AFTER"))

In addition, provide some macros COMPLETE_WITH_LISTn() to reduce the amount
of clutter needed to specify a small number of predetermined completion
alternatives.

This makes the code substantially more compact: tab-complete.c gets over a
thousand lines shorter in this patch, despite the addition of a couple of
hundred lines of infrastructure for the new notations.  The new way of
specifying match rules seems a whole lot more readable and less
error-prone, too.

There's a lot more that could be done now to make matching faster and more
reliable; for example I suspect that most of the TailMatches() rules should
now be Matches() rules.  That would allow them to be skipped after a single
integer comparison if there aren't the right number of words on the line,
and it would reduce the risk of unintended matches.  But for now, (mostly)
refrain from reworking any match rules in favor of just converting what
we've got into the new notation.

Thomas Munro, reviewed by Michael Paquier, some adjustments by me

Fix whitespace

Fix tab completion for ALTER ... TABLESPACE ... OWNED BY.

Previously the completion used the wrong word to match 'BY'. This was
introduced brokenly, in b2de2a. While at it, also add completion of
IN TABLESPACE ... OWNED BY and fix comments referencing nonexistent
syntax.

Reported-By: Michael Paquier
Author: Michael Paquier and Andres Freund
Discussion: CAB7nPqSHDdSwsJqX0d2XzjqOHr==HdWiubCi4L=Zs7YFTUne8w@mail.gmail.com
Backpatch: 9.4, like the commit introducing the bug

Revert 9246af6799819847faa33baf441251003acbb8fe because
I miss too much. Patch is returned to commitfest process.

pgbench: Change terminology from "threshold" to "parameter".

Per a recommendation from Tomas Vondra, it's more helpful to refer to
the value that determines how skewed a Gaussian or exponential
distribution is as a parameter rather than a threshold.

Since it's not quite too late to get this right in 9.5, where it was
introduced, back-patch this. Most of the patch changes only comments
and documentation, but a few pgbench messages are altered to match.

Fabien Coelho, reviewed by Michael Paquier and by me.

Remove duplicate word.

Kyotaro Horiguchi

Fix TupleQueueReaderNext not to ignore its nowait argument.

This was a silly goof on my (rhaas's) part.

Report and fix by Rushabh Lathia.

Fix copy-and-paste error in logical decoding callback.

This could result in the error context misidentifying where the error
actually occurred.

Craig Ringer

Fix typo in comment.

Amit Langote

Allow to omit boundaries in array subscript

Allow to omiy lower or upper or both boundaries in array subscript
for selecting slice of array.

Author: YUriy Zhuravlev

Cube extension kNN support

Introduce distance operators over cubes:
<#> taxicab distance
<-> euclidean distance
<=> chebyshev distance

Also add kNN support of those distances in GiST opclass.

Author: Stas Kelvich

Remove unreferenced function declarations.

datapagemap_create() and datapagemap_destroy() were declared extern,
but they don't actually exist anywhere. Per YUriy Zhuravlev and
Michael Paquier.

Use just one standalone-backend session for initdb's post-bootstrap steps.

Previously, each subroutine in initdb fired up its own standalone backend
session.  Over time we'd grown as many as fifteen of these sessions,
and the cumulative startup and shutdown work for them was getting pretty
noticeable.  Combining things so that all these steps share a single
backend session cuts a good 10% off the total runtime of initdb, more
if you're not fsync'ing.

The main stumbling block to doing this before was that some of the sessions
were run with -j and some not.  The improved definition of -j mode
implemented by my previous commit makes it possible to fix that by running
all the post-bootstrap steps with -j; we just have to use double instead of
single newlines to end command strings.  (This is only absolutely necessary
around the VACUUM and CREATE DATABASE steps, since those can't be run in a
transaction block.  But it seems best to make them all use double newlines
so that the commands remain separate for error-reporting purposes.)

A minor disadvantage is that since initdb can't tell how much of its
output the backend has executed, we can no longer have the per-step
progress reporting initdb used to print.  But things are fast enough
nowadays that that's not really all that useful anyway.

In passing, add more const decoration to some of the static arrays in
initdb.c.

Adjust behavior of single-user -j mode for better initdb error reporting.

Previously, -j caused the entire input file to be read in and executed as
a single command string.  That's undesirable, not least because any error
causes the entire file to be regurgitated as the "failing query".  Some
experimentation suggests a better rule: end the command string when we see
a semicolon immediately followed by two newlines, ie, an empty line after
a query.  This serves nicely to break up the existing examples such as
information_schema.sql and system_views.sql.  A limitation is that it's
no longer possible to write such a sequence within a string literal or
multiline comment in a file meant to be read with -j; but there are no
instances of such a problem within the data currently used by initdb.
(If someone does make such a mistake in future, it'll be obvious because
they'll get an unterminated-literal or unterminated-comment syntax error.)
Other than that, there shouldn't be any negative consequences; you're not
forced to end statements that way, it's just a better idea in most cases.

In passing, remove src/include/tcop/tcopdebug.h, which is dead code
because it's not included anywhere, and hasn't been for more than
ten years.  One of the debug-support symbols it purported to describe
has been unreferenced for at least the same amount of time, and the
other is removed by this commit on the grounds that it was useless:
forcing -j mode all the time would have broken initdb.  The lack of
complaints about that, or about the missing inclusion, shows that
no one has tried to use TCOP_DONTUSENEWLINE in many years.

Fix improper initialization order for readline.

Turns out we must set rl_basic_word_break_characters *before* we call
rl_initialize() the first time, because it will quietly copy that value
elsewhere --- but only on the first call.  (Love these undocumented
dependencies.)  I broke this yesterday in commit 2ec477dc8108339d;
like that commit, back-patch to all active branches.  Per report from
Pavel Stehule.

Rework internals of changing a type's ownership

This is necessary so that REASSIGN OWNED does the right thing with
composite types, to wit, that it also alters ownership of the type's
pg_class entry -- previously, the pg_class entry remained owned by the
original user, which caused later other failures such as the new owner's
inability to use ALTER TYPE to rename an attribute of the affected
composite.  Also, if the original owner is later dropped, the pg_class
entry becomes owned by a non-existant user which is bogus.

To fix, create a new routine AlterTypeOwner_oid which knows whether to
pass the request to ATExecChangeOwner or deal with it directly, and use
that in shdepReassignOwner rather than calling AlterTypeOwnerInternal
directly.  AlterTypeOwnerInternal is now simpler in that it only
modifies the pg_type entry and recurses to handle a possible array type;
higher-level tasks are handled by either AlterTypeOwner directly or
AlterTypeOwner_oid.

I took the opportunity to add a few more objects to the test rig for
REASSIGN OWNED, so that more cases are exercised.  Additional ones could
be added for superuser-only-ownable objects (such as FDWs and event
triggers) but I didn't want to push my luck by adding a new superuser to
the tests on a backpatchable bug fix.

Per bug #13666 reported by Chris Pacejo.

Backpatch to 9.5.

(I would back-patch this all the way back, except that it doesn't apply
cleanly in 9.4 and earlier because 59367fdf9 wasn't backpatched.  If we
decide that we need this in earlier branches too, we should backpatch
both.)

Cope with Readline's failure to track SIGWINCH events outside of input.

It emerges that libreadline doesn't notice terminal window size change
events unless they occur while collecting input.  This is easy to stumble
over if you resize the window while using a pager to look at query output,
but it can be demonstrated without any pager involvement.  The symptom is
that queries exceeding one line are misdisplayed during subsequent input
cycles, because libreadline has the wrong idea of the screen dimensions.

The safest, simplest way to fix this is to call rl_reset_screen_size()
just before calling readline().  That causes an extra ioctl(TIOCGWINSZ)
for every command; but since it only happens when reading from a tty, the
performance impact should be negligible.  A more valid objection is that
this still leaves a tiny window during entry to readline() wherein delivery
of SIGWINCH will be missed; but the practical consequences of that are
probably negligible.  In any case, there doesn't seem to be any good way to
avoid the race, since readline exposes no functions that seem safe to call
from a generic signal handler --- rl_reset_screen_size() certainly isn't.

It turns out that we also need an explicit rl_initialize() call, else
rl_reset_screen_size() dumps core when called before the first readline()
call.

rl_reset_screen_size() is not present in old versions of libreadline,
so we need a configure test for that.  (rl_initialize() is present at
least back to readline 4.0, so we won't bother with a test for it.)
We would need a configure test anyway since libedit's emulation of
libreadline doesn't currently include such a function.  Fortunately,
libedit seems not to have any corresponding bug.

Merlin Moncure, adjusted a bit by me

Speed up CREATE INDEX CONCURRENTLY's TID sort.

Encode TIDs as 64-bit integers to speed up comparisons. This seems to
speed things up on all platforms, but is even more beneficial when
8-byte integers are passed by value.

Peter Geoghegan. Design suggestions and review by Tom Lane. Review
also by Simon Riggs and by me.

Mark CHECK constraints declared NOT VALID valid if created with table.

FOREIGN KEY constraints have behaved this way for a long time, but for
some reason the behavior of CHECK constraints has been inconsistent up
until now.

Amit Langote and Amul Sul, with assorted tweaks by me.

Document use of Subject Alternative Names in SSL server certificates.

Commit acd08d764 did not bother with updating the documentation.

Update 9.5 release notes through today.

Also do another round of copy-editing, and fix up remaining FIXME items.

Teach mdnblocks() not to create zero-length files.

It's entirely surprising that mdnblocks() has the side effect of
creating new files on disk, so let's make it not do that. One
consequence of the old behavior is that, if running on a damaged
cluster that is missing a file, mdnblocks() can recreate the file
and allow a subsequent _mdfd_getseg() for a higher segment to succeed.
This happens because, while mdnblocks() stops when it finds a segment
that is shorter than 1GB, _mdfd_getseg() has no such check, and thus
the empty file created by mdnblocks() can allow it to continue its
traversal and find higher-numbered segments which remain.

It might be a good idea for _mdfd_getseg() to actually verify that
each segment it finds is exactly 1GB before proceeding to the next
one, but that would involve some additional system calls, so for
now I'm just doing this much.

Patch by me, per off-list analysis by Kevin Grittner and Rahila Syed.
Review by Andres Freund.

Move buffer I/O and content LWLocks out of the main tranche.

Move the content lock directly into the BufferDesc, so that locking and
pinning a buffer touches only one cache line rather than two. Adjust
the definition of BufferDesc slightly so that this doesn't make the
BufferDesc any larger than one cache line (at least on platforms where
a spinlock is only 1 or 2 bytes).

We can't fit the I/O locks into the BufferDesc and stay within one
cache line, so move those to a completely separate tranche. This
leaves a relatively limited number of LWLocks in the main tranche, so
increase the padding of those remaining locks to a full cache line,
rather than allowing adjacent locks to share a cache line, hopefully
reducing false sharing.

Performance testing shows that these changes make little difference
on laptop-class machines, but help significantly on larger servers,
especially those with more than 2 sockets.

Andres Freund, originally based on an earlier patch by Simon Riggs.
Review and cosmetic adjustments (including heavy rewriting of the
comments) by me.

Provide a way to predefine LWLock tranche IDs.

It's a bit cumbersome to use LWLockNewTrancheId(), because the returned
value needs to be shared between backends so that each backend can call
LWLockRegisterTranche() with the correct ID. So, for built-in tranches,
use a hard-coded value instead.

This is motivated by an upcoming patch adding further built-in tranches.

Andres Freund and Robert Haas

Improve CREATE POLICY documentation

Clarify that SELECT policies are now applied when SELECT rights
are required for a given query, even if the query is an UPDATE or
DELETE query. Pointed out by Noah.

Additionally, note the risk regarding concurrently open transactions
where a relation which controls access to the rows of another relation
are updated and the rows of the primary relation are also being
modified. Pointed out by Peter Geoghegan.

Back-patch to 9.5.

Collect the global OR of hasRowSecurity flags for plancache

We carry around information about if a given query has row security or
not to allow the plancache to use that information to invalidate a
planned query in the event that the environment changes.

Previously, the flag of one of the subqueries was simply being copied
into place to indicate if the query overall included RLS components.
That's wrong as we need the global OR of all subqueries. Fix by
changing the code to match how fireRIRules works, which is results
in OR'ing all of the flags.

Noted by Tom.

Back-patch to 9.5 where RLS was introduced.

Add missing cleanup logic in pg_rewind/t/005_same_timeline.pl test.

Per Michael Paquier

Add missing CHECK_FOR_INTERRUPTS in lseg_inside_poly

Apparently, there are bugs in this code that cause it to loop endlessly.
That bug still needs more research, but in the meantime it's clear that
the loop is missing a check for interrupts so that it can be cancelled
timely.

Backpatch to 9.1 -- this has been missing since 49475aab8d0d.

Remove xmlparse(document '') test

This one test was behaving differently between the ubuntu fix for
CVE-2015-7499 and the base "expected" file. It's not worth having
yet another version of the expected file for this test, so drop it.
Perhaps at some point when all distros have settled down to the
same behavior on this test, it can be restored.

Problem found by me on libxml2 (2.9.1+dfsg1-3ubuntu4.6).
Solution suggested by Tom Lane.
Backpatch to 9.5, where the test was added.

Fix out-of-memory error handling in ParameterDescription message processing.

If libpq ran out of memory while constructing the result set, it would hang,
waiting for more data from the server, which might never arrive. To fix,
distinguish between out-of-memory error and not-enough-data cases, and give
a proper error message back to the client on OOM.

There are still similar issues in handling COPY start messages, but let's
handle that as a separate patch.

Michael Paquier, Amit Kapila and me. Backpatch to all supported versions.

Fix bug in SetOffsetVacuumLimit() triggered by find_multixact_start() failure.

Previously, if find_multixact_start() failed, SetOffsetVacuumLimit() would
install 0 into MultiXactState->offsetStopLimit if it previously succeeded.
Luckily, there are no known cases where find_multixact_start() will return
an error in 9.5 and above. But if it were to happen, for example due to
filesystem permission issues, it'd be somewhat bad: GetNewMultiXactId()
could continue allocating mxids even if close to a wraparound, or it could
erroneously stop allocating mxids, even if no wraparound is looming. The
wrong value would be corrected the next time SetOffsetVacuumLimit() is
called, or by a restart.

Reported-By: Noah Misch, although this is not his preferred fix
Discussion: 20151210140450.GA22278@alap3.anarazel.de
Backpatch: 9.5, where the bug was introduced as part of 4f627f

Correct statement to actually be the intended assert statement.

e3f4cfc7 introduced a LWLockHeldByMe() call, without the corresponding
Assert() surrounding it.

Spotted by Coverity.

Backpatch: 9.1+, like the previous commit

Docs: document that psql's "\i -" means read from stdin.

This has worked that way for a long time, maybe always, but you would
not have known it from the documentation. Also back-patch the notes
I added to HEAD earlier today about behavior of the "-f -" switch,
which likewise have been valid for many releases.

Code and docs review for multiple -c and -f options in psql.

Commit d5563d7df94488bf drew complaints from Coverity, which quite
correctly complained that one copy of each -c or -f string was being
leaked. What's more, simple_action_list_append was allocating enough space
for still a third copy of each string as part of the SimpleActionListCell,
even though that coding method had been superseded by a separate strdup
operation. There were some other minor coding infelicities too. The
documentation needed more work as well, eg it forgot to explain that -c
causes psql not to accept any interactive input.

Consistently set all fields in pg_stat_replication to null instead of 0

Previously the "sent" field would be set to 0 and all other xlog
pointers be set to NULL if there were no valid values (such as when
in a backup sending walsender).

Properly initialize write, flush and replay locations in walsender slots

These would leak random xlog positions if a walsender used for backup would
a walsender slot previously used by a replication walsender.

In passing also fix a couple of cases where the xlog pointer is directly
compared to zero instead of using XLogRecPtrIsInvalid, noted by
Michael Paquier.

Doc: update external URLs for PostGIS project.

Paul Ramsey

doc: Add some markup

Fix ALTER TABLE ... SET TABLESPACE for unlogged relations.

Changing the tablespace of an unlogged relation did not WAL log the
creation and content of the init fork. Thus, after a standby is
promoted, unlogged relation cannot be accessed anymore, with errors
like:
ERROR: 58P01: could not open file "pg_tblspc/...": No such file or directory
Additionally the init fork was not synced to disk, independent of the
configured wal_level, a relatively small durability risk.

Investigation of that problem also brought to light that, even for
permanent relations, the creation of !main forks was not WAL logged,
i.e. no XLOG_SMGR_CREATE record were emitted. That mostly turns out not
to be a problem, because these files were created when the actual
relation data is copied; nonexistent files are not treated as an error
condition during replay. But that doesn't work for empty files, and
generally feels a bit haphazard. Luckily, outside init and main forks,
empty forks don't occur often or are not a problem.

Add the required WAL logging and syncing to disk.

Reported-By: Michael Paquier
Author: Michael Paquier and Andres Freund
Discussion: 20151210163230.GA11331@alap3.anarazel.de
Backpatch: 9.1, where unlogged relations were introduced

Add an expected-file to match behavior of latest libxml2.

Recent releases of libxml2 do not provide error context reports for errors
detected at the very end of the input string. This appears to be a bug, or
at least an infelicity, introduced by the fix for libxml2's CVE-2015-7499.
We can hope that this behavioral change will get undone before too long;
but the security patch is likely to spread a lot faster/further than any
follow-on cleanup, which means this behavior is likely to be present in the
wild for some time to come. As a stopgap, add a variant regression test
expected-file that matches what you get with a libxml2 that acts this way.

pg_rewind: Don't error if the two clusters are already on the same timeline

This previously resulted in an error and a nonzero exit status, but
after discussion this should rather be a noop with a zero exit status.

For REASSIGN OWNED for foreign user mappings

As reported in bug #13809 by Alexander Ashurkov, the code for REASSIGN
OWNED hadn't gotten word about user mappings.  Deal with them in the
same way default ACLs do, which is to ignore them altogether; they are
handled just fine by DROP OWNED.  The other foreign object cases are
already handled correctly by both commands.

Also add a REASSIGN OWNED statement to foreign_data test to exercise the
foreign data objects.  (The changes are just before the "cleanup" phase,
so it shouldn't remove any existing live test.)

Reported by Alexander Ashurkov, then independently by Jaime Casanova.

Install our "missing" script where PGXS builds can find it.

This allows sane behavior in a PGXS build done on a machine where build
tools such as bison are missing.

Jim Nasby

Handle policies during DROP OWNED BY

DROP OWNED BY handled GRANT-based ACLs but was not removing roles from
policies. Fix that by having DROP OWNED BY remove the role specified
from the list of roles the policy (or policies) apply to, or the entire
policy (or policies) if it only applied to the role specified.

As with ACLs, the DROP OWNED BY caller must have permission to modify
the policy or a WARNING is thrown and no change is made to the policy.

Get rid of the planner's LateralJoinInfo data structure.

I originally modeled this data structure on SpecialJoinInfo, but after
commit acfcd45cacb6df23 that looks like a pretty poor decision.
All we really need is relid sets identifying laterally-referenced rels;
and most of the time, what we want to know about includes indirect lateral
references, a case the LateralJoinInfo data was unsuited to compute with
any efficiency.  The previous commit redefined RelOptInfo.lateral_relids
as the transitive closure of lateral references, so that it easily supports
checking indirect references.  For the places where we really do want just
direct references, add a new RelOptInfo field direct_lateral_relids, which
is easily set up as a copy of lateral_relids before we perform the
transitive closure calculation.  Then we can just drop lateral_info_list
and LateralJoinInfo and the supporting code.  This makes the planner's
handling of lateral references noticeably more efficient, and shorter too.

Such a change can't be back-patched into stable branches for fear of
breaking extensions that might be looking at the planner's data structures;
but it seems not too late to push it into 9.5, so I've done so.

Handle dependencies properly in ALTER POLICY

ALTER POLICY hadn't fully considered partial policy alternation
(eg: change just the roles on the policy, or just change one of
the expressions) when rebuilding the dependencies. Instead, it
would happily remove all dependencies which existed for the
policy and then only recreate the dependencies for the objects
referred to in the specific ALTER POLICY command.

Correct that by extracting and building the dependencies for all
objects referenced by the policy, regardless of if they were
provided as part of the ALTER POLICY command or were already in
place as part of the pre-existing policy.

Still more fixes for planner's handling of LATERAL references.

More fuzz testing by Andreas Seltenreich exposed that the planner did not
cope well with chains of lateral references.  If relation X references Y
laterally, and Y references Z laterally, then we will have to scan X on the
inside of a nestloop with Z, so for all intents and purposes X is laterally
dependent on Z too.  The planner did not understand this and would generate
intermediate joins that could not be used.  While that was usually harmless
except for wasting some planning cycles, under the right circumstances it
would lead to "failed to build any N-way joins" or "could not devise a
query plan" planner failures.

To fix that, convert the existing per-relation lateral_relids and
lateral_referencers relid sets into their transitive closures; that is,
they now show all relations on which a rel is directly or indirectly
laterally dependent.  This not only fixes the chained-reference problem
but allows some of the relevant tests to be made substantially simpler
and faster, since they can be reduced to simple bitmap manipulations
instead of searches of the LateralJoinInfo list.

Also, when a PlaceHolderVar that is due to be evaluated at a join contains
lateral references, we should treat those references as indirect lateral
dependencies of each of the join's base relations.  This prevents us from
trying to join any individual base relations to the lateral reference
source before the join is formed, which again cannot work.

Andreas' testing also exposed another oversight in the "dangerous
PlaceHolderVar" test added in commit 85e5e222b1dd02f1.  Simply rejecting
unsafe join paths in joinpath.c is insufficient, because in some cases
we will end up rejecting *all* possible paths for a particular join, again
leading to "could not devise a query plan" failures.  The restriction has
to be known also to join_is_legal and its cohort functions, so that they
will not select a join for which that will happen.  I chose to move the
supporting logic into joinrels.c where the latter functions are.

Back-patch to 9.3 where LATERAL support was introduced.

Fix commit timestamp initialization

This module needs explicit initialization in order to replay WAL records
in recovery, but we had broken this recently following changes to make
other (stranger) scenarios work correctly. To fix, rework the
initialization sequence so that it always takes place before WAL replay
commences for both master and standby.

I could have gone for a more localized fix that just added a "startup"
call for the master server, but it seemed better to restructure the
existing callers as well so that the whole thing made more sense. As a
drawback, there is more control logic in xlog.c now than previously, but
doing otherwise meant passing down the ControlFile flag, which seemed
uglier as a whole.

This also meant adding a check to not re-execute ActivateCommitTs if it
had already been called.

Reported by Fujii Masao.

Backpatch to 9.5.

Improve some messages

Improve ALTER POLICY tab completion.

Complete "ALTER POLICY" with a policy name, as we do for DROP POLICY.
And, complete "ALTER POLICY polname ON" with a table name that has such
a policy, as we do for DROP POLICY, rather than with any table name
at all.

Masahiko Sawada

Fix typo.

Etsuro Fujita

Fix ON CONFLICT UPDATE bug breaking AFTER UPDATE triggers.

ExecOnConflictUpdate() passed t_ctid of the to-be-updated tuple to
ExecUpdate(). That's problematic primarily because of two reason: First
and foremost t_ctid could point to a different tuple. Secondly, and
that's what triggered the complaint by Stanislav, t_ctid is changed by
heap_update() to point to the new tuple version. The behavior of AFTER
UPDATE triggers was therefore broken, with NEW.* and OLD.* tuples
spuriously identical within AFTER UPDATE triggers.

To fix both issues, pass a pointer to t_self of a on-stack HeapTuple
instead.

Fixing this bug lead to one change in regression tests, which previously
failed due to the first issue mentioned above. There's a reasonable
expectation that test fails, as it updates one row repeatedly within one
INSERT ... ON CONFLICT statement. That is only possible if the second
update is triggered via ON CONFLICT ... SET, ON CONFLICT ... WHERE, or
by a WITH CHECK expression, as those are executed after
ExecOnConflictUpdate() does a visibility check. That could easily be
prohibited, but given it's allowed for plain UPDATEs and a rare corner
case, it doesn't seem worthwhile.

Reported-By: Stanislav Grozev
Author: Andres Freund and Peter Geoghegan
Discussion: CAA78GVqy1+LisN-8DygekD_Ldfy=BJLarSpjGhytOsgkpMavfQ@mail.gmail.com
Backpatch: 9.5, where ON CONFLICT was introduced