granicus.if.org Git - postgresql/log

Add new OID alias type regrole

The new type has the scope of whole the database cluster so it doesn't
behave the same as the existing OID alias types which have database
scope,
concerning object dependency. To avoid confusion constants of the new
type are prohibited from appearing where dependencies are made involving
it.

Also, add a note to the docs about possible MVCC violation and
optimization issues, which are general over the all reg* types.

Kyotaro Horiguchi

Improve ParseConfigFp comment wrt head/tail

The head_p and tail_p pointers passed to ParseConfigFp() are actually
input/output parameters, not strictly output paramaters. This updates
the function comment to reflect that.

Per discussion with Tom.

Change default for include_realm to 1

The default behavior for GSS and SSPI authentication methods has long
been to strip the realm off of the principal, however, this is not a
secure approach in multi-realm environments and the use-case for the
parameter at all has been superseded by the regex-based mapping support
available in pg_ident.conf.

Change the default for include_realm to be '1', meaning that we do
NOT remove the realm from the principal by default.  Any installations
which depend on the existing behavior will need to update their
configurations (ideally by leaving include_realm set to 1 and adding a
mapping in pg_ident.conf, but alternatively by explicitly setting
include_realm=0 prior to upgrading).  Note that the mapping capability
exists in all currently supported versions of PostgreSQL and so this
change can be done today.  Barring that, existing users can update their
configurations today to explicitly set include_realm=0 to ensure that
the prior behavior is maintained when they upgrade.

This needs to be noted in the release notes.

Per discussion with Magnus and Peter.

Modify pg_stat_get_activity to build a tuplestore

This updates pg_stat_get_activity() to build a tuplestore for its
results instead of using the old-style multiple-call method. This
simplifies the function, though that wasn't the primary motivation for
the change, which is that we may turn it into a helper function which
can filter the results (or not) much more easily.

Bump catversion for pg_file_settings

Pointed out by Andres (thanks!)

Apologies for not including it in the initial patch.

Add pg_file_settings view and function

The function and view added here provide a way to look at all settings
in postgresql.conf, any #include'd files, and postgresql.auto.conf
(which is what backs the ALTER SYSTEM command).

The information returned includes the configuration file name, line
number in that file, sequence number indicating when the parameter is
loaded (useful to see if it is later masked by another definition of the
same parameter), parameter name, and what it is set to at that point.
This information is updated on reload of the server.

This is unfiltered, privileged, information and therefore access is
restricted to superusers through the GRANT system.

Author: Sawada Masahiko, various improvements by me.
Reviewers: David Steele

Fix two problems in infer_arbiter_indexes().

The first is a pretty simple bug where a relcache entry is used after
the relation is closed. In this particular situation it does not appear
to have bad consequences unless compiled with RELCACHE_FORCE_RELEASE.

The second is that infer_arbiter_indexes() skipped indexes that aren't
yet valid according to indcheckxmin. That's not required here, because
uniqueness checks don't care about visibility according to an older
snapshot. While thats not really a bug, it makes things undesirably
non-deterministic. There is some hope that this explains a test failure
on buildfarm member jaguarundi.

Discussion: 9096.1431102730@sss.pgh.pa.us

At promotion, archive last segment from old timeline with .partial suffix.

Previously, we would archive the possible-incomplete WAL segment with its
normal filename, but that causes trouble if the server owning that timeline
is still running, and tries to archive the same segment later. It's not nice
for the standby to trip up the master's archival like that. And it's pretty
confusing, anyway, to have an incomplete segment in the archive that's
indistinguishable from a normal, complete segment.

To avoid such confusion, add a .partial suffix to the file. Or to be more
precise, make a copy of the old segment under the .partial suffix, and
archive that instead of the original file. pg_receivexlog also uses the
.partial suffix for the same purpose, to tell apart incompletely streamed
files from complete ones.

There is no automatic mechanism to use the .partial files at recovery, so
they will go unused, unless the administrator manually copies to them to
the pg_xlog directory (and removes the .partial suffix). Recovery won't
normally need the WAL - when recovering to the new timeline, it will find
the same WAL on the first segment on the new timeline instead - but it
nevertheless feels better to archive the file with the .partial suffix, for
debugging purposes if nothing else.

Add macros to check if a filename is a WAL segment or other such file.

We had many instances of the strlen + strspn combination to check for that.
This makes the code a bit easier to read.

Fix whitespace

Minor ON CONFLICT related comments and doc fixes.

Geoff Winkless, Stephen Frost, Peter Geoghegan and me.

Teach autovacuum about multixact member wraparound.

The logic introduced in commit b69bf30b9bfacafc733a9ba77c9587cf54d06c0c
and repaired in commits 669c7d20e6374850593cb430d332e11a3992bbcf and
7be47c56af3d3013955c91c2877c08f2a0e3e6a2 helps to ensure that we don't
overwrite old multixact member information while it is still needed,
but a user who creates many large multixacts can still exhaust the
member space (and thus start getting errors) while autovacuum stands
idly by.

To fix this, progressively ramp down the effective value (but not the
actual contents) of autovacuum_multixact_freeze_max_age as member space
utilization increases. This makes autovacuum more aggressive and also
reduces the threshold for a manual VACUUM to perform a full-table scan.

This patch leaves unsolved the problem of ensuring that emergency
autovacuums are triggered even when autovacuum=off. We'll need to fix
that via a separate patch.

Thomas Munro and Robert Haas

Remove reference to src/tools/backend/index.html

src/tools/backend was removed back in 63f1ccd, but
backend/storage/lmgr/README didn't get the memo.

Author: Amit Langote

Remove dependency on ordering in logical decoding upsert test.

Buildfarm member magpie sorted the output differently than intended by
Peter. "Resolve" the problem by simply not aggregating, it's not that
many lines.

Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.

The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert.  If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made.  If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
    Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
    Dean Rasheed, Stephen Frost and many others.

Represent columns requiring insert and update privileges indentently.

Previously, relation range table entries used a single Bitmapset field
representing which columns required either UPDATE or INSERT privileges,
despite the fact that INSERT and UPDATE privileges are separately
cataloged, and may be independently held. As statements so far required
either insert or update privileges but never both, that was
sufficient. The required permission could be inferred from the top level
statement run.

The upcoming INSERT ... ON CONFLICT UPDATE feature needs to
independently check for both privileges in one statement though, so that
is not sufficient anymore.

Bumps catversion as stored rules change.

Author: Peter Geoghegan
Reviewed-By: Andres Freund

Improve BRIN infra, minmax opclass and regression test

The minmax opclass was using the wrong support functions when
cross-datatypes queries were run.  Instead of trying to fix the
pg_amproc definitions (which apparently is not possible), use the
already correct pg_amop entries instead.  This requires jumping through
more hoops (read: extra syscache lookups) to obtain the underlying
functions to execute, but it is necessary for correctness.

Author: Emre Hasegeli, tweaked by Álvaro
Review: Andreas Karlsson

Also change BrinOpcInfo to record each stored type's typecache entry
instead of just the OID.  Turns out that the full type cache is
necessary in brin_deform_tuple: the original code used the indexed
type's byval and typlen properties to extract the stored tuple, which is
correct in Minmax; but in other implementations that want to store
something different, that's wrong.  The realization that this is a bug
comes from Emre also, but I did not use his patch.

I also adopted Emre's regression test code (with smallish changes),
which is more complete.

Fix incorrect math in DetermineSafeOldestOffset.

The old formula didn't have enough parentheses, so it would do the wrong
thing, and it used / rather than % to find a remainder. The effect of
these oversights is that the stop point chosen by the logic introduced in
commit b69bf30b9bfacafc733a9ba77c9587cf54d06c0c might be rather
meaningless.

Thomas Munro, reviewed by Kevin Grittner, with a whitespace tweak by me.

Makefile: Add comment that doc uninstall clears man directories

Report by Mario Valdez

Properly send SCM status updates when shutting down service on Windows

The Service Control Manager should be notified regularly during a shutdown
that takes a long time. Previously we would increaes the counter, but forgot
to actually send the notification to the system. The loop counter was also
incorrectly initalized in the event that the startup of the system took long
enough for it to increase, which could cause the shutdown process not to wait
as long as expected.

Krystian Bigaj, reviewed by Michael Paquier

Fix indentation that could mask a future bug

Michael Paquier, spotted using Coverity

Fix minor resource leak in pg_dump

Michael Paquier, spotted using Coverity

Avoid using a C++ keyword as a structure member name.

Per request from Peter Eisentraut.

citext's regexp_matches() functions weren't documented, either.

Fix incorrect declaration of citext's regexp_matches() functions.

These functions should return SETOF TEXT[], like the core functions they
are wrappers for; but they were incorrectly declared as returning just
TEXT[].  This mistake had two results: first, if there was no match you got
a scalar null result, whereas what you should get is an empty set (zero
rows).  Second, the 'g' flag was effectively ignored, since you would get
only one result array even if there were multiple matches, as reported by
Jeff Certain.

While ignoring 'g' is a clear bug, the behavior for no matches might well
have been thought to be the intended behavior by people who hadn't compared
it carefully to the core regexp_matches() functions.  So we should tread
carefully about introducing this change in the back branches.  Still, it
clearly is a bug and so providing some fix is desirable.

After discussion, the conclusion was to introduce the change in a 1.1
version of the citext extension (as we would need to do anyway); 1.0 still
contains the incorrect behavior.  1.1 is the default and only available
version in HEAD, but it is optional in the back branches, where 1.0 remains
the default version.  People wishing to adopt the fix in back branches will
need to explicitly do ALTER EXTENSION citext UPDATE TO '1.1'.  (I also
provided a downgrade script in the back branches, so people could go back
to 1.0 if necessary.)

This should be called out as an incompatible change in the 9.5 release
notes, although we'll also document it in the next set of back-branch
release notes.  The notes should mention that any views or rules that use
citext's regexp_matches() functions will need to be dropped before
upgrading to 1.1, and then recreated again afterwards.

Back-patch to 9.1.  The bug goes all the way back to citext's introduction
in 8.4, but pre-9.1 there is no extension mechanism with which to manage
the change.  Given the lack of previous complaints it seems unnecessary to
change this behavior in 9.0, anyway.

doc: Update installation instructions for new shared libperl/libpython handling

Add geometry/range functions to support BRIN inclusion

This commit adds the following functions:
    box(point) -> box
    bound_box(box, box) -> box
    inet_same_family(inet, inet) -> bool
    inet_merge(inet, inet) -> cidr
    range_merge(anyrange, anyrange) -> anyrange

The first of these is also used to implement a new assignment cast from
point to box.

These functions are the first part of a base to implement an "inclusion"
operator class for BRIN, for multidimensional data types.

Author: Emre Hasegeli
Reviewed by: Andreas Karlsson

Fix some problems with patch to fsync the data directory.

pg_win32_is_junction() was a typo for pgwin32_is_junction(). open()
was used not only in a two-argument form, which breaks on Windows,
but also where BasicOpenFile() should have been used.

Per reports from Andrew Dunstan and David Rowley.

hstore_plpython: Support tests on Python 2.3

Python 2.3 does not have the sorted() function, so do it the long way.

Fix typos

Author: Erik Rijkers <er@xs4all.nl>

Use outerPlanState macro instead of referring to leffttree.

This makes the executor code more consistent. It also removes
an apparently superfluous NULL test in nodeGroup.c.

Qingqing Zhou, reviewed by Tom Lane, and further revised by me.

Improve procost estimates for some text search functions.

The text search functions that involve parsing raw text into lexemes are
remarkably CPU-intensive, so estimating them at the same cost as most other
built-in functions seems like a mistake; moreover, doing so turns out to
discourage the optimizer from using functional indexes on these functions.
After some debate, we've agreed to raise procost from 1 to 100 for
to_tsvector(), plainto_tsvector(), to_tsquery(), ts_headline(),
ts_match_tt(), and ts_match_tq(), which are all the text search functions
that parse raw text.

Also increase procost for the 2-argument form of ts_rewrite()
(tsquery_rewrite_query); while this function doesn't do text parsing,
it does execute a user-supplied SQL query, so its previous procost of 1 is
clearly a drastic underestimate. It seems reasonable to assign it the same
cost we assign to PL functions by default, so 100 is the number here too.

I did not bother bumping catversion for this change, since it does not
break catalog compatibility with the server executable nor result in
any regression test changes.

Per complaint from Andrew Gierth and subsequent discussion.

Recursively fsync() the data directory after a crash.

Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk. This could lead to data corruption.

Back-patch to all supported versions.

Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised
by me.

Fix the same-rel optimization when creating WAL records.

prev_regbuf was never set, and therefore the same-rel flag was never set on
WAL records.

Report and fix by Zhanq Zq

Fix two small bugs in json's populate_record_worker

The first bug is not releasing a tupdesc when doing an early return out
of the function. The second bug is a logic error in choosing when to do
an early return if given an empty jsonb object.

Bug reports from Pavel Stehule and Tom Lane respectively.

Backpatch to 9.4 where these were introduced.

Second try at fixing warnings caused by commit 9b43d73b3f9bef27.

Commit ef3f9e642d2b2bba suppressed one cause of warnings here, but
recent clang on OS X is still unhappy because we're passing a "long"
to abs().  The fact that tm_gmtoff is declared as long is no doubt a
hangover from days when int might be only 16 bits; but Postgres has
never been able to run on such machines, so we can just cast it to int
with no worries.  For consistency, also cast to int in the other
uses of tm_gmtoff in this stanza.

Note: this code is still broken on machines that don't follow C99
integer-division-truncates-towards-zero rules.  Given the lack of
complaints about it, I don't feel a large desire to complicate things
enough to cope with the pre-C99 rules.

Fix overlooked relcache invalidation in ALTER TABLE ... ALTER CONSTRAINT.

When altering the deferredness state of a foreign key constraint, we
correctly updated the catalogs and then invalidated the relcache state for
the target relation ... but that's not the only relation with relevant
triggers. Must invalidate the other table as well, or the state change
fails to take effect promptly for operations triggered on the other table.
Per bug #13224 from Christian Ullrich.

In passing, reorganize regression test case for this feature so that it
isn't randomly injected into the middle of an unrelated test sequence.

Oversight in commit f177cbfe676dc2c7ca2b206c54d6bf819feeea8b. Back-patch
to 9.4 where the faulty code was added.

fix escaping of brackets for m4 broken in b6b2149e48aa61981ae0199c963d5145a37c258c

Enable transforms modules to build and run with Mingw builds.

These modules were all missing essential Windows scaffolding, including
resources files and descriptions, and links to the relevant library
import files. This latter item means that the modules can't be built
with pgxs on Windows, as we don't install the import files. If we ever
decide to install them this restriction could probably be removed.

Also, as with plperl we need to make sure that perl's CORE directory is
last on the include list, as on Windows it appears to contain some
headers with names that clash with names of some headers we include.

Fix python_includespec on Windows at configure time

By converting to using forward slashes at configure time we avoid
having to repeat the logic anywhere that this is needed, such as
in transforms modules for plpython.

Combine initdb tests that successfully create a data directory.

This eliminates many seconds of test duration and the cause to invoke
"rm -rf", which is typically unavailable on Windows.

Michael Paquier and Noah Misch

Fix one more TAP test to use standard command-line argument ordering.

Commit c67a86f7da90c30b81f91957023fb752f06f0598 caught most of these,
but this negative test escaped notice. The test did pass, for the wrong
reason, under affected configurations.

Michael Paquier

Rename coerce_type() local variable.

coerce_type() has local variables named targetTypeId, baseTypeId, and
targetType. targetType has been the Type structure for baseTypeId, so
rename it to baseType.

Windows also needs an override of the shared libpython detection

hstore_plperl: Move port-specific parts to later in the makefile

PORTNAME isn't set until the global makefiles have been included.

Fix shared libpython detection on OS X

Apparently, looking for an appropriately named file doesn't work on some
older versions, so put the back the explicit platform detection.

Make hstore_plperl's build even more like plperl's

Combine the two places that set CPPFLAGS into one. Also, some settings
should be restricted to Windows only. More precisely, -Wno-comment is
a GCC-only option, but Windows in a makefile implies GCC at the moment.

Also, since -Wno-comment is more properly a preprocessor option, move it
to CPPFLAGS to simplify things a bit.

Move interpreter shared library detection to configure

For building PL/Perl, PL/Python, and PL/Tcl, we need a shared library of
libperl, libpython, and libtcl, respectively.  Previously, this was
checked in the makefiles, skipping the PL build with a warning if no
shared library was available.  Now this is checked in configure, with an
error if no shared library is available.

The previous situation arose because in the olden days, the configure
options --with-perl, --with-python, and --with-tcl controlled whether
frontend interfaces for those languages would be built.  The procedural
languages were added later, and shared libraries were often not
available in the beginning.  So it was decided skip the builds of the
procedural languages in those cases.  The frontend interfaces have since
been removed from the tree, and shared libraries are now available most
of the time, so that setup makes much less sense now.

Also, the new setup allows contrib modules and pgxs users to rely on the
respective PLs being available based on configure flags.

Make hstore_plperl's build more like plperl's

This involves moving perl's CORE library to the end of the include list,
and adding other compilation settings that plperl uses. This won't
completely fix the breakage currently being seen by gcc builds on
Windows, but it will let the build get further, and should be wholly
benign, if not beneficial, on *nix.

Mark views created from tables as replication identity 'nothing'

pg_dump turns tables into views using a method that was not setting
pg_class.relreplident properly.

Patch by Marko Tiikkaja

Backpatch through 9.4

Deparse named arguments to use the new => operator instead of :=

Tom Lane pointed out that this wasn't done, and asked whether that was
intentional. Subsequent discussion was in favor of making the change,
so here we go.

Allow FDWs and custom scan providers to replace joins with scans.

Foreign data wrappers can use this capability for so-called "join
pushdown"; that is, instead of executing two separate foreign scans
and then joining the results locally, they can generate a path which
performs the join on the remote server and then is scanned locally.
This commit does not extend postgres_fdw to take advantage of this
capability; it just provides the infrastructure.

Custom scan providers can use this in a similar way. Previously,
it was only possible for a custom scan provider to scan a single
relation. Now, it can scan an entire join tree, provided of course
that it knows how to produce the same results that the join would
have produced if executed normally.

KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.

Copy editing of the replication origins patch.

Michael Paquier and myself.

Fix unaligned memory access in xlog parsing due to replication origin patch.

ParseCommitRecord() accessed xl_xact_origin directly. But the chunks in
the commit record's data only have 4 byte alignment, whereas
xl_xact_origin's members require 8 byte alignment on some
platforms. Update comments to make not of that and copy the record to
stack local storage before reading.

With help from Stefan Kaltenbrunner in pinning down the buildfarm and
verifying the fix.

Fix pg_rewind regression failure after "fast promotion"

pg_rewind looks at the control file to determine the server's timeline. If
the standby performs a "fast promotion", the timeline ID in the control
file is not updated until the next checkpoint. The startup process requests
a checkpoint immediately after promotion, so this is unlikely to be an
issue in the real world, but the regression suite ran pg_rewind so quickly
after promotion that the checkpoint had not yet completed.

Reported by Stephen Frost

Fix up some loose ends for CURRENT_USER as RoleSpec

In commit 31eae6028eca4, some documents were not updated to show the new
capability; fix that. Also, the error message you get when CURRENT_USER
and SESSION_USER are used in a context that doesn't accept them could be
clearer about it being a problem only in those contexts; so add the
word "here".

Author: Kyotaro HORIGUCHI

His patch submission also included changes to GRANT/REVOKE, but those
seemed more controversial, so I left them out. We can reconsider these
changes later.

Create an infrastructure for parallel computation in PostgreSQL.

This does four basic things.  First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers.  Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes.  Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.

Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.

Fix pg_upgrade's multixact handling (again)

We need to create the pg_multixact/offsets file deleted by pg_upgrade
much earlier than we originally were: it was in TrimMultiXact(), which
runs after we exit recovery, but it actually needs to run earlier than
the first call to SetMultiXactIdLimit (before recovery), because that
routine already wants to read the first offset segment.

Per pg_upgrade trouble report from Jeff Janes.

While at it, silence a compiler warning about a pointless assert that an
unsigned variable was being tested non-negative. This was a signed
constant in Thomas Munro's patch which I changed to unsigned before
commit. Pointed out by Andres Freund.

Fix typo

Amit Langote

Fix parallel make risk with new check temp-install setup

The "check" target no longer needs to depend on "all", because it now
runs "install" directly, which in turn depends on "all". Doing both
will cause problems with parallel make, because two builds will run next
to each other.

Also remove the redirection of the temp-install output into a log file.
This was appropriate when this was done from within pg_regress, but now
it's just a regular make run, and especially with the above changes this
will now take the place of running the "all" target before the test
suites.

problem report by Jeff Janes, patch in part by Michael Paquier

Correct replication origin's use of UINT16_MAX to PG_UINT16_MAX.

We can't rely on UINT16_MAX being present, which is why we introduced
PG_UINT16_MAX...

Buildfarm animal bowerbird via Andrew Gierth.

Add <literal> markup, for consistency.

This file isn't entirely consistent about whether "on" and "off"
should be marked up with <literal>, but it doesn't make much sense
to be inconsistent within a single sentence.

Etsuro Fujita

Update .gitignore for new rmgr, changed paths.

Remove enum-related special cases for catalog scans.

When this code was written, catalog scans were normally performed using
SnapshotNow, making special handling necessary here. Now, however, all
catalog scans use MVCC snapshots, so we can change these cases to look
more like what we do for catalog scans elsewhere in the code.

Per discussion with Tom Lane and a reminder from Bruce Momjian.

Attempt to fix some compiler warnings.

Enable transforms tests for python 2 on MSVC builds

Currently regression tests for python 3 are disabled on MSVC, and these
tests fail with python 3, too, so we have some work to do to enable
both. Meanwhile, all the buildfarm hosts seem to be building with python
2 anyway, so this at least gets us some coverage.

Original patch from Michael Paquier, significantly modified by me.

Introduce replication progress tracking infrastructure.

When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
  e.g. to avoid loops in bi-directional replication setups

The solution to these problems, as implemented here, consist out of
three parts:

1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
   replication origin, how far replay has progressed in a efficient and
   crash safe manner.
3) The ability to filter out changes performed on the behest of a
   replication origin during logical decoding; this allows complex
   replication topologies. E.g. by filtering all replayed changes out.

Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated.  We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.

This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL.  Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.

For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.

Bumps both catversion and wal page magic.

Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
    20140923182422.GA15776@alap3.anarazel.de,
    20131114172632.GE7522@alap2.anarazel.de

psql: Improve tab completion for ALTER FOREIGN TABLE.

Etsuro Fujita

to_char(): have format 'OF' only show the leading negative sign

Previously both hours and minutes displayed as negative.

Report by David Pozsar

doc: recommend use of GUC server_version_num for version checks

Patch by Craig Ringer

pg_basebackup: canonicalize old and new tablespace paths

This avoids problems with double-slash-specified paths.

Patch by Ian Barwick

Warn about tablespace creation in PGDATA

Also add warning to pg_upgrade

Report by Josh Berkus

Fix another test for RELKIND_RELATION that should allow foreign tables now.

I thought I'd gone through all of these before, but a fresh review found
this one too. (Perhaps it would be better to just delete this test and
let the failure occur later, but for the moment I'll preserve the logic.)

The case that this was rejecting is like
CREATE FOREIGN TABLE ft (f1 int ...) ...;
CREATE TABLE c1 (UNIQUE(f1)) INHERITS(ft);

Fix ATSimpleRecursion() to allow recursion from a foreign table.

This is necessary in view of the changes to allow foreign tables to be
full members of inheritance hierarchies, but I (tgl) unaccountably missed
it in commit cb1ca4d800621dcae67ca6c799006de99fa4f0a5.

Noted by Amit Langote, patch by Etsuro Fujita

Code review for multixact bugfix

Reword messages, rename a confusingly named function.

Per Robert Haas.

Fix MSVC builds for contrib transforms modules.

With this patch the MSVC build and installation will work correctly with
the transforms. However the python transform tests for hstore and ltree
are still disabled pending some further adjustments.

Michael Paquier with some tweaks from me.

Protect against multixact members wraparound

Multixact member files are subject to early wraparound overflow and
removal: if the average multixact size is above a certain threshold (see
note below) the protections against offset overflow are not enough:
during multixact truncation at checkpoint time, some
pg_multixact/members files would be removed because the server considers
them to be old and not needed anymore.  This leads to loss of files that
are critical to interpret existing tuples's Xmax values.

To protect against this, since we don't have enough info in pg_control
and we can't modify it in old branches, we maintain shared memory state
about the oldest value that we need to keep; we use this during new
multixact creation to abort if an old still-needed file would get
overwritten.  This value is kept up to date by checkpoints, which makes
it not completely accurate but should be good enough.  We start emitting
warnings sometime earlier, so that the eventual multixact-shutdown
doesn't take DBAs completely by surprise (more precisely: once 20
members SLRU segments are remaining before shutdown.)

On troublesome average multixact size: The threshold size depends on the
multixact freeze parameters. The oldest age is related to the greater of
multixact_freeze_table_age and multixact_freeze_min_age: anything
older than that should be removed promptly by autovacuum.  If autovacuum
is keeping up with multixact freezing, the troublesome multixact average
size is
(2^32-1) / Max(freeze table age, freeze min age)
or around 28 members per multixact.  Having an average multixact size
larger than that will eventually cause new multixact data to overwrite
the data area for older multixacts.  (If autovacuum is not able to keep
up, or there are errors in vacuuming, the actual maximum is
multixact_freeeze_max_age instead, at which point multixact generation
is stopped completely.  The default value for this limit is 400 million,
which means that the multixact size that would cause trouble is about 10
members).

Initial bug report by Timothy Garnett, bug #12990
Backpatch to 9.3, where the problem was introduced.

Authors: Álvaro Herrera, Thomas Munro
Reviews: Thomas Munro, Amit Kapila, Robert Haas, Kevin Grittner

Use a fd opened for read/write when syncing slots during startup.

Some operating systems, including the reporter's windows, return EBADFD
or similar when fsync() is invoked on a O_RDONLY file descriptor.
Unfortunately RestoreSlotFromDisk() does exactly that; which causes
failures after restarts in at least some scenarios.

If you hit the bug the error message will be something like
ERROR: could not fsync file "pg_replslot/$name/state": Bad file descriptor

Simply use O_RDWR instead of O_RDONLY when opening the relevant file
descriptor to fix the bug. Unfortunately I have no way of verifying the
fix, but we've seen similar problems in the past.

This bug goes back to 9.4 where slots were introduced. Backpatch
accordingly.

Reported-By: Patrice Drolet
Bug: #13143:
Discussion: 20150424101006.2556.60897@wrigleys.postgresql.org

Improve qual pushdown for RLS and SB views

The original security barrier view implementation, on which RLS is
built, prevented all non-leakproof functions from being pushed down to
below the view, even when the function was not receiving any data from
the view. This optimization improves on that situation by, instead of
checking strictly for non-leakproof functions, it checks for Vars being
passed to non-leakproof functions and allows functions which do not
accept arguments or whose arguments are not from the current query level
(eg: constants can be particularly useful) to be pushed down.

As discussed, this does mean that a function which is pushed down might
gain some idea that there are rows meeting a certain criteria based on
the number of times the function is called, but this isn't a
particularly new issue and the documentation in rules.sgml already
addressed similar covert-channel risks. That documentation is updated
to reflect that non-leakproof functions may be pushed down now, if
they meet the above-described criteria.

Author: Dean Rasheed, with a bit of rework to make things clearer,
along with comment and documentation updates from me.

Fix vcbuild failures and chkpass dependency caused by 854adb8

Switching the Windows build scripts to use forward slashes instead of
backslashes has caused a couple of issues in VC builds:
- The file tree list was not correctly generated, build script
  generating vcproj file missing tree dependencies when listing items in
  Filter.
- VC builds do not accept file paths with forward slashes, perhaps it
  could be possible to use a Condition but it seems safer to simply
  enforce the file paths to use backslashes in the vcproj files.
- chkpass had an unneeded dependency with libpgport and libpgcommon to
  make build succeed but actually it is not necessary as crypt.c is
  already listed for this project and should be replaced with a fake name
  as it is a unique file.

Michael Paquier

Fix hstore_plperl regression tests on some platforms

On some platforms, plperl and plperlu cannot be loaded at the same
time. So split the test into two separate test files.

Also correct therefor to therefore.

Since both forms are arguably legal I wasn't sure about changing
this. But then Tom argued for 'therefore'...

Author: Dmitriy Olshevskiy
Discussion: 34789.1430067832@sss.pgh.pa.us

Fix various typos and grammar errors in comments.

Author: Dmitriy Olshevskiy
Discussion: 553D00A6.4090205@bk.ru

Fix possible division by zero in pg_xlogdump.

When displaying stats it was possible that a floating point division by
zero occured when no FPIs were issued for a type of record.

Author: Abhijit Menon-Sen
Discussion: 20150417091811.GA14008@toroid.org

Add transforms feature

This provides a mechanism for specifying conversions between SQL data
types and procedural languages. As examples, there are transforms
for hstore and ltree for PL/Perl and PL/Python.

reviews by Pavel Stěhule and Andres Freund

Fix typo in linux startup script.

Missed a "$" in what was meant to be a variable substitution. Careless
mistake in commit f23425fa950fec3aff458de117037c9caadbc35c.

Add comments warning against generalizing default_with_oids.

pg_dump has historically assumed that default_with_oids affects only plain
tables and not other relkinds. Conceivably we could make it apply to some
newly invented relkind if we did so from the get-go, but changing the
behavior for existing object types will break existing dump scripts.
Add code comments warning about this interaction.

Also, make sure that default_with_oids doesn't cause parse_utilcmd.c to
think that CREATE FOREIGN TABLE will create an OID column. I think this is
only a latent bug right now, since we don't allow UNIQUE/PKEY constraints
in CREATE FOREIGN TABLE, but it's better to be consistent and future-proof.

Try to unbreak some MSVC builds following forward slash change.

Michael Paquier.

Revert: Honor OID status of CREATE LIKE'd tables

Reverts d992f8a8961c09ec219373ffe2b5e6473febd065

Report by Tom Lane

Don't overwrite EXTRA_INSTALL

The temp-install target sets EXTRA_INSTALL to install the current
directory. But when doing so, it should append instead of overwrite,
otherwise settings of EXTRA_INSTALL from a makefile won't take effect.
This would cause the earthdistance test to fail when called directly,
because it would miss installing the cube module.

Prevent improper reordering of antijoins vs. outer joins.

An outer join appearing within the RHS of an antijoin can't commute with
the antijoin, but somehow I missed teaching make_outerjoininfo() about
that. In Teodor Sigaev's recent trouble report, this manifests as a
"could not find RelOptInfo for given relids" error within eqjoinsel();
but I think silently wrong query results are possible too, if the planner
misorders the joins and doesn't happen to trigger any internal consistency
checks. It's broken as far back as we had antijoins, so back-patch to all
supported branches.

Replace backslashes by forward slashes in MSVC build code

This makes it possible to run some stages of these build scripts on
non-Windows systems. That way, we can more easily test whether file
moves or makefile changes might break the MSVC build.

Peter Eisentraut and Michael Paquier

Fix file comment for test_rls_hooks.c

The file-level comment wasn't updated when it was copied from the shared
memory queue test module. Fixed.

Noted by Dean Rasheed.

Perform RLS WITH CHECK before constraints, etc

The RLS capability is built on top of the WITH CHECK OPTION
system which was added for auto-updatable views, however, unlike
WCOs on views (which are mandated by the SQL spec to not fire until
after all other constraints and checks are done), it makes much more
sense for RLS checks to happen earlier than constraint and uniqueness
checks.

This patch reworks the structure which holds the WCOs a bit to be
explicitly either VIEW or RLS checks and the RLS-related checks are
done prior to the constraint and uniqueness checks. This also allows
better error reporting as we are now reporting when a violation is due
to a WITH CHECK OPTION and when it's due to an RLS policy violation,
which was independently noted by Craig Ringer as being confusing.

The documentation is also updated to include a paragraph about when RLS
WITH CHECK handling is performed, as there have been a number of
questions regarding that and the documentation was previously silent on
the matter.

Author: Dean Rasheed, with some kabitzing and comment changes by me.

Remove obsolete -I options from ECPG library compilation.

The MSVC build system already omitted these.

Remove superfluous -DFRONTEND.

The majority practice is to add -DFRONTEND in directories building files
that are, at other times, built for the backend. Some directories
lacking that property added a noise -DFRONTEND in one build system.
Remove the excess flags, for consistency.

Build every ECPG library with -DFRONTEND.

Each of the libraries incorporates src/port files, which often check
FRONTEND. Build systems disagreed on whether to build libpgtypes this
way. Only libecpg incorporates files that rely on it today. Back-patch
to 9.0 (all supported versions) to forestall surprises.

Fix up .gitignore and cleanup actions in some src/test/ subdirectories.

examples/, locale/, and thread/ lacked .gitignore files and were also
not connected up to top-level "make clean" etc. This had escaped notice
because none of those directories are built in normal scenarios. Still,
they have working Makefiles, so if someone does a "make" in one of these
directories it would be good if (a) git doesn't bleat about the product
files and (b) cleaning up removes them.

This is a longstanding oversight, but since this behavior is probably
only of interest to developers, there seems no need for back-patching.

Michael Paquier and Tom Lane

Fix obsolete comment in set_rel_size().

The cross-reference to set_append_rel_pathlist() was obsoleted by
commit e2fa76d80ba571d4de8992de6386536867250474, which split what
had been set_rel_pathlist() and child routines into two sets of
functions. But I (tgl) evidently missed updating this comment.

Back-patch to 9.2 to avoid unnecessary divergence among branches.

Amit Langote

Add comments explaining how unique and exclusion constraints are enforced.