granicus.if.org Git - postgresql/log

Last-minute updates for release notes.

Revise description of CVE-2015-3166, in line with scaled-back patch.
Change release date.

Security: CVE-2015-3166

Revert error-throwing wrappers for the printf family of functions.

This reverts commit 16304a013432931e61e623c8d85e9fe24709d9ba, except
for its changes in src/port/snprintf.c; as well as commit
cac18a76bb6b08f1ecc2a85e46c9d2ab82dd9d23 which is no longer needed.

Fujii Masao reported that the previous commit caused failures in psql on
OS X, since if one exits the pager program early while viewing a query
result, psql sees an EPIPE error from fprintf --- and the wrapper function
thought that was reason to panic.  (It's a bit surprising that the same
does not happen on Linux.)  Further discussion among the security list
concluded that the risk of other such failures was far too great, and
that the one-size-fits-all approach to error handling embodied in the
previous patch is unlikely to be workable.

This leaves us again exposed to the possibility of the type of failure
envisioned in CVE-2015-3166.  However, that failure mode is strictly
hypothetical at this point: there is no concrete reason to believe that
an attacker could trigger information disclosure through the supposed
mechanism.  In the first place, the attack surface is fairly limited,
since so much of what the backend does with format strings goes through
stringinfo.c or psprintf(), and those already had adequate defenses.
In the second place, even granting that an unprivileged attacker could
control the occurrence of ENOMEM with some precision, it's a stretch to
believe that he could induce it just where the target buffer contains some
valuable information.  So we concluded that the risk of non-hypothetical
problems induced by the patch greatly outweighs the security risks.
We will therefore revert, and instead undertake closer analysis to
identify specific calls that may need hardening, rather than attempt a
universal solution.

We have kept the portion of the previous patch that improved snprintf.c's
handling of errors when it calls the platform's sprintf().  That seems to
be an unalloyed improvement.

Security: CVE-2015-3166

Various fixes around ON CONFLICT for rule deparsing.

Neither the deparsing of the new alias for INSERT's target table, nor of
the inference clause was supported. Also fixup a typo in an error
message.

Add regression tests to test those code paths.

Author: Peter Geoghegan

Refactor ON CONFLICT index inference parse tree representation.

Defer lookup of opfamily and input type of a of a user specified opclass
until the optimizer selects among available unique indexes; and store
the opclass in the parse analyzed tree instead. The primary reason for
doing this is that for rule deparsing it's easier to use the opclass
than the previous representation.

While at it also rename a variable in the inference code to better fit
it's purpose.

This is separate from the actual fixes for deparsing to make review
easier.

Fix off-by-one error in Assertion.

The point of the assertion is to ensure that the arrays allocated in stack
are large enough, but the check was one item short.

This won't matter in practice because MaxIndexTuplesPerPage is an
overestimate, so you can't have that many items on a page in reality.
But let's be tidy.

Spotted by Anastasia Lubennikova. Backpatch to all supported versions, like
the patch that added the assertion.

Avoid collation dependence in indexes of system catalogs.

No index in template0 should have collation-dependent ordering, especially
not indexes on shared catalogs.  For most textual columns we avoid this
issue by using type "name" (which sorts per strcmp()).  However there are a
few indexed columns that we'd prefer to use "text" for, and for that, the
default opclass text_ops is unsafe.  Fortunately, text_pattern_ops is safe
(it sorts per memcmp()), and it has no real functional disadvantage for our
purposes.  So change the indexes on pg_seclabel.provider and
pg_shseclabel.provider to use text_pattern_ops.

In passing, also mark pg_replication_origin.roname as using
text_pattern_ops --- for some reason it was labeled varchar_pattern_ops
which is just wrong, even though it accidentally worked.

Add regression test queries to catch future errors of these kinds.

We still can't do anything about the misdeclared pg_seclabel and
pg_shseclabel indexes in back branches :-(

Revert "Change pg_seclabel.provider and pg_shseclabel.provider to type "name"."

This reverts commit b82a7be603f1811a0a707b53c62de6d5d9431740. There
is a better (less invasive) way to fix it, which I will commit next.

Message string improvements

Fix parse tree of DROP TRANSFORM and COMMENT ON TRANSFORM

The plain C string language name needs to be wrapped in makeString() so
that the parse tree is copyable. This is detectable by
-DCOPY_PARSE_PLAN_TREES. Add a test case for the COMMENT case.

Also make the quoting in the error messages more consistent.

discovered by Tom Lane

Change pg_seclabel.provider and pg_shseclabel.provider to type "name".

These were "text", but that's a bad idea because it has collation-dependent
ordering.  No index in template0 should have collation-dependent ordering,
especially not indexes on shared catalogs.  There was general agreement
that provider names don't need to be longer than other identifiers, so we
can fix this at a small waste of table space by changing from text to name.

There's no way to fix the problem in the back branches, but we can hope
that security labels don't yet have widespread-enough usage to make it
urgent to fix.

There needs to be a regression sanity test to prevent us from making this
same mistake again; but before putting that in, we'll need to get rid of
similar brain fade in the recently-added pg_replication_origin catalog.

Note: for lack of a suitable testing environment, I've not really exercised
this change.  I trust the buildfarm will show up any mistakes.

Attach ON CONFLICT SET ... WHERE to the correct planstate.

The previous coding was a leftover from attempting to hang all the on
conflict logic onto modify table's child nodes. It appears to not have
actually caused problems except for explain.

Add test exercising the broken and some other code paths.

Author: Peter Geoghegan and Andres Freund

Put back a backwards-compatible version of sampling support functions.

Commit 83e176ec18d2a91dbea1d0d1bd94c38dc47cd77c removed the longstanding
support functions for block sampling without any consideration of the
impact this would have on third-party FDWs. The new API is not notably
more functional for FDWs than the old, so forcing them to change doesn't
seem like a good thing. We can provide the old API as a wrapper (more
or less) around the new one for a minimal amount of extra code.

Recognize "REGRESS_OPTS += ..." syntax in MSVC build scripts.

Necessitated by commit b14cf229f4bd7238be2e31d873dc5dd241d3871e.
Per buildfarm.

Fix error message in pre_sync_fname.

The old one didn't include %m anywhere, and required extra
translation.

Report by Peter Eisentraut. Fix by me. Review by Tom Lane.

Last-minute updates for release notes.

Add entries for security issues.

Security: CVE-2015-3165 through CVE-2015-3167

pgcrypto: Report errant decryption as "Wrong key or corrupt data".

This has been the predominant outcome.  When the output of decrypting
with a wrong key coincidentally resembled an OpenPGP packet header,
pgcrypto could instead report "Corrupt data", "Not text data" or
"Unsupported compression algorithm".  The distinct "Corrupt data"
message added no value.  The latter two error messages misled when the
decrypted payload also exhibited fundamental integrity problems.  Worse,
error message variance in other systems has enabled cryptologic attacks;
see RFC 4880 section "14. Security Considerations".  Whether these
pgcrypto behaviors are likewise exploitable is unknown.

In passing, document that pgcrypto does not resist side-channel attacks.
Back-patch to 9.0 (all supported versions).

Security: CVE-2015-3167

Check return values of sensitive system library calls.

PostgreSQL already checked the vast majority of these, missing this
handful that nearly cannot fail.  If putenv() failed with ENOMEM in
pg_GSS_recvauth(), authentication would proceed with the wrong keytab
file.  If strftime() returned zero in cache_locale_time(), using the
unspecified buffer contents could lead to information exposure or a
crash.  Back-patch to 9.0 (all supported versions).

Other unchecked calls to these functions, especially those in frontend
code, pose negligible security concern.  This patch does not address
them.  Nonetheless, it is always better to check return values whose
specification provides for indicating an error.

In passing, fix an off-by-one error in strftime_win32()'s invocation of
WideCharToMultiByte().  Upon retrieving a value of exactly MAX_L10N_DATA
bytes, strftime_win32() would overrun the caller's buffer by one byte.
MAX_L10N_DATA is chosen to exceed the length of every possible value, so
the vulnerable scenario probably does not arise.

Security: CVE-2015-3166

Add error-throwing wrappers for the printf family of functions.

All known standard library implementations of these functions can fail
with ENOMEM.  A caller neglecting to check for failure would experience
missing output, information exposure, or a crash.  Check return values
within wrappers and code, currently just snprintf.c, that bypasses the
wrappers.  The wrappers do not return after an error, so their callers
need not check.  Back-patch to 9.0 (all supported versions).

Popular free software standard library implementations do take pains to
bypass malloc() in simple cases, but they risk ENOMEM for floating point
numbers, positional arguments, large field widths, and large precisions.
No specification demands such caution, so this commit regards every call
to a printf family function as a potential threat.

Injecting the wrappers implicitly is a compromise between patch scope
and design goals.  I would prefer to edit each call site to name a
wrapper explicitly.  libpq and the ECPG libraries would, ideally, convey
errors to the caller rather than abort().  All that would be painfully
invasive for a back-patched security fix, hence this compromise.

Security: CVE-2015-3166

Permit use of vsprintf() in PostgreSQL code.

The next commit needs it. Back-patch to 9.0 (all supported versions).

Prevent a double free by not reentering be_tls_close().

Reentering this function with the right timing caused a double free,
typically crashing the backend.  By synchronizing a disconnection with
the authentication timeout, an unauthenticated attacker could achieve
this somewhat consistently.  Call be_tls_close() solely from within
proc_exit_prepare().  Back-patch to 9.0 (all supported versions).

Benkocs Norbert Attila

Security: CVE-2015-3165

Fix typo in comment.

Jim Nasby

Put back stats-collector restarting code, removed accidentally.

Removed that code snippet accidentally in the archive_mode='always' patch.

Also, use varname-tags for archive_command in the docs.

Fujii Masao

Don't classify REINDEX command as DDL in the pg_audit doc.

The commit a936743 changed the class of REINDEX but forgot to update the doc.

Add new files to nls.mk

Fix failure to copy IndexScan.indexorderbyops in copyfuncs.c.

This oversight results in a crash at executor startup if the plan has
been copied.  outfuncs.c was missed as well.

While we could probably have taught both those files to cope with the
originally chosen representation of an Oid array, it would have been
painful, not least because there'd be no easy way to verify the array
length.  An Oid List is far easier to work with.  And AFAICS, there is
no particular notational benefit to using an array rather than a list
in the existing parts of the patch either.  So just change it to a list.

Error in commit 35fcb1b3d038a501f3f4c87c05630095abaaadab, which is new,
so no need for back-patch.

Use += not = to set makefile variables after including base makefiles.

The previous coding in hstore_plpython and ltree_plpython wiped out any
values set by the base makefiles. This at least had the effect of running
the tests in "regression" not "contrib_regression" as expected. These
being pretty new modules, there might be other bad effects we'd not
noticed yet.

Release notes for 9.4.2, 9.3.7, 9.2.11, 9.1.16, 9.0.20.

Fix wording error caused by recent typo fixes

It wasn't just a typo, but bad wording. This should make it
more clear. Pointed out by Tom Lane.

pg_audit Makefile, REINDEX changes

Clean up the Makefile, per Michael Paquier.

Classify REINDEX as we do in core, use '1.0' for the version, per Fujii.

Fix typos in comments

Dmitriy Olshevskiy

Minor docs fixes for pg_audit

Peter Geoghegan

hstore_plpython: Fix regression tests under Python 3

Fix whitespace

First-draft release notes for 9.4.2 et al.

As usual, the release notes for older branches will be made by cutting
these down, but put them up for community review first.

pg_upgrade: no need to check for matching float8_pass_by_value

Report by Noah Misch

Fix docs typo

I don't think "respectfully" is what was meant here ...

More portability fixing for bipartite_match.c.

<float.h> is required for isinf() on some platforms. Per buildfarm.

pg_upgrade: force timeline 1 in the new cluster

Previously, this prevented promoted standby servers from being upgraded
because of a missing WAL history file. (Timeline 1 doesn't need a
history file, and we don't copy WAL files anyway.)

Report by Christian Echerer(?), Alexey Klyukin

Backpatch through 9.0

pg_upgrade: only allow template0 to be non-connectable

This patch causes pg_upgrade to error out during its check phase if:

(1) template0 is marked connectable
or
(2) any other database is marked non-connectable

This is done because, in the first case, pg_upgrade would fail because
the pg_dumpall --globals restore would fail, and in the second case, the
database would not be restored, leading to data loss.

Report by Matt Landry (1), Stephen Frost (2)

Backpatch through 9.0

Avoid direct use of INFINITY.

It's not very portable. Per buildfarm.

Add docs for tablesample system_time()

Support GROUPING SETS, CUBE and ROLLUP.

This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.

This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.

The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.

The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.

Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage.  The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting.  The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.

A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.

Needs a catversion bump because stored rules may change.

Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
    Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com

Add docs for tablesample system_rows()

Update time zone data files to tzdata release 2015d.

DST law changes in Egypt, Mongolia, Palestine.
Historical corrections for Canada and Chile.
Revised zone abbreviation for America/Adak (HST/HDT not HAST/HADT).

Add BRIN infrastructure for "inclusion" opclasses

This lets BRIN be used with R-Tree-like indexing strategies.

Also provided are operator classes for range types, box and inet/cidr.
The infrastructure provided here should be sufficient to create operator
classes for similar datatypes; for instance, opclasses for PostGIS
geometries should be doable, though we didn't try to implement one.

(A box/point opclass was also submitted, but we ripped it out before
commit because the handling of floating point comparisons in existing
code is inconsistent and would generate corrupt indexes.)

Author: Emre Hasegeli. Cosmetic changes by me
Review: Andreas Karlsson

Improve test for CONVERT() with GB18030 <-> UTF8.

Add a bit of coverage of high code points.

Arjen Nienhuis

Move strategy numbers to include/access/stratnum.h

For upcoming BRIN opclasses, it's convenient to have strategy numbers
defined in a single place.  Since there's nothing appropriate, create
it.  The StrategyNumber typedef now lives there, as well as existing
strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from
gist.h).  skey.h is forced to include stratnum.h because of the
StrategyNumber typedef, but gist.h is not; extensions that currently
rely on gist.h for rtree strategy numbers might need to add a new

A few .c files can stop including skey.h and/or gist.h, which is a nice
side benefit.

Per discussion:
https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org

Authored by Emre Hasegeli and Álvaro.

(It's not clear to me why bootscanner.l has any #include lines at all.)

SQLStandard feature T613 Sampling now Supported

Fix uninitialized variable.

Per compiler warnings.

Tablesample method API docs

Petr Jelinek

Add to contrib/Makefile

contrib/tsm_system_time

contrib/tsm_system_rows

TABLESAMPLE system_time(limit)

Contrib module implementing a tablesample method
that allows you to limit the sample by a hard time
limit.

Petr Jelinek

Reviewed by Michael Paquier, Amit Kapila and
Simon Riggs

TABLESAMPLE system_rows(limit)

Contrib module implementing a tablesample method
that allows you to limit the sample by a hard row
limit.

Petr Jelinek

Reviewed by Michael Paquier, Amit Kapila and
Simon Riggs

Extend GB18030 encoding conversion to cover full Unicode range.

Our previous code for GB18030 <-> UTF8 conversion only covered Unicode code
points up to U+FFFF, but the actual spec defines conversions for all code
points up to U+10FFFF.  That would be rather impractical as a lookup table,
but fortunately there is a simple algorithmic conversion between the
additional code points and the equivalent GB18030 byte patterns.  Make use
of the just-added callback facility in LocalToUtf/UtfToLocal to perform the
additional conversions.

Having created the infrastructure to do that, we can use the same code to
map certain linearly-related subranges of the Unicode space below U+FFFF,
allowing removal of the corresponding lookup table entries.  This more
than halves the lookup table size, which is a substantial savings;
utf8_and_gb18030.so drops from nearly a megabyte to about half that.

In support of doing that, replace ISO10646-GB18030.TXT with the data file
gb-18030-2000.xml (retrieved from
http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/ )
in which these subranges have been deleted from the simple lookup entries.

Per bug #12845 from Arjen Nienhuis.  The conversion code added here is
based on his proposed patch, though I whacked it around rather heavily.

doc: CREATE FOREIGN TABLE now allows CHECK ( ... ) NO INHERIT

Etsuro Fujita

TABLESAMPLE, SQL Standard and extensible

Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.

Petr Jelinek

Reviewed by Michael Paquier and Simon Riggs

Silence another create_index regression test failure.

More platform differences in the less-significant digits in output.

Per buildfarm member rover_firefly, still.

Fix outdated src/test/mb/ tests, and add a GB18030 test.

The expected-output files for these tests were broken by the recent
addition of a warning for hash indexes. Update them.

Also add a test case for GB18030 encoding, similar to the other ones.
This is a pretty weak test, but it's better than nothing.

Fix docs build. Oops.

Add archive_mode='always' option.

In 'always' mode, the standby independently archives all files it receives
from the primary.

Original patch by Fujii Masao, docs and review by me.

docs: consistently uppercase index method and add spacing

Consistently uppercase index method names, e.g. GIN, and add space after
the index method name and the parentheses enclosing the column names.

Silence create_index regression test failure.

The expected output contained some floating point values which might get
rounded slightly differently on different platforms. The exact output isn't
very interesting in this test, so just round it.

Per buildfarm member rover_firefly.

Fix datatype confusion with the new lossy GiST distance functions.

We can only support a lossy distance function when the distance function's
datatype is comparable with the original ordering operator's datatype.
The distance function always returns a float8, so we are limited to float8,
and float4 (by a hard-coded cast of the float8 to float4).

In light of this limitation, it seems like a good idea to have a separate
'recheck' flag for the ORDER BY expressions, so that if you have a non-lossy
distance function, it still works with lossy quals. There are cases like
that with the build-in or contrib opclasses, but it's plausible.

There was a hidden assumption that the ORDER BY values returned by GiST
match the original ordering operator's return type, but there are plenty
of examples where that's not true, e.g. in btree_gist and pg_trgm. As long
as the distance function is not lossy, we can tolerate that and just not
return the distance to the executor (or rather, always return NULL). The
executor doesn't need the distances if there are no lossy results.

There was another little bug: the recheck variable was not initialized
before calling the distance function. That revealed the bigger issue,
as the executor tried to reorder tuples that didn't need reordering, and
that failed because of the datatype mismatch.

Fix insufficiently-paranoid GB18030 encoding verifier.

The previous coding effectively only verified that the second byte of a
multibyte character was in the expected range; moreover, it wasn't careful
to make sure that the second byte even exists in the buffer before touching
it.  The latter seems unlikely to cause any real problems in the field
(in particular, it could never be a problem with null-terminated input),
but it's still a bug.

Since GB18030 is not a supported backend encoding, the only thing we'd
really be doing with GB18030 text is converting it to UTF8 in LocalToUtf,
which would fail anyway on any invalid character for lack of a match in
its lookup table.  So the only user-visible consequence of this change
should be that you'll get "invalid byte sequence for encoding" rather than
"character has no equivalent" for malformed GB18030 input.  However,
impending changes to the GB18030 conversion code will require these tighter
up-front checks to avoid producing bogus results.

Remove useless pg_audit.conf

No need to have pg_audit.conf any longer since the regression tests are
just loading the module at the start of each session (to simulate being
in shared_preload_libraries, which isn't something we can actually make
happen on the buildfarm itself, it seems).

Pointed out by Tom

Support --verbose option in reindexdb.

Sawada Masahiko, reviewed by Fabrízio Mello

Allow GiST distance function to return merely a lower-bound.

The distance function can now set *recheck = false, like index quals. The
executor will then re-check the ORDER BY expressions, and use a queue to
reorder the results on the fly.

This makes it possible to do kNN-searches on polygons and circles, which
don't store the exact value in the index, but just a bounding box.

Alexander Korotkov and me

Support VERBOSE option in REINDEX command.

When this option is specified, a progress report is printed as each index
is reindexed.

Per discussion, we agreed on the following syntax for the extensibility of
the options.

REINDEX (flexible options) { INDEX | ... } name

Sawada Masahiko.
Reviewed by Robert Haas, Fabrízio Mello, Alvaro Herrera, Kyotaro Horiguchi,
Jim Nasby and me.

Discussion: CAD21AoA0pK3YcOZAFzMae+2fcc3oGp5zoRggDyMNg5zoaWDhdQ@mail.gmail.com

Honor traditional SGML NAMELEN limit.

We've conformed to this limit in the past, so might as well continue to.

Aaron Swenson

Teach UtfToLocal/LocalToUtf to support algorithmic encoding conversions.

Until now, these functions have only supported encoding conversions using
lookup tables, which is fine as long as there's not too many code points
to convert.  However, GB18030 expects all 1.1 million Unicode code points
to be convertible, which would require a ridiculously-sized lookup table.
Fortunately, a large fraction of those conversions can be expressed through
arithmetic, ie the conversions are one-to-one in certain defined ranges.
To support that, provide a callback function that is used after consulting
the lookup tables.  (This patch doesn't actually change anything about the
GB18030 conversion behavior, just provide infrastructure for fixing it.)

Since this requires changing the APIs of UtfToLocal/LocalToUtf anyway,
take the opportunity to rearrange their argument lists into what seems
to me a saner order.  And beautify the call sites by using lengthof()
instead of error-prone sizeof() arithmetic.

In passing, also mark all the lookup tables used by these calls "const".
This moves an impressive amount of stuff into the text segment, at least
on my machine, and is safer anyhow.

Separate block sampling functions

Refactoring ahead of tablesample patch

Requested and reviewed by Michael Paquier

Petr Jelinek

pg_upgrade: make controldata checks more consistent

Also add missing float8_pass_by_value check.

Add pg_settings.pending_restart column

with input from David G. Johnston, Robert Haas, Michael Paquier

doc: list bigint as mapping to int8 and int64

Report by Paul Jungwirth

Make repeated 'make installcheck' runs work

In pg_audit, set client_min_messages up to warning, then reset the role
attributes, to completely reset the session while not making the
regression tests depend on being run by any particular user.

Improve pg_audit regression tests

Instead of creating a new superuser role, extract out what the current
user is and use that user instead. Further, clean up and drop all
objects created by the regression test.

Pointed out by Tom.

Docs: fix erroneous claim about max byte length of GB18030.

This encoding has characters up to 4 bytes long, not 2.

Fix portability issue in pg_audit.

"%ld" is not a portable way to print int64's. This may explain the
buildfarm crashes we're seeing --- it seems to make dromedary happy,
at least.

Suppress uninitialized-variable warning.

Support "expanded" objects, particularly arrays, for better performance.

This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation.  Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype.  There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.

The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.

In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables.  We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work.  (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)

Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad.  In any case most applications should see a net win.

Tom Lane, reviewed by Andres Freund

Further fixes for the buildfarm for pg_audit

Also, use a function to load the extension ahead of all other calls,
simulating load from shared_libraries_preload, to make sure the
hooks are in place before logging start.

Further fixes for the buildfarm for pg_audit

The database built by the buildfarm is specific to the extension, use
\connect - instead.

Fix buildfarm with regard to pg_audit

Remove the check that pg_audit be installed by
shared_preload_libraries as that's not going to work when running the
regressions tests in the buildfarm. That check was primairly a nice to
have and isn't required anyway.

Add pg_audit, an auditing extension

This extension provides detailed logging classes, ability to control
logging at a per-object level, and includes fully-qualified object
names for logged statements (DML and DDL) in independent fields of the
log output.

Authors: Ian Barwick, Abhijit Menon-Sen, David Steele
Reviews by: Robert Haas, Tatsuo Ishii, Sawada Masahiko, Fujii Masao,
Simon Riggs

Discussion with: Josh Berkus, Jaime Casanova, Peter Eisentraut,
David Fetter, Yeb Havinga, Alvaro Herrera, Petr Jelinek, Tom Lane,
MauMau, Bruce Momjian, Jim Nasby, Michael Paquier,
Fabrízio de Royes Mello, Neil Tiffin

Fix distclean/maintainer-clean targets to remove top-level tmp_install dir.

The top-level makefile removes tmp_install in its "clean" target, but the
distclean and maintainer-clean targets overlooked that (and they don't
simply invoke clean, because that would result in an extra tree traversal).

While at it, let's just make sure that removing GNUmakefile itself is the
very last step of the recipe.

Fix comment.

Commit 78efd5c1edb59017f06ef96773e64e6539bfbc86 overlooked this.

Report by Peter Geoghegan.

Extend abbreviated key infrastructure to datum tuplesorts.

Andrew Gierth, reviewed by Peter Geoghegan and by me.

Fix postgres_fdw to return the right ctid value in EvalPlanQual cases.

If a postgres_fdw foreign table is a non-locked source relation in an
UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, and the query selects its
ctid column, the wrong value would be returned if an EvalPlanQual
recheck occurred.  This happened because the foreign table's result row
was copied via the ROW_MARK_COPY code path, and EvalPlanQualFetchRowMarks
just unconditionally set the reconstructed tuple's t_self to "invalid".

To fix that, we can have EvalPlanQualFetchRowMarks copy the composite
datum's t_ctid field, and be sure to initialize that along with t_self
when postgres_fdw constructs a tuple to return.

If we just did that much then EvalPlanQualFetchRowMarks would start
returning "(0,0)" as ctid for all other ROW_MARK_COPY cases, which perhaps
does not matter much, but then again maybe it might.  The cause of that is
that heap_form_tuple, which is the ultimate source of all composite datums,
simply leaves t_ctid as zeroes in newly constructed tuples.  That seems
like a bad idea on general principles: a field that's really not been
initialized shouldn't appear to have a valid value.  So let's eat the
trivial additional overhead of doing "ItemPointerSetInvalid(&(td->t_ctid))"
in heap_form_tuple.

This closes out our handling of Etsuro Fujita's report that tableoid and
ctid weren't correctly set in postgres_fdw EvalPlanQual cases.  Along the
way we did a great deal of work to improve FDWs' ability to control row
locking behavior; which was not wasted effort by any means, but it didn't
end up being a fix for this problem because that feature would be too
expensive for postgres_fdw to use all the time.

Although the fix for the tableoid misbehavior was back-patched, I'm
hesitant to do so here; it seems far less likely that people would care
about remote ctid than tableoid, and even such a minor behavioral change
as this in heap_form_tuple is perhaps best not back-patched.  So commit
to HEAD only, at least for the moment.

Etsuro Fujita, with some adjustments by me

Fix jsonb replace and delete on scalars and empty structures

These operations now error out if attempted on scalars, and simply
return the input if attempted on empty arrays or objects. Along the way
we remove the unnecessary cloning of the input when it's known to be
unchanged. Regression tests covering these cases are added.

Remove useless assertion.

Here, snapshot->xcnt is an unsigned type, so it will always be
non-negative.

Add pgstattuple_approx() to the pgstattuple extension.

The new function allows to estimate bloat and other table level statics
in a faster, but approximate, way. It does so by using information from
the free space map for pages marked as all visible in the visibility
map. The rest of the table is actually read and free space/bloat is
measured accurately. In many cases that allows to get bloat information
much quicker, causing less IO.

Author: Abhijit Menon-Sen
Reviewed-By: Andres Freund, Amit Kapila and Tomas Vondra
Discussion: 20140402214144.GA28681@kea.toroid.org

PL/Python: Remove procedure cache invalidation

This was added to react to changes in the pg_transform catalog, but
building with CLOBBER_CACHE_ALWAYS showed that PL/Python was not
prepared for having its procedure cache cleared. Since this is a
marginal use case, and we don't do this for other catalogs anyway, we
can postpone this to another day.

Fix ON CONFLICT bugs that manifest when used in rules.

Specifically the tlist and rti of the pseudo "excluded" relation weren't
properly treated by expression_tree_walker, which lead to errors when
excluded was referenced inside a rule because the varnos where not
properly adjusted. Similar omissions in OffsetVarNodes and
expression_tree_mutator had less impact, but should obviously be fixed
nonetheless.

A couple tests of for ON CONFLICT UPDATE into INSERT rule bearing
relations have been added.

In passing I updated a couple comments.

Fix some errors from jsonb functions patch.

The catalog version should have been bumped, and the alternative
regression result file was not up to date with the name of jsonb_pretty.

Additional functions and operators for jsonb

jsonb_pretty(jsonb) produces nicely indented json output.
jsonb || jsonb concatenates two jsonb values.
jsonb - text removes a key and its associated value from the json
jsonb - int removes the designated array element
jsonb - text[] removes a key and associated value or array element at
the designated path
jsonb_replace(jsonb,text[],jsonb) replaces the array element designated
by the path or the value associated with the key designated by the path
with the given value.

Original work by Dmitry Dolgov, adapted and reworked for PostgreSQL core
by Andrew Dunstan, reviewed and tidied up by Petr Jelinek.

Add support for doing late row locking in FDWs.

Previously, FDWs could only do "early row locking", that is lock a row as
soon as it's fetched, even though local restriction/join conditions might
discard the row later.  This patch adds callbacks that allow FDWs to do
late locking in the same way that it's done for regular tables.

To make use of this feature, an FDW must support the "ctid" column as a
unique row identifier.  Currently, since ctid has to be of type TID,
the feature is of limited use, though in principle it could be used by
postgres_fdw.  We may eventually allow FDWs to specify another data type
for ctid, which would make it possible for more FDWs to use this feature.

This commit does not modify postgres_fdw to use late locking.  We've
tested some prototype code for that, but it's not in committable shape,
and besides it's quite unclear whether it actually makes sense to do late
locking against a remote server.  The extra round trips required are likely
to outweigh any benefit from improved concurrency.

Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me

pgbench: Don't fail during startup

In pgbench, report, but ignore, any errors returned when attempting to
vacuum/truncate the default tables during startup. If the tables are
needed, we'll error out soon enough anyway.

Per discussion with Tatsuo, David Rowley, Jim Nasby, Robert, Andres,
Fujii, Fabrízio de Royes Mello, Tomas Vondra, Michael Paquier, Peter,
based on a suggestion from Jeff Janes, patch from Robert, additional
message wording from Tom.

pg_basebackup -F t now succeeds with a long symlink target