granicus.if.org Git - postgresql/log

Improve estimate of distinct values in estimate_num_groups().

When adjusting the estimate for the number of distinct values from a
rel in a grouped query to take into account the selectivity of the
rel's restrictions, use a formula that is less likely to produce
under-estimates.

The old formula simply multiplied the number of distinct values in the
rel by the restriction selectivity, which would be correct if the
restrictions were fully correlated with the grouping expressions, but
can produce significant under-estimates in cases where they are not
well correlated.

The new formula is based on the random selection probability, and so
assumes that the restrictions are not correlated with the grouping
expressions. This is guaranteed to produce larger estimates, and of
course risks over-estimating in cases where the restrictions are
correlated, but that has less severe consequences than
under-estimating, which might lead to a HashAgg that consumes an
excessive amount of memory.

This could possibly be improved upon in the future by identifying
correlated restrictions and using a hybrid of the old and new
formulae.

Author: Tomas Vondra, with some hacking be me
Reviewed-by: Mark Dilger, Alexander Korotkov, Dean Rasheed and Tom Lane
Discussion: http://www.postgresql.org/message-id/flat/56CD0381.5060502@2ndquadrant.com

Avoid archiving XLOG_RUNNING_XACTS on idle server

If archive_timeout > 0 we should avoid logging XLOG_RUNNING_XACTS if idle.

Bug 13685 reported by Laurence Rowe, investigated in detail by Michael Paquier,
though this is not his proposed fix.
20151016203031.3019.72930@wrigleys.postgresql.org

Simple non-invasive patch to allow later backpatch to 9.4 and 9.5

Clean up dubious code in contrib/seg.

The restore() function assumed that the result of sprintf() with %e format
would necessarily contain an 'e', which is false: what if the supplied
number is an infinity or NaN?  If that did happen, we'd get a
null-pointer-dereference core dump.  The case appears impossible currently,
because seg_in() does not accept such values, and there are no seg-creating
functions that would create one.  But it seems unwise to rely on it never
happening in future.

Quite aside from that, the code was pretty ugly: it relied on modifying a
static format string when it could use a "*" precision argument, and it
used strtok() entirely gratuitously, and it stripped off trailing spaces
by hand instead of just not asking for them to begin with.

Coverity noticed the potential null pointer dereference (though I wonder
why it didn't complain years ago, since this code is ancient).

Since this is just code cleanup and forestalling a hypothetical future
bug, there seems no need for back-patching.

Fix contrib/bloom to not fail under CLOBBER_CACHE_ALWAYS.

The code was supposing that rd_amcache wouldn't disappear from under it
during a scan; which is wrong. Copy the data out of the relcache rather
than trying to reference it there.

Clean up some stuff in new contrib/bloom module.

Coverity complained about implicit sign-extension in the
BloomPageGetFreeSpace macro, probably because sizeOfBloomTuple isn't wide
enough for size calculations.  No overflow is really possible as long as
maxoff and sizeOfBloomTuple are small enough to represent a realistic
situation, but it seems like a good idea to declare sizeOfBloomTuple as
Size not int32.

Add missing check on BloomPageAddItem() result, again from Coverity.

Avoid core dump due to not allocating so->sign array when
scan->numberOfKeys is zero.  Also thanks to Coverity.

Use FLEXIBLE_ARRAY_MEMBER rather than declaring an array as size 1
when it isn't necessarily.

Very minor beautification of related code.

Unfortunately, none of the Coverity-detected mistakes look like they
could account for the remaining buildfarm unhappiness with this
module.  It's barely possible that the FLEXIBLE_ARRAY_MEMBER mistake
does account for that, if it's enabling bogus compiler optimizations;
but I'm not terribly optimistic.  We probably still have bugs to
find here.

Avoid pin scan for replay of XLOG_BTREE_VACUUM in all cases

Replay of XLOG_BTREE_VACUUM during Hot Standby was previously thought to require
complex interlocking that matched the requirements on the master. This required
an O(N) operation that became a significant problem with large indexes, causing
replication delays of seconds or in some cases minutes while the
XLOG_BTREE_VACUUM was replayed.

This commit skips the pin scan that was previously required, by observing in
detail when and how it is safe to do so, with full documentation. The pin
scan is skipped only in replay; the VACUUM code path on master is not
touched here and WAL is identical.

The current commit applies in all cases, effectively replacing commit
687f2cd7a0150647794efe432ae0397cb41b60ff.

Add psql \errverbose command to see last server error at full verbosity.

Often, upon getting an unexpected error in psql, one's first wish is that
the verbosity setting had been higher; for example, to be able to see the
schema-name field or the server code location info.  Up to now the only way
has been to adjust the VERBOSITY variable and repeat the failing query.
That's a pain, and it doesn't work if the error isn't reproducible.

This commit adds a psql feature that redisplays the most recent server
error at full verbosity, without needing to make any variable changes or
re-execute the failed command.  We just need to hang onto the latest error
PGresult in case the user executes \errverbose, and then apply libpq's
new PQresultVerboseErrorMessage() function to it.  This will consume
some trivial amount of psql memory, but otherwise the cost when the
feature isn't used should be negligible.

Alex Shulgin, reviewed by Daniel Vérité, some improvements by me

Add libpq support for recreating an error message with different verbosity.

Often, upon getting an unexpected error in psql, one's first wish is that
the verbosity setting had been higher; for example, to be able to see the
schema-name field or the server code location info.  Up to now the only way
has been to adjust the VERBOSITY variable and repeat the failing query.
That's a pain, and it doesn't work if the error isn't reproducible.

This commit adds support in libpq for regenerating the error message for
an existing error PGresult at any desired verbosity level.  This is almost
just a matter of refactoring the existing code into a subroutine, but there
is one bit of possibly-needed information that was not getting put into
PGresults: the text of the last query sent to the server.  We must add that
string to the contents of an error PGresult.  But we only need to save it
if it might be used, which with the existing error-formatting code only
happens if there is a PG_DIAG_STATEMENT_POSITION error field, which is
probably pretty rare for errors in production situations.  So really the
overhead when the feature isn't used should be negligible.

Alex Shulgin, reviewed by Daniel Vérité, some improvements by me

Add missing "static".

Per buildfarm member pademelon.

Make all the declarations of WaitEventSetWaitBlock be marked "inline".

The inconsistency here triggered compiler warnings on some buildfarm
members, and it's surely pretty pointless.

Suppress compiler warning.

Some buildfarm members are showing "comparison is always false due to
limited range of data type" complaints on this test, so #ifdef it out
on machines with 32-bit int.

Fix condition in e9e441c9fac6cbc0510cded6abb9d0e6b646ecaf

Comment is right, but if - not.

Fix typo in pg_regress.c

s/afer/after

Pointed out by Andreas 'ads' Scherbaum

Prevent mark as deleted and as 'has free space' page in bloom module

Vacuum might put page into list of pages with some free space and mark as
deleted at the same time.

Fixes in bloom contrib module

Looking at result of buildfarm member jaguarundi it seems to me that
BloomOptions isn't inited sometime, but I don't see yet how it's possible.
Nevertheless, check of signature length's is missed, so, add
a limit of it. Also add missed GenericXLogAbort() in case of already
deleted page in vacuum + minor code refactoring.

Refer to a TOKEN_USER payload as a "token user," not as a "user token".

This corrects messages for can't-happen errors. The corresponding "user
token" appears in the HANDLE argument of GetTokenInformation().

Copyedit comments and documentation.

test_slot_timelines: Fix alternate expected output

Omit null rows when setting the threshold for what's a most-common value.

As with the previous patch, large numbers of null rows could skew this
calculation unfavorably, causing us to discard values that have a
legitimate claim to be MCVs, since our definition of MCV is that it's
most common among the non-null population of the column. Hence, make
the numerator of avgcount be the number of non-null sample values not
the number of sample rows; likewise for maxmincount in the
compute_scalar_stats variant.

Also, make the denominator be the number of distinct values actually
observed in the sample, rather than reversing it back out of the computed
stadistinct. This avoids depending on the accuracy of the Haas-Stokes
approximation, and really it's what we want anyway; the threshold should
depend only on what we see in the sample, not on what we extrapolate
about the contents of the whole column.

Alex Shulgin, reviewed by Tomas Vondra and myself

pgbench: Remove unused parameter

For some reason this parameter was introduced as unused in 3da0dfb4b146,
and has never been used for anything. Remove it.

Author: Fabien Coelho

Omit null rows when applying the Haas-Stokes estimator for ndistinct.

Previously, we included null rows in the values of n and N that went
into the formula, which amounts to considering null as a value in its
own right; but the d and f1 values do not include nulls. This is
inconsistent, and it contributes to significant underestimation of
ndistinct when the column is mostly nulls. In any case stadistinct
is defined as the number of distinct non-null values, so we should
exclude nulls when doing this computation.

This is an aboriginal bug in our application of the Haas-Stokes formula,
but we'll refrain from back-patching for fear of destabilizing plan
choices in released branches.

While at it, make the code a bit more readable by omitting unnecessary
casts and intermediate variables.

Observation and original patch by Tomas Vondra, adjusted to fix both
uses of the formula by Alex Shulgin, cosmetic improvements by me

Fix logical_decoding_timelines test crashes

In the test_slot_timelines test module, we were abusing passing NULL
values which was received as zeroes in x86, but this breaks in ARM
(buildfarm member hamster) by crashing instead. Fix the breakage by
marking these functions as STRICT; the InvalidXid value that was
previously implicit in NULL values (on x86 at least) can now be passed
as 0. Failing to follow the fmgr protocol to check for NULLs beforehand
was causing ARM to fail, as evidenced by segmentation faults in
buildfarm member hamster.

In order to use the new functionality in the test script, use COALESCE
in the right spot to avoid forwarding NULL values.

This was diagnosed from the hamster crash by Craig Ringer, who also
proposed a different patch (checking for NULL values explicitely in the
C function code, and keeping the non-strictness in the C functions).
I decided to go with this approach instead.

Fixes in bloom contrib module missed during review

- macroses llike (var & FLAG) are changed to ((var & FLAG) != 0)
- do not copy uninitialized part of notFullPage array to page

Type names should not be quoted

Our actual convention, contrary to what I said in 59a2111b23f, is not to
quote type names, as evidenced by unquoted use of format_type_be()
result value in error messages. Remove quotes from recently tweaked
messages accordingly.

Per note from Tom Lane

Get rid of minus zero in box regression test.

Commit acdf2a8b added a test case involving minus zero as a box endpoint.
This is not very portable, as evidenced by the several older buildfarm
members that are failing on the test because they print minus zero as
just "0". If there were any significant reason to test this behavior,
we could consider carrying a separate expected-file; but it doesn't look
to me like there's adequate justification to accept such a maintenance
burden. Just change the test to use plain zero, instead.

Fix oversight in getParamDescriptions(), and improve comments.

When getParamDescriptions was changed to handle out-of-memory better
by cribbing error recovery logic from getRowDescriptions/getAnotherTuple,
somebody omitted to copy the stanza about checking for excess data in
the message.  But you need to do that, since continue'ing out of the
switch in pqParseInput3 means no such check gets applied there anymore.
Noted while looking at Michael Paquier's patch that made yet another
copy of this advance_and_error logic.

(This whole business desperately needs refactoring, because I sure don't
want to see a dozen copies of this code, but that's where we seem to be
headed.  What's more, the "suspend parsing on EOF return" convention is a
holdover from protocol 2 and shouldn't exist at all in protocol 3, because
we don't process partial messages anymore.  But for now, just fix the
obvious bug.)

Also, fix some wrong/missing comments about what the API spec is
for these three functions.

This doesn't seem worthy of back-patching, even though it's a bug;
the case shouldn't ever arise in the field.

Fix English in bloom module documentation

Author: Erik Rijkers

Bloom index contrib module

Module provides new access method. It is actually a simple Bloom filter
implemented as pgsql's index. It could give some benefits on search
with large number of columns.

Module is a single way to test generic WAL interface committed earlier.

Author: Teodor Sigaev, Alexander Korotkov
Reviewers: Aleksander Alekseev, Michael Paquier, Jim Nasby

Fix typo in generic wal docs

Markus Nullmeier

Add Generic WAL interface

This interface is designed to give an access to WAL for extensions which
could implement new access method, for example. Previously it was
impossible because restoring from custom WAL would need to access system
catalog to find a redo custom function. This patch suggests generic way
to describe changes on page with standart layout.

Bump XLOG_PAGE_MAGIC because of new record type.

Author: Alexander Korotkov with a help of Petr Jelinek, Markus Nullmeier and
minor editorization by my
Reviewers: Petr Jelinek, Alvaro Herrera, Teodor Sigaev, Jim Nasby,
Michael Paquier

Another zic portability fix.

I should have remembered that we can't use INT64_MODIFIER with sscanf():
configure chooses that to work with snprintf(), but it might be for our
src/port/snprintf.c implementation and so not compatible with the
platform's sscanf(). This appears to be the explanation for buildfarm
member frogmouth's continuing unhappiness with the tzcode update.

Fortunately, in all of the places where zic is attempting to read into
an int64 variable, it's reading a year which certainly will fit just fine
into an int. So make it read into an int with %d, and then cast or copy
as necessary.

Fix recovery_min_apply_delay test

Previously this test was relying too much on WAL replay to occur in the
exact configured interval, which was unreliable on slow or overly busy
servers. Use a custom loop instead of poll_query_until, which is
hopefully more reliable.

Per continued failures on buildfarm member hamster (which is probably
the only one running this test suite)

Author: Michaël Paquier

Support using index-only scans with partial indexes in more cases.

Previously, the planner would reject an index-only scan if any restriction
clause for its table used a column not available from the index, even
if that restriction clause would later be dropped from the plan entirely
because it's implied by the index's predicate.  This is a fairly common
situation for partial indexes because predicates using columns not included
in the index are often the most useful kind of predicate, and we have to
duplicate (or at least imply) the predicate in the WHERE clause in order
to get the index to be considered at all.  So index-only scans were
essentially unavailable with such partial indexes.

To fix, we have to do detection of implied-by-predicate clauses much
earlier in the planner.  This patch puts it in check_index_predicates
(nee check_partial_indexes), meaning it gets done for every partial index,
whereas we previously only considered this issue at createplan time,
so that the work was only done for an index actually selected for use.
That could result in a noticeable planning slowdown for queries against
tables with many partial indexes.  However, testing suggested that there
isn't really a significant cost, especially not with reasonable numbers
of partial indexes.  We do get a small additional benefit, which is that
cost_index is more accurate since it correctly discounts the evaluation
cost of clauses that will be removed.  We can also avoid considering such
clauses as potential indexquals, which saves useless matching cycles in
the case where the predicate columns aren't in the index, and prevents
generating bogus plans that double-count the clause's selectivity when
the columns are in the index.

Tomas Vondra and Kyotaro Horiguchi, reviewed by Kevin Grittner and
Konstantin Knizhnik, and whacked around a little by me

Fix broken variable declaration

Author: Konstantin Knizhnik

Blind attempt at fixing Win32 issue on 24c5f1a103c

As best as I can tell, MyReplicationSlot needs to be PGDLLIMPORT in
order for the new test_slot_timelines test module to compile.

Per buildfarm

Use proper format specifier %X/%X for LSN.

I forgot the alternate expected file in previous commit

Without this, the test_slot_timelines modules fails "make installcheck"
because the required feature is not enabled in a stock server.

Per buildfarm

Enable logical slots to follow timeline switches

When decoding from a logical slot, it's necessary for xlog reading to be
able to read xlog from historical (i.e. not current) timelines;
otherwise, decoding fails after failover, because the archives are in
the historical timeline. This is required to make "failover logical
slots" possible; it currently has no other use, although theoretically
it could be used by an extension that creates a slot on a standby and
continues to replay from the slot when the standby is promoted.

This commit includes a module in src/test/modules with functions to
manipulate the slots (which is not otherwise possible in SQL code) in
order to enable testing, and a new test in src/test/recovery to ensure
that the behavior is as expected.

Author: Craig Ringer
Reviewed-By: Oleksii Kliukin, Andres Freund, Petr Jelínek

XLogReader general code cleanup

Some minor tweaks and comment additions, for cleanliness sake and to
avoid having the upcoming timeline-following patch be polluted with
unrelated cleanup.

Extracted from a larger patch by Craig Ringer, reviewed by Andres
Freund, with some additions by myself.

Improve portability of I/O behavior for the geometric types.

Formerly, the geometric I/O routines such as box_in and point_out relied
directly on strtod() and sprintf() for conversion of the float8 component
values of their data types.  However, the behavior of those functions is
pretty platform-dependent, especially for edge-case values such as
infinities and NaNs.  This was exposed by commit acdf2a8b372aec1d, which
added test cases involving boxes with infinity endpoints, and immediately
failed on Windows and AIX buildfarm members.  We solved these problems
years ago in the main float8in and float8out functions, so let's fix it
by making the geometric types use that code instead of depending directly
on the platform-supplied functions.

To do this, refactor the float8in code so that it can be used to parse
just part of a string, and as a convenience make the guts of float8out
usable without going through DirectFunctionCall.

While at it, get rid of geo_ops.c's fairly shaky assumptions about the
maximum output string length for a double, by having it build results in
StringInfo buffers instead of fixed-length strings.

In passing, convert all the "invalid input syntax for type foo" messages
in this area of the code into "invalid input syntax for type %s" to reduce
the number of distinct translatable strings, per recent discussion.
We would have needed a fair number of the latter anyway for code-sharing
reasons, so we might as well just go whole hog.

Note: this patch is by no means intended to guarantee that the geometric
types uniformly behave sanely for infinity or NaN component values.
But any bugs we have in that line were there all along, they were just
harder to reach in a platform-independent way.

Suppress uninitialized-variable warnings.

My compiler doesn't like the lack of initialization of "flag", and
I think it's right: if there were zero keys we'd have an undefined
result. The AND of zero items is TRUE, so initialize to TRUE.

Bump catalog version, forget in acdf2a8b372aec1da09370fca77ff7dccac7646d

Introduce SP-GiST operator class over box.

Patch implements quad-tree over boxes, naive approach of 2D quad tree will not
work for any non-point objects because splitting space on node is not
efficient. The idea of pathc is treating 2D boxes as 4D points, so,
object will not overlap (in 4D space).

The performance tests reveal that this technique especially beneficial
with too much overlapping objects, so called "spaghetti data".

Author: Alexander Lebedev with editorization by Emre Hasegeli and me

Use traversalValue in SP-GiST range opclass.

Author: Alexander Lebedev

Introduce traversalValue for SP-GiST scan

During scan sometimes it would be very helpful to know some information about
parent node or all ancestor nodes. Right now reconstructedValue could be used
but it's not a right usage of it (range opclass uses that).

traversalValue is arbitrary piece of memory in separate MemoryContext while
reconstructedVale should have the same type as indexed column.

Subsequent patches for range opclass and quad4d tree will use it.

Author: Alexander Lebedev, Teodor Sigaev

Add missing gss option to msvc config template

Michael Paquier

Remove just-added tests for to_timestamp(float8) with out-of-range inputs.

Reporting the specific out-of-range input value produces platform-dependent
results. We could skip reporting the value, but that's contrary to our
message style guidelines and unhelpful to users. Or we could add a
separate expected-output file for Windows, but that would be a substantial
maintenance burden, and these test cases seem unlikely to be worth it.

Per buildfarm.

Remove TZ environment-variable entry from postgres reference page.

The server hasn't paid attention to the TZ environment variable since
commit ca4af308c32d03db, but that commit missed removing this documentation
reference, as did commit d883b916a947a3c6 which added the reference where
it now belongs (initdb).

Back-patch to 9.2 where the behavior changed. Also back-patch
d883b916a947a3c6 as needed.

Matthew Somerville

Add new replication mode synchronous_commit = 'remote_apply'.

In this mode, the master waits for the transaction to be applied on
the remote side, not just written to disk.  That means that you can
count on a transaction started on the standby to see all commits
previously acknowledged by the master.

To make this work, the standby sends a reply after replaying each
commit record generated with synchronous_commit >= 'remote_apply'.
This introduces a small inefficiency: the extra replies will be sent
even by standbys that aren't the current synchronous standby.  But
previously-existing synchronous_commit levels make no attempt at all
to optimize which replies are sent based on what the primary cares
about, so this is no worse, and at least avoids any extra replies for
people not using the feature at all.

Thomas Munro, reviewed by Michael Paquier and by me.  Some additional
tweaks by me.

Fix interval_mul() to not produce insane results.

interval_mul() attempts to prevent its calculations from producing silly
results, but it forgot that zero times infinity yields NaN in IEEE
arithmetic.  Hence, a case like '1 second'::interval * 'infinity'::float8
produced a NaN for the months product, which didn't trigger the range
check, resulting in bogus and possibly platform-dependent output.

This isn't terribly obvious to the naked eye because if you try that
exact case, you get "interval out of range" which is what you expect
--- but if you look closer, the error is coming from interval_out not
interval_mul.  interval_mul has allowed a bogus value into the system.

Fix by adding isnan tests.

Noted while testing Vitaly Burovoy's fix for infinity input to
to_timestamp().  Given the lack of field complaints, I doubt this
is worth a back-patch.

Allow to_timestamp(float8) to convert float infinity to timestamp infinity.

With the original SQL-function implementation, such cases failed because
we don't support infinite intervals. Converting the function to C lets
us bypass the interval representation, which should be a bit faster as
well as more flexible.

Vitaly Burovoy, reviewed by Anastasia Lubennikova

Fix bug in aggregate (de)serialization commit.

resulttypeLen and resulttypeByVal must be set correctly when serializing
aggregates, not just when finalizing them. This was in David's final
patch but I downloaded the wrong version by mistake and failed to spot
the error.

David Rowley

Allow aggregate transition states to be serialized and deserialized.

This is necessary infrastructure for supporting parallel aggregation
for aggregates whose transition type is "internal". Such values
can't be passed between cooperating processes, because they are
just pointers.

David Rowley, reviewed by Tomas Vondra and by me.

Improve pgbench docs regarding per-transaction logging.

The old documentation didn't know about the new -b flag, only about -f.

Fabien Coelho

Fix pgbench documentation error.

The description of what the per-transaction log file says for skipped
transactions is just plain wrong.

Report and patch by Tomas Vondra, reviewed by Fabien Coelho and
modified by me.

pgbench: allow a script weight of zero

This refines the previous weight range and allows a script to be "turned
off" by passing a zero weight, which is useful when scripting multiple
pgbench runs.

I did not apply the suggested warning when a script uses zero weight; we
use the principle elsewhere that if there's nothing to be done, do
nothing quietly.

Adjust docs accordingly.

Author: Jeff Janes, Fabien Coelho

pgbench: Remove \setrandom.

You can now do the same thing via \set using the appropriate function,
either random(), random_gaussian(), or random_exponential(), depending
on the desired distribution. This is not backward-compatible, but per
discussion, it's worth it to avoid having the old syntax hang around
forever.

Fabien Coelho, reviewed by Michael Paquier, and adjusted by me.

Avoid possibly-unsafe use of Windows' FormatMessage() function.

Whenever this function is used with the FORMAT_MESSAGE_FROM_SYSTEM flag,
it's good practice to include FORMAT_MESSAGE_IGNORE_INSERTS as well.
Otherwise, if the message contains any %n insertion markers, the function
will try to fetch argument strings to substitute --- which we are not
passing, possibly leading to a crash. This is exactly analogous to the
rule about not giving printf() a format string you're not in control of.

Noted and patched by Christian Ullrich.
Back-patch to all supported branches.

Fix support of digits in email/hostnames.

When tsearch was implemented I did several mistakes in hostname/email
definition rules:
1) allow underscore in hostname what prohibited by RFC
2) forget to allow leading digits separated by hyphen (like 123-x.com)
in hostname
3) do no allow underscore/hyphen after leading digits in localpart of email

Artur's patch resolves two last issues, but by the way allows hosts name like
123_x.com together with 123-x.com. RFC forbids underscore usage in hostname
but pg allows that since initial tsearch version in core, although only
for non-digits. Patch syncs support digits and nondigits in both hostname and
email.

Forbidding underscore in hostname may break existsing usage of tsearch and,
anyhow, it should be done by separate patch.

Author: Artur Zakirov
BUG: #13964

Rework custom scans to work more like the new extensible node stuff.

Per discussion, the new extensible node framework is thought to be
better designed than the custom path/scan/scanstate stuff we added
in PostgreSQL 9.5. Rework the latter to be more like the former.

This is not backward-compatible, but we generally don't promise that
for C APIs, and there probably aren't many people using this yet
anyway.

KaiGai Kohei, reviewed by Petr Jelinek and me. Some further
cosmetic changes by me.

Protect zic's symlink() call with #ifdef HAVE_SYMLINK.

The IANA crew seem to think that symlink() exists everywhere nowadays,
and they may well be right. But we use #ifdef HAVE_SYMLINK elsewhere
so for consistency we should do it here too. Noted by Michael Paquier.

Fix zic for Windows.

The new coding of dolink() is dependent on link() returning an on-point
errno when it fails; but the quick-hack implementation of link() that
we'd put in for Windows didn't bother with setting errno. Fix that.

Analysis and patch by Christian Ullrich.

Fix portability issues in 86c43f4e22c0771fd0cc6bce2799802c894ee2ec.

INT64_MIN/MAX should be spelled PG_INT64_MIN/MAX, per well established
convention in our sources. Less obviously, a symbol named DOUBLE causes
problems on Windows builds, so rename that to DOUBLE_CONST; and rename
INTEGER to INTEGER_CONST for consistency.

Also, get rid of incorrect/obsolete hand-munging of yycolumn, and fix
the grammar for float constants to handle expected cases such as ".1".

First two items by Michael Paquier, second two by me.

Don't require a user mapping for FDWs to work.

Commit fbe5a3fb73102c2cfec11aaaa4a67943f4474383 accidentally changed
this behavior; put things back the way they were, and add some
regression tests.

Report by Andres Freund; patch by Ashutosh Bapat, with a bit of
kibitzing by me.

On all Windows platforms, not just Cygwin, use _timezone and _tzname.

Up until now, we've been using timezone and tzname, but Visual Studio
2015 (for which we wish to add support) no longer declares those
symbols. All versions since Visual Studio 2003 apparently support the
underscore-equipped names, and we don't support anything older than
Visual Studio 2005, so this should work OK everywhere. But let's see
what the buildfarm thinks.

Michael Paquier, reviewed by Petr Jelinek

Fix typo in comment.

Thomas Munro

pgbench: Support double constants and functions.

The new functions are pi(), random(), random_exponential(),
random_gaussian(), and sqrt(). I was worried that this would be
slower than before, but, if anything, it actually turns out to be
slightly faster, because we now express the built-in pgbench scripts
using fewer lines; each \setrandom can be merged into a subsequent
\set.

Fabien Coelho

PostgresNode: initialize $timed_out if passed

Corrects an oversight in 2c83f435a3 where the $timed_out reference var
isn't initialized; using it would require the caller to initialize it
beforehand, which is cumbersome.

Author: Craig Ringer

Mention BRIN as able to do multi-column indexes

Documentation mentioned B-tree, GiST and GIN as able to do multicolumn
indexes; I failed to add BRIN to the list.

Author: Petr Jediný
Reviewed-By: Fujii Masao, Emre Hasegeli

Sync tzload() and tzparse() APIs with IANA release tzcode2016c.

This brings us a bit closer to matching upstream, but since it affects
files outside src/timezone/, we might choose not to back-patch it.
Hence keep it separate from the main update patch.

Fix MSVC build for changes in zic.

zic now only needs zic.c, but I didn't realize knowledge about it was
hardwired into Mkvcbuild.pm. Per buildfarm.

Sync our copy of the timezone library with IANA release tzcode2016c.

We hadn't done this in about six years, which proves to have been a mistake
because there's been a lot of code churn upstream, making the merge rather
painful.  But putting it off any further isn't going to lessen the pain,
and there are at least two incompatible changes that we need to absorb
before someone starts complaining that --with-system-tzdata doesn't work
at all on their platform, or we get blindsided by a tzdata release that
our out-of-date zic can't compile.  Last week's "time zone abbreviation
differs from POSIX standard" mess was a wake-up call in that regard.

This is a sufficiently large patch that I'm afraid to back-patch it
immediately, though the foregoing considerations imply that we probably
should do so eventually.  For the moment, just put it in HEAD so that
it can get some testing.  Maybe we can wait till the end of the 9.6
beta cycle before deeming it okay.

Document errhidecontext() where it ought to be documented.

Seems to have been missed when this function was added. Noted while
looking at David Steele's proposal to add another similar function.

Update expected file from quoting change

I neglected to update this in 59a2111b23f.

Per buildfarm

pg_rewind: Improve internationalization

This is mostly cosmetic since two of the three changes are debug
messages, and the third one is just a progress indicator.

Author: Michaël Paquier

Fix minor leak in pg_dump for ACCESS METHOD.

Bug reported by Coverity.

Author: Michaël Paquier

Improve internationalization of messages involving type names

Change the slightly different variations of the message
function FOO must return type BAR
to a single wording, removing the variability in type name so that they
all create a single translation entry; since the type name is not to be
translated, there's no point in it being part of the message anyway.

Also, change them all to use the same quoting convention, namely that
the function name is not to be quoted but the type name is. (I'm not
quite sure why this is so, but it's the clear majority.)

Some similar messages such as "encoding conversion function FOO must ..."
are also changed.

psql tab-complete for CREATE/DROP ACCESS METHOD

Alexander Korotkov

Fix comment in pg_dump.

It was missed in 473b93287040b20017cc25a157cffdc5b978c254,
CREATE ACCESS METHOD

Alexander Korotkov

Last-minute updates for release notes.

Security: CVE-2016-2193, CVE-2016-3065

Add missing checks to some of pageinspect's BRIN functions

brin_page_type() and brin_metapage_info() did not enforce being called
by superuser, like other pageinspect functions that take bytea do.
Since they don't verify the passed page thoroughly, it is possible to
use them to read the server memory with a carefully crafted bytea value,
up to a file kilobytes from where the input bytea is located.

Have them throw errors if called by a non-superuser.

Report and initial patch: Andreas Seltenreich

Security: CVE-2016-3065

Reset plan->row_security_env and planUserId

In the plancache, we check if the environment we planned the query under
has changed in a way which requires us to re-plan, such as when the user
for whom the plan was prepared changes and RLS is being used (and,
therefore, there may be different policies to apply).

Unfortunately, while those values were set and checked, they were not
being reset when the query was re-planned and therefore, in cases where
we change role, re-plan, and then change role again, we weren't
re-planning again. This leads to potentially incorrect policies being
applied in cases where role-specific policies are used and a given query
is planned under one role and then executed under other roles, which
could happen under security definer functions or when a common user and
query is planned initially and then re-used across multiple SET ROLEs.

Further, extensions which made use of CopyCachedPlan() may suffer from
similar issues as the RLS-related fields were not properly copied as
part of the plan and therefore RevalidateCachedQuery() would copy in the
current settings without invalidating the query.

Fix by using the same approach used for 'search_path', where we set the
correct values in CompleteCachedPlan(), check them early on in
RevalidateCachedQuery() and then properly reset them if re-planning.
Also, copy through the values during CopyCachedPlan().

Pointed out by Ashutosh Bapat. Reviewed by Michael Paquier.

Back-patch to 9.5 where RLS was introduced.

Security: CVE-2016-2193

Code and docs review for commit 3187d6de0e5a9e805b27c48437897e8c39071d45.

Fix up check for high-bit-set characters, which provoked "comparison is
always true due to limited range of data type" warnings on some compilers,
and was unlike the way we do it elsewhere anyway.  Fix omission of "$"
from the set of valid identifier continuation characters.  Get rid of
sanitize_text(), which was utterly inconsistent with any other error report
anywhere in the system, and wasn't even well designed on its own terms
(double-quoting the result string without escaping contained double quotes
doesn't seem very well thought out).  Fix up error messages, which didn't
follow the message style guidelines very well, and were overly specific in
situations where the actual mistake might not be what they said.  Improve
documentation.

(I started out just intending to fix the compiler warning, but the more
I looked at the patch the less I liked it.)

Release notes for 9.5.2, 9.4.7, 9.3.12, 9.2.16, 9.1.21.

Guard against zero vardata.rel->tuples in estimate_hash_bucketsize().

If the referenced rel was proven empty, we'd compute 0/0 here, which
results in the function returning NaN.  That's a bit more serious
than the other zero-divide case.  Still, it only seems to be possible
in HEAD, so no back-patch.

Per report from Piotr Stefaniak.  I looked through the rest of selfuncs.c
and found no other likely trouble spots.

Clamp adjusted ndistinct to positive integer in estimate_hash_bucketsize().

This avoids a possible divide-by-zero in the following calculation,
and rounding the number to an integer seems like saner behavior anyway.
Assuming IEEE math, the division would yield +Infinity which would get
replaced by 1.0 at the bottom of the function, so nothing really
interesting would ensue; but avoiding divide-by-zero seems like a
good idea on general principles.

Per report from Piotr Stefaniak. No back-patch since this seems
mostly cosmetic.

pg_rewind: fsync target data directory.

Previously pg_rewind did not fsync any files. That's problematic, given
that the target directory is modified. If the database was started
afterwards, 2ce439f33 luckily already caused the data directory to be
synced to disk at postmaster startup; reducing the scope of the problem.

To fix, use initdb -S, at the end of the pg_rewind run. It doesn't seem
worthwhile to duplicate the code into pg_rewind, and initdb -S is
already used that way by pg_upgrade.

Reported-By: Andres Freund
Author: Michael Paquier, somewhat edited by me
Discussion: 20160310034352.iuqgvpmg5qmnxtkz@alap3.anarazel.de
CAB7nPqSytVG1o4S3S2pA1O=692ekurJ+fckW2PywEG3sNw54Ow@mail.gmail.com
Backpatch: 9.5, where pg_rewind was introduced

Fix LWLockReportWaitEnd() parameter list to be (void).

Previously it was an "old style" function declaration.

pg_rewind: Close backup_label file descriptor.

This was a relatively harmless leak, as createBackupLabel() is only
called once per pg_rewind invocation.

Author: Michael Paquier
Reported-By: Michael Paquier
Discussion: CAB7nPqRnOw30gOXe2_SPLjh37bgm4V+txbYAPwoXb97nGQ297w@mail.gmail.com
Backpatch: 9.5, where pg_rewind was introduced

Don't use !! but != 0/NULL to force boolean evaluation.

I introduced several uses of !! to force bit arithmetic to be boolean,
but per discussion the project prefers != 0/NULL.

Discussion: CA+TgmoZP5KakLGP6B4vUjgMBUW0woq_dJYi0paOz-My0Hwt_vQ@mail.gmail.com

Change various Gin*Is* macros to return 0/1.

Returning the direct result of bit arithmetic, in a macro intended to be
used in a boolean manner, can be problematic if the return value is
stored in a variable of type 'bool'. If bool is implemented using C99's
_Bool, that can lead to comparison failures if the variable is then
compared again with the expression (see ginStepRight() for an example
that fails), as _Bool forces the result to be 0/1. That happens in some
configurations of newer MSVC compilers. It's also problematic when
storing the result of such an expression in a narrower type.

Several gin macros have been declared in that style since gin's initial
commit in 8a3631f8d86.

There's a lot more macros like this, but this is the only one causing
regression test failures; and I don't want to commit and backpatch a
larger patch with lots of conflicts just before the next set of minor
releases.

Discussion: 20150811154237.GD17575@awork2.anarazel.de
Backpatch: All supported branches

First-draft release notes for 9.5.2.

As usual, the release notes for other branches will be made by cutting
these down, but put them up for community review first.

Modernize zic's test for valid timezone abbreviations.

We really need to sync all of our IANA-derived timezone code with upstream,
but that's going to be a large patch and I certainly don't care to shove
such a thing into stable branches immediately before a release. As a
stopgap, copy just the tzcode2016c logic that checks validity of timezone
abbreviations. This prevents getting multiple "time zone abbreviation
differs from POSIX standard" bleats with tzdata 2014b and later.

Avoid a couple of zero-divide scenarios in the planner.

cost_subplan() supposed that the given subplan must have plan_rows > 0,
which as far as I can tell was true until recent refactoring of the
code in createplan.c; but now that code allows the Result for a provably
empty subquery to have plan_rows = 0.  Rather than undo that change,
put in a clamp to prevent zero divide.

get_cheapest_fractional_path() likewise supposed that best_path->rows > 0.
This assumption has been wrong for longer.  It's actually harmless given
IEEE float math, because a positive value divided by zero gives +Infinity
and compare_fractional_path_costs() will do the right thing with that.
Still, best not to assume that.

final_cost_nestloop() also seems to have some risks in this area, so
borrow the clamping logic already present in the mergejoin cost functions.

Lastly, remove unnecessary clamp_row_est() in planner.c's calls to
get_number_of_groups().  The only thing that function does with path_rows
is pass it to estimate_num_groups() which already has an internal clamp,
so we don't need the extra call; and if we did, the callers are arguably
the wrong place for it anyway.

First two items reported by Piotr Stefaniak, the others are products
of my nosing around for similar problems.  No back-patch since there's
no evidence that problems arise in the back branches.

Update time zone data files to tzdata release 2016c.

DST law changes in Azerbaijan, Chile, Haiti, Palestine, and Russia (Altai,
Astrakhan, Kirov, Sakhalin, Ulyanovsk regions).  Historical corrections
for Lithuania, Moldova, Russia (Kaliningrad, Samara, Volgograd).

As of 2015b, the keepers of the IANA timezone database started to use
numeric time zone abbreviations (e.g., "+04") instead of inventing
abbreviations not found in the wild like "ASTT".  This causes our rather
old copy of zic to whine "warning: time zone abbreviation differs from
POSIX standard" several times during "make install".  This warning is
harmless according to the IANA folk, and I don't see any problems with
these abbreviations in some simple tests; but it seems like now would be
a good time to update our copy of the tzcode stuff.  I'll look into that
soon.

Fix PL/Tcl for vpath builds.

Commit cd37bb78599dcf24 works for in-tree builds, but not so much for
VPATH. Per buildfarm.

Improve PL/Tcl errorCode facility by providing decoded name for SQLSTATE.

We don't really want to encourage people to write numeric SQLSTATEs in
programs; that's unreadable and error-prone. Copy plpgsql's infrastructure
for converting between SQLSTATEs and exception names shown in Appendix A,
and modify examples in tests and documentation to do it that way.

In PL/Tcl, make database errors return additional info in the errorCode.

Tcl has a convention for returning additional info about an error in a
global variable named errorCode. Up to now PL/Tcl has ignored that,
but this patch causes database errors caught by PL/Tcl to fill in
errorCode with useful information from the ErrorData struct.

Jim Nasby, reviewed by Pavel Stehule and myself

Fix DROP OPERATOR to reset oprcom/oprnegate links to the dropped operator.

This avoids leaving dangling links in pg_operator; which while fairly
harmless are also unsightly.

While we're at it, simplify OperatorUpd, which went through
heap_modify_tuple for no very good reason considering it had already made
a tuple copy it could just scribble on.

Roma Sokolov, reviewed by Tomas Vondra, additional hacking by Robert Haas
and myself.

Don't split up SRFs when choosing to postpone SELECT output expressions.

In commit 9118d03a8cca3d97 we taught the planner to postpone evaluation of
set-returning functions in a SELECT's targetlist until after any sort done
to satisfy ORDER BY.  However, if we postpone some SRFs this way while
others do not get postponed (because they're sort or group key columns)
we will break the traditional behavior by which all SRFs in the tlist run
in-step during ExecTargetList(), so that you get the least common multiple
of their periods not the product.  Fix make_sort_input_target() so it will
not split up SRF evaluation in such cases.

There is still a hazard of similar odd behavior if there's a SRF in a
grouping column and another one that isn't, but that was true before
and we're just trying to preserve bug-compatibility with the traditional
behavior.  This whole area is overdue to be rethought and reimplemented,
but we'll try to avoid changing behavior until then.

Per report from Regina Obe.