Alvaro Herrera [Wed, 30 Mar 2016 23:07:05 +0000 (20:07 -0300)]
Enable logical slots to follow timeline switches
When decoding from a logical slot, it's necessary for xlog reading to be
able to read xlog from historical (i.e. not current) timelines;
otherwise, decoding fails after failover, because the archives are in
the historical timeline. This is required to make "failover logical
slots" possible; it currently has no other use, although theoretically
it could be used by an extension that creates a slot on a standby and
continues to replay from the slot when the standby is promoted.
This commit includes a module in src/test/modules with functions to
manipulate the slots (which is not otherwise possible in SQL code) in
order to enable testing, and a new test in src/test/recovery to ensure
that the behavior is as expected.
Author: Craig Ringer Reviewed-By: Oleksii Kliukin, Andres Freund, Petr Jelínek
Alvaro Herrera [Wed, 30 Mar 2016 21:56:13 +0000 (18:56 -0300)]
XLogReader general code cleanup
Some minor tweaks and comment additions, for cleanliness sake and to
avoid having the upcoming timeline-following patch be polluted with
unrelated cleanup.
Extracted from a larger patch by Craig Ringer, reviewed by Andres
Freund, with some additions by myself.
Tom Lane [Wed, 30 Mar 2016 21:25:03 +0000 (17:25 -0400)]
Improve portability of I/O behavior for the geometric types.
Formerly, the geometric I/O routines such as box_in and point_out relied
directly on strtod() and sprintf() for conversion of the float8 component
values of their data types. However, the behavior of those functions is
pretty platform-dependent, especially for edge-case values such as
infinities and NaNs. This was exposed by commit acdf2a8b372aec1d, which
added test cases involving boxes with infinity endpoints, and immediately
failed on Windows and AIX buildfarm members. We solved these problems
years ago in the main float8in and float8out functions, so let's fix it
by making the geometric types use that code instead of depending directly
on the platform-supplied functions.
To do this, refactor the float8in code so that it can be used to parse
just part of a string, and as a convenience make the guts of float8out
usable without going through DirectFunctionCall.
While at it, get rid of geo_ops.c's fairly shaky assumptions about the
maximum output string length for a double, by having it build results in
StringInfo buffers instead of fixed-length strings.
In passing, convert all the "invalid input syntax for type foo" messages
in this area of the code into "invalid input syntax for type %s" to reduce
the number of distinct translatable strings, per recent discussion.
We would have needed a fair number of the latter anyway for code-sharing
reasons, so we might as well just go whole hog.
Note: this patch is by no means intended to guarantee that the geometric
types uniformly behave sanely for infinity or NaN component values.
But any bugs we have in that line were there all along, they were just
harder to reach in a platform-independent way.
Tom Lane [Wed, 30 Mar 2016 17:36:18 +0000 (13:36 -0400)]
Suppress uninitialized-variable warnings.
My compiler doesn't like the lack of initialization of "flag", and
I think it's right: if there were zero keys we'd have an undefined
result. The AND of zero items is TRUE, so initialize to TRUE.
Teodor Sigaev [Wed, 30 Mar 2016 15:42:36 +0000 (18:42 +0300)]
Introduce SP-GiST operator class over box.
Patch implements quad-tree over boxes, naive approach of 2D quad tree will not
work for any non-point objects because splitting space on node is not
efficient. The idea of pathc is treating 2D boxes as 4D points, so,
object will not overlap (in 4D space).
The performance tests reveal that this technique especially beneficial
with too much overlapping objects, so called "spaghetti data".
Author: Alexander Lebedev with editorization by Emre Hasegeli and me
Teodor Sigaev [Wed, 30 Mar 2016 15:29:28 +0000 (18:29 +0300)]
Introduce traversalValue for SP-GiST scan
During scan sometimes it would be very helpful to know some information about
parent node or all ancestor nodes. Right now reconstructedValue could be used
but it's not a right usage of it (range opclass uses that).
traversalValue is arbitrary piece of memory in separate MemoryContext while
reconstructedVale should have the same type as indexed column.
Subsequent patches for range opclass and quad4d tree will use it.
Tom Lane [Wed, 30 Mar 2016 02:23:32 +0000 (22:23 -0400)]
Remove just-added tests for to_timestamp(float8) with out-of-range inputs.
Reporting the specific out-of-range input value produces platform-dependent
results. We could skip reporting the value, but that's contrary to our
message style guidelines and unhelpful to users. Or we could add a
separate expected-output file for Windows, but that would be a substantial
maintenance burden, and these test cases seem unlikely to be worth it.
Tom Lane [Wed, 30 Mar 2016 01:38:14 +0000 (21:38 -0400)]
Remove TZ environment-variable entry from postgres reference page.
The server hasn't paid attention to the TZ environment variable since
commit ca4af308c32d03db, but that commit missed removing this documentation
reference, as did commit d883b916a947a3c6 which added the reference where
it now belongs (initdb).
Back-patch to 9.2 where the behavior changed. Also back-patch d883b916a947a3c6 as needed.
Robert Haas [Wed, 30 Mar 2016 01:16:12 +0000 (21:16 -0400)]
Add new replication mode synchronous_commit = 'remote_apply'.
In this mode, the master waits for the transaction to be applied on
the remote side, not just written to disk. That means that you can
count on a transaction started on the standby to see all commits
previously acknowledged by the master.
To make this work, the standby sends a reply after replaying each
commit record generated with synchronous_commit >= 'remote_apply'.
This introduces a small inefficiency: the extra replies will be sent
even by standbys that aren't the current synchronous standby. But
previously-existing synchronous_commit levels make no attempt at all
to optimize which replies are sent based on what the primary cares
about, so this is no worse, and at least avoids any extra replies for
people not using the feature at all.
Thomas Munro, reviewed by Michael Paquier and by me. Some additional
tweaks by me.
Tom Lane [Tue, 29 Mar 2016 21:21:12 +0000 (17:21 -0400)]
Fix interval_mul() to not produce insane results.
interval_mul() attempts to prevent its calculations from producing silly
results, but it forgot that zero times infinity yields NaN in IEEE
arithmetic. Hence, a case like '1 second'::interval * 'infinity'::float8
produced a NaN for the months product, which didn't trigger the range
check, resulting in bogus and possibly platform-dependent output.
This isn't terribly obvious to the naked eye because if you try that
exact case, you get "interval out of range" which is what you expect
--- but if you look closer, the error is coming from interval_out not
interval_mul. interval_mul has allowed a bogus value into the system.
Fix by adding isnan tests.
Noted while testing Vitaly Burovoy's fix for infinity input to
to_timestamp(). Given the lack of field complaints, I doubt this
is worth a back-patch.
Tom Lane [Tue, 29 Mar 2016 21:09:21 +0000 (17:09 -0400)]
Allow to_timestamp(float8) to convert float infinity to timestamp infinity.
With the original SQL-function implementation, such cases failed because
we don't support infinite intervals. Converting the function to C lets
us bypass the interval representation, which should be a bit faster as
well as more flexible.
Robert Haas [Tue, 29 Mar 2016 19:21:57 +0000 (15:21 -0400)]
Fix bug in aggregate (de)serialization commit.
resulttypeLen and resulttypeByVal must be set correctly when serializing
aggregates, not just when finalizing them. This was in David's final
patch but I downloaded the wrong version by mistake and failed to spot
the error.
Robert Haas [Tue, 29 Mar 2016 19:04:05 +0000 (15:04 -0400)]
Allow aggregate transition states to be serialized and deserialized.
This is necessary infrastructure for supporting parallel aggregation
for aggregates whose transition type is "internal". Such values
can't be passed between cooperating processes, because they are
just pointers.
Alvaro Herrera [Tue, 29 Mar 2016 17:13:51 +0000 (14:13 -0300)]
pgbench: allow a script weight of zero
This refines the previous weight range and allows a script to be "turned
off" by passing a zero weight, which is useful when scripting multiple
pgbench runs.
I did not apply the suggested warning when a script uses zero weight; we
use the principle elsewhere that if there's nothing to be done, do
nothing quietly.
Robert Haas [Tue, 29 Mar 2016 16:08:49 +0000 (12:08 -0400)]
pgbench: Remove \setrandom.
You can now do the same thing via \set using the appropriate function,
either random(), random_gaussian(), or random_exponential(), depending
on the desired distribution. This is not backward-compatible, but per
discussion, it's worth it to avoid having the old syntax hang around
forever.
Fabien Coelho, reviewed by Michael Paquier, and adjusted by me.
Tom Lane [Tue, 29 Mar 2016 15:54:57 +0000 (11:54 -0400)]
Avoid possibly-unsafe use of Windows' FormatMessage() function.
Whenever this function is used with the FORMAT_MESSAGE_FROM_SYSTEM flag,
it's good practice to include FORMAT_MESSAGE_IGNORE_INSERTS as well.
Otherwise, if the message contains any %n insertion markers, the function
will try to fetch argument strings to substitute --- which we are not
passing, possibly leading to a crash. This is exactly analogous to the
rule about not giving printf() a format string you're not in control of.
Noted and patched by Christian Ullrich.
Back-patch to all supported branches.
Teodor Sigaev [Tue, 29 Mar 2016 14:59:58 +0000 (17:59 +0300)]
Fix support of digits in email/hostnames.
When tsearch was implemented I did several mistakes in hostname/email
definition rules:
1) allow underscore in hostname what prohibited by RFC
2) forget to allow leading digits separated by hyphen (like 123-x.com)
in hostname
3) do no allow underscore/hyphen after leading digits in localpart of email
Artur's patch resolves two last issues, but by the way allows hosts name like
123_x.com together with 123-x.com. RFC forbids underscore usage in hostname
but pg allows that since initial tsearch version in core, although only
for non-digits. Patch syncs support digits and nondigits in both hostname and
email.
Forbidding underscore in hostname may break existsing usage of tsearch and,
anyhow, it should be done by separate patch.
Robert Haas [Tue, 29 Mar 2016 15:00:18 +0000 (11:00 -0400)]
Rework custom scans to work more like the new extensible node stuff.
Per discussion, the new extensible node framework is thought to be
better designed than the custom path/scan/scanstate stuff we added
in PostgreSQL 9.5. Rework the latter to be more like the former.
This is not backward-compatible, but we generally don't promise that
for C APIs, and there probably aren't many people using this yet
anyway.
KaiGai Kohei, reviewed by Petr Jelinek and me. Some further
cosmetic changes by me.
Tom Lane [Tue, 29 Mar 2016 15:06:44 +0000 (11:06 -0400)]
Protect zic's symlink() call with #ifdef HAVE_SYMLINK.
The IANA crew seem to think that symlink() exists everywhere nowadays,
and they may well be right. But we use #ifdef HAVE_SYMLINK elsewhere
so for consistency we should do it here too. Noted by Michael Paquier.
Tom Lane [Tue, 29 Mar 2016 14:40:08 +0000 (10:40 -0400)]
Fix zic for Windows.
The new coding of dolink() is dependent on link() returning an on-point
errno when it fails; but the quick-hack implementation of link() that
we'd put in for Windows didn't bother with setting errno. Fix that.
INT64_MIN/MAX should be spelled PG_INT64_MIN/MAX, per well established
convention in our sources. Less obviously, a symbol named DOUBLE causes
problems on Windows builds, so rename that to DOUBLE_CONST; and rename
INTEGER to INTEGER_CONST for consistency.
Also, get rid of incorrect/obsolete hand-munging of yycolumn, and fix
the grammar for float constants to handle expected cases such as ".1".
First two items by Michael Paquier, second two by me.
Robert Haas [Tue, 29 Mar 2016 00:59:25 +0000 (20:59 -0400)]
On all Windows platforms, not just Cygwin, use _timezone and _tzname.
Up until now, we've been using timezone and tzname, but Visual Studio
2015 (for which we wish to add support) no longer declares those
symbols. All versions since Visual Studio 2003 apparently support the
underscore-equipped names, and we don't support anything older than
Visual Studio 2005, so this should work OK everywhere. But let's see
what the buildfarm thinks.
Robert Haas [Tue, 29 Mar 2016 00:45:57 +0000 (20:45 -0400)]
pgbench: Support double constants and functions.
The new functions are pi(), random(), random_exponential(),
random_gaussian(), and sqrt(). I was worried that this would be
slower than before, but, if anything, it actually turns out to be
slightly faster, because we now express the built-in pgbench scripts
using fewer lines; each \setrandom can be merged into a subsequent
\set.
Alvaro Herrera [Mon, 28 Mar 2016 22:17:06 +0000 (19:17 -0300)]
PostgresNode: initialize $timed_out if passed
Corrects an oversight in 2c83f435a3 where the $timed_out reference var
isn't initialized; using it would require the caller to initialize it
beforehand, which is cumbersome.
Tom Lane [Mon, 28 Mar 2016 21:19:29 +0000 (17:19 -0400)]
Sync tzload() and tzparse() APIs with IANA release tzcode2016c.
This brings us a bit closer to matching upstream, but since it affects
files outside src/timezone/, we might choose not to back-patch it.
Hence keep it separate from the main update patch.
Tom Lane [Mon, 28 Mar 2016 19:10:17 +0000 (15:10 -0400)]
Sync our copy of the timezone library with IANA release tzcode2016c.
We hadn't done this in about six years, which proves to have been a mistake
because there's been a lot of code churn upstream, making the merge rather
painful. But putting it off any further isn't going to lessen the pain,
and there are at least two incompatible changes that we need to absorb
before someone starts complaining that --with-system-tzdata doesn't work
at all on their platform, or we get blindsided by a tzdata release that
our out-of-date zic can't compile. Last week's "time zone abbreviation
differs from POSIX standard" mess was a wake-up call in that regard.
This is a sufficiently large patch that I'm afraid to back-patch it
immediately, though the foregoing considerations imply that we probably
should do so eventually. For the moment, just put it in HEAD so that
it can get some testing. Maybe we can wait till the end of the 9.6
beta cycle before deeming it okay.
Alvaro Herrera [Mon, 28 Mar 2016 17:12:00 +0000 (14:12 -0300)]
Improve internationalization of messages involving type names
Change the slightly different variations of the message
function FOO must return type BAR
to a single wording, removing the variability in type name so that they
all create a single translation entry; since the type name is not to be
translated, there's no point in it being part of the message anyway.
Also, change them all to use the same quoting convention, namely that
the function name is not to be quoted but the type name is. (I'm not
quite sure why this is so, but it's the clear majority.)
Some similar messages such as "encoding conversion function FOO must ..."
are also changed.
Alvaro Herrera [Mon, 28 Mar 2016 13:57:42 +0000 (10:57 -0300)]
Add missing checks to some of pageinspect's BRIN functions
brin_page_type() and brin_metapage_info() did not enforce being called
by superuser, like other pageinspect functions that take bytea do.
Since they don't verify the passed page thoroughly, it is possible to
use them to read the server memory with a carefully crafted bytea value,
up to a file kilobytes from where the input bytea is located.
Have them throw errors if called by a non-superuser.
Stephen Frost [Mon, 28 Mar 2016 13:03:20 +0000 (09:03 -0400)]
Reset plan->row_security_env and planUserId
In the plancache, we check if the environment we planned the query under
has changed in a way which requires us to re-plan, such as when the user
for whom the plan was prepared changes and RLS is being used (and,
therefore, there may be different policies to apply).
Unfortunately, while those values were set and checked, they were not
being reset when the query was re-planned and therefore, in cases where
we change role, re-plan, and then change role again, we weren't
re-planning again. This leads to potentially incorrect policies being
applied in cases where role-specific policies are used and a given query
is planned under one role and then executed under other roles, which
could happen under security definer functions or when a common user and
query is planned initially and then re-used across multiple SET ROLEs.
Further, extensions which made use of CopyCachedPlan() may suffer from
similar issues as the RLS-related fields were not properly copied as
part of the plan and therefore RevalidateCachedQuery() would copy in the
current settings without invalidating the query.
Fix by using the same approach used for 'search_path', where we set the
correct values in CompleteCachedPlan(), check them early on in
RevalidateCachedQuery() and then properly reset them if re-planning.
Also, copy through the values during CopyCachedPlan().
Pointed out by Ashutosh Bapat. Reviewed by Michael Paquier.
Fix up check for high-bit-set characters, which provoked "comparison is
always true due to limited range of data type" warnings on some compilers,
and was unlike the way we do it elsewhere anyway. Fix omission of "$"
from the set of valid identifier continuation characters. Get rid of
sanitize_text(), which was utterly inconsistent with any other error report
anywhere in the system, and wasn't even well designed on its own terms
(double-quoting the result string without escaping contained double quotes
doesn't seem very well thought out). Fix up error messages, which didn't
follow the message style guidelines very well, and were overly specific in
situations where the actual mistake might not be what they said. Improve
documentation.
(I started out just intending to fix the compiler warning, but the more
I looked at the patch the less I liked it.)
Tom Lane [Sun, 27 Mar 2016 22:21:03 +0000 (18:21 -0400)]
Guard against zero vardata.rel->tuples in estimate_hash_bucketsize().
If the referenced rel was proven empty, we'd compute 0/0 here, which
results in the function returning NaN. That's a bit more serious
than the other zero-divide case. Still, it only seems to be possible
in HEAD, so no back-patch.
Per report from Piotr Stefaniak. I looked through the rest of selfuncs.c
and found no other likely trouble spots.
Tom Lane [Sun, 27 Mar 2016 22:06:53 +0000 (18:06 -0400)]
Clamp adjusted ndistinct to positive integer in estimate_hash_bucketsize().
This avoids a possible divide-by-zero in the following calculation,
and rounding the number to an integer seems like saner behavior anyway.
Assuming IEEE math, the division would yield +Infinity which would get
replaced by 1.0 at the bottom of the function, so nothing really
interesting would ensue; but avoiding divide-by-zero seems like a
good idea on general principles.
Per report from Piotr Stefaniak. No back-patch since this seems
mostly cosmetic.
Andres Freund [Sun, 27 Mar 2016 21:46:25 +0000 (23:46 +0200)]
pg_rewind: fsync target data directory.
Previously pg_rewind did not fsync any files. That's problematic, given
that the target directory is modified. If the database was started
afterwards, 2ce439f33 luckily already caused the data directory to be
synced to disk at postmaster startup; reducing the scope of the problem.
To fix, use initdb -S, at the end of the pg_rewind run. It doesn't seem
worthwhile to duplicate the code into pg_rewind, and initdb -S is
already used that way by pg_upgrade.
Reported-By: Andres Freund
Author: Michael Paquier, somewhat edited by me
Discussion: 20160310034352.iuqgvpmg5qmnxtkz@alap3.anarazel.de
CAB7nPqSytVG1o4S3S2pA1O=692ekurJ+fckW2PywEG3sNw54Ow@mail.gmail.com
Backpatch: 9.5, where pg_rewind was introduced
Andres Freund [Sun, 27 Mar 2016 20:48:31 +0000 (22:48 +0200)]
pg_rewind: Close backup_label file descriptor.
This was a relatively harmless leak, as createBackupLabel() is only
called once per pg_rewind invocation.
Author: Michael Paquier Reported-By: Michael Paquier
Discussion: CAB7nPqRnOw30gOXe2_SPLjh37bgm4V+txbYAPwoXb97nGQ297w@mail.gmail.com
Backpatch: 9.5, where pg_rewind was introduced
Andres Freund [Sun, 27 Mar 2016 14:59:58 +0000 (16:59 +0200)]
Change various Gin*Is* macros to return 0/1.
Returning the direct result of bit arithmetic, in a macro intended to be
used in a boolean manner, can be problematic if the return value is
stored in a variable of type 'bool'. If bool is implemented using C99's
_Bool, that can lead to comparison failures if the variable is then
compared again with the expression (see ginStepRight() for an example
that fails), as _Bool forces the result to be 0/1. That happens in some
configurations of newer MSVC compilers. It's also problematic when
storing the result of such an expression in a narrower type.
Several gin macros have been declared in that style since gin's initial
commit in 8a3631f8d86.
There's a lot more macros like this, but this is the only one causing
regression test failures; and I don't want to commit and backpatch a
larger patch with lots of conflicts just before the next set of minor
releases.
Discussion: 20150811154237.GD17575@awork2.anarazel.de
Backpatch: All supported branches
Tom Lane [Sat, 26 Mar 2016 19:58:44 +0000 (15:58 -0400)]
Modernize zic's test for valid timezone abbreviations.
We really need to sync all of our IANA-derived timezone code with upstream,
but that's going to be a large patch and I certainly don't care to shove
such a thing into stable branches immediately before a release. As a
stopgap, copy just the tzcode2016c logic that checks validity of timezone
abbreviations. This prevents getting multiple "time zone abbreviation
differs from POSIX standard" bleats with tzdata 2014b and later.
Tom Lane [Sat, 26 Mar 2016 16:03:12 +0000 (12:03 -0400)]
Avoid a couple of zero-divide scenarios in the planner.
cost_subplan() supposed that the given subplan must have plan_rows > 0,
which as far as I can tell was true until recent refactoring of the
code in createplan.c; but now that code allows the Result for a provably
empty subquery to have plan_rows = 0. Rather than undo that change,
put in a clamp to prevent zero divide.
get_cheapest_fractional_path() likewise supposed that best_path->rows > 0.
This assumption has been wrong for longer. It's actually harmless given
IEEE float math, because a positive value divided by zero gives +Infinity
and compare_fractional_path_costs() will do the right thing with that.
Still, best not to assume that.
final_cost_nestloop() also seems to have some risks in this area, so
borrow the clamping logic already present in the mergejoin cost functions.
Lastly, remove unnecessary clamp_row_est() in planner.c's calls to
get_number_of_groups(). The only thing that function does with path_rows
is pass it to estimate_num_groups() which already has an internal clamp,
so we don't need the extra call; and if we did, the callers are arguably
the wrong place for it anyway.
First two items reported by Piotr Stefaniak, the others are products
of my nosing around for similar problems. No back-patch since there's
no evidence that problems arise in the back branches.
Tom Lane [Fri, 25 Mar 2016 23:03:08 +0000 (19:03 -0400)]
Update time zone data files to tzdata release 2016c.
DST law changes in Azerbaijan, Chile, Haiti, Palestine, and Russia (Altai,
Astrakhan, Kirov, Sakhalin, Ulyanovsk regions). Historical corrections
for Lithuania, Moldova, Russia (Kaliningrad, Samara, Volgograd).
As of 2015b, the keepers of the IANA timezone database started to use
numeric time zone abbreviations (e.g., "+04") instead of inventing
abbreviations not found in the wild like "ASTT". This causes our rather
old copy of zic to whine "warning: time zone abbreviation differs from
POSIX standard" several times during "make install". This warning is
harmless according to the IANA folk, and I don't see any problems with
these abbreviations in some simple tests; but it seems like now would be
a good time to update our copy of the tzcode stuff. I'll look into that
soon.
Tom Lane [Fri, 25 Mar 2016 20:54:52 +0000 (16:54 -0400)]
Improve PL/Tcl errorCode facility by providing decoded name for SQLSTATE.
We don't really want to encourage people to write numeric SQLSTATEs in
programs; that's unreadable and error-prone. Copy plpgsql's infrastructure
for converting between SQLSTATEs and exception names shown in Appendix A,
and modify examples in tests and documentation to do it that way.
Tom Lane [Fri, 25 Mar 2016 19:52:53 +0000 (15:52 -0400)]
In PL/Tcl, make database errors return additional info in the errorCode.
Tcl has a convention for returning additional info about an error in a
global variable named errorCode. Up to now PL/Tcl has ignored that,
but this patch causes database errors caught by PL/Tcl to fill in
errorCode with useful information from the ErrorData struct.
Tom Lane [Fri, 25 Mar 2016 16:33:16 +0000 (12:33 -0400)]
Fix DROP OPERATOR to reset oprcom/oprnegate links to the dropped operator.
This avoids leaving dangling links in pg_operator; which while fairly
harmless are also unsightly.
While we're at it, simplify OperatorUpd, which went through
heap_modify_tuple for no very good reason considering it had already made
a tuple copy it could just scribble on.
Roma Sokolov, reviewed by Tomas Vondra, additional hacking by Robert Haas
and myself.
Tom Lane [Fri, 25 Mar 2016 15:19:51 +0000 (11:19 -0400)]
Don't split up SRFs when choosing to postpone SELECT output expressions.
In commit 9118d03a8cca3d97 we taught the planner to postpone evaluation of
set-returning functions in a SELECT's targetlist until after any sort done
to satisfy ORDER BY. However, if we postpone some SRFs this way while
others do not get postponed (because they're sort or group key columns)
we will break the traditional behavior by which all SRFs in the tlist run
in-step during ExecTargetList(), so that you get the least common multiple
of their periods not the product. Fix make_sort_input_target() so it will
not split up SRF evaluation in such cases.
There is still a hazard of similar odd behavior if there's a SRF in a
grouping column and another one that isn't, but that was true before
and we're just trying to preserve bug-compatibility with the traditional
behavior. This whole area is overdue to be rethought and reimplemented,
but we'll try to avoid changing behavior until then.
Tom Lane [Fri, 25 Mar 2016 00:45:31 +0000 (20:45 -0400)]
Link libpq after libpgfeutils to satisfy Windows linker.
Some of the non-MSVC Windows buildfarm members seem to need this to avoid
getting "undefined symbol" errors on libpgfeutils' references to libpq.
I could understand that if libpq were a static library, but surely it is
not? Oh well, at least the extra reference is no more harmful than it is
for libpgcommon or libpgport.
Tom Lane [Fri, 25 Mar 2016 00:28:47 +0000 (20:28 -0400)]
Move psql's psqlscan.l into src/fe_utils.
This completes (at least for now) the project of getting rid of ad-hoc
linkages among the src/bin/ subdirectories. Everything they share is now
in src/fe_utils/ and is included from a static library at link time.
A side benefit is that we can restore the FLEX_NO_BACKUP check for
psqlscanslash.l. We might need to think of another way to do that check
if we ever need to build two lexers with that property in the same source
directory, but there's no foreseeable reason to need that.
Tom Lane [Thu, 24 Mar 2016 19:55:44 +0000 (15:55 -0400)]
Create src/fe_utils/, and move stuff into there from pg_dump's dumputils.
Per discussion, we want to create a static library and put the stuff into
it that until now has been shared across src/bin/ directories by ad-hoc
methods like symlinking a source file. This commit creates the library and
populates it with a couple of files that contain the widely-useful portions
of pg_dump's dumputils.c file. dumputils.c survives, because it has some
stuff that didn't seem appropriate for fe_utils, but it's significantly
smaller and is no longer referenced from any other directory.
Follow-on patches will move more stuff into fe_utils.
The Mkvcbuild.pm hacking here is just a best guess; we'll see how the
buildfarm likes it.
Alvaro Herrera [Thu, 24 Mar 2016 02:01:35 +0000 (23:01 -0300)]
Support CREATE ACCESS METHOD
This enables external code to create access methods. This is useful so
that extensions can add their own access methods which can be formally
tracked for dependencies, so that DROP operates correctly. Also, having
explicit support makes pg_dump work correctly.
Currently only index AMs are supported, but we expect different types to
be added in the future.
Authors: Alexander Korotkov, Petr Jelínek Reviewed-By: Teodor Sigaev, Petr Jelínek, Jim Nasby
Commitfest-URL: https://commitfest.postgresql.org/9/353/
Discussion: https://www.postgresql.org/message-id/CAPpHfdsXwZmojm6Dx+TJnpYk27kT4o7Ri6X_4OSWcByu1Rm+VA@mail.gmail.com
Tom Lane [Thu, 24 Mar 2016 00:22:08 +0000 (20:22 -0400)]
Move keywords.c/kwlookup.c into src/common/.
Now that we have src/common/ for code shared between frontend and backend,
we can get rid of (most of) the klugy ways that the keyword table and
keyword lookup code were formerly shared between different uses.
This is a first step towards a more general plan of getting rid of
special-purpose kluges for sharing code in src/bin/.
I chose to merge kwlookup.c back into keywords.c, as it once was, and
always has been so far as keywords.h is concerned. We could have
kept them separate, but there is noplace that uses ScanKeywordLookup
without also wanting access to the backend's keyword list, so there
seems little point.
ecpg is still a bit weird, but at least now the trickiness is documented.
I think that the MSVC build script should require no adjustments beyond
what's done here ... but we'll soon find out.
Robert Haas [Wed, 23 Mar 2016 19:58:34 +0000 (15:58 -0400)]
Disable abbreviated keys for string-sorting in non-C locales.
Unfortunately, every version of glibc thus far tested has bugs whereby
strcoll() ordering does not match strxfrm() ordering as required by
the standard. This can result in, for example, corrupted indexes.
Disabling abbreviated keys in these cases slows down non-C-collation
string sorting considerably, but there seems to be no practical
alternative. Users who are confident that their libc implementations
are solid in this regard can re-enable the optimization by compiling
with TRUST_STRXFRM.
Users who have built indexes using PostgreSQL 9.5 or PostgreSQL 9.5.1
should REINDEX if there is a possibility that they may have been
affected by this problem.
Report by Marc-Olaf Jaschke. Investigation mostly by Tom Lane, with
help from Peter Geoghegan, Noah Misch, Stephen Frost, and me. Patch
by me, reviewed by Peter Geoghegan and Tom Lane.
Robert Haas [Wed, 23 Mar 2016 16:28:01 +0000 (12:28 -0400)]
postgres_fdw: Fix crash when pushing down multiple joins.
A join clause might mention multiple relations on either side, so it
need not be the case that a given joinrel's constituent relations are
all on one side of the join clause or all on the other.
Report by Rajkumar Raghuwanshi. Analysis and fix by Michael Paquier
and Ashutosh Bapat.
Tom Lane [Wed, 23 Mar 2016 15:00:39 +0000 (11:00 -0400)]
Code review for error reports in jsonb_set().
User-facing (even tested by regression tests) error conditions were thrown
with elog(), hence had wrong SQLSTATE and were untranslatable. And the
error message texts weren't up to project style, either.
Tom Lane [Wed, 23 Mar 2016 14:43:13 +0000 (10:43 -0400)]
Fix unsafe use of strtol() on a non-null-terminated Text datum.
jsonb_set() could produce wrong answers or incorrect error reports, or in
the worst case even crash, when trying to convert a path-array element into
an integer for use as an array subscript. Per report from Vitaly Burovoy.
Back-patch to 9.5 where the faulty code was introduced (in commit c6947010ceb42143).
Tom Lane [Tue, 22 Mar 2016 21:56:06 +0000 (17:56 -0400)]
Fix EvalPlanQual bug when query contains both locked and not-locked rels.
In commit afb9249d06f47d7a, we (probably I) made ExecLockRows assign
null test tuples to all relations of the query while setting up to do an
EvalPlanQual recheck for a newly-updated locked row. This was sheerest
brain fade: we should only set test tuples for relations that are lockable
by the LockRows node, and in particular empty test tuples are only sensible
for inheritance child relations that weren't the source of the current
tuple from their inheritance tree. Setting a null test tuple for an
unrelated table causes it to return NULLs when it should not, as exhibited
in bug #14034 from Bronislav Houdek. To add insult to injury, doing it the
wrong way required two loops where one would suffice; so the corrected code
is even a bit shorter and faster.
Add a regression test case based on his example, and back-patch to 9.5
where the bug was introduced.
Tom Lane [Mon, 21 Mar 2016 22:34:18 +0000 (18:34 -0400)]
Allow the delay in psql's \watch command to be a fractional second.
Instead of just "2" seconds, allow eg. "2.5" seconds. Per request
from Alvaro Herrera. No docs change since the docs didn't say you
couldn't do this already.
Tom Lane [Mon, 21 Mar 2016 22:18:13 +0000 (18:18 -0400)]
Improve header output from psql's \watch command.
Include the \pset title string if there is one, and shorten the prefab
part of the header to be "timestamp (every Ns)". Per suggestion by
David Johnston.
Tom Lane [Mon, 21 Mar 2016 15:59:49 +0000 (11:59 -0400)]
Clean up some Coverity complaints about commit 0bf3ae88af330496.
The two get_tle_by_resno() calls introduced by this commit lacked any
check for a NULL return, unlike any other calls of that function anywhere
in our tree. Coverity quite properly complained about it. Also fix a
misindented line in process_query_params(), which Coverity also complained
about on the grounds that the bad indentation suggested possible programmer
misinterpretation.
Robert Haas [Mon, 21 Mar 2016 13:20:53 +0000 (09:20 -0400)]
Support parallel aggregation.
Parallel workers can now partially aggregate the data and pass the
transition values back to the leader, which can combine the partial
results to produce the final answer.
David Rowley, based on earlier work by Haribabu Kommi. Reviewed by
Álvaro Herrera, Tomas Vondra, Amit Kapila, James Sewell, and me.
Andres Freund [Mon, 21 Mar 2016 08:56:39 +0000 (09:56 +0100)]
Introduce WaitEventSet API.
Commit ac1d794 ("Make idle backends exit if the postmaster dies.")
introduced a regression on, at least, large linux systems. Constantly
adding the same postmaster_alive_fds to the OSs internal datastructures
for implementing poll/select can cause significant contention; leading
to a performance regression of nearly 3x in one example.
This can be avoided by using e.g. linux' epoll, which avoids having to
add/remove file descriptors to the wait datastructures at a high rate.
Unfortunately the current latch interface makes it hard to allocate any
persistent per-backend resources.
Replace, with a backward compatibility layer, WaitLatchOrSocket with a
new WaitEventSet API. Users can allocate such a Set across multiple
calls, and add more than one file-descriptor to wait on. The latter has
been added because there's upcoming postgres features where that will be
helpful.
In addition to the previously existing poll(2), select(2),
WaitForMultipleObjects() implementations also provide an epoll_wait(2)
based implementation to address the aforementioned performance
problem. Epoll is only available on linux, but that is the most likely
OS for machines large enough (four sockets) to reproduce the problem.
To actually address the aforementioned regression, create and use a
long-lived WaitEventSet for FE/BE communication. There are additional
places that would benefit from a long-lived set, but that's a task for
another day.
Thanks to Amit Kapila, who helped make the windows code I blindly wrote
actually work.
Andres Freund [Mon, 21 Mar 2016 08:56:39 +0000 (09:56 +0100)]
Combine win32 and unix latch implementations.
Previously latches for windows and unix had been implemented in
different files. A later patch introduce an expanded wait
infrastructure, keeping the implementation separate would introduce too
much duplication.
This basically just moves the functions, without too much change. The
reason to keep this separate is that it allows blame to continue working
a little less badly; and to make review a tiny bit easier.
Tom Lane [Mon, 21 Mar 2016 01:59:03 +0000 (21:59 -0400)]
Use %option bison-bridge in psql/pgbench lexers.
The point of this change is to use %pure-parser in pgbench's exprparse.y.
The immediate reason is that it turns out very ancient versions of bison
have a bug with the combination of a reentrant lexer and non-reentrant
parser. We could consider dropping support for such ancient bisons; but
considering that we might well need exprparse.y to be reentrant some day,
it seems better to make it so right now than to move the portability
goalposts. (AFAICT there's no particular performance consequence to this
change, either, so there's no good reason not to do it.)
Now, %pure-parser assumes that the called lexer is built with %option
bison-bridge. Because we're assuming bitwise compatibility of yyscan_t
(yyguts_t) data structures among all the psql/pgbench lexers, that
requirement propagates back to psql's lexers as well. But it's just a
few lines of change on that side too; and if psqlscan.l is to set the
baseline for a possibly-large family of lexers, it should err on the
side of including not omitting useful features.
pgbench now needs to use src/bin/psql/psqlscan.l, but it's not very clear
how to fit that into the MSVC build system. If this doesn't work I'm going
to need some help from somebody who actually understands those scripts ...
Tom Lane [Sun, 20 Mar 2016 16:58:44 +0000 (12:58 -0400)]
SQL commands in pgbench scripts are now ended by semicolons, not newlines.
To allow multiline SQL commands in scripts, adopt the same rules psql uses
to decide what is the end of a SQL command, to wit, an unquoted semicolon
not encased in parentheses. Do this by importing the same flex lexer that
psql uses, since coping with stuff like dollar-quoted literals is hard to
get right without going the full nine yards.
This makes use of the infrastructure added in commit 0ea9efbe9ec1bf07 to
support independently-written flex lexers scanning the same PsqlScanState
input-buffer data structure. Since that infrastructure isn't very
friendly to ad-hoc parsing code such as strtok(), improve exprscan.l
so that it can parse either whitespace-separated words or expression
tokens, on demand, and rewrite pgbench.c's backslash-command parsing
code to always use the lexer to fetch tokens.
It's still the case that pgbench backslash commands extend to the end
of the line, no more and no less. That could be changed in a fairly
localized way now, and there was some interest in doing so, but it
seems like material for a separate patch.
In passing, make some marginal cleanups in syntax error reporting,
const-ify a few data structures that could use it, and run some of
this code through pgindent.
I can't tell whether the MSVC build scripts need to be taught explicitly
about the changes here or not, but the buildfarm will soon tell us.
Andrew Dunstan [Sat, 19 Mar 2016 22:36:35 +0000 (18:36 -0400)]
Remove dependency on psed for MSVC builds.
Modern Perl has removed psed from its core distribution, so it might not
be readily available on some build platforms. We therefore replace its
use with a Perl script generated by s2p, which is equivalent to the sed
script. The latter is retained for non-MSVC builds to avoid creating a
new hard dependency on Perl for non-Windows tarball builds.