Fix race between GetNewTransactionId and GetOldestActiveTransactionId.
The race condition goes like this:
1. GetNewTransactionId advances nextXid e.g. from 100 to 101
2. GetOldestActiveTransactionId reads the new nextXid, 101
3. GetOldestActiveTransactionId loops through the proc array. There are no
active XIDs there, so it returns 101 as the oldest active XID.
4. GetNewTransactionid stores XID 100 to MyPgXact->xid
So, GetOldestActiveTransactionId returned XID 101, even though 100 only
just started and is surely still running.
This would be hard to hit in practice, and even harder to spot any ill
effect if it happens. GetOldestActiveTransactionId is only used when
creating a checkpoint in a master server, and the race condition can only
happen on an online checkpoint, as there are no backends running during a
shutdown checkpoint. The oldestActiveXid value of an online checkpoint is
only used when starting up a hot standby server, to determine the starting
point where pg_subtrans is initialized from. For the race condition to
happen, there must be no other XIDs in the proc array that would hold back
the oldest-active XID value, which means that the missed XID must be a top
transaction's XID. However, pg_subtrans is not used for top XIDs, so I
believe an off-by-one error is in fact inconsequential. Nevertheless, let's
fix it, as it's clearly wrong and the fix is simple.
This has been wrong ever since hot standby was introduced, so backport to
all supported versions.
Tom Lane [Wed, 12 Jul 2017 22:00:04 +0000 (18:00 -0400)]
Fix ruleutils.c for domain-over-array cases, too.
Further investigation shows that ruleutils isn't quite up to speed either
for cases where we have a domain-over-array: it needs to be prepared to
look past a CoerceToDomain at the top level of field and element
assignments, else it decompiles them incorrectly. Potentially this would
result in failure to dump/reload a rule, if it looked like the one in the
new test case. (I also added a test for EXPLAIN; that output isn't broken,
but clearly we need more test coverage here.)
Like commit b1cb32fb6, this bug is reachable in cases we already support,
so back-patch all the way.
Reduce memory usage of tsvector type analyze function.
compute_tsvector_stats() detoasted and kept in memory every tsvector value
in the sample, but that can be a lot of memory. The original bug report
described a case using over 10 gigabytes, with statistics target of 10000
(the maximum).
To fix, allocate a separate copy of just the lexemes that we keep around,
and free the detoasted tsvector values as we go. This adds some palloc/pfree
overhead, when you have a lot of distinct lexemes in the sample, but it's
better than running out of memory.
Fixes bug #14654 reported by James C. Reviewed by Tom Lane. Backport to
all supported versions.
Tom Lane [Tue, 11 Jul 2017 20:48:59 +0000 (16:48 -0400)]
Fix multiple assignments to a column of a domain type.
We allow INSERT and UPDATE commands to assign to the same column more than
once, as long as the assignments are to subfields or elements rather than
the whole column. However, this failed when the target column was a domain
over array rather than plain array. Fix by teaching process_matched_tle()
to look through CoerceToDomain nodes, and add relevant test cases.
Also add a group of test cases exercising domains over array of composite.
It's doubtless accidental that CREATE DOMAIN allows this case while not
allowing straight domain over composite; but it does, so we'd better make
sure we don't break it. (I could not find any documentation mentioning
either side of that, so no doc changes.)
It's been like this for a long time, so back-patch to all supported
branches.
Tom Lane [Mon, 10 Jul 2017 15:00:09 +0000 (11:00 -0400)]
On Windows, retry process creation if we fail to reserve shared memory.
We've heard occasional reports of backend launch failing because
pgwin32_ReserveSharedMemoryRegion() fails, indicating that something
has already used that address space in the child process. It's not
very clear what, given that we disable ASLR in Windows builds, but
suspicion falls on antivirus products. It'd be better if we didn't
have to disable ASLR, anyway. So let's try to ameliorate the problem
by retrying the process launch after such a failure, up to 100 times.
Patch by me, based on previous work by Amit Kapila and others.
This is a longstanding issue, so back-patch to all supported branches.
Tom Lane [Mon, 10 Jul 2017 04:08:19 +0000 (00:08 -0400)]
Doc: clarify wording about tool requirements in sourcerepo.sgml.
Original wording had confusingly vague antecedent for "they", so replace
that with a more repetitive but clearer formulation. In passing, make the
link to the installation requirements section more specific. Per gripe
from Martin Mai, though this is not the fix he initially proposed.
Tom Lane [Wed, 28 Jun 2017 16:30:16 +0000 (12:30 -0400)]
Second try at fixing tcp_keepalives_idle option on Solaris.
Buildfarm evidence shows that TCP_KEEPALIVE_THRESHOLD doesn't exist
after all on Solaris < 11. This means we need to take positive action to
prevent the TCP_KEEPALIVE code path from being taken on that platform.
I've chosen to limit it with "&& defined(__darwin__)", since it's unclear
that anyone else would follow Apple's precedent of spelling the symbol
that way.
Also, follow a suggestion from Michael Paquier of eliminating code
duplication by defining a couple of intermediate symbols for the
socket option.
In passing, make some effort to reduce the number of translatable messages
by replacing "setsockopt(foo) failed" with "setsockopt(%s) failed", etc,
throughout the affected files. And update relevant documentation so
that it doesn't claim to provide an exhaustive list of the possible
socket option names.
Like the previous commit (f0256c774), back-patch to all supported branches.
Tom Lane [Tue, 27 Jun 2017 22:47:57 +0000 (18:47 -0400)]
Support tcp_keepalives_idle option on Solaris.
Turns out that the socket option for this is named TCP_KEEPALIVE_THRESHOLD,
at least according to the tcp(7P) man page for Solaris 11. (But since that
text refers to "SunOS", it's likely pretty ancient.) It appears that the
symbol TCP_KEEPALIVE does get defined on that platform, but it doesn't
seem to represent a valid protocol-level socket option. This leads to
bleats in the postmaster log, and no tcp_keepalives_idle functionality.
Per bug #14720 from Andrey Lizenko, as well as an earlier report from
Dhiraj Chawla that nobody had followed up on. The issue's been there
since we added the TCP_KEEPALIVE code path in commit 5acd417c8, so
back-patch to all supported branches.
Tom Lane [Mon, 26 Jun 2017 21:31:56 +0000 (17:31 -0400)]
Don't lose walreceiver start requests due to race condition in postmaster.
When a walreceiver dies, the startup process will notice that and send
a PMSIGNAL_START_WALRECEIVER signal to the postmaster, asking for a new
walreceiver to be launched. There's a race condition, which at least
in HEAD is very easy to hit, whereby the postmaster might see that
signal before it processes the SIGCHLD from the walreceiver process.
In that situation, sigusr1_handler() just dropped the start request
on the floor, reasoning that it must be redundant. Eventually, after
10 seconds (WALRCV_STARTUP_TIMEOUT), the startup process would make a
fresh request --- but that's a long time if the connection could have
been re-established almost immediately.
Fix it by setting a state flag inside the postmaster that we won't
clear until we do launch a walreceiver. In cases where that results
in an extra walreceiver launch, it's up to the walreceiver to realize
it's unwanted and go away --- but we have, and need, that logic anyway
for the opposite race case.
I came across this through investigating unexpected delays in the
src/test/recovery TAP tests: it manifests there in test cases where
a master server is stopped and restarted while leaving streaming
slaves active.
This logic has been broken all along, so back-patch to all supported
branches.
Tom Lane [Mon, 26 Jun 2017 20:17:06 +0000 (16:17 -0400)]
Ignore old stats file timestamps when starting the stats collector.
The stats collector disregards inquiry messages that bear a cutoff_time
before when it last wrote the relevant stats file. That's fine, but at
startup when it reads the "permanent" stats files, it absorbed their
timestamps as if they were the times at which the corresponding temporary
stats files had been written. In reality, of course, there's no data
out there at all. This led to disregarding inquiry messages soon after
startup if the postmaster had been shut down and restarted within less
than PGSTAT_STAT_INTERVAL; which is a pretty common scenario, both for
testing and in the field. Requesting backends would hang for 10 seconds
and then report failure to read statistics, unless they got bailed out
by some other backend coming along and making a newer request within
that interval.
I came across this through investigating unexpected delays in the
src/test/recovery TAP tests: it manifests there because the autovacuum
launcher hangs for 10 seconds when it can't get statistics at startup,
thus preventing a second shutdown from occurring promptly. We might
want to do some things in the autovac code to make it less prone to
getting stuck that way, but this change is a good bug fix regardless.
In passing, also fix pgstat_read_statsfiles() to ensure that it
re-zeroes its global stats variables if they are corrupted by a
short read from the stats file. (Other reads in that function
go into temp variables, so that the issue doesn't arise.)
This has been broken since we created the separation between permanent
and temporary stats files in 8.4, so back-patch to all supported branches.
Andres Freund [Wed, 21 Jun 2017 20:51:40 +0000 (13:51 -0700)]
Fix possibility of creating a "phantom" segment after promotion.
When promoting a standby just after a XLOG_SWITCH record was replayed,
and next segment(s) are already are locally available (via walsender,
restore_command + trigger/recovery target), that segment could
accidentally be recycled onto the past of the new timeline. Later
checkpointer would create a .ready file for it, assuming there was an
error during creation, and it would get archived. That causes trouble
if another standby is later brought up from a basebackup from before
the timeline creation, because it would try to read the
segment, because XLogFileReadAnyTLI just tries all possible timelines,
which doesn't have valid contents. Thus replay would fail.
The problem, if already occurred, can be fixed by removing the segment
and/or having restore_command filter it out.
The reason for the creation of such "phantom" segments was, that after
an XLOG_SWITCH record the EndOfLog variable points to the beginning of
the next segment, and RemoveXlogFile() used XLByteToPrevSeg().
Normally RemoveXlogFile() doing so is harmless, because the last
segment will still exist preventing InstallXLogFileSegment() from
causing harm, but just after promotion there's no previous segment on
the new timeline.
Fix that by using XLByteToSeg() instead of XLByteToPrevSeg().
Author: Andres Freund Reported-By: Greg Burek
Discussion: https://postgr.es/m/20170619073026.zcwpe6mydsaz5ygd@alap3.anarazel.de
Backpatch: 9.2-, bug older than all supported versions
Bruce Momjian [Tue, 20 Jun 2017 17:20:02 +0000 (13:20 -0400)]
pg_upgrade: start/stop new server after pg_resetwal
When commit 0f33a719fdbb5d8c43839ea0d2c90cd03e2af2d2 removed the
instructions to start/stop the new cluster before running rsync, it was
now possible for pg_resetwal/pg_resetxlog to leave the final WAL record
at wal_level=minimum, preventing upgraded standby servers from
reconnecting.
This patch fixes that by having pg_upgrade unconditionally start/stop
the new cluster after pg_resetwal/pg_resetxlog has run.
Backpatch through 9.2 since, though the instructions were added in PG
9.5, they worked all the way back to 9.2.
Tom Lane [Mon, 19 Jun 2017 15:02:45 +0000 (11:02 -0400)]
On Windows, make pg_dump use binary mode for compressed plain text output.
The combination of -Z -Fp and output to stdout resulted in corrupted
output data, because we left stdout in text mode, resulting in newline
conversion being done on the compressed stream. Switch stdout to binary
mode for this case, at the same place where we do it for non-text output
formats.
Report and patch by Kuntal Ghosh, tested by Ashutosh Sharma and Neha
Sharma. Back-patch to all supported branches.
Fix dependency, when changing a function's argument/return type.
When a new base type is created using the old-style procedure of first
creating the input/output functions with "opaque" in place of the base
type, the "opaque" argument/return type is changed to the final base type,
on CREATE TYPE. However, we did not create a pg_depend record when doing
that, so the functions were left not depending on the type.
Tom Lane [Thu, 15 Jun 2017 19:03:39 +0000 (15:03 -0400)]
Fix low-probability leaks of PGresult objects in the backend.
We had three occurrences of essentially the same coding pattern
wherein we tried to retrieve a query result from a libpq connection
without blocking. In the case where PQconsumeInput failed (typically
indicating a lost connection), all three loops simply gave up and
returned, forgetting to clear any previously-collected PGresult
object. Since those are malloc'd not palloc'd, the oversight results
in a process-lifespan memory leak.
One instance, in libpqwalreceiver, is of little significance because
the walreceiver process would just quit anyway if its connection fails.
But we might as well fix it.
The other two instances, in postgres_fdw, are somewhat more worrisome
because at least in principle the scenario could be repeated, allowing
the amount of memory leaked to build up to something worth worrying
about. Moreover, in these cases the loops contain CHECK_FOR_INTERRUPTS
calls, as well as other calls that could potentially elog(ERROR),
providing another way to exit without having cleared the PGresult.
Here we need to add PG_TRY logic similar to what exists in quite a
few other places in postgres_fdw.
Coverity noted the libpqwalreceiver bug; I found the other two cases
by checking all calls of PQconsumeInput.
Back-patch to all supported versions as appropriate (9.2 lacks
postgres_fdw, so this is really quite unexciting for that branch).
Tatsuo Ishii [Thu, 15 Jun 2017 01:01:39 +0000 (10:01 +0900)]
Fix document bug regarding read only transactions.
It was explained that read only transactions (not in standby) allow to
update sequences. This had been wrong since the commit: 05d8a561ff85db1545f5768fe8d8dc9d99ad2ef7
Andres Freund [Tue, 6 Jun 2017 01:53:42 +0000 (18:53 -0700)]
Unify SIGHUP handling between normal and walsender backends.
Because walsender and normal backends share the same main loop it's
problematic to have two different flag variables, set in signal
handlers, indicating a pending configuration reload. Only certain
walsender commands reach code paths checking for the
variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT
... LOGICAL, notably not base backups).
This is a bug present since the introduction of walsender, but has
gotten worse in releases since then which allow walsender to do more.
A later patch, not slated for v10, will similarly unify SIGHUP
handling in other types of processes as well.
Author: Petr Jelinek, Andres Freund Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de
Backpatch: 9.2-, bug is present since 9.0
Tom Lane [Thu, 1 Jun 2017 17:32:56 +0000 (13:32 -0400)]
Always use -fPIC, not -fpic, when building shared libraries with gcc.
On some platforms, -fpic fails for sufficiently large shared libraries.
We've mostly not hit that boundary yet, but there are some extensions
such as Citus and pglogical where it's becoming a problem. A bit of
research suggests that the penalty for -fPIC is small, in the
single-digit-percentage range --- and there's none at all on popular
platforms such as x86_64. So let's just default to -fPIC everywhere
and provide one less thing for extension developers to worry about.
Per complaint from Christoph Berg. Back-patch to all supported branches.
(I did not bother to touch the recently-removed Makefiles for sco and
unixware in the back branches, though. We'd have no way to test that
it doesn't break anything on those platforms.)
Tom Lane [Mon, 29 May 2017 21:08:16 +0000 (17:08 -0400)]
Prevent running pg_resetwal/pg_resetxlog against wrong-version data dirs.
pg_resetwal (formerly pg_resetxlog) doesn't insist on finding a matching
version number in pg_control, and that seems like an important thing to
preserve since recovering from corrupt pg_control is a prime reason to
need to run it. However, that means you can try to run it against a
data directory of a different major version, which is at best useless
and at worst disastrous. So as to provide some protection against that
type of pilot error, inspect PG_VERSION at startup and refuse to do
anything if it doesn't match. PG_VERSION is read-only after initdb,
so it's unlikely to get corrupted, and even if it were corrupted it would
be easy to fix by hand.
This hazard has been there all along, so back-patch to all supported
branches.
Tom Lane [Mon, 29 May 2017 19:19:07 +0000 (15:19 -0400)]
Allow NumericOnly to be "+ FCONST".
The NumericOnly grammar production accepted ICONST, + ICONST, - ICONST,
FCONST, and - FCONST, but for some reason not + FCONST. This led to
strange inconsistencies like
regression=# set random_page_cost = +4;
SET
regression=# set random_page_cost = 4000000000;
SET
regression=# set random_page_cost = +4000000000;
ERROR: syntax error at or near "4000000000"
(because 4000000000 is too large to be an ICONST). While there's
no actual functional reason to need to write a "+", if we allow
it for integers it seems like we should allow it for numerics too.
It's been like that forever, so back-patch to all supported branches.
Tom Lane [Fri, 26 May 2017 19:16:59 +0000 (15:16 -0400)]
Move autogenerated array types out of the way during ALTER ... RENAME.
Commit 9aa3c782c added code to allow CREATE TABLE/CREATE TYPE to not fail
when the desired type name conflicts with an autogenerated array type, by
dint of renaming the array type out of the way. But I (tgl) overlooked
that the same case arises in ALTER TABLE/TYPE RENAME. Fix that too.
Back-patch to all supported branches.
Report and patch by Vik Fearing, modified a bit by me
Tom Lane [Fri, 26 May 2017 16:51:06 +0000 (12:51 -0400)]
Fix pg_dump to not emit invalid SQL for an empty operator class.
If an operator class has no operators or functions, and doesn't need
a STORAGE clause, we emitted "CREATE OPERATOR CLASS ... AS ;" which
is syntactically invalid. Fix by forcing a STORAGE clause to be
emitted anyway in this case.
(At some point we might consider changing the grammar to allow CREATE
OPERATOR CLASS without an opclass_item_list. But probably we'd want to
omit the AS in that case, so that wouldn't fix this pg_dump issue anyway.)
It's been like this all along, so back-patch to all supported branches.
Daniel Gustafsson, tweaked by me to avoid a dangling-pointer bug
Tom Lane [Wed, 24 May 2017 19:28:35 +0000 (15:28 -0400)]
Tighten checks for whitespace in functions that parse identifiers etc.
This patch replaces isspace() calls with scanner_isspace() in functions
that are likely to be presented with non-ASCII input. isspace() has
the small advantage that it will correctly recognize no-break space
in single-byte encodings (such as LATIN1); but it cannot work successfully
for any multibyte character, and depending on platform it might return
false positive results for some fragments of multibyte characters. That's
disastrous for functions that are trying to discard whitespace between
valid strings, as noted in bug #14662 from Justin Muise. Even treating
no-break space as whitespace is pretty questionable for the usages touched
here, because the core scanner would think it is an identifier character.
Affected functions are parse_ident(), parseNameAndArgTypes (underlying
regprocedurein() and siblings), SplitIdentifierString (used for parsing
GUCs and options that are qualified names or lists of names), and
SplitDirectoriesString (used for parsing GUCs that are lists of
directories).
All the functions adjusted here are parsing SQL identifiers and similar
constructs, so it's reasonable to insist that their definition of
whitespace match the core scanner. So we can hope that this won't cause
many backwards-compatibility problems. I've left alone isspace() calls
in places that aren't really expecting any non-ASCII input characters,
such as float8in().
Tom Lane [Sun, 21 May 2017 17:05:17 +0000 (13:05 -0400)]
Fix precision and rounding issues in money multiplication and division.
The cash_div_intX functions applied rint() to the result of the division.
That's not merely useless (because the result is already an integer) but
it causes precision loss for values larger than 2^52 or so, because of
the forced conversion to float8.
On the other hand, the cash_mul_fltX functions neglected to apply rint() to
their multiplication results, thus possibly causing off-by-one outputs.
Per C standard, arithmetic between any integral value and a float value is
performed in float format. Thus, cash_mul_flt4 and cash_div_flt4 produced
answers good to only about six digits, even when the float value is exact.
We can improve matters noticeably by widening the float inputs to double.
(It's tempting to consider using "long double" arithmetic if available,
but that's probably too much of a stretch for a back-patched fix.)
Also, document that cash_div_intX operators truncate rather than round.
Per bug #14663 from Richard Pistole. Back-patch to all supported branches.
Tom Lane [Wed, 17 May 2017 16:24:19 +0000 (12:24 -0400)]
Make psql handle EOF during COPY FROM STDIN properly on all platforms.
When stdin is a terminal, it's possible to end a COPY FROM STDIN with
a keyboard EOF signal (typically control-D), and then keep on issuing
SQL commands. One would expect another COPY FROM STDIN to work as well,
but on some platforms it did not. This turns out to be because we were
not resetting the stream's feof() flag, and BSD-ish versions of fread()
and fgets() won't attempt to read more data if that's set.
The misbehavior is observed on BSDen (including macOS), but not Linux,
Windows, or SysV-ish Unixen, which makes this a portability bug not
just a missing feature.
Add a clearerr() call to fix the behavior, and improve the prompt that's
issued when copying from a TTY to mention that EOF signals work.
It's been like this forever, so back-patch to all supported branches.
Andrew Dunstan [Fri, 12 May 2017 14:17:54 +0000 (10:17 -0400)]
Add libxml2 include path for MSVC builds
On Unix this path is detected via the use of xml2-config, but that's not
available on Windows. This means that users building with libxml2 will
no longer need to move things around from the standard libxml2
installation for MSVC builds.
Noah Misch [Mon, 8 May 2017 14:24:24 +0000 (07:24 -0700)]
Match pg_user_mappings limits to information_schema.user_mapping_options.
Both views replace the umoptions field with NULL when the user does not
meet qualifications to see it. They used different qualifications, and
pg_user_mappings documented qualifications did not match its implemented
qualifications. Make its documentation and implementation match those
of user_mapping_options. One might argue for stronger qualifications,
but these have long, documented tenure. pg_user_mappings has always
exhibited this problem, so back-patch to 9.2 (all supported versions).
Michael Paquier and Feike Steenbergen. Reviewed by Jeff Janes.
Reported by Andrew Wheelwright.
Add security checks to selectivity estimation functions
Some selectivity estimation functions run user-supplied operators over
data obtained from pg_statistic without security checks, which allows
those operators to leak pg_statistic data without having privileges on
the underlying tables. Fix by checking that one of the following is
satisfied: (1) the user has table or column privileges on the table
underlying the pg_statistic data, or (2) the function implementing the
user-supplied operator is leak-proof. If neither is satisfied, planning
will proceed as if there are no statistics available.
At least one of these is satisfied in most cases in practice. The only
situations that are negatively impacted are user-defined or
not-leak-proof operators on a security-barrier view.
Reported-by: Robert Haas <robertmhaas@gmail.com>
Author: Peter Eisentraut <peter_e@gmx.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Tom Lane [Sun, 7 May 2017 16:33:12 +0000 (12:33 -0400)]
Guard against null t->tm_zone in strftime.c.
The upstream IANA code does not guard against null TM_ZONE pointers in this
function, but in our code there is such a check in the other pre-existing
use of t->tm_zone. We do have some places that set pg_tm.tm_zone to NULL.
I'm not entirely sure it's possible to reach strftime with such a value,
but I'm not sure it isn't either, so be safe.
Tom Lane [Sun, 7 May 2017 15:57:41 +0000 (11:57 -0400)]
Install the "posixrules" timezone link in MSVC builds.
Somehow, we'd missed ever doing this. The consequences aren't too
severe: basically, the timezone library would fall back on its hardwired
notion of the DST transition dates to use for a POSIX-style zone name,
rather than obeying US/Eastern which is the intended behavior. The net
effect would only be to obey current US DST law further back than it
ought to apply; so it's not real surprising that nobody noticed.
Tom Lane [Sun, 7 May 2017 15:34:31 +0000 (11:34 -0400)]
Restore fullname[] contents before falling through in pg_open_tzfile().
Fix oversight in commit af2c5aa88: if the shortcut open() doesn't work,
we need to reset fullname[] to be just the name of the toplevel tzdata
directory before we fall through into the pre-existing code. This failed
to be exposed in my (tgl's) testing because the fall-through path is
actually never taken under normal circumstances.
Alvaro Herrera [Fri, 5 May 2017 15:05:34 +0000 (12:05 -0300)]
Allow MSVC to build with Tcl 8.6.
Commit eaba54c20c5 added support for Tcl 8.6 for configure-supported
platforms after verifying that pltcl works without further changes, but
the MSVC tooling wasn't updated accordingly. Update MSVC to match,
restructuring the code to avoid duplicating the logic for every Tcl
version supported.
Backpatch to all live branches, like eaba54c20c5. In 9.4 and previous,
change the patch to use backslashes rather than forward, as in the rest
of the file.
Reported by Paresh More, who also tested the patch I provided.
Discussion: https://postgr.es/m/CAAgiCNGVw3ssBtSi3ZNstrz5k00ax=UV+_ZEHUeW_LMSGL2sew@mail.gmail.com
Give nicer error message when connecting to a v10 server requiring SCRAM.
This is just to give the user a hint that they need to upgrade, if they try
to connect to a v10 server that uses SCRAM authentication, with an older
client.
Tom Lane [Wed, 3 May 2017 01:50:35 +0000 (21:50 -0400)]
Improve performance of timezone loading, especially pg_timezone_names view.
tzparse() would attempt to load the "posixrules" timezone database file on
each call. That might seem like it would only be an issue when selecting a
POSIX-style zone name rather than a zone defined in the timezone database,
but it turns out that each zone definition file contains a POSIX-style zone
string and tzload() will call tzparse() to parse that. Thus, when scanning
the whole timezone file tree as we do in the pg_timezone_names view,
"posixrules" was read repetitively for each zone definition file. Fix
that by caching the file on first use within any given process. (We cache
other zone definitions for the life of the process, so there seems little
reason not to cache this one as well.) This probably won't help much in
processes that never run pg_timezone_names, but even one additional SET
of the timezone GUC would come out ahead.
An even worse problem for pg_timezone_names is that pg_open_tzfile()
has an inefficient way of identifying the canonical case of a zone name:
it basically re-descends the directory tree to the zone file. That's not
awful for an individual "SET timezone" operation, but it's pretty horrid
when we're inspecting every zone in the database. And it's pointless too
because we already know the canonical spelling, having just read it from
the filesystem. Fix by teaching pg_open_tzfile() to avoid the directory
search if it's not asked for the canonical name, and backfilling the
proper result in pg_tzenumerate_next().
In combination these changes seem to make the pg_timezone_names view
about 3x faster to read, for me. Since a scan of pg_timezone_names
has up to now been one of the slowest queries in the regression tests,
this should help some little bit for buildfarm cycle times.
Back-patch to all supported branches, not so much because it's likely
that users will care much about the view's performance as because
tracking changes in the upstream IANA timezone code is really painful
if we don't keep all the branches in sync.
Tom Lane [Tue, 2 May 2017 22:05:54 +0000 (18:05 -0400)]
Ensure commands in extension scripts see the results of preceding DDL.
Due to a missing CommandCounterIncrement() call, parsing of a non-utility
command in an extension script would not see the effects of the immediately
preceding DDL command, unless that command's execution ends with
CommandCounterIncrement() internally ... which some do but many don't.
Report by Philippe Beaudoin, diagnosis by Julien Rouhaud.
Rather remarkably, this bug has evaded detection since extensions were
invented, so back-patch to all supported branches.
Tom Lane [Mon, 1 May 2017 15:52:59 +0000 (11:52 -0400)]
Update time zone data files to tzdata release 2017b.
DST law changes in Chile, Haiti, and Mongolia. Historical corrections for
Ecuador, Kazakhstan, Liberia, and Spain.
The IANA crew continue their campaign to replace invented time zone
abbrevations with numeric GMT offsets. This update changes numerous zones
in South America, the Pacific and Indian oceans, and some Asian and Middle
Eastern zones. I kept these abbreviations in the tznames/ data files,
however, so that we will still accept them for input. (We may want to
start trimming those files someday, but I think we should wait for the
upstream dust to settle before deciding what to do.)
In passing, add MESZ (Mitteleuropaeische Sommerzeit) to the tznames lists;
since we accept MEZ (Mitteleuropaeische Zeit) it seems rather strange not
to take the other one. And fix some incorrect, or at least obsolete,
comments that certain abbreviations are not traceable to the IANA data.
Tom Lane [Sun, 30 Apr 2017 19:13:51 +0000 (15:13 -0400)]
Sync our copy of the timezone library with IANA release tzcode2017b.
zic no longer mishandles some transitions in January 2038 when it
attempts to work around Qt bug 53071. This fixes a bug affecting
Pacific/Tongatapu that was introduced in zic 2016e. localtime.c
now contains a workaround, useful when loading a file generated by
a buggy zic.
There are assorted cosmetic changes as well, notably relocation
of a bunch of #defines.
Robert Haas [Fri, 28 Apr 2017 18:48:38 +0000 (14:48 -0400)]
Fix VALIDATE CONSTRAINT to consider NO INHERIT attribute.
Currently, trying to validate a NO INHERIT constraint on the parent will
search for the constraint in child tables (where it is not supposed to
exist), wrongly causing a "constraint does not exist" error.
Tom Lane [Sun, 23 Apr 2017 17:10:58 +0000 (13:10 -0400)]
Fix order of arguments to SubTransSetParent().
ProcessTwoPhaseBuffer (formerly StandbyRecoverPreparedTransactions)
mixed up the parent and child XIDs when calling SubTransSetParent to
record the transactions' relationship in pg_subtrans.
Remarkably, analysis by Simon Riggs suggests that this doesn't lead to
visible problems (at least, not in non-Assert builds). That might
explain why we'd not noticed it before. Nonetheless, it's surely wrong.
This code was born broken, so back-patch to all supported branches.
Tom Lane [Fri, 21 Apr 2017 19:55:56 +0000 (15:55 -0400)]
Avoid depending on non-POSIX behavior of fcntl(2).
The POSIX standard does not say that the success return value for
fcntl(F_SETFD) and fcntl(F_SETFL) is zero; it says only that it's not -1.
We had several calls that were making the stronger assumption. Adjust
them to test specifically for -1 for strict spec compliance.
The standard further leaves open the possibility that the O_NONBLOCK
flag bit is not the only active one in F_SETFL's argument. Formally,
therefore, one ought to get the current flags with F_GETFL and store
them back with only the O_NONBLOCK bit changed when trying to change
the nonblock state. In port/noblock.c, we were doing the full pushup
in pg_set_block but not in pg_set_noblock, which is just weird. Make
both of them do it properly, since they have little business making
any assumptions about the socket they're handed. The other places
where we're issuing F_SETFL are working with FDs we just got from
pipe(2), so it's reasonable to assume the FDs' properties are all
default, so I didn't bother adding F_GETFL steps there.
Also, while pg_set_block deserves some points for trying to do things
right, somebody had decided that it'd be even better to cast fcntl's
third argument to "long". Which is completely loony, because POSIX
clearly says the third argument for an F_SETFL call is "int".
Given the lack of field complaints, these missteps apparently are not
of significance on any common platforms. But they're still wrong,
so back-patch to all supported branches.
Tom Lane [Mon, 17 Apr 2017 17:52:42 +0000 (13:52 -0400)]
Support OpenSSL 1.1.0 in 9.3 and 9.2.
This commit back-patches the equivalent of the 9.5-branch commits e2838c580 and 48e5ba61e, so that we can work with OpenSSL 1.1.0
in all supported branches.
Original patches by Andreas Karlsson and Heikki Linnakangas,
back-patching work by Andreas Karlsson.
Tom Lane [Mon, 17 Apr 2017 16:51:40 +0000 (12:51 -0400)]
Back-patch 9.4-era SSL renegotiation code into 9.3 and 9.2.
This back-patches 9.4 commits 31cf1a1a4, 86029b31e, and 36a3be654 into
the prior branches, along with relevant bits of b1aebbb6a and 7ce2a45ae.
We had foreseen doing this once the code was proven, but that never did
happen, probably because we got sufficiently fed up with renegotiation
to disable it by default. However, we have to do something now because
the prior code doesn't even compile against OpenSSL 1.1. Per discussion,
the best solution seems to be to make the older branches look like 9.4.
Tom Lane [Sat, 15 Apr 2017 21:27:38 +0000 (17:27 -0400)]
Provide a way to control SysV shmem attach address in EXEC_BACKEND builds.
In standard non-Windows builds, there's no particular reason to care what
address the kernel chooses to map the shared memory segment at. However,
when building with EXEC_BACKEND, there's a risk that the chosen address
won't be available in all child processes. Linux with ASLR enabled (which
it is by default) seems particularly at risk because it puts shmem segments
into the same area where it maps shared libraries. We can work around
that by specifying a mapping address that's outside the range where
shared libraries could get mapped. On x86_64 Linux, 0x7e0000000000
seems to work well.
This is only meant for testing/debugging purposes, so it doesn't seem
necessary to go as far as providing a GUC (or any user-visible
documentation, though we might change that later). Instead, it's just
controlled by setting an environment variable PG_SHMEM_ADDR to the
desired attach address.
Back-patch to all supported branches, since the point here is to
remove intermittent buildfarm failures on EXEC_BACKEND animals.
Owners of affected animals will need to add a suitable setting of
PG_SHMEM_ADDR to their build_env configuration.
Tom Lane [Mon, 10 Apr 2017 17:51:29 +0000 (13:51 -0400)]
Improve castNode notation by introducing list-extraction-specific variants.
This extends the castNode() notation introduced by commit 5bcab1114 to
provide, in one step, extraction of a list cell's pointer and coercion to
a concrete node type. For example, "lfirst_node(Foo, lc)" is the same
as "castNode(Foo, lfirst(lc))". Almost half of the uses of castNode
that have appeared so far include a list extraction call, so this is
pretty widely useful, and it saves a few more keystrokes compared to the
old way.
As with the previous patch, back-patch the addition of these macros to
pg_list.h, so that the notation will be available when back-patching.
Joe Conway [Thu, 6 Apr 2017 21:22:01 +0000 (14:22 -0700)]
Silence compiler warning in sepgsql
<selinux/label.h> includes <stdbool.h>, which creates an incompatible
We don't care if <stdbool.h> redefines "true"/"false"; those are close
enough.
Complaint and initial patch by Mike Palmiotto. Final approach per
Tom Lane's suggestion, as discussed on hackers. Backpatching to
all supported branches.
Remove dead code and fix comments in fast-path function handling.
HandleFunctionRequest() is no longer responsible for reading the protocol
message from the client, since commit 2b3a8b20c2. Fix the outdated
comments.
HandleFunctionRequest() now always returns 0, because the code that used
to return EOF was moved in 2b3a8b20c2. Therefore, the caller no longer
needs to check the return value.
Reported by Andres Freund. Backpatch to all supported versions, even though
this doesn't have any user-visible effect, to make backporting future
patches in this area easier.
Fujii Masao [Thu, 30 Mar 2017 16:31:15 +0000 (01:31 +0900)]
Simplify the example of VACUUM in documentation.
Previously a detailed activity report by VACUUM VERBOSE ANALYZE was
described as an example of VACUUM in docs. But it had been obsolete
for a long time. For example, commit feb4f44d296b88b7f0723f4a4f3945a371276e0b
updated the content of that activity report in 2003, but we had
forgotten to update the example.
So basically we need to update the example. But since no one cared
about the details of VACUUM output and complained about that mistake
for such long time, per discussion on hackers, we decided to get rid
of the detailed activity report from the example and simplify it.
Back-patch to all supported versions.
Reported by Masahiko Sawada, patch by me.
Discussion: https://postgr.es/m/CAD21AoAGA2pB3p-CWmTkxBsbkZS1bcDGBLcYVcvcDxspG_XAfA@mail.gmail.com
Tom Lane [Sun, 26 Mar 2017 21:35:35 +0000 (17:35 -0400)]
Fix unportable disregard of alignment requirements in RADIUS code.
The compiler is entitled to store a char[] local variable with no
particular alignment requirement. Our RADIUS code cavalierly took such
a local variable and cast its address to a struct type that does have
alignment requirements. On an alignment-picky machine this would lead
to bus errors. To fix, declare the local variable honestly, and then
cast its address to char * for use in the I/O calls.
Given the lack of field complaints, there must be very few if any
people affected; but nonetheless this is a clear portability issue,
so back-patch to all supported branches.
Noted while looking at a Coverity complaint in the same code.
Revert Windows service check refactoring, and replace with a different fix.
This reverts commit 38bdba54a64bacec78e3266f0848b0b4a824132a, "Fix and
simplify check for whether we're running as Windows service". It turns out
that older versions of MinGW - like that on buildfarm member narwhal - do
not support the CheckTokenMembership() function. This replaces the
refactoring with a much smaller fix, to add a check for SE_GROUP_ENABLED to
pgwin32_is_service().
Only apply to back-branches, and keep the refactoring in HEAD. It's
unlikely that anyone is still really using such an old version of MinGW -
aside from narwhal - but let's not change the minimum requirements in
minor releases.
Fix and simplify check for whether we're running as Windows service.
If the process token contains SECURITY_SERVICE_RID, but it has been
disabled by the SE_GROUP_USE_FOR_DENY_ONLY attribute, win32_is_service()
would incorrectly report that we're running as a service. That situation
arises, e.g. if postmaster is launched with a restricted security token,
with the "Log in as Service" privilege explicitly removed.
Replace the broken code with CheckProcessTokenMembership(), which does
this correctly. Also replace similar code in win32_is_admin(), even
though it got this right, for simplicity and consistency.
Per bug #13755, reported by Breen Hagan. Back-patch to all supported
versions. Patch by Takayuki Tsunakawa, reviewed by Michael Paquier.
Andrew Gierth [Thu, 16 Mar 2017 22:33:38 +0000 (22:33 +0000)]
Avoid having vacuum set reltuples to 0 on non-empty relations in the
presence of page pins, which leads to serious estimation errors in the
planner. This particularly affects small heavily-accessed tables,
especially where locking (e.g. from FK constraints) forces frequent
vacuums for mxid cleanup.
Fix by keeping separate track of pages whose live tuples were actually
counted vs. pages that were only scanned for freezing purposes. Thus,
reltuples can only be set to 0 if all pages of the relation were
actually counted.
Backpatch to all supported versions.
Per bug #14057 from Nicolas Baccelli, analyzed by me.
Robert Haas [Tue, 14 Mar 2017 15:51:11 +0000 (11:51 -0400)]
Fix failure to mark init buffers as BM_PERMANENT.
This could result in corruption of the init fork of an unlogged index
if the ambuildempty routine for that index used shared buffers to
create the init fork, which was true for brin, gin, gist, and hash
indexes.
Patch by me, based on an earlier patch by Michael Paquier, who also
reviewed this one. This also incorporates an idea from Artur
Zakirov.
Tom Lane [Mon, 13 Mar 2017 20:46:32 +0000 (16:46 -0400)]
Remove unnecessary dependency on statement_timeout in prepared_xacts test.
Rather than waiting around for statement_timeout to expire, we can just
try to take the table's lock in nowait mode. This saves some fraction
under 4 seconds when running this test with prepared xacts available,
and it guards against timeout-expired-anyway failures on very slow
machines when prepared xacts are not available, as seen in a recent
failure on axolotl for instance.
This approach could fail if autovacuum were to take an exclusive lock
on the test table concurrently, but there's no reason for it to do so.
Since the main point here is to improve stability in the buildfarm,
back-patch to all supported branches.
Michael Meskes [Mon, 13 Mar 2017 19:44:13 +0000 (20:44 +0100)]
Ecpg should support COMMIT PREPARED and ROLLBACK PREPARED.
The problem was that "begin transaction" was issued automatically
before executing COMMIT/ROLLBACK PREPARED if not in auto commit. This fix by
Masahiko Sawada fixes this.
Noah Misch [Sun, 12 Mar 2017 23:35:31 +0000 (19:35 -0400)]
Fix pg_file_write() error handling.
Detect fclose() failures; given "ln -s /dev/full $PGDATA/devfull",
"pg_file_write('devfull', 'x', true)" now fails as it should. Don't
leak a stream when fwrite() fails. Remove a born-ineffective test that
aimed to skip zero-length writes. Back-patch to 9.2 (all supported
versions).
Joe Conway [Sat, 11 Mar 2017 21:33:30 +0000 (13:33 -0800)]
Fix ancient connection leak in dblink
When using unnamed connections with dblink, every time a new
connection is made, the old one is leaked. Fix that.
This has been an issue probably since dblink was first committed.
Someone complained almost ten years ago, but apparently I decided
not to pursue it at the time, and neither did anyone else, so it
slipped between the cracks. Now that someone else has complained,
fix in all supported branches.
Tom Lane [Fri, 10 Mar 2017 19:15:09 +0000 (14:15 -0500)]
Sanitize newlines in object names in "pg_restore -l" output.
Commits 89e0bac86 et al replaced newlines with spaces in object names
printed in SQL comments, but we neglected to consider that the same
names are also printed by "pg_restore -l", and a newline would render
the output unparseable by "pg_restore -L". Apply the same replacement
in "-l" output. Since "pg_restore -L" doesn't actually examine any
object names, only the dump ID field that starts each line, this is
enough to fix things for its purposes.
The previous fix was treated as a security issue, and we might have
done that here as well, except that the issue was reported publicly
to start with. Anyway it's hard to see how this could be exploited
for SQL injection; "pg_restore -L" doesn't do much with the file
except parse it for leading integers.
Per bug #14587 from Milos Urbanek. Back-patch to all supported versions.
Tom Lane [Thu, 9 Mar 2017 22:20:11 +0000 (17:20 -0500)]
Fix timestamptz regression test to still work with latest IANA zone data.
The IANA timezone crew continues to chip away at their project of removing
timezone abbreviations that have no real-world currency from their
database. The tzdata2017a update removes all such abbreviations for
South American zones, as well as much of the Pacific. This breaks some
test cases in timestamptz.sql that were expecting America/Santiago and
America/Caracas to have non-numeric abbreviations.
The test cases involving America/Santiago seem to have selected that
zone more or less at random, so just replace it with America/New_York,
which is of similar longitude. The cases involving America/Caracas are
harder since they were chosen to test a time-varying zone abbreviation
around a point where it changed meaning in the backwards direction.
Fortunately, Europe/Moscow has a similar case in 2014, and the MSK/MSD
abbreviations are well enough attested that IANA seems unlikely to
decide to remove them from the database in future.
With these changes, this regression test should pass when using any IANA
zone database from 2015 or later. One could wish that there were a few
years more daylight on how out-of-date your zone database can be ... but
really the --with-system-tzdata option is only meant for use on platforms
where the zone database is kept up-to-date pretty faithfully, so I do not
think this is a big objection.
Tom Lane [Tue, 7 Mar 2017 00:33:59 +0000 (19:33 -0500)]
Repair incorrect pg_dump labeling for some comments and security labels.
We attached no schema label to comments for procedural languages, casts,
transforms, operator classes, operator families, or text search objects.
The first three categories of objects don't really have schemas, but
pg_dump treats them as if they do, and it seems like the TocEntry fields
for their comments had better match the TocEntry fields for the parent
objects. (As an example of a possible hazard, the type names in a CAST
will be formatted with the assumption of a particular search_path, so
failing to ensure that this same path is active for the COMMENT ON command
could lead to an error or to attaching the comment to the wrong cast.)
In the last six cases, this was a flat-out error --- possibly mine to
begin with, but it was a long time ago.
The security label for a procedural language was likewise not correctly
labeled as to schema, and both the comment and security label for a
procedural language were not correctly labeled as to owner.
In simple cases the restore would accidentally work correctly anyway, since
these comments and security labels would normally get emitted right after
the owning object, and so the search path and active user would be correct
anyhow. But it could fail in corner cases; for example a schema-selective
restore would omit comments it should include.
Giuseppe Broccolo noted the oversight, and proposed the correct fix, for
text search dictionary objects; I found the rest by cross-checking other
dumpComment() calls. These oversights are ancient, so back-patch all
the way.
Stephen Frost [Mon, 6 Mar 2017 22:04:55 +0000 (17:04 -0500)]
pg_upgrade: Fix large object COMMENTS, SECURITY LABELS
When performing a pg_upgrade, we copy the files behind pg_largeobject
and pg_largeobject_metadata, allowing us to avoid having to dump out and
reload the actual data for large objects and their ACLs.
Unfortunately, that isn't all of the information which can be associated
with large objects. Currently, we also support COMMENTs and SECURITY
LABELs with large objects and these were being silently dropped during a
pg_upgrade as pg_dump would skip everything having to do with a large
object and pg_upgrade only copied the tables mentioned to the new
cluster.
As the file copies happen after the catalog dump and reload, we can't
simply include the COMMENTs and SECURITY LABELs in pg_dump's binary-mode
output but we also have to include the actual large object definition as
well. With the definition, comments, and security labels in the pg_dump
output and the file copies performed by pg_upgrade, all of the data and
metadata associated with large objects is able to be successfully pulled
forward across a pg_upgrade.
In 9.6 and master, we can simply adjust the dump bitmask to indicate
which components we don't want. In 9.5 and earlier, we have to put
explciit checks in in dumpBlob() and dumpBlobs() to not include the ACL
or the data when in binary-upgrade mode.
Adjustments made to the privileges regression test to allow another test
(large_object.sql) to be added which explicitly leaves a large object
with a comment in place to provide coverage of that case with
pg_upgrade.
Tom Lane [Tue, 21 Feb 2017 22:51:28 +0000 (17:51 -0500)]
Fix sloppy handling of corner-case errors in fd.c.
Several places in fd.c had badly-thought-through handling of error returns
from lseek() and close(). The fact that those would seldom fail on valid
FDs is probably the reason we've not noticed this up to now; but if they
did fail, we'd get quite confused.
LruDelete and LruInsert actually just Assert'd that lseek never fails,
which is pretty awful on its face.
In LruDelete, we indeed can't throw an error, because that's likely to get
called during error abort and so throwing an error would probably just lead
to an infinite loop. But by the same token, throwing an error from the
close() right after that was ill-advised, not to mention that it would've
left the LRU state corrupted since we'd already unlinked the VFD from the
list. I also noticed that really, most of the time, we should know the
current seek position and it shouldn't be necessary to do an lseek here at
all. As patched, if we don't have a seek position and an lseek attempt
doesn't give us one, we'll close the file but then subsequent re-open
attempts will fail (except in the somewhat-unlikely case that a
FileSeek(SEEK_SET) call comes between and allows us to re-establish a known
target seek position). This isn't great but it won't result in any state
corruption.
Meanwhile, having an Assert instead of an honest test in LruInsert is
really dangerous: if that lseek failed, a subsequent read or write would
read or write from the start of the file, not where the caller expected,
leading to data corruption.
In both LruDelete and FileClose, if close() fails, just LOG that and mark
the VFD closed anyway. Possibly leaking an FD is preferable to getting
into an infinite loop or corrupting the VFD list. Besides, as far as I can
tell from the POSIX spec, it's unspecified whether or not the file has been
closed, so treating it as still open could be the wrong thing anyhow.
I also fixed a number of other places that were being sloppy about
behaving correctly when the seekPos is unknown.
Also, I changed FileSeek to return -1 with EINVAL for the cases where it
detects a bad offset, rather than throwing a hard elog(ERROR). It seemed
pretty inconsistent that some bad-offset cases would get a failure return
while others got elog(ERROR). It was missing an offset validity check for
the SEEK_CUR case on a closed file, too.
Back-patch to all supported branches, since all this code is fundamentally
identical in all of them.
Tom Lane [Mon, 20 Feb 2017 15:05:01 +0000 (10:05 -0500)]
Fix documentation of to_char/to_timestamp TZ, tz, OF formatting patterns.
These are only supported in to_char, not in the other direction, but the
documentation failed to mention that. Also, describe TZ/tz as printing the
time zone "abbreviation", not "name", because what they print is elsewhere
referred to that way. Per bug #14558.
Tom Lane [Sun, 19 Feb 2017 21:14:53 +0000 (16:14 -0500)]
Adjust PL/Tcl regression test to dodge a possible bug or zone dependency.
One case in the PL/Tcl tests is observed to fail on RHEL5 with a Turkish
time zone setting. It's not clear if this is an old Tcl bug or something
odd about the zone data, but in any case that test is meant to see if the
Tcl [clock] command works at all, not what its corner-case behaviors are.
Therefore we have no need to test exactly which week a Sunday midnight is
considered to fall into. Probe the following Tuesday instead.
Tom Lane [Fri, 17 Feb 2017 21:58:26 +0000 (16:58 -0500)]
Back-patch 9.4-era compiler warning fixes into older branches.
This applies portions of commits b64b5ccb6 and b1aebbb6a to the older
branches, in hopes of getting -Werror builds to succeed there. The
applied changes simply remove useless tests, eg checking an unsigned
variable to see if it is >= 0. Recent versions of clang warn about
such tests by default.
Tom Lane [Wed, 15 Feb 2017 21:40:06 +0000 (16:40 -0500)]
Make sure that hash join's bulk-tuple-transfer loops are interruptible.
The loops in ExecHashJoinNewBatch(), ExecHashIncreaseNumBatches(), and
ExecHashRemoveNextSkewBucket() are all capable of iterating over many
tuples without ever doing a CHECK_FOR_INTERRUPTS, so that the backend
might fail to respond to SIGINT or SIGTERM for an unreasonably long time.
Fix that. In the case of ExecHashJoinNewBatch(), it seems useful to put
the added CHECK_FOR_INTERRUPTS into ExecHashJoinGetSavedTuple() rather
than directly in the loop, because that will also ensure that both
principal code paths through ExecHashJoinOuterGetTuple() will do a
CHECK_FOR_INTERRUPTS, which seems like a good idea to avoid surprises.
Noah Misch [Sun, 12 Feb 2017 21:03:41 +0000 (16:03 -0500)]
Ignore tablespace ACLs when ignoring schema ACLs.
The ALTER TABLE ALTER TYPE implementation can issue DROP INDEX and
CREATE INDEX to refit existing indexes for the new column type. Since
this CREATE INDEX is an implementation detail of an index alteration,
the ensuing DefineIndex() should skip ACL checks specific to index
creation. It already skips the namespace ACL check. Make it skip the
tablespace ACL check, too. Back-patch to 9.2 (all supported versions).
Tom Lane [Tue, 7 Feb 2017 15:24:25 +0000 (10:24 -0500)]
Correct thinko in last-minute release note item.
The CREATE INDEX CONCURRENTLY bug can only be triggered by row updates,
not inserts, since the problem would arise from an update incorrectly
being made HOT. Noted by Alvaro.
Stephen Frost [Tue, 7 Feb 2017 15:17:02 +0000 (10:17 -0500)]
Initialize number_of_jobs in NewRestoreOptions
Now that we're checking that the number_of_jobs passed in isn't zero or
negative, we need to actually initialize number_of_jobs to '1' when it
isn't set.
Pointed out by Rushabh Lathia, though not his patch.
Tom Lane [Mon, 6 Feb 2017 18:19:51 +0000 (13:19 -0500)]
Avoid returning stale attribute bitmaps in RelationGetIndexAttrBitmap().
The problem with the original coding here is that we might receive (and
clear) a relcache invalidation signal for the target relation down inside
one of the index_open calls we're doing. Since the target is open, we
would not drop the relcache entry, just reset its rd_indexvalid and
rd_indexlist fields. But RelationGetIndexAttrBitmap() kept going, and
would eventually cache and return potentially-obsolete attribute bitmaps.
The case where this matters is where the inval signal was from a CREATE
INDEX CONCURRENTLY telling us about a new index on a formerly-unindexed
column. (In all other cases, the lock we hold on the target rel should
prevent any concurrent change in index state.) Even just returning the
stale attribute bitmap is not such a problem, because it shouldn't matter
during the transaction in which we receive the signal. What hurts is
caching the stale data, because it can survive into later transactions,
breaking CREATE INDEX CONCURRENTLY's expectation that later transactions
will not create new broken HOT chains. The upshot is that there's a window
for building corrupted indexes during CREATE INDEX CONCURRENTLY.
This patch fixes the problem by rechecking that the set of index OIDs
is still the same at the end of RelationGetIndexAttrBitmap() as it was
at the start. If not, we loop back and try again. That's a little
more than is strictly necessary to fix the bug --- in principle, we
could return the stale data but not cache it --- but it seems like a
bad idea on general principles for relcache to return data it knows
is stale.
There might be more hazards of the same ilk, or there might be a better
way to fix this one, but this patch definitely improves matters and seems
unlikely to make anything worse. So let's push it into today's releases
even as we continue to study the problem.