This removes some info about support procedures being used, which was
obsoleted by commit db5f98ab4f, as well as add some more documentation
on how to create new opclasses using the Minmax infrastructure.
(Hopefully we can get something similar for Inclusion as well.)
In passing, fix some obsolete mentions of "mmtuples" in source code
comments.
Tom Lane [Sat, 18 Jul 2015 15:47:13 +0000 (11:47 -0400)]
Make WaitLatchOrSocket's timeout detection more robust.
In the previous coding, timeout would be noticed and reported only when
poll() or socket() returned zero (or the equivalent behavior on Windows).
Ordinarily that should work well enough, but it seems conceivable that we
could get into a state where poll() always returns a nonzero value --- for
example, if it is noticing a condition on one of the file descriptors that
we do not think is reason to exit the loop. If that happened, we'd be in a
busy-wait loop that would fail to terminate even when the timeout expires.
We can make this more robust at essentially no cost, by deciding to exit
of our own accord if we compute a zero or negative time-remaining-to-wait.
Previously the code noted this but just clamped the time-remaining to zero,
expecting that we'd detect timeout on the next loop iteration.
Back-patch to 9.2. While 9.1 had a version of WaitLatchOrSocket, it was
primitive compared to later versions, and did not guarantee reliable
detection of timeouts anyway. (Essentially, this is a refinement of
commit 3e7fdcffd6f77187, which was back-patched only as far as 9.2.)
Andrew Dunstan [Sat, 18 Jul 2015 00:56:13 +0000 (20:56 -0400)]
Support JSON negative array subscripts everywhere
Previously, there was an inconsistency across json/jsonb operators that
operate on datums containing JSON arrays -- only some operators
supported negative array count-from-the-end subscripting. Specifically,
only a new-to-9.5 jsonb deletion operator had support (the new "jsonb -
integer" operator). This inconsistency seemed likely to be
counter-intuitive to users. To fix, allow all places where the user can
supply an integer subscript to accept a negative subscript value,
including path-orientated operators and functions, as well as other
extraction operators. This will need to be called out as an
incompatibility in the 9.5 release notes, since it's possible that users
are relying on certain established extraction operators changed here
yielding NULL in the event of a negative subscript.
For the json type, this requires adding a way of cheaply getting the
total JSON array element count ahead of time when parsing arrays with a
negative subscript involved, necessitating an ad-hoc lex and parse.
This is followed by a "conversion" from a negative subscript to its
equivalent positive-wise value using the count. From there on, it's as
if a positive-wise value was originally provided.
Note that there is still a minor inconsistency here across jsonb
deletion operators. Unlike the aforementioned new "-" deletion operator
that accepts an integer on its right hand side, the new "#-" path
orientated deletion variant does not throw an error when it appears like
an array subscript (input that could be recognized by as an integer
literal) is being used on an object, which is wrong-headed. The reason
for not being stricter is that it could be the case that an object pair
happens to have a key value that looks like an integer; in general,
these two possibilities are impossible to differentiate with rhs path
text[] argument elements. However, we still don't allow the "#-"
path-orientated deletion operator to perform array-style subscripting.
Rather, we just return the original left operand value in the event of a
negative subscript (which seems analogous to how the established
"jsonb/json #> text[]" path-orientated operator may yield NULL in the
event of an invalid subscript).
In passing, make SetArrayPath() stricter about not accepting cases where
there is trailing non-numeric garbage bytes rather than a clean NUL
byte. This means, for example, that strings like "10e10" are now not
accepted as an array subscript of 10 by some new-to-9.5 path-orientated
jsonb operators (e.g. the new #- operator). Finally, remove dead code
for jsonb subscript deletion; arguably, this should have been done in
commit b81c7b409.
Tom Lane [Fri, 17 Jul 2015 19:53:10 +0000 (15:53 -0400)]
Repair mishandling of cached cast-expression trees in plpgsql.
In commit 1345cc67bbb014209714af32b5681b1e11eaf964, I introduced caching
of expressions representing type-cast operations into plpgsql. However,
I supposed that I could cache both the expression trees and the evaluation
state trees derived from them for the life of the session. This doesn't
work, because we execute the expressions in plpgsql's simple_eval_estate,
which has an ecxt_per_query_memory that is only transaction-lifespan.
Therefore we can end up putting pointers into the evaluation state tree
that point to transaction-lifespan memory; in particular this happens if
the cast expression calls a SQL-language function, as reported by Geoff
Winkless.
The minimum-risk fix seems to be to treat the state trees the same way
we do for "simple expression" trees in plpgsql, ie create them in the
simple_eval_estate's ecxt_per_query_memory, which means recreating them
once per transaction.
Since I had to introduce bookkeeping overhead for that anyway, I bought
back some of the added cost by sharing the read-only expression trees
across all functions in the session, instead of using a per-function
table as originally. The simple-expression bookkeeping takes care of
the recursive-usage risk that I was concerned about avoiding before.
At some point we should take a harder look at how all this works,
and see if we can't reduce the amount of tree reinitialization needed.
But that won't happen for 9.5.
xlc provides "long long" unconditionally at C99-compatible language
levels, and this option provokes a warning. The warning interferes with
"configure" tests that fail in response to any warning. Notably, before
commit 85a2a8903f7e9151793308d0638621003aded5ae, it interfered with the
test for -qnoansialias. Back-patch to 9.0 (all supported versions).
Tom Lane [Fri, 17 Jul 2015 02:57:46 +0000 (22:57 -0400)]
Fix a low-probability crash in our qsort implementation.
It's standard for quicksort implementations, after having partitioned the
input into two subgroups, to recurse to process the smaller partition and
then handle the larger partition by iterating. This method guarantees
that no more than log2(N) levels of recursion can be needed. However,
Bentley and McIlroy argued that checking to see which partition is smaller
isn't worth the cycles, and so their code doesn't do that but just always
recurses on the left partition. In most cases that's fine; but with
worst-case input we might need O(N) levels of recursion, and that means
that qsort could be driven to stack overflow. Such an overflow seems to
be the only explanation for today's report from Yiqing Jin of a SIGSEGV
in med3_tuple while creating an index of a couple billion entries with a
very large maintenance_work_mem setting. Therefore, let's spend the few
additional cycles and lines of code needed to choose the smaller partition
for recursion.
Also, fix up the qsort code so that it properly uses size_t not int for
some intermediate values representing numbers of items. This would only
be a live risk when sorting more than INT_MAX bytes (in qsort/qsort_arg)
or tuples (in qsort_tuple), which I believe would never happen with any
caller in the current core code --- but perhaps it could happen with
call sites in third-party modules? In any case, this is trouble waiting
to happen, and the corrected code is probably if anything shorter and
faster than before, since it removes sign-extension steps that had to
happen when converting between int and size_t.
In passing, move a couple of CHECK_FOR_INTERRUPTS() calls so that it's
not necessary to preserve the value of "r" across them, and prettify
the output of gen_qsort_tuple.pl a little.
Back-patch to all supported branches. The odds of hitting this issue
are probably higher in 9.4 and up than before, due to the new ability
to allocate sort workspaces exceeding 1GB, but there's no good reason
to believe that it's impossible to crash older branches this way.
AIX: Link TRANSFORM modules with their dependencies.
The result closely resembles linking of these modules for the "win32"
port. Augment the $(exports_file) header so the file is also usable as
an import file. Unfortunately, relocating an AIX installation will now
require adding $(pkglibdir) to LD_LIBRARY_PATH. Back-patch to 9.5,
where the modules were introduced.
AIX: Link the postgres executable with -Wl,-brtllib.
This allows PostgreSQL modules and their dependencies to have undefined
symbols, resolved at runtime. Perl module shared objects rely on that
in Perl 5.8.0 and later. This fixes the crash when PL/PerlU loads such
modules, as the hstore_plperl test suite does. Module authors can link
using -Wl,-G to permit undefined symbols; by default, linking will fail
as it has. Back-patch to 9.0 (all supported versions).
In the test query I added for ALTER TABLE retaining comments, the order of
the result rows was not stable, and varied across systems. Add an ORDER BY
to make the order predictable. This should fix the buildfarm failures.
Retain comments on indexes and constraints at ALTER TABLE ... TYPE ...
When a column's datatype is changed, ATExecAlterColumnType() rebuilds all
the affected indexes and constraints, and the comments from the old
indexes/constraints were not carried over.
To fix, create a synthetic COMMENT ON command in the work queue, to re-add
any comments on constraints. For indexes, there's a comment field in
IndexStmt that is used.
This fixes bug #13126, reported by Kirill Simonov. Original patch by
Michael Paquier, reviewed by Petr Jelinek and me. This bug is present in
all versions, but only backpatch to 9.5. Given how minor the issue is, it
doesn't seem worth the work and risk to backpatch further than that.
The code in ATPostAlterTypeParse was very deeply indented, mostly because
there were two nested switch-case statements, which add a lot of
indentation. Use if-else blocks instead, to make the code less indented
and more readable.
This is in preparation for next patch that makes some actualy changes to
the function. These cosmetic parts have been separated to make it easier
to see the real changes in the other patch.
Tom Lane [Sun, 12 Jul 2015 20:25:51 +0000 (16:25 -0400)]
Fix assorted memory leaks.
Per Coverity (not that any of these are so non-obvious that they should not
have been caught before commit). The extent of leakage is probably minor
to unnoticeable, but a leak is a leak. Back-patch as necessary.
Andres Freund [Sun, 12 Jul 2015 20:06:27 +0000 (22:06 +0200)]
Optionally don't error out due to preexisting slots in commandline utilities.
pg_receivexlog and pg_recvlogical error out when --create-slot is
specified and a slot with the same name already exists. In some cases,
especially with pg_receivexlog, that's rather annoying and requires
additional scripting.
Backpatch to 9.5 as slot control functions have newly been added to
pg_receivexlog, and there doesn't seem much point leaving it in a less
useful state.
Joe Conway [Sat, 11 Jul 2015 21:20:01 +0000 (14:20 -0700)]
Add assign_expr_collations() to CreatePolicy() and AlterPolicy().
As noted by Noah Misch, CreatePolicy() and AlterPolicy() omit to call
assign_expr_collations() on the node trees. Fix the omission and add
his test case to the rowsecurity regression test.
Copy-edit the docs changes of OWNER TO CURRENT/SESSION_USER additions.
Commit 31eae602 added new syntax to many DDL commands to use CURRENT_USER
or SESSION_USER instead of role name in ALTER ... OWNER TO, but because
of a misplaced '{', the syntax in the docs implied that the syntax was
"ALTER ... CURRENT_USER", instead of "ALTER ... OWNER TO CURRENT_USER".
Fix that, and also the funny indentation in some of the modified syntax
blurps.
Tom Lane [Thu, 9 Jul 2015 22:50:31 +0000 (18:50 -0400)]
Improve documentation about array concat operator vs. underlying functions.
The documentation implied that there was seldom any reason to use the
array_append, array_prepend, and array_cat functions directly. But that's
not really true, because they can help make it clear which case is meant,
which the || operator can't do since it's overloaded to represent all three
cases. Add some discussion and examples illustrating the potentially
confusing behavior that can ensue if the parser misinterprets what was
meant.
Per a complaint from Michael Herold. Back-patch to 9.2, which is where ||
started to behave this way.
Tom Lane [Thu, 9 Jul 2015 17:22:23 +0000 (13:22 -0400)]
Fix postmaster's handling of a startup-process crash.
Ordinarily, a failure (unexpected exit status) of the startup subprocess
should be considered fatal, so the postmaster should just close up shop
and quit. However, if we sent the startup process a SIGQUIT or SIGKILL
signal, the failure is hardly "unexpected", and we should attempt restart;
this is necessary for recovery from ordinary backend crashes in hot-standby
scenarios. I attempted to implement the latter rule with a two-line patch
in commit 442231d7f71764b8c628044e7ce2225f9aa43b67, but it now emerges that
that patch was a few bricks shy of a load: it failed to distinguish the
case of a signaled startup process from the case where the new startup
process crashes before reaching database consistency. That resulted in
infinitely respawning a new startup process only to have it crash again.
To handle this properly, we really must track whether we have sent the
*current* startup process a kill signal. Rather than add yet another
ad-hoc boolean to the postmaster's state, I chose to unify this with the
existing RecoveryError flag into an enum tracking the startup process's
state. That seems more consistent with the postmaster's general state
machine design.
Make wal_compression PGC_SUSET rather than PGC_USERSET.
When enabling wal_compression, there is a risk to leak data similarly to
the BREACH and CRIME attacks on SSL where the compression ratio of
a full page image gives a hint of what is the existing data of this page.
This vulnerability is quite cumbersome to exploit in practice, but doable.
So this patch makes wal_compression PGC_SUSET in order to prevent
non-superusers from enabling it and exploiting the vulnerability while
DBA thinks the risk very seriously and disables it in postgresql.conf.
Back-patch to 9.5 where wal_compression was introduced.
The AIX 7.1 libm is static, and AIX postgres executables do not export
symbols acquired from libraries. Back-patch to 9.5, where commit cfe12763c32437bc708a64ce88a90c7544f16185 added a sqrt() call.
Revoke support for strxfrm() that write past the specified array length.
This formalizes a decision implicit in commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23 and adds clean detection of
affected systems. Vendor updates are available for each such known bug.
Back-patch to 9.5, where the aforementioned commit first appeared.
POSIX does not specify the -q option, and many implementations do not
offer it. Don't bother changing the MSVC build system, because having
non-GNU diff on Windows is vanishingly unlikely. Back-patch to 9.2,
where this invocation was introduced.
Fix null pointer dereference in "\c" psql command.
The psql crash happened when no current connection existed. (The second
new check is optional given today's undocumented NULL argument handling
in PQhost() etc.) Back-patch to 9.0 (all supported versions).
Move pthread-tests earlier in the autoconf script.
On some Linux systems, "-lrt" exposed pthread-functions, so that linking
with -lrt was seemingly enough to make a program that uses pthreads to
work. However, when linking libpq, the dependency to libpthread was not
marked correctly, so that when an executable was linked with -lpq but
without -pthread, you got errors about undefined pthread_* functions from
libpq.
To fix, test for the flags required to use pthreads earlier in the autoconf
script, before checking any other libraries.
This should fix the failure on buildfarm member shearwater. gharial is also
failing; hopefully this fixes that too although the failure looks somewhat
different.
Replace our hacked version of ax_pthread.m4 with latest upstream version.
Our version was different from the upstream version in that we tried to use
all possible pthread-related flags that the compiler accepts, rather than
just the first one that works. That change was made in commit e48322a6d6cfce1ec52ab303441df329ddbc04d1, to work-around a bug affecting GCC
versions 3.2 and below (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=8888),
although we didn't realize that it was a GCC bug at the time. We hardly care
about that old GCC versions anymore, so we no longer need that workaround.
This fixes the macro for compilers that print warnings with the chosen
flags. That's pretty annoying on its own right, but it also inconspicuously
disabled thread-safety, because we refused to use any pthread-related flags
if the compiler produced warnings. Max Filippov reported that problem when
linking with uClibc and OpenSSL. The warnings-check was added because the
workaround for the GCC bug caused warnings otherwise, so it's no longer
needed either. We can just use the upstream version as is.
If you really want to compile with GCC version 3.2 or older, you can still
work-around it manually by setting PTHREAD_CFLAGS="-pthread -lpthread"
manually on the configure command line.
Backpatch to 9.5. I don't want to unnecessarily rock the boat on stable
branches, but 9.5 seems like fair game.
Joe Conway [Tue, 7 Jul 2015 21:36:03 +0000 (14:36 -0700)]
Improve regression test coverage of table lock modes vs permissions.
Test the interactions with permissions and LOCK TABLE. Specifically
ROW EXCLUSIVE, ACCESS SHARE, and ACCESS EXCLUSIVE modes against
SELECT, INSERT, UPDATE, DELETE, and TRUNCATE permissions. Discussed
by Stephen Frost and Michael Paquier, patch by the latter. Backpatch
to 9.5 where matching behavior was first committed.
Tom Lane [Tue, 7 Jul 2015 16:49:18 +0000 (12:49 -0400)]
Fix portability issue in pg_upgrade test script: avoid $PWD.
SUSv2-era shells don't set the PWD variable, though anything more modern
does. In the buildfarm environment this could lead to test.sh executing
with PWD pointing to $HOME or another high-level directory, so that there
were conflicts between concurrent executions of the test in different
branch subdirectories. This appears to be the explanation for recent
intermittent failures on buildfarm members binturong and dingo (and might
well have something to do with the buildfarm script's failure to capture
log files from pg_upgrade tests, too).
To fix, just use `pwd` in place of $PWD. AFAICS test.sh is the only place
in our source tree that depended on $PWD. Back-patch to all versions
containing this script.
Per buildfarm. Thanks to Oskari Saarenmaa for diagnosing the problem.
If an allocation fails in the main message handling loop, pqParseInput3
or pqParseInput2, it should not be treated as "not enough data available
yet". Otherwise libpq will wait indefinitely for more data to arrive from
the server, and gets stuck forever.
This isn't a complete fix - getParamDescriptions and getCopyStart still
have the same issue, but it's a step in the right direction.
Michael Paquier and me. Backpatch to all supported versions.
Turn install.bat into a pure one line wrapper fort he perl script.
Build.bat and vcregress.bat got similar treatment years ago. I'm not sure
why install.bat wasn't treated at the same time, but it seems like a good
idea anyway.
The immediate problem with the old install.bat was that it had quoting
issues, and wouldn't work if the target directory's name contained spaces.
This fixes that problem.
I committed this to master yesterday, this is a backpatch of the same for
all supported versions.
Andres Freund [Tue, 7 Jul 2015 11:13:15 +0000 (13:13 +0200)]
Fix logical decoding bug leading to inefficient reopening of files.
When spilling transaction data to disk a simple typo caused the output
file to be closed and reopened for every serialized change. That happens
to not have a huge impact on linux, which is why it probably wasn't
noticed so far, but on windows that appears to trigger actual disk
writes after every change. Not fun.
The bug fortunately does not have any impact besides speed. A change
could end up being in the wrong segment (last instead of next), but
since we read all files to the end, that's just ugly, not really
problematic. It's not a problem to upgrade, since transaction spill
files do not persist across restarts.
Andres Freund [Tue, 7 Jul 2015 10:47:44 +0000 (12:47 +0200)]
Fix pg_recvlogical not to fsync output when it's a tty or pipe.
The previous coding tried to handle possible failures when fsyncing a
tty or pipe fd by accepting EINVAL - but apparently some
platforms (windows, OSX) don't reliably return that. So instead check
whether the output fd refers to a pipe or a tty when opening it.
Reported-By: Olivier Gosseaume, Marko Tiikkaja
Discussion: 559AF98B.3050901@joh.to
Remove incorrect warning from pg_archivecleanup document.
The .backup file name can be passed to pg_archivecleanup even if
it includes the extension which is specified in -x option.
However, previously the document incorrectly warned a user
not to do that.
Back-patch to 9.2 where pg_archivecleanup's -x option and
the warning were added.
Tom Lane [Sun, 5 Jul 2015 16:01:01 +0000 (12:01 -0400)]
Make numeric form of PG version number readily available in Makefiles.
Expose PG_VERSION_NUM (e.g., "90600") as a Make variable; but for
consistency with the other Make variables holding similar info,
call the variable just VERSION_NUM not PG_VERSION_NUM.
There was some discussion of making this value available as a pg_config
value as well. However, that would entail substantially more work than
this two-line patch. Given that there was not exactly universal consensus
that we need this at all, let's just do a minimal amount of work for now.
Fix pgbench progress report behaviour when pgbench or a query gets stuck.
There were two issues here. First, if a query got stuck so that it took
e.g. 5 seconds, and progress interval was 1 second, no progress reports were
printed until the query returned. Fix so that we wake up specifically to
print the progress report. Secondly, if pgbench got stuck so that it would
nevertheless not print a progress report on time, and enough time passes
that it's already time to print the next progress report, just skip the one
that was missed. Before this patch, it would print the missed one with 0 TPS
immediately after the previous one.
Fabien Coelho. Backpatch to 9.4, where progress reports were added.
Make WAL-related utilities handle .partial WAL files properly.
Commit de76884 changed an archive recovery so that the last WAL
segment with old timeline was renamed with suffix .partial. It should
have updated WAL-related utilities so that they can handle such
.paritial WAL files, but we forgot that.
This patch changes pg_archivecleanup so that it can clean up even
archived WAL files with .partial suffix. Also it allows us to specify
.partial WAL file name as the command-line argument "oldestkeptwalfile".
This patch also changes pg_resetxlog so that it can remove .partial
WAL files in pg_xlog directory.
pg_xlogdump cannot handle .partial WAL files. Per discussion,
we decided only to document that limitation instead of adding the fix.
Because a user can easily work around the limitation (i.e., just remove
.partial suffix from the file name) and the fix seems complicated for
very narrow use case.
Back-patch to 9.5 where the problem existed.
Review by Michael Paquier.
Discussion: http://www.postgresql.org/message-id/CAHGQGwGxMKnVHGgTfiig2Bt_2djec0in3-DLJmtg7+nEiidFdQ@mail.gmail.com
Tom Lane [Thu, 2 Jul 2015 21:02:08 +0000 (17:02 -0400)]
Fix misuse of TextDatumGetCString().
"TextDatumGetCString(PG_GETARG_TEXT_P(x))" is formally wrong: a text*
is not a Datum. Although this coding will accidentally fail to fail on
all known platforms, it risks leaking memory if a detoast step is needed,
unlike "TextDatumGetCString(PG_GETARG_DATUM(x))" which is what's used
elsewhere. Make pg_get_object_address() fall in line with other uses.
Noted while reviewing two-arg current_setting() patch.
Use appendStringInfoString/Char et al where appropriate.
Patch by David Rowley. Backpatch to 9.5, as some of the calls were new in
9.5, and keeping the code in sync with master makes future backpatching
easier.
Make use of xlog_internal.h's macros in WAL-related utilities.
Commit 179cdd09 added macros to check if a filename is a WAL segment
or other such file. However there were still some instances of the
strlen + strspn combination to check for that in WAL-related utilities
like pg_archivecleanup. Those checks can be replaced with the macros.
This patch makes use of the macros in those utilities and
which would make the code a bit easier to read.
Tom Lane [Wed, 1 Jul 2015 22:07:48 +0000 (18:07 -0400)]
Make sampler_random_fract() actually obey its API contract.
This function is documented to return a value in the range (0,1),
which is what its predecessor anl_random_fract() did. However, the
new version depends on pg_erand48() which returns a value in [0,1).
The possibility of returning zero creates hazards of division by zero
or trying to compute log(0) at some call sites, and it might well
break third-party modules using anl_random_fract() too. So let's
change it to never return zero. Spotted by Coverity.
XLogFileCopy() was changed heavily in commit de76884. However it was
partially reverted in commit 7abc685 and most of those changes to
XLogFileCopy() were no longer needed. Then commit 7cbee7c removed
those unnecessary code, but XLogFileCopy() looked different in master
and 9.4 though the contents are almost the same.
This patch makes XLogFileCopy() look the same in master and back-branches,
which makes back-patching easier, per discussion on pgsql-hackers.
Back-patch to 9.5.
Don't call PageGetSpecialPointer() on page until it's been initialized.
After calling XLogInitBufferForRedo(), the page might be all-zeros if it was
not in page cache already. btree_xlog_unlink_page initialized the page
correctly, but it called PageGetSpecialPointer before initializing it, which
would lead to a corrupt page at WAL replay, if the unlinked page is not in
page cache.
Backpatch to 9.4, the bug came with the rewrite of B-tree page deletion.
Initialize GIN metapage correctly when replaying metapage-update WAL record.
I broke this with my WAL format refactoring patch. Before that, the metapage
was read from disk, and modified in-place regardless of the LSN. That was
always a bit silly, as there's no need to read the old page version from
disk disk when we're overwriting it anyway. So that was changed in 9.5, but
I failed to add a GinInitPage call to initialize the page-headers correctly.
Usually you wouldn't notice, because the metapage is already in the page
cache and is not zeroed.
One way to reproduce this is to perform a VACUUM on an already vacuumed
table (so that the vacuum has no real work to do), immediately after a
checkpoint, and then perform an immediate shutdown. After recovery, the
page headers of the metapage will be incorrectly all-zeroes.
Tom Lane [Mon, 29 Jun 2015 19:38:46 +0000 (15:38 -0400)]
Desultory review of 9.5 release notes.
Minor corrections and clarifications. Notably, for stuff that got moved
out of contrib, make sure it's documented somewhere other than "Additional
Modules".
I'm sure these need more work, but that's all I have time for today.
Tom Lane [Mon, 29 Jun 2015 16:42:52 +0000 (12:42 -0400)]
Code + docs review for escaping of option values (commit 11a020eb6).
Avoid memory leak from incorrect choice of how to free a StringInfo
(resetStringInfo doesn't do it). Now that pg_split_opts doesn't scribble
on the optstr, mark that as "const" for clarity. Attach the commentary in
protocol.sgml to the right place, and add documentation about the
user-visible effects of this change on postgres' -o option and libpq's
PGOPTIONS option.
Andres Freund [Mon, 29 Jun 2015 12:53:32 +0000 (14:53 +0200)]
Replace ia64 S_UNLOCK compiler barrier with a full memory barrier.
_Asm_sched_fence() is just a compiler barrier, not a memory barrier. But
spinlock release on IA64 needs, at the very least, release
semantics. Use a full barrier instead.
This might be the cause for the occasional failures on buildfarm member
anole.
Tom Lane [Sun, 28 Jun 2015 22:06:14 +0000 (18:06 -0400)]
Improve design and implementation of pg_file_settings view.
As first committed, this view reported on the file contents as they were
at the last SIGHUP event. That's not as useful as reporting on the current
contents, and what's more, it didn't work right on Windows unless the
current session had serviced at least one SIGHUP. Therefore, arrange to
re-read the files when pg_show_all_settings() is called. This requires
only minor refactoring so that we can pass changeVal = false to
set_config_option() so that it won't actually apply any changes locally.
In addition, add error reporting so that errors that would prevent the
configuration files from being loaded, or would prevent individual settings
from being applied, are visible directly in the view. This makes the view
usable for pre-testing whether edits made in the config files will have the
desired effect, before one actually issues a SIGHUP.
I also added an "applied" column so that it's easy to identify entries that
are superseded by later entries; this was the main use-case for the original
design, but it seemed unnecessarily hard to use for that.
Also fix a 9.4.1 regression that allowed multiple entries for a
PGC_POSTMASTER variable to cause bogus complaints in the postmaster log.
(The issue here was that commit bf007a27acd7b2fb unintentionally reverted 3e3f65973a3c94a6, which suppressed any duplicate entries within
ParseConfigFp. However, since the original coding of the pg_file_settings
view depended on such suppression *not* happening, we couldn't have fixed
this issue now without first doing something with pg_file_settings.
Now we suppress duplicates by marking them "ignored" within
ProcessConfigFileInternal, which doesn't hide them in the view.)
Lesser changes include:
Drive the view directly off the ConfigVariable list, instead of making a
basically-equivalent second copy of the data. There's no longer any need
to hang onto the data permanently, anyway.
Convert show_all_file_settings() to do its work in one call and return a
tuplestore; this avoids risks associated with assuming that the GUC state
will hold still over the course of query execution. (I think there were
probably latent bugs here, though you might need something like a cursor
on the view to expose them.)
Arrange to run SIGHUP processing in a short-lived memory context, to
forestall process-lifespan memory leaks. (There is one known leak in this
code, in ProcessConfigDirectory; it seems minor enough to not be worth
back-patching a specific fix for.)
Remove mistaken assignment to ConfigFileLineno that caused line counting
after an include_dir directive to be completely wrong.
Add missed failure check in AlterSystemSetConfigFile(). We don't really
expect ParseConfigFp() to fail, but that's not an excuse for not checking.
Also trigger restartpoints based on max_wal_size on standby.
When archive recovery and restartpoints were initially introduced,
checkpoint_segments was ignored on the grounds that the files restored from
archive don't consume any space in the recovery server. That was changed in
later releases, but even then it was arguably a feature rather than a bug,
as performing restartpoints as often as checkpoints during normal operation
might be excessive, but you might nevertheless not want to waste a lot of
space for pre-allocated WAL by setting checkpoint_segments to a high value.
But now that we have separate min_wal_size and max_wal_size settings, you
can bound WAL usage with max_wal_size, and still avoid consuming excessive
space usage by setting min_wal_size to a lower value, so that argument is
moot.
There are still some issues with actually limiting the space usage to
max_wal_size: restartpoints in recovery can only start after seeing the
checkpoint record, while a checkpoint starts flushing buffers as soon as
the redo-pointer is set. Restartpoint is paced to happen at the same
leisurily speed, determined by checkpoint_completion_target, as checkpoints,
but because they are started later, max_wal_size can be exceeded by upto
one checkpoint cycle's worth of WAL, depending on
checkpoint_completion_target. But that seems better than not trying at all,
and max_wal_size is a soft limit anyway.
The documentation already claimed that max_wal_size is obeyed in recovery,
so this just fixes the behaviour to match the docs. However, add some
weasel-words there to mention that max_wal_size may well be exceeded by
some amount in recovery.
Promote the assertion that XLogBeginInsert() is not called twice into ERROR.
Seems like cheap insurance for WAL bugs. A spurious call to
XLogBeginInsert() in itself would be fairly harmless, but if there is any
data registered and the insertion is not completed/cancelled properly, there
is a risk that the data ends up in a wrong WAL record.
Fix double-XLogBeginInsert call in GIN page splits.
If data checksums or wal_log_hints is on, and a GIN page is split, the code
to find a new, empty, block was called after having already called
XLogBeginInsert(). That causes an assertion failure or PANIC, if finding the
new block involves updating a FSM page that had not been modified since last
checkpoint, because that update is WAL-logged, which calls XLogBeginInsert
again. Nested XLogBeginInsert calls are not supported.
To fix, rearrange GIN code so that XLogBeginInsert is called later, after
finding the victim buffers.
Don't choke on files that are removed while pg_rewind runs.
If a file is removed from the source server, while pg_rewind is running, the
invocation of pg_read_binary_file() will fail. Use the just-added missing_ok
option to that function, to have it return NULL instead, and handle that
gracefully. And similarly for pg_ls_dir and pg_stat_file.
Add missing_ok option to the SQL functions for reading files.
This makes it possible to use the functions without getting errors, if there
is a chance that the file might be removed or renamed concurrently.
pg_rewind needs to do just that, although this could be useful for other
purposes too. (The changes to pg_rewind to use these functions will come in
a separate commit.)
The read_binary_file() function isn't very well-suited for extensions.c's
purposes anymore, if it ever was. So bite the bullet and make a copy of it
in extension.c, tailored for that use case. This seems better than the
accidental code reuse, even if it's a some more lines of code.
Tom Lane [Sat, 27 Jun 2015 21:47:39 +0000 (17:47 -0400)]
Avoid passing NULL to memcmp() in lookups of zero-argument functions.
A few places assumed they could pass NULL for the argtypes array when
looking up functions known to have zero arguments. At first glance
it seems that this should be safe enough, since memcmp() is surely not
allowed to fetch any bytes if its count argument is zero. However,
close reading of the C standard says that such calls have undefined
behavior, so we'd probably best avoid it.
Since the number of places doing this is quite small, and some other
places looking up zero-argument functions were already passing dummy
arrays, let's standardize on the latter solution rather than hacking
the function lookup code to avoid calling memcmp() in these cases.
I also added Asserts to catch any future violations of the new rule.
Given the utter lack of any evidence that this actually causes any
problems in the field, I don't feel a need to back-patch this change.
Per report from Piotr Stefaniak, though this is not his patch.
Andres Freund [Sat, 27 Jun 2015 16:49:00 +0000 (18:49 +0200)]
Fix test_decoding's handling of nonexistant columns in old tuple versions.
test_decoding used fastgetattr() to extract column values. That's wrong
when decoding updates and deletes if a table's replica identity is set
to FULL and new columns have been added since the old version of the
tuple was created. Due to the lack of a crosscheck with the datum's
natts values an invalid value will be output, leading to errors or
worse.
Bug: #13470 Reported-By: Krzysztof Kotlarski
Discussion: 20150626100333.3874.90852@wrigleys.postgresql.org
Backpatch to 9.4, where the feature, including the bug, was added.
Kevin Grittner [Sat, 27 Jun 2015 14:55:06 +0000 (09:55 -0500)]
Add opaque declaration of HTAB to tqual.h.
Commit b89e151054a05f0f6d356ca52e3b725dd0505e53 added the
ResolveCminCmaxDuringDecoding declaration to tqual.h, which uses an
HTAB parameter, without declaring HTAB. It accidentally fails to
fail to build with current sources because a declaration happens to
be included, directly or indirectly, in all source files that
currently use tqual.h before tqual.h is first included, but we
shouldn't count on that. Since an opaque declaration is enough
here, just use that, as was done in snapmgr.h.
Backpatch to 9.4, where the HTAB reference was added to tqual.h.
Simon Riggs [Fri, 26 Jun 2015 23:41:47 +0000 (00:41 +0100)]
Avoid hot standby cancels from VAC FREEZE
VACUUM FREEZE generated false cancelations of standby queries on an
otherwise idle master. Caused by an off-by-one error on cutoff_xid
which goes back to original commit.
Alvaro Herrera [Fri, 26 Jun 2015 21:17:54 +0000 (18:17 -0300)]
Fix DDL command collection for TRANSFORM
Commit b488c580ae, which added the DDL command collection feature,
neglected to update the code that commit cac76582053e had previously
added two weeks earlier for the TRANSFORM feature.
Alvaro Herrera [Fri, 26 Jun 2015 21:13:05 +0000 (18:13 -0300)]
Fix BRIN xlog replay
There was a confusion about which block number to use when storing an
item's pointer in the revmap -- the revmap page's blkno was being used,
not the data page's blkno.