Replace our hacked version of ax_pthread.m4 with latest upstream version.
Our version was different from the upstream version in that we tried to use
all possible pthread-related flags that the compiler accepts, rather than
just the first one that works. That change was made in commit e48322a6d6cfce1ec52ab303441df329ddbc04d1, to work-around a bug affecting GCC
versions 3.2 and below (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=8888),
although we didn't realize that it was a GCC bug at the time. We hardly care
about that old GCC versions anymore, so we no longer need that workaround.
This fixes the macro for compilers that print warnings with the chosen
flags. That's pretty annoying on its own right, but it also inconspicuously
disabled thread-safety, because we refused to use any pthread-related flags
if the compiler produced warnings. Max Filippov reported that problem when
linking with uClibc and OpenSSL. The warnings-check was added because the
workaround for the GCC bug caused warnings otherwise, so it's no longer
needed either. We can just use the upstream version as is.
If you really want to compile with GCC version 3.2 or older, you can still
work-around it manually by setting PTHREAD_CFLAGS="-pthread -lpthread"
manually on the configure command line.
Backpatch to 9.5. I don't want to unnecessarily rock the boat on stable
branches, but 9.5 seems like fair game.
Joe Conway [Tue, 7 Jul 2015 21:35:35 +0000 (14:35 -0700)]
Improve regression test coverage of table lock modes vs permissions.
Test the interactions with permissions and LOCK TABLE. Specifically
ROW EXCLUSIVE, ACCESS SHARE, and ACCESS EXCLUSIVE modes against
SELECT, INSERT, UPDATE, DELETE, and TRUNCATE permissions. Discussed
by Stephen Frost and Michael Paquier, patch by the latter. Backpatch
to 9.5 where matching behavior was first committed.
Tom Lane [Tue, 7 Jul 2015 16:49:18 +0000 (12:49 -0400)]
Fix portability issue in pg_upgrade test script: avoid $PWD.
SUSv2-era shells don't set the PWD variable, though anything more modern
does. In the buildfarm environment this could lead to test.sh executing
with PWD pointing to $HOME or another high-level directory, so that there
were conflicts between concurrent executions of the test in different
branch subdirectories. This appears to be the explanation for recent
intermittent failures on buildfarm members binturong and dingo (and might
well have something to do with the buildfarm script's failure to capture
log files from pg_upgrade tests, too).
To fix, just use `pwd` in place of $PWD. AFAICS test.sh is the only place
in our source tree that depended on $PWD. Back-patch to all versions
containing this script.
Per buildfarm. Thanks to Oskari Saarenmaa for diagnosing the problem.
If an allocation fails in the main message handling loop, pqParseInput3
or pqParseInput2, it should not be treated as "not enough data available
yet". Otherwise libpq will wait indefinitely for more data to arrive from
the server, and gets stuck forever.
This isn't a complete fix - getParamDescriptions and getCopyStart still
have the same issue, but it's a step in the right direction.
Michael Paquier and me. Backpatch to all supported versions.
Andres Freund [Tue, 7 Jul 2015 11:05:41 +0000 (13:05 +0200)]
Fix logical decoding bug leading to inefficient reopening of files.
When spilling transaction data to disk a simple typo caused the output
file to be closed and reopened for every serialized change. That happens
to not have a huge impact on linux, which is why it probably wasn't
noticed so far, but on windows that appears to trigger actual disk
writes after every change. Not fun.
The bug fortunately does not have any impact besides speed. A change
could end up being in the wrong segment (last instead of next), but
since we read all files to the end, that's just ugly, not really
problematic. It's not a problem to upgrade, since transaction spill
files do not persist across restarts.
Andres Freund [Tue, 7 Jul 2015 10:47:44 +0000 (12:47 +0200)]
Fix pg_recvlogical not to fsync output when it's a tty or pipe.
The previous coding tried to handle possible failures when fsyncing a
tty or pipe fd by accepting EINVAL - but apparently some
platforms (windows, OSX) don't reliably return that. So instead check
whether the output fd refers to a pipe or a tty when opening it.
Reported-By: Olivier Gosseaume, Marko Tiikkaja
Discussion: 559AF98B.3050901@joh.to
Turn install.bat into a pure one line wrapper fort he perl script.
Build.bat and vcregress.bat got similar treatment years ago. I'm not sure
why install.bat wasn't treated at the same time, but it seems like a good
idea anyway.
The immediate problem with the old install.bat was that it had quoting
issues, and wouldn't work if the target directory's name contained spaces.
This fixes that problem.
We're interested in the buffer size of the socket that's connected to the
client, not the one that's listening for new connections. It happened to
work, as default buffer size is the same on both, but it was clearly not
wrong.
Don't set SO_SNDBUF on recent Windows versions that have a bigger default.
It's unnecessary to set it if the default is higher in the first place.
Furthermore, setting SO_SNDBUF disables the so-called "dynamic send
buffering" feature, which hurts performance further. This can be seen
especially when the network between the client and the server has high
latency.
Remove incorrect warning from pg_archivecleanup document.
The .backup file name can be passed to pg_archivecleanup even if
it includes the extension which is specified in -x option.
However, previously the document incorrectly warned a user
not to do that.
Back-patch to 9.2 where pg_archivecleanup's -x option and
the warning were added.
Tom Lane [Sun, 5 Jul 2015 16:57:17 +0000 (12:57 -0400)]
Further reduce overhead for passing plpgsql variables to the executor.
This builds on commit 21dcda2713656a7483e3280ac9d2ada20a87a9a9 by keeping
a plpgsql function's shared ParamListInfo's entries for simple variables
(PLPGSQL_DTYPE_VARs) valid at all times. That adds a few cycles to each
assignment to such variables, but saves significantly more cycles each time
they are used; so except in the pathological case of many dead stores, this
should always be a win. Initial testing says it's good for about a 10%
speedup of simple calculations; more in large functions with many datums.
We can't use this method for row/record references unfortunately, so what
we do for those is reset those ParamListInfo slots after use; which we
can skip doing unless some of them were actually evaluated during the
previous evaluation call. So this should frequently be a win as well,
while worst case is that it's similar cost to the previous approach.
Also, closer study suggests that the previous method of instantiating a
new ParamListInfo array per evaluation is actually probably optimal for
cursor-opening executor calls. The reason is that whatever is visible in
the array is going to get copied into the cursor portal via copyParamList.
So if we used the function's main ParamListInfo for those calls, we'd end
up with all of its DTYPE_VAR vars getting copied, which might well include
large pass-by-reference values that the cursor actually has no need for.
To avoid a possible net degradation in cursor cases, go back to creating
and filling a private ParamListInfo in those cases (which therefore will be
exactly the same speed as before 21dcda271365). We still get some benefit
out of this though, because this approach means that we only have to defend
against copyParamList's try-to-fetch-every-slot behavior in the case of an
unshared ParamListInfo; so plpgsql_param_fetch() can skip testing
expr->paramnos in the common case.
To ensure that the main ParamListInfo's image of a DTYPE_VAR datum is
always valid, all assignments to such variables are now funneled through
assign_simple_var(). But this makes for cleaner and shorter code anyway.
You can no longer use pgbench with multiple threads when compiled without
--enable-thread-safety. That's an acceptable limitation these days; it
still works fine with -j1, and all modern platforms support threads anyway.
This makes future maintenance and development of the code easier.
Fix pgbench progress report behaviour when pgbench or a query gets stuck.
There were two issues here. First, if a query got stuck so that it took
e.g. 5 seconds, and progress interval was 1 second, no progress reports were
printed until the query returned. Fix so that we wake up specifically to
print the progress report. Secondly, if pgbench got stuck so that it would
nevertheless not print a progress report on time, and enough time passes
that it's already time to print the next progress report, just skip the one
that was missed. Before this patch, it would print the missed one with 0 TPS
immediately after the previous one.
Fabien Coelho. Backpatch to 9.4, where progress reports were added.
Make WAL-related utilities handle .partial WAL files properly.
Commit de76884 changed an archive recovery so that the last WAL
segment with old timeline was renamed with suffix .partial. It should
have updated WAL-related utilities so that they can handle such
.paritial WAL files, but we forgot that.
This patch changes pg_archivecleanup so that it can clean up even
archived WAL files with .partial suffix. Also it allows us to specify
.partial WAL file name as the command-line argument "oldestkeptwalfile".
This patch also changes pg_resetxlog so that it can remove .partial
WAL files in pg_xlog directory.
pg_xlogdump cannot handle .partial WAL files. Per discussion,
we decided only to document that limitation instead of adding the fix.
Because a user can easily work around the limitation (i.e., just remove
.partial suffix from the file name) and the fix seems complicated for
very narrow use case.
Back-patch to 9.5 where the problem existed.
Review by Michael Paquier.
Discussion: http://www.postgresql.org/message-id/CAHGQGwGxMKnVHGgTfiig2Bt_2djec0in3-DLJmtg7+nEiidFdQ@mail.gmail.com
Tom Lane [Thu, 2 Jul 2015 22:13:34 +0000 (18:13 -0400)]
Improve pg_restore's -t switch to match all types of relations.
-t will now match views, foreign tables, materialized views, and sequences,
not only plain tables. This is more useful, and also more consistent with
the behavior of pg_dump's -t switch, which has always matched all relation
types.
We're still not there on matching pg_dump's behavior entirely, so mention
that in the docs.
Tom Lane [Thu, 2 Jul 2015 21:24:36 +0000 (17:24 -0400)]
Make numeric form of PG version number readily available in Makefiles.
Expose PG_VERSION_NUM (e.g., "90600") as a Make variable; but for
consistency with the other Make variables holding similar info,
call the variable just VERSION_NUM not PG_VERSION_NUM.
There was some discussion of making this value available as a pg_config
value as well. However, that would entail substantially more work than
this two-line patch. Given that there was not exactly universal consensus
that we need this at all, let's just do a minimal amount of work for now.
Tom Lane [Thu, 2 Jul 2015 21:02:08 +0000 (17:02 -0400)]
Fix misuse of TextDatumGetCString().
"TextDatumGetCString(PG_GETARG_TEXT_P(x))" is formally wrong: a text*
is not a Datum. Although this coding will accidentally fail to fail on
all known platforms, it risks leaking memory if a detoast step is needed,
unlike "TextDatumGetCString(PG_GETARG_DATUM(x))" which is what's used
elsewhere. Make pg_get_object_address() fall in line with other uses.
Noted while reviewing two-arg current_setting() patch.
These variants used the old-style 'n'/' ' NULL indicators. The new-style
functions have been available since version 8.1. That should be long enough
that if there is still any old external code using these functions, they
can just switch to the new functions without worrying about backwards
compatibility
Plug some trivial memory leaks in pg_dump and pg_upgrade.
There's no point in trying to free every small allocation in these
programs that are used in a one-shot fashion, but these ones seems like
an improvement on readability grounds.
Use appendStringInfoString/Char et al where appropriate.
Patch by David Rowley. Backpatch to 9.5, as some of the calls were new in
9.5, and keeping the code in sync with master makes future backpatching
easier.
Make use of xlog_internal.h's macros in WAL-related utilities.
Commit 179cdd09 added macros to check if a filename is a WAL segment
or other such file. However there were still some instances of the
strlen + strspn combination to check for that in WAL-related utilities
like pg_archivecleanup. Those checks can be replaced with the macros.
This patch makes use of the macros in those utilities and
which would make the code a bit easier to read.
Tom Lane [Wed, 1 Jul 2015 22:55:39 +0000 (18:55 -0400)]
Don't leave pg_hba and pg_ident data lying around in running backends.
Free the contexts holding this data after we're done using it, by the
expedient of attaching them to the PostmasterContext which we were
already taking care to delete (and where, indeed, this data used to live
before commits e5e2fc842c418432 and 7c45e3a3c682f855). This saves a
probably-usually-negligible amount of space per running backend. It also
avoids leaving potentially-security-sensitive data lying around in memory
in processes that don't need it. You'd have to be unusually paranoid to
think that that amounts to a live security bug, so I've not gone so far as
to forcibly zero the memory; but there surely isn't a good reason to keep
this data around.
Arguably this is a memory management bug in the aforementioned commits,
but it doesn't seem important enough to back-patch.
Tom Lane [Wed, 1 Jul 2015 22:07:48 +0000 (18:07 -0400)]
Make sampler_random_fract() actually obey its API contract.
This function is documented to return a value in the range (0,1),
which is what its predecessor anl_random_fract() did. However, the
new version depends on pg_erand48() which returns a value in [0,1).
The possibility of returning zero creates hazards of division by zero
or trying to compute log(0) at some call sites, and it might well
break third-party modules using anl_random_fract() too. So let's
change it to never return zero. Spotted by Coverity.
XLogFileCopy() was changed heavily in commit de76884. However it was
partially reverted in commit 7abc685 and most of those changes to
XLogFileCopy() were no longer needed. Then commit 7cbee7c removed
those unnecessary code, but XLogFileCopy() looked different in master
and 9.4 though the contents are almost the same.
This patch makes XLogFileCopy() look the same in master and back-branches,
which makes back-patching easier, per discussion on pgsql-hackers.
Back-patch to 9.5.
Andres Freund [Tue, 30 Jun 2015 19:00:12 +0000 (21:00 +0200)]
Improve 9.5 release notes.
1) Add sgml comments referencing commits. This is useful to search for
missing items etc.
The comments containing the commit notes are an excerpt from:
git log --date=short \
--pretty='format:%cd [%h] %<(8,trunc)%cN: %<(48,trunc)%s%n%n%w(,4,4)%b%n' \
$(git merge-base origin/master upstream/REL9_4_STABLE)..origin/master
2) Improve a handful of existing notes
3) Add missing entries about a couple features.
4) Add a bunch of straight-forward FIXMEs
Don't call PageGetSpecialPointer() on page until it's been initialized.
After calling XLogInitBufferForRedo(), the page might be all-zeros if it was
not in page cache already. btree_xlog_unlink_page initialized the page
correctly, but it called PageGetSpecialPointer before initializing it, which
would lead to a corrupt page at WAL replay, if the unlinked page is not in
page cache.
Backpatch to 9.4, the bug came with the rewrite of B-tree page deletion.
Initialize GIN metapage correctly when replaying metapage-update WAL record.
I broke this with my WAL format refactoring patch. Before that, the metapage
was read from disk, and modified in-place regardless of the LSN. That was
always a bit silly, as there's no need to read the old page version from
disk disk when we're overwriting it anyway. So that was changed in 9.5, but
I failed to add a GinInitPage call to initialize the page-headers correctly.
Usually you wouldn't notice, because the metapage is already in the page
cache and is not zeroed.
One way to reproduce this is to perform a VACUUM on an already vacuumed
table (so that the vacuum has no real work to do), immediately after a
checkpoint, and then perform an immediate shutdown. After recovery, the
page headers of the metapage will be incorrectly all-zeroes.
Tom Lane [Mon, 29 Jun 2015 19:38:46 +0000 (15:38 -0400)]
Desultory review of 9.5 release notes.
Minor corrections and clarifications. Notably, for stuff that got moved
out of contrib, make sure it's documented somewhere other than "Additional
Modules".
I'm sure these need more work, but that's all I have time for today.
Tom Lane [Mon, 29 Jun 2015 16:42:52 +0000 (12:42 -0400)]
Code + docs review for escaping of option values (commit 11a020eb6).
Avoid memory leak from incorrect choice of how to free a StringInfo
(resetStringInfo doesn't do it). Now that pg_split_opts doesn't scribble
on the optstr, mark that as "const" for clarity. Attach the commentary in
protocol.sgml to the right place, and add documentation about the
user-visible effects of this change on postgres' -o option and libpq's
PGOPTIONS option.
Andres Freund [Mon, 29 Jun 2015 12:53:32 +0000 (14:53 +0200)]
Replace ia64 S_UNLOCK compiler barrier with a full memory barrier.
_Asm_sched_fence() is just a compiler barrier, not a memory barrier. But
spinlock release on IA64 needs, at the very least, release
semantics. Use a full barrier instead.
This might be the cause for the occasional failures on buildfarm member
anole.
Tom Lane [Sun, 28 Jun 2015 22:06:14 +0000 (18:06 -0400)]
Improve design and implementation of pg_file_settings view.
As first committed, this view reported on the file contents as they were
at the last SIGHUP event. That's not as useful as reporting on the current
contents, and what's more, it didn't work right on Windows unless the
current session had serviced at least one SIGHUP. Therefore, arrange to
re-read the files when pg_show_all_settings() is called. This requires
only minor refactoring so that we can pass changeVal = false to
set_config_option() so that it won't actually apply any changes locally.
In addition, add error reporting so that errors that would prevent the
configuration files from being loaded, or would prevent individual settings
from being applied, are visible directly in the view. This makes the view
usable for pre-testing whether edits made in the config files will have the
desired effect, before one actually issues a SIGHUP.
I also added an "applied" column so that it's easy to identify entries that
are superseded by later entries; this was the main use-case for the original
design, but it seemed unnecessarily hard to use for that.
Also fix a 9.4.1 regression that allowed multiple entries for a
PGC_POSTMASTER variable to cause bogus complaints in the postmaster log.
(The issue here was that commit bf007a27acd7b2fb unintentionally reverted 3e3f65973a3c94a6, which suppressed any duplicate entries within
ParseConfigFp. However, since the original coding of the pg_file_settings
view depended on such suppression *not* happening, we couldn't have fixed
this issue now without first doing something with pg_file_settings.
Now we suppress duplicates by marking them "ignored" within
ProcessConfigFileInternal, which doesn't hide them in the view.)
Lesser changes include:
Drive the view directly off the ConfigVariable list, instead of making a
basically-equivalent second copy of the data. There's no longer any need
to hang onto the data permanently, anyway.
Convert show_all_file_settings() to do its work in one call and return a
tuplestore; this avoids risks associated with assuming that the GUC state
will hold still over the course of query execution. (I think there were
probably latent bugs here, though you might need something like a cursor
on the view to expose them.)
Arrange to run SIGHUP processing in a short-lived memory context, to
forestall process-lifespan memory leaks. (There is one known leak in this
code, in ProcessConfigDirectory; it seems minor enough to not be worth
back-patching a specific fix for.)
Remove mistaken assignment to ConfigFileLineno that caused line counting
after an include_dir directive to be completely wrong.
Add missed failure check in AlterSystemSetConfigFile(). We don't really
expect ParseConfigFp() to fail, but that's not an excuse for not checking.
Also trigger restartpoints based on max_wal_size on standby.
When archive recovery and restartpoints were initially introduced,
checkpoint_segments was ignored on the grounds that the files restored from
archive don't consume any space in the recovery server. That was changed in
later releases, but even then it was arguably a feature rather than a bug,
as performing restartpoints as often as checkpoints during normal operation
might be excessive, but you might nevertheless not want to waste a lot of
space for pre-allocated WAL by setting checkpoint_segments to a high value.
But now that we have separate min_wal_size and max_wal_size settings, you
can bound WAL usage with max_wal_size, and still avoid consuming excessive
space usage by setting min_wal_size to a lower value, so that argument is
moot.
There are still some issues with actually limiting the space usage to
max_wal_size: restartpoints in recovery can only start after seeing the
checkpoint record, while a checkpoint starts flushing buffers as soon as
the redo-pointer is set. Restartpoint is paced to happen at the same
leisurily speed, determined by checkpoint_completion_target, as checkpoints,
but because they are started later, max_wal_size can be exceeded by upto
one checkpoint cycle's worth of WAL, depending on
checkpoint_completion_target. But that seems better than not trying at all,
and max_wal_size is a soft limit anyway.
The documentation already claimed that max_wal_size is obeyed in recovery,
so this just fixes the behaviour to match the docs. However, add some
weasel-words there to mention that max_wal_size may well be exceeded by
some amount in recovery.
Promote the assertion that XLogBeginInsert() is not called twice into ERROR.
Seems like cheap insurance for WAL bugs. A spurious call to
XLogBeginInsert() in itself would be fairly harmless, but if there is any
data registered and the insertion is not completed/cancelled properly, there
is a risk that the data ends up in a wrong WAL record.
Fix double-XLogBeginInsert call in GIN page splits.
If data checksums or wal_log_hints is on, and a GIN page is split, the code
to find a new, empty, block was called after having already called
XLogBeginInsert(). That causes an assertion failure or PANIC, if finding the
new block involves updating a FSM page that had not been modified since last
checkpoint, because that update is WAL-logged, which calls XLogBeginInsert
again. Nested XLogBeginInsert calls are not supported.
To fix, rearrange GIN code so that XLogBeginInsert is called later, after
finding the victim buffers.
Don't choke on files that are removed while pg_rewind runs.
If a file is removed from the source server, while pg_rewind is running, the
invocation of pg_read_binary_file() will fail. Use the just-added missing_ok
option to that function, to have it return NULL instead, and handle that
gracefully. And similarly for pg_ls_dir and pg_stat_file.
Add missing_ok option to the SQL functions for reading files.
This makes it possible to use the functions without getting errors, if there
is a chance that the file might be removed or renamed concurrently.
pg_rewind needs to do just that, although this could be useful for other
purposes too. (The changes to pg_rewind to use these functions will come in
a separate commit.)
The read_binary_file() function isn't very well-suited for extensions.c's
purposes anymore, if it ever was. So bite the bullet and make a copy of it
in extension.c, tailored for that use case. This seems better than the
accidental code reuse, even if it's a some more lines of code.
Tom Lane [Sat, 27 Jun 2015 21:47:39 +0000 (17:47 -0400)]
Avoid passing NULL to memcmp() in lookups of zero-argument functions.
A few places assumed they could pass NULL for the argtypes array when
looking up functions known to have zero arguments. At first glance
it seems that this should be safe enough, since memcmp() is surely not
allowed to fetch any bytes if its count argument is zero. However,
close reading of the C standard says that such calls have undefined
behavior, so we'd probably best avoid it.
Since the number of places doing this is quite small, and some other
places looking up zero-argument functions were already passing dummy
arrays, let's standardize on the latter solution rather than hacking
the function lookup code to avoid calling memcmp() in these cases.
I also added Asserts to catch any future violations of the new rule.
Given the utter lack of any evidence that this actually causes any
problems in the field, I don't feel a need to back-patch this change.
Per report from Piotr Stefaniak, though this is not his patch.
Andres Freund [Sat, 27 Jun 2015 16:49:00 +0000 (18:49 +0200)]
Fix test_decoding's handling of nonexistant columns in old tuple versions.
test_decoding used fastgetattr() to extract column values. That's wrong
when decoding updates and deletes if a table's replica identity is set
to FULL and new columns have been added since the old version of the
tuple was created. Due to the lack of a crosscheck with the datum's
natts values an invalid value will be output, leading to errors or
worse.
Bug: #13470 Reported-By: Krzysztof Kotlarski
Discussion: 20150626100333.3874.90852@wrigleys.postgresql.org
Backpatch to 9.4, where the feature, including the bug, was added.
Kevin Grittner [Sat, 27 Jun 2015 14:55:06 +0000 (09:55 -0500)]
Add opaque declaration of HTAB to tqual.h.
Commit b89e151054a05f0f6d356ca52e3b725dd0505e53 added the
ResolveCminCmaxDuringDecoding declaration to tqual.h, which uses an
HTAB parameter, without declaring HTAB. It accidentally fails to
fail to build with current sources because a declaration happens to
be included, directly or indirectly, in all source files that
currently use tqual.h before tqual.h is first included, but we
shouldn't count on that. Since an opaque declaration is enough
here, just use that, as was done in snapmgr.h.
Backpatch to 9.4, where the HTAB reference was added to tqual.h.
Simon Riggs [Fri, 26 Jun 2015 23:41:47 +0000 (00:41 +0100)]
Avoid hot standby cancels from VAC FREEZE
VACUUM FREEZE generated false cancelations of standby queries on an
otherwise idle master. Caused by an off-by-one error on cutoff_xid
which goes back to original commit.
Alvaro Herrera [Fri, 26 Jun 2015 21:17:54 +0000 (18:17 -0300)]
Fix DDL command collection for TRANSFORM
Commit b488c580ae, which added the DDL command collection feature,
neglected to update the code that commit cac76582053e had previously
added two weeks earlier for the TRANSFORM feature.
Alvaro Herrera [Fri, 26 Jun 2015 21:13:05 +0000 (18:13 -0300)]
Fix BRIN xlog replay
There was a confusion about which block number to use when storing an
item's pointer in the revmap -- the revmap page's blkno was being used,
not the data page's blkno.
Robert Haas [Fri, 26 Jun 2015 19:53:13 +0000 (15:53 -0400)]
Be more conservative about removing tablespace "symlinks".
Don't apply rmtree(), which will gleefully remove an entire subtree,
and don't even apply unlink() unless it's symlink or a directory,
the only things that we expect to find.
Amit Kapila, with minor tweaks by me, per extensive discussions
involving Andrew Dunstan, Fujii Masao, and Heikki Linnakangas,
at least some of whom also reviewed the code.
Robert Haas [Fri, 26 Jun 2015 18:45:32 +0000 (14:45 -0400)]
Remove unnecessary NULL test.
Spotted by Coverity and reported by Michael Paquier. Per discussion,
we don't necessarily care about making Coverity happy in all such
instances, but we can go ahead and change them where it otherwise
seems to improve the code.
Andres Freund [Fri, 26 Jun 2015 15:00:01 +0000 (17:00 +0200)]
Fix the fallback memory barrier implementation to be reentrant.
This was essentially "broken" since 0c8eda62; but until more
recently (14e8803f) barriers usage in signal handlers was infrequent.
The failure to be reentrant was noticed because the test_shm_mq, which
uses memory barriers at a high frequency, occasionally got stuck on some
solaris buildfarm animals. Turns out, those machines use sun studio
12.1, which doesn't yet have efficient memory barrier support. A machine
with a newer sun studio did not fail. Forcing the barrier fallback to
be used on x86 allows to reproduce the problem.
The new fallback is to use kill(PostmasterPid, 0) based on the theory
that that'll always imply a barrier due to checking the liveliness of
PostmasterPid on systems old enough to need fallback support. It's hard
to come up with a good and performant fallback.
I'm not backpatching this for now - the problem isn't active in the back
branches, and we haven't backpatched barrier changes for
now. Additionally master looks entirely different than the back branches
due to the new atomics abstraction. It seems better to let this rest in
master, where the non-reentrancy actively causes a problem, and then
consider backpatching.
Found-By: Robert Haas
Discussion: 55626265.3060800@dunslane.net
Robert Haas [Fri, 26 Jun 2015 13:40:47 +0000 (09:40 -0400)]
Improve handling of CustomPath/CustomPlan(State) children.
Allow CustomPath to have a list of paths, CustomPlan a list of plans,
and CustomPlanState a list of planstates known to the core system, so
that custom path/plan providers can more reasonably use this
infrastructure for nodes with multiple children.
KaiGai Kohei, per a design suggestion from Tom Lane, with some
further kibitzing by me.
1. Replay of the WAL record for setting a bit in the visibility map
contained an assertion that a full-page image of that record type can only
occur with checksums enabled. But it can also happen with wal_log_hints, so
remove the assertion. Unlike checksums, wal_log_hints can be changed on the
fly, so it would be complicated to figure out if it was enabled at the time
that the WAL record was generated.
2. wal_log_hints has the same effect on the locking needed to read the LSN
of a page as data checksums. BufferGetLSNAtomic() didn't get the memo.
Tom Lane [Thu, 25 Jun 2015 18:39:05 +0000 (14:39 -0400)]
Fix the logic for putting relations into the relcache init file.
Commit f3b5565dd4e59576be4c772da364704863e6a835 was a couple of bricks shy
of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index
into the relcache init file, because that index is not used by any
syscache. However, we have historically nailed that index into cache for
performance reasons. The upshot was that load_relcache_init_file always
decided that the init file was busted and silently ignored it, resulting
in a significant hit to backend startup speed.
To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around
RelationSupportsSysCache(), which can know about additional relations
that should be in the init file despite being unknown to syscache.c.
Also install some guards against future mistakes of this type: make
write_relcache_init_file Assert that all nailed relations get written to
the init file, and make load_relcache_init_file emit a WARNING if it takes
the "wrong number of nailed relations" exit path. Now that we remove the
init files during postmaster startup, that case should never occur in the
field, even if we are starting a minor-version update that added or removed
rels from the nailed set. So the warning shouldn't ever be seen by end
users, but it will show up in the regression tests if somebody breaks this
logic.
Back-patch to all supported branches, like the previous commit.
Tom Lane [Thu, 25 Jun 2015 14:44:03 +0000 (10:44 -0400)]
Docs: fix claim that to_char('FM') removes trailing zeroes.
Of course, what it removes is leading zeroes. Seems to have been a thinko
in commit ffe92d15d53625d5ae0c23f4e1984ed43614a33d. Noted by Hubert Depesz
Lubaczewski.
Tom Lane [Mon, 22 Jun 2015 22:53:27 +0000 (18:53 -0400)]
Improve inheritance_planner()'s performance for large inheritance sets.
Commit c03ad5602f529787968fa3201b35c119bbc6d782 introduced a planner
performance regression for UPDATE/DELETE on large inheritance sets.
It required copying the append_rel_list (which is of size proportional to
the number of inherited tables) once for each inherited table, thus
resulting in O(N^2) time and memory consumption. While it's difficult to
avoid that in general, the extra work only has to be done for
append_rel_list entries that actually reference subquery RTEs, which
inheritance-set entries will not. So we can buy back essentially all of
the loss in cases without subqueries in FROM; and even for those, the added
work is mainly proportional to the number of UNION ALL subqueries.
Back-patch to 9.2, like the previous commit.
Tom Lane and Dean Rasheed, per a complaint from Thomas Munro.
Noah Misch [Mon, 22 Jun 2015 00:04:36 +0000 (20:04 -0400)]
Truncate strings in tarCreateHeader() with strlcpy(), not sprintf().
This supplements the GNU libc bug #6530 workarounds introduced in commit 54cd4f04576833abc394e131288bf3dd7dcf4806. On affected systems, a
tar-format pg_basebackup failed when some filename beneath the data
directory was not valid character data in the postmaster/walsender
locale. Back-patch to 9.1, where pg_basebackup was introduced. Extant,
bug-prone conversion specifications receive only ASCII bytes or involve
low-importance messages.
Andres Freund [Sun, 21 Jun 2015 16:57:28 +0000 (18:57 +0200)]
Improve multixact emergency autovacuum logic.
Previously autovacuum was not necessarily triggered if space in the
members slru got tight. The first problem was that the signalling was
tied to values in the offsets slru, but members can advance much
faster. Thats especially a problem if old sessions had been around that
previously prevented the multixact horizon to increase. Secondly the
skipping logic doesn't work if the database was restarted after
autovacuum was triggered - that knowledge is not preserved across
restart. This is especially a problem because it's a common
panic-reaction to restart the database if it gets slow to
anti-wraparound vacuums.
Fix the first problem by separating the logic for members from
offsets. Trigger autovacuum whenever a multixact crosses a segment
boundary, as the current member offset increases in irregular values, so
we can't use a simple modulo logic as for offsets. Add a stopgap for
the second problem, by signalling autovacuum whenver ERRORing out
because of boundaries.
Andres Freund [Sun, 21 Jun 2015 16:35:59 +0000 (18:35 +0200)]
Add missing check for wal_debug GUC.
9a20a9b2 added a new elog(), enabled when WAL_DEBUG is defined. The
other WAL_DEBUG dependant messages check for the wal_debug GUC, but this
one did not. While at it replace 'upto' with 'up to'.