Tom Lane [Mon, 14 May 2012 01:11:31 +0000 (21:11 -0400)]
Add some temporary instrumentation to pgstat.c.
Log main-loop blocking events and the results of inquiry messages.
This is to get some clarity as to what's happening on those Windows
buildfarm members that still don't like the latch-ified stats collector.
This bulks up the postmaster log a tad, so I won't leave it in place for
long.
Tom Lane [Sun, 13 May 2012 22:06:52 +0000 (18:06 -0400)]
Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation. However, we forgot that the pg_tblspc
symlink might still be there. We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.
There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.
Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
Tom Lane [Sun, 13 May 2012 18:44:39 +0000 (14:44 -0400)]
Re-revert stats collector latch changes.
This reverts commit cb2f2873d6b81ad7f0a9733ba738bfac0746fb7b, restoring
the latch-ified stats collector logic. We'll soon see if this works any
better on the Windows buildfarm machines.
Tom Lane [Sun, 13 May 2012 18:35:40 +0000 (14:35 -0400)]
Attempt to fix some issues in our Windows socket code.
Make sure WaitLatchOrSocket regards FD_CLOSE as a read-ready condition.
We might want to tweak this further, but it was surely wrong as-is.
Make pgwin32_waitforsinglesocket detach its private event object from the
passed socket before returning. I suspect that failure to do so leads
to race conditions when other code (such as WaitLatchOrSocket) attaches
a different event object to the same socket. Moreover, the existing
coding meant that repeated calls to pgwin32_waitforsinglesocket would
perform ResetEvent on an event actively connected to a socket, which
is rumored to be an unsafe practice; the WSAEventSelect documentation
appears to recommend against this, though it does not say not to do it
in so many words.
Also, uniformly use the coding pattern "WSAEventSelect(s, NULL, 0)" to
detach events from sockets, rather than passing the event in the second
parameter. The WSAEventSelect documentation says that the second parameter
is ignored if the third is 0, so theoretically this should make no
difference. However, elsewhere on the same reference page the use of NULL
in this context is recommended, and I have found suggestions on the net
that some versions of Windows have bugs with a non-NULL second parameter
in this usage.
Some other mostly-cosmetic cleanup, such as using the right one of
WSAGetLastError and GetLastError for reporting errors from these functions.
Tom Lane [Sun, 13 May 2012 04:30:32 +0000 (00:30 -0400)]
Fix bogus declaration of local variable.
rc should be an int here, not a pgsocket. Fairly harmless as long as
pgsocket is an integer type, but nonetheless wrong. Error introduced
in commit 87091cb1f1ed914e2ddca424fa28f94fdf8461d2.
Tom Lane [Sat, 12 May 2012 23:21:54 +0000 (19:21 -0400)]
Avoid unnecessary process wakeups in the log collector.
syslogger was coded to wake up once per second whether there was anything
useful to do or not. As part of our campaign to reduce the server's idle
power consumption, change it to use a latch for waiting. Now, in the
absence of any data to log or any signals to service, it will only wake up
at the programmed logfile rotation times (if any).
Peter Eisentraut [Sat, 12 May 2012 20:29:07 +0000 (23:29 +0300)]
Remove unused AC_SUBST variables
These were apparently never used. The AC_SUBST was probably just
added in a copy-and-paste manner. (The shell variables continue to be
used inside configure. The change is just that we don't need them
outside of configure.)
Tom Lane [Sat, 12 May 2012 20:36:47 +0000 (16:36 -0400)]
Fix WaitLatchOrSocket to handle EOF on socket correctly.
When using poll(), EOF on a socket is reported with the POLLHUP not
POLLIN flag (at least on Linux). WaitLatchOrSocket failed to check
this bit, causing it to go into a busy-wait loop if EOF occurs.
We earlier fixed the same mistake in the test for the state of the
postmaster_alive socket, but missed it for the caller-supplied socket.
Fortunately, this error is new in 9.2, since 9.1 only had a select()
based code path not a poll() based one.
Tom Lane [Fri, 11 May 2012 22:33:39 +0000 (18:33 -0400)]
Update example of process titles shown by "ps".
This example was quite old: it lacked the WAL writer and autovac launcher
as well as the more recently added checkpointer. Linux "ps" seems to show
slightly different stuff now too.
Peter Eisentraut [Fri, 11 May 2012 20:01:15 +0000 (23:01 +0300)]
PL/Python: Adjust the regression tests for Python 3.3
The string representation of ImportError changed. Remove printing
that; it's not necessary for the test.
The order in which members of a dict are printed changed. But this
was always implementation-dependent, so we have just been lucky for a
long time. Do the printing the hard way to ensure sorted order.
Tom Lane [Fri, 11 May 2012 19:22:30 +0000 (15:22 -0400)]
Fix contrib/citext's upgrade script to handle array and domain cases.
We previously recognized that citext wouldn't get marked as collatable
during pg_upgrade from a pre-9.1 installation, and hacked its
create-from-unpackaged script to manually perform the necessary catalog
adjustments. However, we overlooked the fact that domains over citext,
as well as the citext[] array type, need the same adjustments. Extend
the script to handle those cases.
Also, the documentation suggested that this was only an issue in pg_upgrade
scenarios, which is quite wrong; loading any dump containing citext from a
pre-9.1 server will also result in the type being wrongly marked.
I approached the documentation problem by changing the 9.1.2 release note
paragraphs about this issue, which is historically inaccurate. But it
seems better than having the information scattered in multiple places, and
leaving incorrect info in the 9.1.2 notes would be bad anyway. We'll still
need to mention the issue again in the 9.1.4 notes, but perhaps they can
just reference 9.1.2 for fix instructions.
Per report from Evan Carroll. Back-patch into 9.1.
On GiST page split, release the locks on child pages before recursing up.
When inserting the downlinks for a split gist page, we used hold the locks
on the child pages until the insertion into the parent - and recursively its
parent if it had to be split too - were all completed. Change that so that
the locks on child pages are released after the insertion in the immediate
parent is done, before recursing further up the tree.
This reduces the number of lwlocks that are held simultaneously. Holding
many locks is bad for concurrency, and in extreme cases you can even hit
the limit of 100 simultaneously held lwlocks in a backend. If you're really
unlucky, you can hit the limit while in a critical section, which brings
down the whole system.
This fixes bug #6629 reported by Tom Forbes. Backpatch to 9.1. The page
splitting code was rewritten in 9.1, and the old code did not have this
problem.
Tom Lane [Fri, 11 May 2012 03:01:28 +0000 (23:01 -0400)]
Improve discussion of setting server parameters.
Rewrite description of "include_if_exists" for clarity. Add subsection
headings to make the structure of the page a little clearer. A couple
other minor improvements too.
Tom Lane [Thu, 10 May 2012 22:02:37 +0000 (18:02 -0400)]
Tweak documentation wording to avoid "pdfendlink" failure.
HEAD documentation was failing to build as US PDF for me, because a link
to "CREATE CAST" was getting split across pages. Adjust wording to
remove this rather gratuitous cross-reference.
Tom Lane [Thu, 10 May 2012 21:26:08 +0000 (17:26 -0400)]
Temporarily revert stats collector latch changes so we can ship beta1.
This patch reverts commit 49340037ee3ab46cb24144a86705e35f272c24d5 and some
follow-on tweaking in pgstat.c. While the basic scheme of latch-ifying the
stats collector seems sound enough, it's failing on most Windows buildfarm
members for unknown reasons, and there's no time left to debug that before
9.2beta1. Better to ship a beta version without this improvement. I hope
to re-revert this once beta1 is out, though.
Tom Lane [Thu, 10 May 2012 18:34:22 +0000 (14:34 -0400)]
Make WaitLatch's WL_POSTMASTER_DEATH result trustworthy; simplify callers.
Per a suggestion from Peter Geoghegan, make WaitLatch responsible for
verifying that the WL_POSTMASTER_DEATH bit it returns is truthful (by
testing PostmasterIsAlive). Then simplify its callers, who no longer
need to do that for themselves. Remove weasel wording about falsely-set
result bits from WaitLatch's API contract.
Peter Eisentraut [Thu, 10 May 2012 17:38:17 +0000 (20:38 +0300)]
PL/Python: Fix slicing support for result objects for Python 3
The old way of implementing slicing support by implementing
PySequenceMethods.sq_slice no longer works in Python 3. You now have
to implement PyMappingMethods.mp_subscript. Do this by simply
proxying the call to the wrapped list of result dictionaries.
Consolidate some of the subscripting regression tests.
Tom Lane [Thu, 10 May 2012 17:36:14 +0000 (13:36 -0400)]
Fix Windows implementation of PGSemaphoreLock.
The original coding failed to reset ImmediateInterruptOK before returning,
which would potentially allow a subsequent query-cancel interrupt to be
accepted at an unsafe point. This is a really nasty bug since it's so hard
to predict the consequences, but they could be unpleasant.
Also, ensure that signal handlers are serviced before this function
returns, even if the semaphore is already set. This should make the
behavior more like Unix.
Tom Lane [Thu, 10 May 2012 17:26:47 +0000 (13:26 -0400)]
Improve Windows implementation of WaitLatch/WaitLatchOrSocket.
Ensure that signal handlers are serviced before this function returns.
This should make the behavior more like Unix. Also, add some more
error checking, and make some other cosmetic improvements.
No back-patch since it's not clear whether this is fixing any live bug
that would affect 9.1. I'm more concerned about 9.2 anyway given our
considerable recent expansions in the usage of WaitLatch.
Peter Eisentraut [Thu, 10 May 2012 16:58:35 +0000 (19:58 +0300)]
Python 2.2 is no longer supported
It was already on its last legs, and it turns out that it was
accidentally broken in commit 89e850e6fda9e4e441712012abe971fe938d595a
and no one cared. So remove the rest the support for it and update
the documentation to indicate that Python 2.3 is now required.
Tom Lane [Thu, 10 May 2012 16:22:22 +0000 (12:22 -0400)]
Remove unportable use of SGML character-code entity.
It'd be nice to be able to spell Jan Urbanski's name with the correct
accent marks, but we haven't yet found a way that works in everybody's
docs toolchain. This way definitely doesn't.
Joe Conway [Thu, 10 May 2012 05:57:19 +0000 (22:57 -0700)]
PL/pgSQL RETURN NEXT was leaking converted tuples, causing
out of memory when looping through large numbers of rows.
Flag the converted tuples to be freed. Complaint and patch
by Joe.
Tom Lane [Thu, 10 May 2012 04:54:32 +0000 (00:54 -0400)]
Improve tests for postmaster death in auxiliary processes.
In checkpointer and walwriter, avoid calling PostmasterIsAlive unless
WaitLatch has reported WL_POSTMASTER_DEATH. This saves a kernel call per
iteration of the process's outer loop, which is not all that much, but a
cycle shaved is a cycle earned. I had already removed the unconditional
PostmasterIsAlive calls in bgwriter and pgstat in previous patches, but
forgot that WL_POSTMASTER_DEATH is supposed to be treated as untrustworthy
(per comment in unix_latch.c); so adjust those two cases to match.
There are a few other places where the same idea might be applied, but only
after substantial code rearrangement, so I didn't bother.
Tom Lane [Thu, 10 May 2012 04:01:10 +0000 (00:01 -0400)]
Further tweaking of nomenclature in checkpointer.c.
Get rid of some more naming choices that only make sense if you know that
this code used to be in the bgwriter, as well as some stray comments
referencing the bgwriter.
Tom Lane [Thu, 10 May 2012 03:36:01 +0000 (23:36 -0400)]
Improve control logic for bgwriter hibernation mode.
Commit 6d90eaaa89a007e0d365f49d6436f35d2392cfeb added a hibernation mode
to the bgwriter to reduce the server's idle-power consumption. However,
its interaction with the detailed behavior of BgBufferSync's feedback
control loop wasn't very well thought out. That control loop depends
primarily on the rate of buffer allocation, not the rate of buffer
dirtying, so the hibernation mode has to be designed to operate only when
no new buffer allocations are happening. Also, the check for whether the
system is effectively idle was not quite right and would fail to detect
a constant low level of activity, thus allowing the bgwriter to go into
hibernation mode in a way that would let the cycle time vary quite a bit,
possibly further confusing the feedback loop. To fix, move the wakeup
support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into
StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode
unless no buffer allocations have happened recently.
In addition, fix the delaying logic to remove the problem of possibly not
responding to signals promptly, which was basically caused by trying to use
the process latch's is_set flag for multiple purposes. I can't prove it
but I'm suspicious that that hack was responsible for the intermittent
"postmaster does not shut down" failures we've been seeing in the buildfarm
lately. In any case it did nothing to improve the readability or
robustness of the code.
In passing, express the hibernation sleep time as a multiplier on
BgWriterDelay, not a constant. I'm not sure whether there's any value in
exposing the longer sleep time as an independently configurable setting,
but we can at least make it act like this for little extra code.
Add make dependency so that postgres.bki is rebuilt in major version change
Every time since the current rule for postgres.bki was put in place
when we change the major version, people complain that their tests
fail in strange ways. This is because the version number in
postgres.bki is not updated, because it has no dependency for that.
And you can't even force the rebuild manually if you don't happen to
know which file has the problem. Fix that now before it will happen
again.
The only remaining problem with switching major versions, as far as
the regression tests are concerned, is that contrib needs to be
rebuilt. But that's easily invoked, and in any case the failure modes
are more friendly if you forget that.
Split contrib documentation into extensions and programs
Create separate appendixes for contrib extensions and other server
plugins on the one hand, and utility programs on the other. Recast
the documentation of the latter as refentries, so that man pages are
generated.
Tom Lane [Wed, 9 May 2012 03:05:58 +0000 (23:05 -0400)]
Fix an issue in recent walwriter hibernation patch.
Users of asynchronous-commit mode expect there to be a guaranteed maximum
delay before an async commit's WAL records get flushed to disk. The
original version of the walwriter hibernation patch broke that. Add an
extra shared-memory flag to allow async commits to kick the walwriter out
of hibernation mode, without adding any noticeable overhead in cases where
no action is needed.
Tom Lane [Wed, 9 May 2012 01:26:46 +0000 (21:26 -0400)]
Reduce idle power consumption of stats collector process.
Latch-ify the stats collector, so that it does not need an arbitrary wakeup
cycle to check for postmaster death. The incremental savings in idle power
is pretty marginal, since we only had it waking every two seconds; but I
believe that this patch may also improve the collector's performance under
load, by reducing the number of kernel calls made per message when messages
are arriving constantly (we now avoid a select/poll call except when we
need to sleep). The change also reduces the time needed for a normal
database shutdown on platforms where signals don't interrupt select().
Tom Lane [Wed, 9 May 2012 00:03:26 +0000 (20:03 -0400)]
Reduce idle power consumption of walwriter and checkpointer processes.
This patch modifies the walwriter process so that, when it has not found
anything useful to do for many consecutive wakeup cycles, it extends its
sleep time to reduce the server's idle power consumption. It reverts to
normal as soon as it's done any successful flushes. It's still true that
during any async commit, backends check for completed, unflushed pages of
WAL and signal the walwriter if there are any; so that in practice the
walwriter can get awakened and returned to normal operation sooner than the
sleep time might suggest.
Also, improve the checkpointer so that it uses a latch and a computed delay
time to not wake up at all except when it has something to do, replacing a
previous hardcoded 0.5 sec wakeup cycle. This also is primarily useful for
reducing the server's power consumption when idle.
In passing, get rid of the dedicated latch for signaling the walwriter in
favor of using its procLatch, since that comports better with possible
generic signal handlers using that latch. Also, fix a pre-existing bug
with failure to save/restore errno in walwriter's signal handlers.
psql: Add variable to control keyword case in tab completion
This adds the variable COMP_KEYWORD_CASE, which controls in what case
keywords are completed. This is partially to let users configure the
change from commit 69f4f1c3576abc535871c6cfa95539e32a36120f, but it
also offers more behaviors than were available before.
Fix dependency tracking for src/port/%_srv.o files
Because they use their own compilation rule, they don't use the
dependency tracking logic from Makefile.global. To make sure that
dependency tracking works anyway for the *_srv.o files, depend on
their *.o siblings as well, which do have proper dependencies. It's a
hack that might fail someday if there is a *_srv.o without a
corresponding *.o, but it works for now (and those would probably go
into src/backend/port/ anyway).
According to the Autoconf documentation, there should be a make rule
pg_config.h: stamp-h
so that with the right setup around this, a change in pg_config.h.in
will trigger a rebuild of everything that depends on pg_config.h. But
this doesn't always work, sometimes you need to run make twice to get
everything up to date after a change of pg_config.h.in.
The fix is to write the rule as
pg_config.h: stamp-h ;
instead (with an empty command instead of no command). This is what
Automake-generated makefiles effectively do, so it seems safe to be on
this side.
It's not actually clear why this is (apparently) more correct. It's
been posted to
<http://lists.gnu.org/archive/html/help-make/2012-04/msg00058.html>
without response so far.
Magnus Hagander [Mon, 7 May 2012 16:39:37 +0000 (18:39 +0200)]
Make "unexpected EOF" messages DEBUG1 unless in an open transaction
"Unexpected EOF on client connection" without an open transaction
is mostly noise, so turn it into DEBUG1. With an open transaction it's
still indicating a problem, so keep those as ERROR, and change the message
to indicate that it happened in a transaction.
Tom Lane [Fri, 4 May 2012 21:43:27 +0000 (17:43 -0400)]
Overdue code review for transaction-level advisory locks patch.
Commit 62c7bd31c8878dd45c9b9b2429ab7a12103f3590 had assorted problems, most
visibly that it broke PREPARE TRANSACTION in the presence of session-level
advisory locks (which should be ignored by PREPARE), as per a recent
complaint from Stephen Rees. More abstractly, the patch made the
LockMethodData.transactional flag not merely useless but outright
dangerous, because in point of fact that flag no longer tells you anything
at all about whether a lock is held transactionally. This fix therefore
removes that flag altogether. We now rely entirely on the convention
already in use in lock.c that transactional lock holds must be owned by
some ResourceOwner, while session holds are never so owned. Setting the
locallock struct's owner link to NULL thus denotes a session hold, and
there is no redundant marker for that.
PREPARE TRANSACTION now works again when there are session-level advisory
locks, and it is also able to transfer transactional advisory locks to the
prepared transaction, but for implementation reasons it throws an error if
we hold both types of lock on a single lockable object. Perhaps it will be
worth improving that someday.
Assorted other minor cleanup and documentation editing, as well.
Back-patch to 9.1, except that in the 9.1 branch I did not remove the
LockMethodData.transactional flag for fear of causing an ABI break for
any external code that might be examining those structs.
doc: Fix for too many brackets in command synopses on man pages
The default for the choice attribute of the <arg> element is "opt",
which would normally put the argument inside brackets. But the DSSSL
stylesheets contain a hack that treats <arg> directly inside <group>
specially, so that <group><arg>-x</arg><arg>-y</arg></group> comes out
as [ -x | -y ] rather than [ [-x] | [-y] ], which it would technically
be. But when building man pages, this doesn't work, and so the
command synopses on the man pages contain lots of extra brackets.
By putting choice="opt" or choice="plain" explicitly on every <arg>
and <group> element, we avoid any toolchain dependencies like that,
and it also makes it clearer in the source code what is meant.
In passing, make some small corrections in the documentation about
which arguments are really optional or not.
Add test cases for inline handler of plython2u (when using that
language name), and for result object element assignment. There is
now at least one test case for every top-level functionality, except
plpy.Fatal (annoying to use in regression tests) and result object
slice retrieval and slice assignment (which are somewhat broken).
PL/Python: Fix crash in functions returning SETOF and using SPI
Allocate PLyResultObject.tupdesc in TopMemoryContext, because its
lifetime is the lifetime of the Python object and it shouldn't be
freed by some other memory context, such as one controlled by SPI. We
trust that the Python object will clean up its own memory.
Before, this would crash the included regression test case by trying
to use memory that was already freed.
Robert Haas [Wed, 2 May 2012 16:40:07 +0000 (12:40 -0400)]
Avoid repeated CLOG access from heap_hot_search_buffer.
At the time we check whether the tuple is dead to all running
transactions, we've already verified that it isn't visible to our
scan, setting hint bits if appropriate. So there's no need to
recheck CLOG for the all-dead test we do just a moment later.
So, add HeapTupleIsSurelyDead() to test the appropriate condition
under the assumption that all relevant hit bits are already set.
Peter Eisentraut [Mon, 30 Apr 2012 18:15:48 +0000 (21:15 +0300)]
Improve markup of cmdsynopsis elements
Add more markup in particular so that the command options appear
consistently in monospace in the HTML output.
On the vacuumdb reference page, remove listing all the possible
options in the synopsis. They have become too many now; we have the
detailed options list for that.
Peter Eisentraut [Mon, 30 Apr 2012 18:12:28 +0000 (21:12 +0300)]
Fix display of <command> elements on man pages
We had changed this from the default bold to monospace for all output
formats, but for man pages, this creates visual inconsistencies, so
revert to the default for man pages.
Tom Lane [Mon, 30 Apr 2012 18:02:47 +0000 (14:02 -0400)]
Converge all SQL-level statistics timing values to float8 milliseconds.
This patch adjusts the core statistics views to match the decision already
taken for pg_stat_statements, that values representing elapsed time should
be represented as float8 and measured in milliseconds. By using float8,
we are no longer tied to a specific maximum precision of timing data.
(Internally, it's still microseconds, but we could now change that without
needing changes at the SQL level.)
The columns affected are
pg_stat_bgwriter.checkpoint_write_time
pg_stat_bgwriter.checkpoint_sync_time
pg_stat_database.blk_read_time
pg_stat_database.blk_write_time
pg_stat_user_functions.total_time
pg_stat_user_functions.self_time
pg_stat_xact_user_functions.total_time
pg_stat_xact_user_functions.self_time
The first four of these are new in 9.2, so there is no compatibility issue
from changing them. The others require a release note comment that they
are now double precision (and can show a fractional part) rather than
bigint as before; also their underlying statistics functions now match
the column definitions, instead of returning bigint microseconds.