Tom Lane [Mon, 10 Aug 2009 16:10:19 +0000 (16:10 +0000)]
Adjust extract(epoch) example to clarify that it includes fractional
seconds, per gripe from Richard Neill. Also, add a cross-reference to
the to_timestamp function.
Documentation files in HTML and man formats are now prepared for
distribution using the distprep make target, like everything else. They
are placed in doc/src/sgml/html and manX and installed from there by
make install, if present. The business with the tarballs in the tarball
is gone.
Tom Lane [Fri, 7 Aug 2009 22:48:34 +0000 (22:48 +0000)]
Modify parallel pg_restore to track pending and ready items by means of
two new lists, rather than repeatedly rescanning the main TOC list.
This avoids a potential O(N^2) slowdown, although you'd need a *lot*
of tables to make that really significant; and it might simplify future
improvements in the scheduling algorithm by making the set of ready
items more easily inspectable. The original thought that it would
in itself result in a more efficient job dispatch order doesn't seem
to have been borne out in testing, but it seems worth doing anyway.
Test coverage support now covers the entire source tree, including
contrib, instead of just src/backend. In a related but independent
development, the commands make coverage and make coverage-html can be run
in any directory.
This turned out to be much easier than feared. Besides a few ad hoc fixes
to pass the make target down the tree, change all affected makefiles to
list their directories in the SUBDIRS variable, changed from variants like
DIRS and WANTED_DIRS. MSVC build fix was attempted as well.
Tom Lane [Fri, 7 Aug 2009 20:16:11 +0000 (20:16 +0000)]
Try to defend against the possibility that libpq is still in COPY_IN state
when we reach the post-COPY "pump it dry" error recovery code that was added
2006-11-24. Per a report from Neil Best, there is at least one code path
in which this occurs, leading to an infinite loop in code that's supposed
to be making it more robust not less so. A reasonable response seems to be
to call PQputCopyEnd() again, so let's try that.
Back-patch to all versions that contain the cleanup loop.
Tom Lane [Fri, 7 Aug 2009 19:29:49 +0000 (19:29 +0000)]
rm_cleanup functions need to be allowed to write WAL entries. This oversight
appears to explain the recent reports of "PANIC: cannot make new WAL entries
during recovery".
Fast shutdown stop should forcibly disconnect any active backends, even
if a smart shutdown is already in progress. Backpatch to 8.3, this was broken
in the patch that introduced "dead-end backends".
Per report by Itagaki Takahiro, patch by Fujii Masao.
Tom Lane [Thu, 6 Aug 2009 20:44:32 +0000 (20:44 +0000)]
Improve plpgsql's ability to cope with rowtypes containing dropped columns,
by supporting conversions in places that used to demand exact rowtype match.
Since this issue is certain to come up elsewhere (in fact, already has,
in ExecEvalConvertRowtype), factor out the support code into new core
functions for tuple conversion. I chose to put these in a new source
file since heaptuple.c is already overly long.
Heavily revised version of a patch by Pavel Stehule.
Magnus Hagander [Thu, 6 Aug 2009 09:50:22 +0000 (09:50 +0000)]
Avoid terminating the postmaster on a number of "can't happen" cases during
backend startup on Win32. Instead, log the error and just forget about
the potentially dangling process, since we can't do anything about it anyway.
Improve error messages in md.c. When a filesystem operation like open() or
fsync() fails, say "file" rather than "relation" when printing the filename.
This makes messages that display block numbers a bit confusing. For example,
in message 'could not read block 150000 of file "base/1234/5678.1"', 150000
is the block number from the beginning of the relation, ie. segment 0, not
150000th block within that segment. Per discussion, users aren't usually
interested in the exact location within the file, so we can live with that.
To ease constructing error messages, add FilePathName(File) function to
return the pathname of a virtual fd.
Joe Conway [Wed, 5 Aug 2009 16:11:07 +0000 (16:11 +0000)]
Implement dblink_get_notify().
Adds the ability to retrieve async notifications using dblink,
via the addition of the function dblink_get_notify(). Original patch
by Marcus Kempe, suggestions by Tom Lane and Alvaro Herrera, patch
review and adjustments by Joe Conway.
This switches the man page building process to use the DocBook XSL stylesheet
toolchain. The previous targets for Docbook2X are removed. configure has been
updated to look for the new tools. The Documentation appendix contains the
new build instructions. There are also a few isolated tweaks in the
documentation to improve places that came out strangely in the man pages.
Tom Lane [Tue, 4 Aug 2009 21:56:09 +0000 (21:56 +0000)]
Fix pg_dump to do the right thing when escaping the contents of large objects.
The previous implementation got it right in most cases but failed in one:
if you pg_dump into an archive with standard_conforming_strings enabled, then
pg_restore to a script file (not directly to a database), the script will set
standard_conforming_strings = on but then emit large object data as
nonstandardly-escaped strings.
At the moment the code is made to emit hex-format bytea strings when dumping
to a script file. We might want to change to old-style escaping for backwards
compatibility, but that would be slower and bulkier. If we do, it's just a
matter of reimplementing appendByteaLiteral().
This has been broken for a long time, but given the lack of field complaints
I'm not going to worry about back-patching.
Tom Lane [Tue, 4 Aug 2009 18:49:50 +0000 (18:49 +0000)]
Ooops, missed that a couple of contrib modules have calls to byteacmp.
Add bytea.h inclusions as needed. Some of the contrib regression tests
need to be de-hexified, too. Per buildfarm.
Tom Lane [Tue, 4 Aug 2009 16:08:37 +0000 (16:08 +0000)]
Support hex-string input and output for type BYTEA.
Both hex format and the traditional "escape" format are automatically
handled on input. The output format is selected by the new GUC variable
bytea_output.
As committed, bytea_output defaults to HEX, which is an *incompatible
change*. We will keep it this way for awhile for testing purposes, but
should consider whether to switch to the more backwards-compatible
default of ESCAPE before 8.5 is released.
Tom Lane [Tue, 4 Aug 2009 04:04:12 +0000 (04:04 +0000)]
Cause pg_proc.probin to be declared as text, not bytea. Everything was
already treating it as text anyway, to the point that I couldn't find anything
to change except the datatype markings in catalog/*.h. The only effect that
the bytea declaration had was to cause byteaout() to be invoked when pg_dump
(or another client program) inspected the column value. Since pg_dump wasn't
expecting that, but just treating what it got as text, the net result is that
dump and reload would mangle any backslashes or non-ASCII characters in the
filename string for a C-language function. That is a very long-standing bug,
but given the lack of field complaints it doesn't seem worth trying to find
a back-patchable fix. We'll just make this change to fix it going forward.
This change will also forestall problems after the planned change to let bytea
emit hex output instead of escaped characters.
Joe Conway [Mon, 3 Aug 2009 21:11:40 +0000 (21:11 +0000)]
Implement has_sequence_privilege()
Add family of functions that did not exist earlier,
mainly due to historical omission. Original patch by
Abhijit Menon-Sen, with review and modifications by
Joe Conway. catversion.h bumped.
Tatsuo Ishii [Mon, 3 Aug 2009 15:18:14 +0000 (15:18 +0000)]
Multi-threaded version of pgbench contributed by ITAGAKI Takahiro,
reviewed by Greg Smith and Josh Williams.
Following is the proposal from ITAGAKI Takahiro:
Pgbench is a famous tool to measure postgres performance, but nowadays
it does not work well because it cannot use multiple CPUs. On the other
hand, postgres server can use CPUs very well, so the bottle-neck of
workload is *in pgbench*.
Multi-threading would be a solution. The attached patch adds -j
(number of jobs) option to pgbench. If the value N is greater than 1,
pgbench runs with N threads. Connections are equally-divided into
them (ex. -c64 -j4 => 4 threads with 16 connections each). It can
run on POSIX platforms with pthread and on Windows with win32 threads.
Here are results of multi-threaded pgbench runs on Fedora 11 with intel
core i7 (8 logical cores = 4 physical cores * HT). -j8 (8 threads) was
the best and the tps is 4.5 times of -j1, that is a traditional result.
Tom Lane [Sat, 1 Aug 2009 20:59:17 +0000 (20:59 +0000)]
Department of second thoughts: let's show the exact key during unique index
build failures, too. Refactor a bit more since that error message isn't
spelled the same.
Tom Lane [Fri, 31 Jul 2009 20:26:23 +0000 (20:26 +0000)]
Create a multiplexing structure for signals to Postgres child processes.
This patch gets us out from under the Unix limitation of two user-defined
signal types. We already had done something similar for signals directed to
the postmaster process; this adds multiplexing for signals directed to
backends and auxiliary processes (so long as they're connected to shared
memory).
As proof of concept, replace the former usage of SIGUSR1 and SIGUSR2
for backends with use of the multiplexing mechanism. There are still some
hard-wired definitions of SIGUSR1 and SIGUSR2 for other process types,
but getting rid of those doesn't seem interesting at the moment.
Tom Lane [Thu, 30 Jul 2009 02:45:38 +0000 (02:45 +0000)]
Merge the Constraint and FkConstraint node types into a single type.
This was foreseen to be a good idea long ago, but nobody had got round
to doing it. The recent patch for deferred unique constraints made
transformConstraintAttrs() ugly enough that I decided it was time.
This change will also greatly simplify parsing of deferred CHECK constraints,
if anyone ever gets around to implementing that.
While at it, add a location field to Constraint, and use that to provide
an error cursor for some of the constraint-related error messages.
Tom Lane [Wed, 29 Jul 2009 22:19:18 +0000 (22:19 +0000)]
Fix time_part and timetz_part (ie, EXTRACT() for those datatypes) to
include a fractional part in the output for MILLISECOND and SECOND cases,
rather than truncating the source value. This is what the float-timestamp
code has always done, and it was clearly the code author's intent to do
the same for integer timestamps, but he forgot about integer division in C.
The other datatypes supported by EXTRACT() already do this correctly.
Backpatch to 8.4, so that the default (integer) behavior of that branch will
match the default (float) behavior of older branches. Arguably we should
patch further back, but it's possible that applications are expecting the
broken behavior in older branches. 8.4 is new enough that expectations
shouldn't be too settled.
Tom Lane [Wed, 29 Jul 2009 20:56:21 +0000 (20:56 +0000)]
Support deferrable uniqueness constraints.
The current implementation fires an AFTER ROW trigger for each tuple that
looks like it might be non-unique according to the index contents at the
time of insertion. This works well as long as there aren't many conflicts,
but won't scale to massive unique-key reassignments. Improving that case
is a TODO item.
Tom Lane [Wed, 29 Jul 2009 15:57:11 +0000 (15:57 +0000)]
Fix a thinko introduced into CountActiveBackends by a recent patch:
we should ignore NULL array entries, not non-NULL ones. This had the
effect of disabling commit_delay, and could have caused a crash in the
rare race condition the patch was intended to fix.
Bug report and diagnosis by Jeff Janes, in bug #4952.
Tom Lane [Tue, 28 Jul 2009 02:56:31 +0000 (02:56 +0000)]
Add system catalog columns pg_constraint.conindid and pg_trigger.tgconstrindid.
conindid is the index supporting a constraint. We can use this not only for
unique/primary-key constraints, but also foreign-key constraints, which
depend on the unique index that constrains the referenced columns.
tgconstrindid is just copied from the constraint's conindid field, or is
zero for triggers not associated with constraints.
This is mainly intended as infrastructure for upcoming patches, but it has
some virtue in itself, since it exposes a relationship that you formerly
had to grovel in pg_depend to determine. I simplified one information_schema
view accordingly. (There is a pg_dump query that could also use conindid,
but I left it alone because it wasn't clear it'd get any faster.)
Tom Lane [Mon, 27 Jul 2009 05:31:05 +0000 (05:31 +0000)]
Add s_lock support for SuperH architecture.
After a patch originally submitted by Nobuhiro Iwamatsu, but corrected
(I think) to match our guidelines for safe use of asm fragments.
This should be considered untested ...
Tom Lane [Mon, 27 Jul 2009 00:26:03 +0000 (00:26 +0000)]
Experiment with using EXPLAIN COSTS OFF in regression tests.
This is a simple test to see whether COSTS OFF will help much with getting
EXPLAIN output that's sufficiently platform-independent for use in the
regression tests. The planner does have some freedom of choice in these
examples (plain via bitmap indexscan), so I'm not sure what will happen.
Tom Lane [Sun, 26 Jul 2009 23:34:18 +0000 (23:34 +0000)]
Extend EXPLAIN to allow generic options to be specified.
The original syntax made it difficult to add options without making them
into reserved words. This change parenthesizes the options to avoid that
problem, and makes provision for an explicit (and perhaps non-Boolean)
value for each option. The original syntax is still supported, but only
for the two original options ANALYZE and VERBOSE.
As a test case, add a COSTS option that can suppress the planner cost
estimates. This may be useful for including EXPLAIN output in the regression
tests, which are otherwise unable to cope with cross-platform variations in
cost estimates.
Tom Lane [Sat, 25 Jul 2009 17:04:19 +0000 (17:04 +0000)]
Code review for FORCE QUOTE * patch: fix error checking to consider FORCE
QUOTE * as a variety of FORCE QUOTE, and update psql documentation to include
the option. (The actual psql code doesn't seem to need any changes.)
Tom Lane [Fri, 24 Jul 2009 21:08:42 +0000 (21:08 +0000)]
Assorted minor refactoring in EXPLAIN.
This is believed to not change the output at all, with one known exception:
"Subquery Scan foo" becomes "Subquery Scan on foo". (We can fix that if
anyone complains, but it would be a wart, because the old code was clearly
inconsistent.) The main intention is to remove duplicate coding and
provide a cleaner base for subsequent EXPLAIN patching.
Magnus Hagander [Fri, 24 Jul 2009 20:12:42 +0000 (20:12 +0000)]
Reserve the shared memory region during backend startup on Windows, so
that memory allocated by starting third party DLLs doesn't end up
conflicting with it.
Hopefully this solves the long-time issue with "could not reattach
to shared memory" errors on Win32.
Patch from Tsutomu Yamada and me, based on idea from Trevor Talbot.
Tom Lane [Fri, 24 Jul 2009 17:58:31 +0000 (17:58 +0000)]
Avoid extra system calls to block SIGPIPE if the platform provides either
sockopt(SO_NOSIGPIPE) or the MSG_NOSIGNAL flag to send().
We assume these features are available if (1) the symbol is defined at
compile time and (2) the kernel doesn't reject the call at runtime.
It might turn out that there are some platforms where (1) and (2) are
true and yet the signal isn't really blocked, in which case applications
would die on server crash. If that sort of thing gets reported, then
we'll have to add additional defenses of some kind.
Tom Lane [Thu, 23 Jul 2009 21:27:10 +0000 (21:27 +0000)]
Save a few cycles in EXPLAIN and related commands by not bothering to form
a physical tuple in do_tup_output(). A virtual tuple is easier to set up
and also easier for most tuple receivers to process. Per my comment on
Robert Haas' recent patch in this code.
Tom Lane [Thu, 23 Jul 2009 20:45:27 +0000 (20:45 +0000)]
In a non-hashed Agg node, reset the "aggcontext" at group boundaries, instead
of individually pfree'ing pass-by-reference transition values. This should
be at least as fast as the prior coding, and it has the major advantage of
clearing out any working data an aggregate function may have stored in or
underneath the aggcontext. This avoids memory leakage when an aggregate
such as array_agg() is used in GROUP BY mode. Per report from Chris Spotts.
Back-patch to 8.4. In principle the problem could arise in prior versions,
but since they didn't have array_agg the issue seems not critical.
Tom Lane [Thu, 23 Jul 2009 17:42:06 +0000 (17:42 +0000)]
Fix another thinko in join_is_legal's handling of semijoins: we have to test
for the case that the semijoin was implemented within either input by
unique-ifying its RHS before we test to see if it appears to match the current
join situation. The previous coding would select semijoin logic in situations
where we'd already unique-ified the RHS and joined it to some unrelated
relation(s), and then came to join it to the semijoin's LHS. That still gave
the right answer as far as the semijoin itself was concerned, but would lead
to incorrectly examining only an arbitrary one of the matchable rows from the
unrelated relation(s). The cause of this thinko was incorrect unification of
the pre-8.4 logic for IN joins and OUTER joins --- the comparable case for
outer joins can be handled after making the match test, but that's because
there is nothing like the unique-ification escape hatch for outer joins.
Per bug #4934 from Benjamin Reed.
Tom Lane [Wed, 22 Jul 2009 17:00:23 +0000 (17:00 +0000)]
Change do_tup_output() to take Datum/isnull arrays instead of a char * array,
so it doesn't go through BuildTupleFromCStrings. This is more or less a
wash for current uses, but will avoid inefficiency for planned changes to
EXPLAIN.
Tom Lane [Wed, 22 Jul 2009 01:21:22 +0000 (01:21 +0000)]
Tweak TOAST code so that columns marked with MAIN storage strategy are
not forced out-of-line unless that is necessary to make the row fit on a
page. Previously, they were forced out-of-line if needed to get the row
down to the default target size (1/4th page).
Peter Eisentraut [Tue, 21 Jul 2009 20:24:51 +0000 (20:24 +0000)]
Change pg_listener attribute number constants to match the usual pattern
It appears that, for no particularly good reason, pg_listener.h deviates from
the usual convention for declaring attribute number constants. Normally, it's
Tom Lane [Tue, 21 Jul 2009 19:53:12 +0000 (19:53 +0000)]
Speed up AllocSetFreeIndex, which is a significant cost in palloc and pfree,
by using a lookup table instead of a naive shift-and-count loop. Based on
code originally posted by Sean Eron Anderson at
http://graphics.stanford.edu/%7eseander/bithacks.html.
Greg Stark did the research and benchmarking to show that this is what
we should use. Jeremy Kerr first noticed that this is a hotspot that
could be optimized, though we ended up not using his suggestion of
platform-specific bit-searching code.
Tom Lane [Tue, 21 Jul 2009 02:02:44 +0000 (02:02 +0000)]
Fix another semijoin-ordering bug. We already knew that we couldn't
reorder a semijoin into or out of the righthand side of another semijoin,
but actually it doesn't work to reorder it into or out of the righthand
side of a left or antijoin, either. Per bug #4906 from Mathieu Fenniak.
This was sloppy thinking on my part. This identity does work:
( A left join B on (Pab) ) semijoin C on (Pac)
==
( A semijoin C on (Pac) ) left join B on (Pab)
but I failed to see that that doesn't mean this does:
( A left join B on (Pab) ) semijoin C on (Pbc)
!=
A left join ( B semijoin C on (Pbc) ) on (Pab)
Install src/include/utils/fmgroids.h on VPATH builds too.
The original coding was not dealing specially with this file being a symlink,
with the end result that it was not installed in VPATH builds. Oddly enough,
the clean target does know about it ...
Peter Eisentraut [Mon, 20 Jul 2009 08:01:07 +0000 (08:01 +0000)]
Use errcontext mechanism in PL/Python
Error messages from PL/Python now always mention the function name in the
CONTEXT: field. This also obsoletes the few places that tried to do the
same manually.
Regression test files are updated to work with Python 2.4-2.6. I don't have
access to older versions right now.
Tom Lane [Mon, 20 Jul 2009 00:24:30 +0000 (00:24 +0000)]
Teach simplify_boolean_equality to simplify the forms foo <> true and
foo <> false, along with its previous duties of simplifying foo = true
and foo = false. (All of these are equivalent to just foo or NOT foo
as the case may be.) It's not clear how often this is really useful;
but it costs almost nothing to do, and it seems some people think we
should be smart about such cases. Per recent bug report.
Tom Lane [Sun, 19 Jul 2009 21:00:43 +0000 (21:00 +0000)]
Rewrite GEQO's gimme_tree function so that it always finds a legal join
sequence, even when the input "tour" doesn't lead directly to such a sequence.
The stack logic that was added in 2004 only supported cases where relations
that had to be joined to each other (due to join order restrictions) were
adjacent in the tour. However, relying on a random search to figure that out
is tremendously inefficient in large join problems, and could even fail
completely (leading to "failed to make a valid plan" errors) if
random_init_pool ran out of patience. It seems better to make the
tour-to-plan transformation a little bit fuzzier so that every tour can form
a legal plan, even though this means that apparently different tours will
sometimes yield the same plan.
In the same vein, get rid of the logic that knew that tours (a,b,c,d,...)
are the same as tours (b,a,c,d,...), and therefore insisted the latter
are invalid. The chance of generating two tours that differ only in
this way isn't that high, and throwing out 50% of possible tours to
avoid such duplication seems more likely to waste valuable genetic-
refinement generations than to do anything useful.
This leaves us with no cases in which geqo_eval will deem a tour invalid,
so get rid of assorted kluges that tried to deal with such cases, in
particular the undocumented assumption that DBL_MAX is an impossible
plan cost.
This is all per testing of Robert Haas' lets-remove-the-collapse-limits
patch. That idea has crashed and burned, at least for now, but we still
got something useful out of it.
It's possible we should back-patch this change, since the "failed to make a
valid plan" error can happen in existing releases; but I'd rather not until
it has gotten more testing.
Tom Lane [Sun, 19 Jul 2009 20:32:48 +0000 (20:32 +0000)]
Fix a thinko in join_is_legal: when we decide we can implement a semijoin
by unique-ifying the RHS and then inner-joining to some other relation,
that is not grounds for violating the RHS of some other outer join.
Noticed while regression-testing new GEQO code, which will blindly follow
any path that join_is_legal says is legal, and then complain later if that
leads to a dead end.
I'm not certain that this can result in any visible failure in 8.4: the
mistake may always be masked by the fact that subsequent attempts to join
the rest of the RHS of the other join will fail. But I'm not certain it
can't, either, and it's definitely not operating as intended. So back-patch.
The added regression test depends on the new no-failures-allowed logic
that I'm about to commit in GEQO, so no point back-patching that.
Tom Lane [Sat, 18 Jul 2009 19:15:42 +0000 (19:15 +0000)]
Fix error cleanup failure caused by 8.4 changes in plpgsql to try to avoid
memory leakage in error recovery. We were calling FreeExprContext, and
therefore invoking ExprContextCallback callbacks, in both normal and error
exits from subtransactions. However this isn't very safe, as shown in
recent trouble report from Frank van Vugt, in which releasing a tupledesc
refcount failed. It's also unnecessary, since the resources that callbacks
might wish to release should be cleaned up by other error recovery mechanisms
(ie the resource owners). We only really want FreeExprContext to release
memory attached to the exprcontext in the error-exit case. So, add a bool
parameter to FreeExprContext to tell it not to call the callbacks.
A more general solution would be to pass the isCommit bool parameter on to
the callbacks, so they could do only safe things during error exit. But
that would make the patch significantly more invasive and possibly break
third-party code that registers ExprContextCallback callbacks. We might want
to do that later in HEAD, but for now I'll just do what seems reasonable to
back-patch.
Tom Lane [Fri, 17 Jul 2009 23:19:34 +0000 (23:19 +0000)]
Repair bug #4926 "too few pathkeys for mergeclauses". This example shows
that the sanity checking I added to create_mergejoin_plan() in 8.3 was a
few bricks shy of a load: the mergeclauses could reference pathkeys in a
noncanonical order such as x,y,x, not only cases like x,x,y which is all
that the code had allowed for. The odd cases only turn up when using
redundant clauses in an outer join condition, which is why no one had
noticed before.