Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer
functions into one ReadBufferExtended function, that takes the strategy
and mode as argument. There's three modes, RBM_NORMAL which is the default
used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and
a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages
without throwing an error. The FSM needs the new mode to recover from
corrupt pages, which could happend if we crash after extending an FSM file,
and the new page is "torn".
Add fork number to some error messages in bufmgr.c, that still lacked it.
Peter Eisentraut [Fri, 31 Oct 2008 14:35:30 +0000 (14:35 +0000)]
The conversion rule from postgres.sgml to postgres.xml didn't work with
BSD sed. So write it in Perl, which is more portable and a bit faster, too.
We already use Perl for standard documentation builds, so this imposes no
additional requirement.
Tom Lane [Thu, 30 Oct 2008 04:06:16 +0000 (04:06 +0000)]
Fix recoveryLastXTime logic so that it actually does what one would expect.
Per gripe from Kevin Grittner. Backpatch to 8.3, where the bug was introduced.
Peter Eisentraut [Wed, 29 Oct 2008 09:27:24 +0000 (09:27 +0000)]
Use Autoconf provided AS_HELP_STRING macro to automatically format and
align strings in the --help output. Do this through our abstraction layer
to eliminate redundancy and randomness in configure.in.
Tom Lane [Wed, 29 Oct 2008 00:00:39 +0000 (00:00 +0000)]
Be more tense about not creating tuplestores with randomAccess = true unless
backwards scan could actually happen. In particular, pass a flag to
materialize-mode SRFs that tells them whether they need to require random
access. In passing, also suppress unneeded backward-scan overhead for a
Portal's holdStore tuplestore. Per my proposal about reducing I/O costs for
tuplestores.
Tom Lane [Tue, 28 Oct 2008 22:02:06 +0000 (22:02 +0000)]
Extend ExecMakeFunctionResult() to support set-returning functions that return
via a tuplestore instead of value-per-call. Refactor a few things to reduce
ensuing code duplication with nodeFunctionscan.c. This represents the
reasonably noncontroversial part of my proposed patch to switch SQL functions
over to returning tuplestores. For the moment, SQL functions still do things
the old way. However, this change enables PL SRFs to be called in targetlists
(observe changes in plperl regression results).
Tom Lane [Tue, 28 Oct 2008 17:13:51 +0000 (17:13 +0000)]
Change WorkTableScan to not support backward scan. The apparent support
didn't actually work, because nodeRecursiveunion.c creates the underlying
tuplestore with backward scan disabled; which is a decision that we shouldn't
reverse because of performance cost. We could imagine adding signaling from
WorkTableScan to RecursiveUnion about whether backward scan is needed ...
but in practice it'd be a waste of effort, because there simply isn't any
current or plausible future scenario where WorkTableScan would be called on
to scan backward. So just dike out the code that claims to support it.
Tom Lane [Tue, 28 Oct 2008 15:51:03 +0000 (15:51 +0000)]
Arrange to squeeze out the MINIMAL_TUPLE_PADDING in the tuple representation
written to temp files by tuplesort.c and tuplestore.c. This saves 2 bytes per
row for 32-bit machines, and 6 bytes per row for 64-bit machines, which seems
worth the slight additional uglification of the tuple read/write routines.
Tom Lane [Mon, 27 Oct 2008 19:37:22 +0000 (19:37 +0000)]
Install a more robust solution for the problem of infinite error-processing
recursion when we are unable to convert a localized error message to the
client's encoding. We've been over this ground before, but as reported by
Ibrar Ahmed, it still didn't work in the case of conversion failures for
the conversion-failure message itself :-(. Fix by installing a "circuit
breaker" that disables attempts to localize this message once we get into
recursion trouble.
Patch all supported branches, because it is in fact broken in all of them;
though I had to add some missing translations to the older branches in
order to expose the failure in the particular test case I was using.
Magnus Hagander [Mon, 27 Oct 2008 09:42:31 +0000 (09:42 +0000)]
Add support for multiple error messages from libpq, by simply appending them
after each other (since we already add a newline on each, this makes them
multiline).
Previously a new error would just overwrite the old one, so for example any
error caused when trying to connect with SSL enabled would be overwritten
by the error message form the non-SSL connection when using sslmode=prefer.
Tom Lane [Sun, 26 Oct 2008 02:46:25 +0000 (02:46 +0000)]
Better solution to the IN-list issue: instead of having an arbitrary cutoff,
treat Var and non-Var IN-list items differently. Only non-Var items are
candidates to go into an ANY(ARRAY) construct --- we put all Vars as separate
OR conditions on the grounds that that leaves more scope for optimization.
Per suggestion from Robert Haas.
Tom Lane [Sat, 25 Oct 2008 19:51:32 +0000 (19:51 +0000)]
Be a little smarter about qual handling for semi-joins: a qual that mentions
only the outer side can be pushed down rather than having to be evaluated
at the join.
Tom Lane [Sat, 25 Oct 2008 17:19:09 +0000 (17:19 +0000)]
Add a heuristic to transformAExprIn() to make it prefer expanding "x IN (list)"
into an OR of equality comparisons, rather than x = ANY(ARRAY[...]), when there
are Vars in the right-hand side. This avoids a performance regression compared
to pre-8.2 releases, in cases where the OR form can be optimized into scans
of multiple indexes. Limit the possible downside by preferring this form only
when the list isn't very long (I set the cutoff at 32 elements, which is a
bit arbitrary but in the right ballpark). Per discussion with Jim Nasby.
In passing, also make it try the OR form if it cannot select a common type
for the array elements; we've seen a complaint or two about how the OR form
worked for such cases and ARRAY doesn't.
Tom Lane [Fri, 24 Oct 2008 23:42:35 +0000 (23:42 +0000)]
Reduce the memory footprint of large pending-trigger-event lists, as per my
recent proposal. In typical cases, we now need 12 bytes per insert or delete
event and 16 bytes per update event; previously we needed 40 bytes per
event on 32-bit hardware and 80 bytes per event on 64-bit hardware. Even
in the worst case usage pattern with a large number of distinct triggers being
fired in one query, usage is at most 32 bytes per event. It seems to be a
bit faster than the old code as well, due to reduction of palloc overhead.
This commit doesn't address the TODO item of allowing the event list to spill
to disk; rather it's trying to stave off the need for that. However, it
probably makes that task a bit easier by reducing the data structure's
dependency on pointers. It would now be practical to dump an event list to
disk by "chunks" instead of individual events.
Magnus Hagander [Fri, 24 Oct 2008 12:24:35 +0000 (12:24 +0000)]
Remove a "TODO-list" structure at the top of the file, referring back
to the old set of SSL patches. Hasn't been updated since, and we keep
the TODOs in the "real" TODO list, really...
Magnus Hagander [Fri, 24 Oct 2008 11:48:29 +0000 (11:48 +0000)]
Remove large parts of the old SSL readme, that consisted of a couple
of copy/paste:d emails. Much of the contents had already been migrated
into the main documentation, some was out of date and some just plain
wrong.
Keep the "protocol-flowchart" which can still be useful.
Tom Lane [Thu, 23 Oct 2008 15:29:23 +0000 (15:29 +0000)]
Fix an oversight in two different recent patches: nodes that support SRFs
in their targetlists had better reset ps_TupFromTlist during ReScan calls.
There's no need to back-patch here since nodeAgg and nodeGroup didn't
even pretend to support SRFs in prior releases.
Tom Lane [Thu, 23 Oct 2008 14:34:34 +0000 (14:34 +0000)]
Remove useless ps_OuterTupleSlot field from PlanState. I suppose this was
used long ago, but in the current code the ecxt_outertuple field of
ExprContext is doing all the work. Spotted by Ran Tang.
Magnus Hagander [Thu, 23 Oct 2008 13:31:10 +0000 (13:31 +0000)]
* make pg_hba authoption be a set of 0 or more name=value pairs
* make LDAP use this instead of the hacky previous method to specify
the DN to bind as
* make all auth options behave the same when they are not compiled
into the server
* rename "ident maps" to "user name maps", and support them for all
auth methods that provide an external username
This makes a backwards incompatible change in the format of pg_hba.conf
for the ident, PAM and LDAP authentication methods.
Tom Lane [Thu, 23 Oct 2008 00:24:50 +0000 (00:24 +0000)]
When estimating without benefit of MCV lists (suggesting that one or both
inputs is unique or nearly so), make eqjoinsel() clamp the ndistinct estimates
to be not more than the estimated number of rows coming from the input
relations. This allows the estimate to change in response to the selectivity
of restriction conditions on the inputs.
This is a pretty narrow patch and maybe we should be more aggressive about
similarly clamping ndistinct in other cases; but I'm worried about
double-counting the effects of the restriction conditions. However, it seems
to help for the case exhibited by Grzegorz Jaskiewicz (antijoin against a
small subset of a relation), so let's try this for awhile.
Tom Lane [Wed, 22 Oct 2008 20:17:52 +0000 (20:17 +0000)]
Dept of better ideas: refrain from creating the planner's placeholder_list
until vars are distributed to rels during query_planner() startup. We don't
really need it before that, and not building it early has some advantages.
First, we don't need to put it through the various preprocessing steps, which
saves some cycles and eliminates the need for a number of routines to support
PlaceHolderInfo nodes at all. Second, this means one less unused plan for any
sub-SELECT appearing in a placeholder's expression, since we don't build
placeholder_list until after sublink expansion is complete.
Teodor Sigaev [Wed, 22 Oct 2008 12:53:56 +0000 (12:53 +0000)]
Fix GiST's killing tuple: GISTScanOpaque->curpos wasn't
correctly set. As result, killtuple() marks as dead
wrong tuple on page. Bug was introduced by me while fixing
possible duplicates during GiST index scan.
Tom Lane [Tue, 21 Oct 2008 20:42:53 +0000 (20:42 +0000)]
Add a concept of "placeholder" variables to the planner. These are variables
that represent some expression that we desire to compute below the top level
of the plan, and then let that value "bubble up" as though it were a plain
Var (ie, a column value).
The immediate application is to allow sub-selects to be flattened even when
they are below an outer join and have non-nullable output expressions.
Formerly we couldn't flatten because such an expression wouldn't properly
go to NULL when evaluated above the outer join. Now, we wrap it in a
PlaceHolderVar and arrange for the actual evaluation to occur below the outer
join. When the resulting Var bubbles up through the join, it will be set to
NULL if necessary, yielding the correct results. This fixes a planner
limitation that's existed since 7.1.
In future we might want to use this mechanism to re-introduce some form of
Hellerstein's "expensive functions" optimization, ie place the evaluation of
an expensive function at the most suitable point in the plan tree.
Alvaro Herrera [Mon, 20 Oct 2008 19:18:18 +0000 (19:18 +0000)]
Rework subtransaction commit protocol for hot standby.
This patch eliminates the marking of subtransactions as SUBCOMMITTED in pg_clog
during their commit; instead they remain in-progress until main transaction
commit. At main transaction commit, the commit protocol is atomic-by-page
instead of one transaction at a time. To avoid a race condition with some
subtransactions appearing committed before others in the case where they span
more than one pg_clog page, we conserve the logic that marks them subcommitted
before marking the parent committed.
Tom Lane [Fri, 17 Oct 2008 22:10:30 +0000 (22:10 +0000)]
Add a new column to pg_am to specify whether an index AM supports backward
scanning; GiST and GIN do not, and it seems like too much trouble to make
them do so. By teaching ExecSupportsBackwardScan() about this restriction,
we ensure that the planner will protect a scroll cursor from the problem
by adding a Materialize node.
In passing, fix another longstanding bug in the same area: backwards scan of
a plan with set-returning functions in the targetlist did not work either,
since the TupFromTlist expansion code pays no attention to direction (and
has no way to run a SRF backwards anyway). Again the fix is to make
ExecSupportsBackwardScan check this restriction.
Also adjust the index AM API specification to note that mark/restore support
is unnecessary if the AM can't produce ordered output.
Tom Lane [Fri, 17 Oct 2008 20:27:24 +0000 (20:27 +0000)]
Salvage a little bit of work from a failed patch: simplify and speed up
set_rel_width(). The code had been catering for the possibility of different
varnos in the relation targetlist, but this is impossible for a base relation
(and if it were possible, putting all the widths in the same RelOptInfo would
be wrong anyway).
Teodor Sigaev [Fri, 17 Oct 2008 17:27:46 +0000 (17:27 +0000)]
Fix small bug in headline generation.
Patch from Sushant Sinha <sushant354@gmail.com>
http://archives.postgresql.org/pgsql-hackers/2008-07/msg00785.php
Teodor Sigaev [Fri, 17 Oct 2008 17:02:21 +0000 (17:02 +0000)]
During repeated rescan of GiST index it's possible that scan key
is NULL but SK_SEARCHNULL is not set. Add checking IS NULL of keys
to set during key initialization. If key is NULL and SK_SEARCHNULL is not
set then nothnig can be satisfied.
With assert-enabled compilation that causes coredump.
Bug was introduced in 8.3 by support of IS NULL index scan.
Neil Conway [Thu, 16 Oct 2008 19:25:55 +0000 (19:25 +0000)]
Fix a small memory leak in ExecReScanAgg() in the hashed aggregation case.
In the previous coding, the list of columns that needed to be hashed on
was allocated in the per-query context, but we reallocated every time
the Agg node was rescanned. Since this information doesn't change over
a rescan, just construct the list of columns once during ExecInitAgg().
Tom Lane [Thu, 16 Oct 2008 13:23:21 +0000 (13:23 +0000)]
Fix SPI_getvalue and SPI_getbinval to range-check the given attribute number
according to the TupleDesc's natts, not the number of physical columns in the
tuple. The previous coding would do the wrong thing in cases where natts is
different from the tuple's column count: either incorrectly report error when
it should just treat the column as null, or actually crash due to indexing off
the end of the TupleDesc's attribute array. (The second case is probably not
possible in modern PG versions, due to more careful handling of inheritance
cases than we once had. But it's still a clear lack of robustness here.)
The incorrect error indication is ignored by all callers within the core PG
distribution, so this bug has no symptoms visible within the core code, but
it might well be an issue for add-on packages. So patch all the way back.
Tom Lane [Tue, 14 Oct 2008 23:27:40 +0000 (23:27 +0000)]
Make the system-attributes loop in AddNewAttributeTuples depend on
lengthof(SysAtt) not FirstLowInvalidHeapAttributeNumber, for consistency with
the other uses of the SysAtt array, and to make it clearer that it doesn't
walk off the end of that array.
Tom Lane [Tue, 14 Oct 2008 21:47:39 +0000 (21:47 +0000)]
Add a defense to prevent storing pseudo-type data into index columns.
Formerly, the lack of any opclasses that could accept such data was enough
of a defense, but now with a "record" opclass we need to check more carefully.
(You can still use that opclass for an index, but you have to store a named
composite type not an anonymous one.)
Tom Lane [Tue, 14 Oct 2008 21:39:41 +0000 (21:39 +0000)]
Update citext expected output for recent change in error message location
pointers. This is only a whitespace change, which ought to be ignored
by regression testing, but for some reason buildfarm member spoonbill
doesn't like it.
Tom Lane [Tue, 14 Oct 2008 17:12:33 +0000 (17:12 +0000)]
Extend the date type to support infinity and -infinity, analogously to
the timestamp types. Turns out this doesn't even reduce the available
range of dates, since the restriction to dates that work for Julian-date
arithmetic is much tighter than the int32 range anyway. Per a longstanding
TODO item.
Fix oversight in the relation forks patch: forgot to copy fork number to
fsync requests. This should fix the installcheck failure of the buildfarm
member "kudu".
Tom Lane [Tue, 14 Oct 2008 00:41:35 +0000 (00:41 +0000)]
Add docs and regression test about sorting the output of a recursive query in
depth-first search order. Upon close reading of SQL:2008, it seems that the
spec's SEARCH DEPTH FIRST and SEARCH BREADTH FIRST options do not actually
guarantee any particular result order: what they do is provide a constructed
column that the user can then sort on in the outer query. So this is actually
just as much functionality ...
Tom Lane [Mon, 13 Oct 2008 16:25:20 +0000 (16:25 +0000)]
Implement comparison of generic records (composite types), and invent a
pseudo-type record[] to represent arrays of possibly-anonymous composite
types. Since composite datums carry their own type identification, no
extra knowledge is needed at the array level.
The main reason for doing this right now is that it is necessary to support
the general case of detection of cycles in recursive queries: if you need to
compare more than one column to detect a cycle, you need to compare a ROW()
to an array built from ROW()s, at least if you want to do it as the spec
suggests. Add some documentation and regression tests concerning the cycle
detection issue.
Tom Lane [Mon, 13 Oct 2008 00:41:41 +0000 (00:41 +0000)]
Fix corner case wherein a WorkTableScan node could get initialized before the
RecursiveUnion to which it refers. It turns out that we can just postpone the
relevant initialization steps until the first exec call for the node, by which
time the ancestor node must surely be initialized. Per report from Greg Stark.
Tom Lane [Fri, 10 Oct 2008 13:48:05 +0000 (13:48 +0000)]
Fix omission of DiscardStmt in GetCommandLogLevel, per report from Hubert
Depesz Lubaczewski. In HEAD, also move a couple of other cases to make the
code ordering match up with ProcessUtility.
Tom Lane [Thu, 9 Oct 2008 19:27:40 +0000 (19:27 +0000)]
Improve the recently-added code for inlining set-returning functions so that
it can handle functions returning setof record. The case was left undone
originally, but it turns out to be simple to fix.