Bruce Momjian [Tue, 18 Apr 2006 00:52:23 +0000 (00:52 +0000)]
Document that errors are not output by log_statement (was they were in
8.0), and add as suggestion to use log_min_error_statement for this
purpose. I also fixed the code so the first EXECUTE has it's prepare,
rather than the last which is what was in the current code. Also remove
"protocol" prefix for SQL EXECUTE output because it is not accurate.
Tom Lane [Mon, 17 Apr 2006 18:55:05 +0000 (18:55 +0000)]
Fix the torn-page hazard for PITR base backups by forcing full page writes
to occur between pg_start_backup() and pg_stop_backup(), even if the GUC
setting full_page_writes is OFF. Per discussion, doing this in combination
with the already-existing checkpoint during pg_start_backup() should ensure
safety against partial page updates being included in the backup. We do
not have to force full page writes to occur during normal PITR operation,
as I had first feared.
Tom Lane [Sat, 15 Apr 2006 17:45:46 +0000 (17:45 +0000)]
Support the syntax
CREATE AGGREGATE aggname (input_type) (parameter_list)
along with the old syntax where the input type was named in the parameter
list. This fits more naturally with the way that the aggregate is identified
in DROP AGGREGATE and other utility commands; furthermore it has a natural
extension to handle multiple-input aggregates, where the basetype-parameter
method would get ugly. In fact, this commit fixes the grammar and all the
utility commands to support multiple-input aggregates; but DefineAggregate
rejects it because the executor isn't fixed yet.
I didn't do anything about treating agg(*) as a zero-input aggregate instead
of artificially making it a one-input aggregate, but that should be considered
in combination with supporting multi-input aggregates.
Tom Lane [Fri, 14 Apr 2006 20:27:24 +0000 (20:27 +0000)]
Make the world safe for full_page_writes. Allow XLOG records that try to
update no-longer-existing pages to fall through as no-ops, but make a note
of each page number referenced by such records. If we don't see a later
XLOG entry dropping the table or truncating away the page, complain at
the end of XLOG replay. Since this fixes the known failure mode for
full_page_writes = off, revert my previous band-aid patch that disabled
that GUC variable.
Tom Lane [Fri, 14 Apr 2006 03:38:56 +0000 (03:38 +0000)]
Repair a low-probability race condition identified by Qingqing Zhou.
If a process abandons a wait in LockBufferForCleanup (in practice,
only happens if someone cancels a VACUUM) just before someone else
sends it a signal indicating the buffer is available, it was possible
for the wakeup to remain in the process' semaphore, causing misbehavior
next time the process waited for an lmgr lock. Rather than try to
prevent the race condition directly, it seems best to make the lock
manager robust against leftover wakeups, by having it repeat waiting
on the semaphore if the lock has not actually been granted or denied
yet.
Tom Lane [Thu, 13 Apr 2006 18:01:31 +0000 (18:01 +0000)]
Fix similar_escape() so that SIMILAR TO works properly for patterns involving
alternatives ("|" symbol). The original coding allowed the added ^ and $
constraints to be absorbed into the first and last alternatives, producing
a pattern that would match more than it should. Per report from Eric Noriega.
I also changed the pattern to add an ARE director ("***:"), ensuring that
SIMILAR TO patterns do not change behavior if regex_flavor is changed. This
is necessary to make the non-capturing parentheses work, and seems like a
good idea on general principles.
Back-patched as far as 7.4. 7.3 also has the bug, but a fix seems impractical
because that version's regex engine doesn't have non-capturing parens.
Bruce Momjian [Thu, 13 Apr 2006 11:41:02 +0000 (11:41 +0000)]
Update AIX FAQ:
At any rate, here's a revision to CVS HEAD to reflect some changes by
myself and by Seneca Cunningham for the AIX FAQ. It touches on the
following issues:
1. memcpy pointer patch for dynahash.c
2. AIX memory management, which can, for 32 bit cases, bite people
quite unexpectedly...
Bruce Momjian [Thu, 13 Apr 2006 10:50:13 +0000 (10:50 +0000)]
Update:
< multiple I/O channels simultaneously.
> multiple I/O channels simultaneously. One idea is to create a
> background reader that can pre-fetch sequential and index scan
> pages needed by other backends. This could be expanded to allow
> concurrent reads from multiple devices in a partitioned table.
Tom Lane [Thu, 13 Apr 2006 03:53:05 +0000 (03:53 +0000)]
Fix an ancient oversight in btree xlog replay. When trying to determine if an
upper-level insertion completes a previously-seen split, we cannot simply grab
the downlink block number out of the buffer, because the buffer could contain
a later state of the page --- or perhaps the page doesn't even exist at all
any more, due to relation truncation. These possibilities have been masked up
to now because the use of full_page_writes effectively ensured that no xlog
replay routine ever actually saw a page state newer than its own change.
Since we're deprecating full_page_writes in 8.1.*, there's no need to fix this
in existing release branches, but we need a fix in HEAD if we want to have any
hope of re-allowing full_page_writes. Accordingly, adjust the contents of
btree WAL records so that we can always get the downlink block number from the
WAL record rather than having to depend on buffer contents. Per report from
Kevin Grittner and Peter Brant.
Improve a few comments in related code while at it.
Tom Lane [Wed, 12 Apr 2006 22:18:48 +0000 (22:18 +0000)]
Fix pg_restore -n option to do what the man page says it does. The
original coding only worked if one of the selTypes restriction options
was also given. Per report from Nick Johnson.
Bruce Momjian [Wed, 12 Apr 2006 18:56:16 +0000 (18:56 +0000)]
Add second sentence:
<P>The maximum table size, row size, and maximum number of columns
can be quadrupled by increasing the default block size to 32k. The
maximum table size can also be increased using table partitioning.</P>
Bruce Momjian [Mon, 10 Apr 2006 21:06:23 +0000 (21:06 +0000)]
Add:
> * Allow log_min_messages to be specified on a per-module basis
>
> This would allow administrators to see more detailed information from
> specific sections of the backend, e.g. checkpoints, autovacuum, etc.
Bruce Momjian [Sun, 9 Apr 2006 20:27:27 +0000 (20:27 +0000)]
Add comment for why we recompile pgport C files.
# Need to recomple any libpgport object files because we need these
# object files to use the same compile flags as libpq. If we used
# the object files from libpgport, this would not be true on all
# platforms.
Bruce Momjian [Sun, 9 Apr 2006 20:24:30 +0000 (20:24 +0000)]
Add:
< * Experiment with multi-threaded backend [thread]
> * Experiment with multi-threaded backend for backend creation [thread] 1003a1004,1008
>
> * Experiment with multi-threaded backend better resource utilization
>
> This would allow a single query to make use of multiple CPU's or
> multiple I/O channels simultaneously.
Tom Lane [Sun, 9 Apr 2006 18:18:41 +0000 (18:18 +0000)]
Revert my best_inner_indexscan patch of yesterday, which turns out to have
had a bad side-effect: it stopped finding plans that involved BitmapAnd
combinations of indexscans using both join and non-join conditions. Instead,
make choose_bitmap_and more aggressive about detecting redundancies between
BitmapOr subplans.
Bruce Momjian [Sun, 9 Apr 2006 03:27:06 +0000 (03:27 +0000)]
Update:
> * Allow the creation of indexes with mixed ascending/descending
> specifiers
>
> This is possible now by creating an operator class with reversed sort
> operators. One complexity is that NULLs would then appear at the start
> of the result set, and this might affect certain sort types, like
> merge join.
>
Tom Lane [Sat, 8 Apr 2006 21:32:17 +0000 (21:32 +0000)]
Fix best_inner_indexscan to actually enforce that an "inner indexscan" use
at least one join condition as an indexqual. Before bitmap indexscans, this
oversight didn't really cost much except for redundantly considering the
same join paths twice; but as of 8.1 it could result in silly bitmap scans
that would do the same BitmapOr twice and then BitmapAnd these together :-(
Tom Lane [Sat, 8 Apr 2006 18:49:52 +0000 (18:49 +0000)]
Fix EXPLAIN so that it can drill down through multiple levels of subplan
when trying to locate the referent of a RECORD variable. This fixes the
'record type has not been registered' failure reported by Stefan
Kaltenbrunner about a month ago. A side effect of the way I chose to
fix it is that most variable references in join conditions will now be
properly labeled with the variable's source table name, instead of the
not-too-helpful 'outer' or 'inner' we used to use.
Tom Lane [Fri, 7 Apr 2006 21:26:29 +0000 (21:26 +0000)]
Fix pg_dumpall to do something sane when a pre-8.1 installation has
identically named user and group: we merge these into a single entity
with LOGIN permission. Also, add ORDER BY commands to ensure consistent
dump ordering, for ease of comparing outputs from different installations.
Tom Lane [Fri, 7 Apr 2006 17:05:39 +0000 (17:05 +0000)]
Fix make_restrictinfo_from_bitmapqual() to preserve AND/OR flatness of its
output, ie, no OR immediately below an OR. Otherwise we get Asserts or
wrong answers for cases such as
select * from tenk1 a, tenk1 b
where (a.ten = b.ten and (a.unique1 = 100 or a.unique1 = 101))
or (a.hundred = b.hundred and a.unique1 = 42);
Per report from Rafael Martinez Guerrero.
Tom Lane [Thu, 6 Apr 2006 20:38:00 +0000 (20:38 +0000)]
Remove the pgstats logic for delaying destruction of stats table entries.
Per recent discussion, this seems to be making the stats less accurate
rather than more so, particularly on Windows where PID values may be
reused very quickly. Patch by Peter Brant.
Tom Lane [Wed, 5 Apr 2006 22:11:58 +0000 (22:11 +0000)]
Fix a bunch of problems with domains by making them use special input functions
that apply the necessary domain constraint checks immediately. This fixes
cases where domain constraints went unchecked for statement parameters,
PL function local variables and results, etc. We can also eliminate existing
special cases for domains in places that had gotten it right, eg COPY.
Also, allow domains over domains (base of a domain is another domain type).
This almost worked before, but was disallowed because the original patch
hadn't gotten it quite right.
When merging PO files, take into consideration translations in other PO
files of the same languages. That way, similar or equal translations in
different programs are automatically propagated and the life of translators
becomes a little bit easier.
Tom Lane [Wed, 5 Apr 2006 03:34:05 +0000 (03:34 +0000)]
Add a field to the first page of each WAL file to indicate the
XLOG_BLCKSZ. This ought to help in preventing configuration mismatch
problems if anyone tries to ship PITR files between servers compiled
with different XLOG_BLCKSZ settings. Simon Riggs
Tom Lane [Tue, 4 Apr 2006 22:39:59 +0000 (22:39 +0000)]
Don't use BLCKSZ for the physical length of the pg_control file, but
instead a dedicated symbol. This probably makes no functional difference
for likely values of BLCKSZ, but it makes the intent clearer.
Simon Riggs, minor editorialization by Tom Lane.
Tom Lane [Tue, 4 Apr 2006 19:35:37 +0000 (19:35 +0000)]
Modify all callers of datatype input and receive functions so that if these
functions are not strict, they will be called (passing a NULL first parameter)
during any attempt to input a NULL value of their datatype. Currently, all
our input functions are strict and so this commit does not change any
behavior. However, this will make it possible to build domain input functions
that centralize checking of domain constraints, thereby closing numerous holes
in our domain support, as per previous discussion.
While at it, I took the opportunity to introduce convenience functions
InputFunctionCall, OutputFunctionCall, etc to use in code that calls I/O
functions. This eliminates a lot of grotty-looking casts, but the main
motivation is to make it easier to grep for these places if we ever need
to touch them again.
Tom Lane [Mon, 3 Apr 2006 23:35:05 +0000 (23:35 +0000)]
Define a separately configurable XLOG_BLCKSZ symbol for the page size
used within WAL files. Historically this was the same as the data file
BLCKSZ, but there's no necessary connection, and it's possible that
performance gains might ensue from reducing XLOG_BLCKSZ. In any case
distinguishing two symbols should improve code clarity. This commit
does not actually change the page size, only provide the infrastructure
to make it possible to do so. initdb forced because of addition of a
field to pg_control.
Mark Wong, with some help from Simon Riggs and Tom Lane.
Tom Lane [Mon, 3 Apr 2006 16:45:50 +0000 (16:45 +0000)]
Fix thinko in gistRedoPageUpdateRecord: if XLR_BKP_BLOCK_1 is set, we
don't have anything to do to the page, but we still have to adjust the
incomplete_inserts list that we're maintaining in memory.
Neil Conway [Sun, 2 Apr 2006 20:08:22 +0000 (20:08 +0000)]
Rewrite much of psql's \connect code, for the sake of code clarity and
to fix regressions introduced in the recent patch adding additional
\connect options. This is based on work by Volkan YAZICI, although
this version of the patch doesn't bear much resemblance to Volkan's
version.
\connect takes 4 optional arguments: database name, user name, host
name, and port number. If any of those parameters are omitted or
specified as "-", the value of that parameter from the previous
connection is used instead; if there is no previous connection,
the libpq default is used. Note that this behavior makes it
impossible to reuse the libpq defaults without quitting psql and
restarting it; I don't really see the use case for needing to do
that.
Add tab-completion for REASSIGN OWNED BY and DROP OWNED BY. Also fix some
whitespace issues nearby.
DROP OWNED BY is actually a bit kludgy, but it seems better to do it this way
rather than duplicating the words_after_create list just to add a single
element.
Tom Lane [Sat, 1 Apr 2006 03:03:37 +0000 (03:03 +0000)]
Remove the 'slow' path for btree index build, which built the btree
incrementally by successive inserts rather than by sorting the data.
We were only using the slow path during bootstrap, apparently because
when first written it failed during bootstrap --- but it works fine now
AFAICT. Removing it saves a hundred or so lines of code and produces
noticeably (~10%) smaller initial states of the system catalog indexes.
While that won't make much difference for heavily-modified catalogs,
for the more static ones there may be a useful long-term performance
improvement.
Tom Lane [Fri, 31 Mar 2006 23:32:07 +0000 (23:32 +0000)]
Clean up WAL/buffer interactions as per my recent proposal. Get rid of the
misleadingly-named WriteBuffer routine, and instead require routines that
change buffer pages to call MarkBufferDirty (which does exactly what it says).
We also require that they do so before calling XLogInsert; this takes care of
the synchronization requirement documented in SyncOneBuffer. Note that
because bufmgr takes the buffer content lock (in shared mode) while writing
out any buffer, it doesn't matter whether MarkBufferDirty is executed before
the buffer content change is complete, so long as the content change is
completed before releasing exclusive lock on the buffer. So it's OK to set
the dirtybit before we fill in the LSN.
This eliminates the former kluge of needing to set the dirtybit in LockBuffer.
Aside from making the code more transparent, we can also add some new
debugging assertions, in particular that the caller of MarkBufferDirty must
hold the buffer content lock, not merely a pin.
Tom Lane [Thu, 30 Mar 2006 23:03:10 +0000 (23:03 +0000)]
Improve gist XLOG code to follow the coding rules needed to prevent
torn-page problems. This introduces some issues of its own, mainly
that there are now some critical sections of unreasonably broad scope,
but it's a step forward anyway. Further cleanup will require some
code refactoring that I'd prefer to get Oleg and Teodor involved in.
Tom Lane [Thu, 30 Mar 2006 22:11:55 +0000 (22:11 +0000)]
Suppress attempts to report dropped tables to the stats collector from a
startup or recovery process. Since such a process isn't a real backend,
pgstat.c gets confused. This accounts for recent reports of strange
"invalid server process ID -1" log messages during crash recovery.
There isn't any point in attempting to make the report, since we'll discard
stats in such scenarios anyhow.
Tom Lane [Wed, 29 Mar 2006 21:17:39 +0000 (21:17 +0000)]
Clean up and document the API for XLogOpenRelation and XLogReadBuffer.
This commit doesn't make much functional change, but it does eliminate some
duplicated code --- for instance, PageIsNew tests are now done inside
XLogReadBuffer rather than by each caller.
The GIST xlog code still needs a lot of love, but I'll worry about that
separately.
Tom Lane [Wed, 29 Mar 2006 15:15:43 +0000 (15:15 +0000)]
TablespaceCreateDbspace should function normally even on platforms that do not
have symlinks (ie, Windows). Although it'll never be called on to do anything
useful during normal operation on such a platform, it's still needed to
re-create dropped directories during WAL replay.
Tom Lane [Tue, 28 Mar 2006 22:01:16 +0000 (22:01 +0000)]
Disable full_page_writes, because turning it off risks causing crash-recovery
failures even when the hardware and OS did nothing wrong. Per recent analysis
of a problem report from Alex Bahdushka.
For the moment I've just diked out the test of the parameter, rather than
removing the GUC infrastructure and documentation, in case we conclude that
there's something salvageable there. There seems no chance of it being
resurrected in the 8.1 branch though.
Tom Lane [Tue, 28 Mar 2006 21:17:23 +0000 (21:17 +0000)]
Repair longstanding error in btree xlog replay: XLogReadBuffer should be
passed extend = true whenever we are reading a page we intend to reinitialize
completely, even if we think the page "should exist". This is because it
might indeed not exist, if the relation got truncated sometime after the
current xlog record was made and before the crash we're trying to recover
from. These two thinkos appear to explain both of the old bug reports
discussed here:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg01369.php
Tom Lane [Fri, 24 Mar 2006 23:02:17 +0000 (23:02 +0000)]
Comments in IndexBuildHeapScan describe the indexing of recently-dead
tuples as needed "to keep VACUUM from complaining", but actually there is
a more compelling reason to do it: failure to do so violates MVCC semantics.
This is because a pre-existing serializable transaction might try to use
the index after we finish (re)building it, and it might fail to find tuples
it should be able to see. We got this mostly right, but not in the case
of partial indexes: the code mistakenly discarded recently-dead tuples for
partial indexes. Fix that, and adjust the comments.
Tom Lane [Fri, 24 Mar 2006 04:32:13 +0000 (04:32 +0000)]
Arrange to emit a description of the current XLOG record as error context
when an error occurs during xlog replay. Also, replace the former risky
'write into a fixed-size buffer with no overflow detection' API for XLOG
record description routines; use an expansible StringInfo instead. (The
latter accounts for most of the patch bulk.)
Tom Lane [Thu, 23 Mar 2006 04:22:37 +0000 (04:22 +0000)]
Fix plpgsql to pass only one copy of any given plpgsql variable into a SQL
command or expression, rather than one copy for each textual occurrence as
it did before. This might result in some small performance improvement,
but the compelling reason to do it is that not doing so can result in
unexpected grouping failures because the main SQL parser won't see different
parameter numbers as equivalent. Add a regression test for the failure case.
Per report from Robert Davidson.
Tom Lane [Tue, 21 Mar 2006 19:49:15 +0000 (19:49 +0000)]
Improve performance of our private version of qsort. Per recent testing,
the logic it contained to switch to insertion sort for near-sorted input was
in fact a big loss, because it could fairly easily be fooled into applying
insertion sort to large subfiles that weren't all that well ordered. Remove
that, and instead add a simple check for already-perfectly-sorted input, as
per suggestion from Dann Corbit. This adds at worst O(N*lgN) overhead, and
usually far less, while sometimes allowing a subfile sort to finish in O(N)
time. Preliminary testing says this is an improvement over the basic
Bentley & McIlroy code for many nonrandom inputs, and it costs almost
nothing when the input is random.
Bruce Momjian [Tue, 21 Mar 2006 13:38:12 +0000 (13:38 +0000)]
Fix psql history handling:
> 1) Fix the problems with the \s command.
> When the saveHistory is executed by the \s command we must not do the
> conversion \n -> \x01 (per
> http://archives.postgresql.org/pgsql-hackers/2006-03/msg00317.php )
>
> 2) Fix the handling of Ctrl+C
>
> Now when you do
> wsdb=# select 'your long query here '
> wsdb-#
> and press afterwards the CtrlC the line "select 'your long query here
'"
> will be in the history
>
> (partly per
> http://archives.postgresql.org/pgsql-hackers/2006-03/msg00297.php )
>
> 3) Fix the handling of commands with not closed brackets, quotes,
double
> quotes. (now those commands are not splitted in parts...)
>
> 4) Fix the behaviour when SINGLELINE mode is used. (before it was
almost
> broken ;(
Neil Conway [Sun, 19 Mar 2006 22:22:56 +0000 (22:22 +0000)]
Fix a few places that were checking for the return value of palloc() to be
non-NULL: palloc() ereports on OOM, so we can safely assume it returns a
valid pointer.
Tom Lane [Sun, 19 Mar 2006 01:19:42 +0000 (01:19 +0000)]
Adjust join_1.out to match Windows behavior for new mergejoin regression
test, per Dave Page and buildfarm. Perhaps we will need a join_2 instead,
but for the moment assume that this test tracks the other diffs.
Tom Lane [Fri, 17 Mar 2006 19:38:12 +0000 (19:38 +0000)]
Fix bug introduced into mergejoin logic by performance improvement patch of
2005-05-13. When we find that a new inner tuple can't possibly match any
outer tuple (because it contains a NULL), we can't immediately skip the
tuple when we are in NEXTINNER state. Doing so can lead to emitting
multiple copies of the tuple in FillInner mode, because we may rescan the
tuple after returning to a previous marked tuple. Instead, proceed to
NEXTOUTER state the same as we used to do. After we've found that there's
no need to return to the marked position, we can go to SKIPINNER_ADVANCE
state instead of SKIP_TEST when the inner tuple is unmatchable; this
preserves the performance improvement. Per bug report from Bruce.
I also made a couple of cosmetic code rearrangements and added a regression
test for the problem.
Tom Lane [Thu, 16 Mar 2006 18:11:17 +0000 (18:11 +0000)]
Fix invalid use of #if within a macro, per Laurenz Albe. Also try to
make the LDAP code's error messages look like they were written by someone
who had heard of our style guidelines.
Tom Lane [Thu, 16 Mar 2006 00:31:55 +0000 (00:31 +0000)]
Clean up representation of function RTEs for functions returning RECORD.
The original coding stored the raw parser output (ColumnDef and TypeName
nodes) which was ugly, bulky, and wrong because it failed to create any
dependency on the referenced datatype --- and in fact would not track type
renamings and suchlike. Instead store a list of column type OIDs in the
RTE.
Also fix up general failure of recordDependencyOnExpr to do anything sane
about recording dependencies on datatypes. While there are many cases where
there will be an indirect dependency (eg if an operator returns a datatype,
the dependency on the operator is enough), we do have to record the datatype
as a separate dependency in examples like CoerceToDomain.
Tom Lane [Tue, 14 Mar 2006 22:48:25 +0000 (22:48 +0000)]
Improve parser so that we can show an error cursor position for errors
during parse analysis, not only errors detected in the flex/bison stages.
This is per my earlier proposal. This commit includes all the basic
infrastructure, but locations are only tracked and reported for errors
involving column references, function calls, and operators. More could
be done later but this seems like a good set to start with. I've also
moved the ReportSyntaxErrorPosition logic out of psql and into libpq,
which should make it available to more people --- even within psql this
is an improvement because warnings weren't handled by ReportSyntaxErrorPosition.