Alvaro Herrera [Fri, 29 Jun 2007 17:07:39 +0000 (17:07 +0000)]
Arrange for SIGINT in autovacuum workers to cancel the current table and
continue with the schedule. Change current uses of SIGINT to abort a worker
into SIGTERM, which keeps the old behaviour of terminating the process.
Patch from ITAGAKI Takahiro, with some editorializing of my own.
Tom Lane [Fri, 29 Jun 2007 16:18:43 +0000 (16:18 +0000)]
Fix computation of PG_VERSION_NUM by configure: remove unnecessary and
unportable backslashes in awk script (per Patrick Welche), and add
brackets to prevent autoconf from mangling sed's regexp (the sed call
here never did what was expected).
Tom Lane [Fri, 29 Jun 2007 15:46:21 +0000 (15:46 +0000)]
Add a note that pg_start_backup will take awhile because of new
distributed checkpoint behavior. Explain how to work around this
by issuing a manual CHECKPOINT command. Per discussion with Heikki.
Tom Lane [Fri, 29 Jun 2007 01:51:35 +0000 (01:51 +0000)]
Fix a passel of ancient bugs in to_char(), including two distinct buffer
overruns (neither of which seem likely to be exploitable as security holes,
fortunately, since the provoker can't control the data written). One of
these is due to choosing to stomp on the output of a called function, which
is bad news in any case; make it treat the called functions' results as
read-only. Avoid some unnecessary palloc/pfree traffic too; it's not
really helpful to free small temporary objects, and again this is presuming
more than it ought to about the nature of the results of called functions.
Per report from Patrick Welche and additional code-reading by Imad.
Tom Lane [Thu, 28 Jun 2007 17:49:59 +0000 (17:49 +0000)]
Fix incorrect tests for undef Perl values in some places in plperl.c.
The correct test for defined-ness is SvOK(sv), not anything involving
SvTYPE. Per bug #3415 from Matt Taylor.
Back-patch as far as 8.0; no apparent problem in 7.x.
Tom Lane [Thu, 28 Jun 2007 00:02:40 +0000 (00:02 +0000)]
Implement "distributed" checkpoints in which the checkpoint I/O is spread
over a fairly long period of time, rather than being spat out in a burst.
This happens only for background checkpoints carried out by the bgwriter;
other cases, such as a shutdown checkpoint, are still done at full speed.
Remove the "all buffers" scan in the bgwriter, and associated stats
infrastructure, since this seems no longer very useful when the checkpoint
itself is properly throttled.
Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas,
and some minor API editorialization by me.
Tom Lane [Tue, 26 Jun 2007 22:05:04 +0000 (22:05 +0000)]
Fix PGXS conventions so that extensions can be built against Postgres
installations whose pg_config program does not appear first in the PATH.
Per gripe from Eddie Stanley and subsequent discussions with Fabien Coelho
and others.
Alvaro Herrera [Mon, 25 Jun 2007 16:09:03 +0000 (16:09 +0000)]
Improve autovacuum launcher's ability to detect a problem in worker startup,
by having the postmaster signal it when certain failures occur. This requires
the postmaster setting a flag in shared memory, but should be as safe as the
pmsignal.c code is.
Also make sure the launcher honor's a postgresql.conf change turning it off
on SIGHUP.
Tom Lane [Sat, 23 Jun 2007 22:12:52 +0000 (22:12 +0000)]
Separate parse-analysis for utility commands out of parser/analyze.c
(which now deals only in optimizable statements), and put that code
into a new file parser/parse_utilcmd.c. This helps clarify and enforce
the design rule that utility statements shouldn't be processed during
the regular parse analysis phase; all interpretation of their meaning
should happen after they are given to ProcessUtility to execute.
(We need this because we don't retain any locks for a utility statement
that's in a plan cache, nor have any way to detect that it's stale.)
We are also able to simplify the API for parse_analyze() and related
routines, because they will now always return exactly one Query structure.
In passing, fix bug #3403 concerning trying to add a serial column to
an existing temp table (this is largely Heikki's work, but we needed
all that restructuring to make it safe).
Tom Lane [Fri, 22 Jun 2007 16:15:23 +0000 (16:15 +0000)]
Add a <tip> that log_line_prefix should usually end with a space to
provide visual separation from the rest of the log line; I've been
noticing lately that quite a few newbies fail to figure this out for
themselves. Also a little editorial cleanup of the log_line_prefix
description.
Neil Conway [Fri, 22 Jun 2007 01:09:28 +0000 (01:09 +0000)]
In psql, when running a SELECT query using a cursor, flush the query
output after each FETCH. This ensures that incremental results are
available to clients that are executing long-running SELECT queries
via the FETCH_COUNT feature.
Tom Lane [Thu, 21 Jun 2007 22:59:12 +0000 (22:59 +0000)]
Allow trailing whitespace in parse_real(), for consistency with
parse_int() and with itself (strtod allows leading whitespace, so it
seems odd not to allow trailing whitespace). parse_bool remains
not-whitespace-friendly, but this is generically true for non-numeric
GUC variables, so I'll desist from changing it.
Tom Lane [Thu, 21 Jun 2007 18:14:21 +0000 (18:14 +0000)]
Provide a HINT listing the allowed unit names when a GUC variable seems to
contain a wrong unit specification, per discussion.
In passing, fix the code to avoid unnecessary integer overflows when
converting units, and to detect overflows when they do occur.
Tom Lane [Wed, 20 Jun 2007 23:11:38 +0000 (23:11 +0000)]
Add a caveat pointing out that constraint exclusion doesn't work with
constraints the planner is unable to disprove, hence simple btree-compatible
conditions should be used. We've seen people try to get cute with stuff
like date_part(something) = something at least twice now. Even if we
wanted to try to teach predtest.c about the properties of date_part,
most of the useful variants aren't immutable so nothing could be proved.
Tom Lane [Wed, 20 Jun 2007 18:31:39 +0000 (18:31 +0000)]
Restrict deadlock_timeout to the range for which the implementation
actually works sanely, viz not 0 and not more than INT_MAX/1000
(else TimestampTzPlusMilliseconds can overflow). Per discussion with
Greg Stark. Since this is a superuser-only setting and there was not
previously any big reason to change it, not worth back-patching.
Tom Lane [Wed, 20 Jun 2007 18:21:00 +0000 (18:21 +0000)]
transformColumnDefinition failed to complain about
create table foo (bar int default null default 3);
due to not thinking about the special-case handling of DEFAULT NULL.
Problem noticed while investigating bug #3396.
Tom Lane [Wed, 20 Jun 2007 18:15:49 +0000 (18:15 +0000)]
CREATE DOMAIN ... DEFAULT NULL failed because gram.y special-cases DEFAULT
NULL and DefineDomain didn't. Bug goes all the way back to original coding
of domains. Per bug #3396 from Sergey Burladyan.
Neil Conway [Wed, 20 Jun 2007 02:02:49 +0000 (02:02 +0000)]
Minor code cleanup: calling FreeFile() before ereport(ERROR) is not
necessary, since files opened via AllocateFile() are closed automatically
as part of error recovery.
Tom Lane [Tue, 19 Jun 2007 22:01:15 +0000 (22:01 +0000)]
Only log 'process acquired lock' if we actually did get the lock. This
test seems inessential right now since the only control path for not
getting the lock is via CHECK_FOR_INTERRUPTS which won't return control
to ProcSleep, but it would be important if we ever allow the deadlock
code to kill someone else's transaction instead of our own.
Tom Lane [Tue, 19 Jun 2007 20:13:22 +0000 (20:13 +0000)]
Code review for log_lock_waits patch. Don't try to issue log messages from
within a signal handler (this might be safe given the relatively narrow code
range in which the interrupt is enabled, but it seems awfully risky); do issue
more informative log messages that tell what is being waited for and the exact
length of the wait; minor other code cleanup. Greg Stark and Tom Lane
Tom Lane [Mon, 18 Jun 2007 21:40:58 +0000 (21:40 +0000)]
Arrange for quote_identifier() and pg_dump to not quote keywords that are
unreserved according to the grammar. The list of unreserved words has gotten
extensive enough that the unnecessary quoting is becoming a bit of an eyesore.
To do this, add knowledge of the keyword category to keywords.c's table.
(Someday we might be able to generate keywords.c's table and the keyword lists
in gram.y from a common source.) For the moment, lie about WITH's status in
the table so it will still get quoted --- this is because of the expectation
that WITH will become reserved when the SQL recursive-queries patch gets done.
I didn't force initdb because this affects nothing on-disk; but note that a
few regression tests have changed expected output.
Tom Lane [Sun, 17 Jun 2007 23:39:28 +0000 (23:39 +0000)]
Marginal hacking to improve the speed of COPY OUT. I had found in a bit of
profiling that CopyAttributeOutText was taking an unreasonable fraction of
the backend run time (like 66%!) on the following trivial test case:
$ time psql -c "copy (select repeat('xyzzy',50) from generate_series(1,10000000)) to stdout" regression >/dev/null
The time is all being spent on scanning the string for characters to be
escaped, which most of the time there aren't any of. Some tweaking to take
as many tests as possible out of the inner loop reduced the runtime of this
example by more than 10%. In a real-world case it wouldn't be as useful
a speedup, but it still seems worth adding a few lines here.
Tom Lane [Sun, 17 Jun 2007 18:57:29 +0000 (18:57 +0000)]
Revert an ill-considered portion of my patch of 12-Mar, which tried to save a
few lines in sql_exec_error_callback() by using the function source string
field that the patch added to SQL function cache entries. This doesn't work
because the fn_extra field isn't filled in yet during init_sql_fcache().
Probably it could be made to work, but it doesn't seem appropriate to contort
the main code paths to make an error-reporting path a tad faster. Per report
from Pavel Stehule.
Tom Lane [Fri, 15 Jun 2007 20:56:52 +0000 (20:56 +0000)]
Tweak the API for per-datatype typmodin functions so that they are passed
an array of strings rather than an array of integers, and allow any simple
constant or identifier to be used in typmods; for example
create table foo (f1 widget(42,'23skidoo',point));
Of course the typmodin function has still got to pack this info into a
non-negative int32 for storage, but it's still a useful improvement in
flexibility, especially considering that you can do nearly anything if you
are willing to keep the info in a side table. We can get away with this
change since we have not yet released a version providing user-definable
typmods. Per discussion.
Andrew Dunstan [Thu, 14 Jun 2007 01:48:51 +0000 (01:48 +0000)]
Implement a chunking protocol for writes to the syslogger pipe, with messages
reassembled in the syslogger before writing to the log file. This prevents
partial messages from being written, which mucks up log rotation, and
messages from different backends being interleaved, which causes garbled
logs. Backport as far as 8.0, where the syslogger was introduced.
Neil Conway [Wed, 13 Jun 2007 23:59:47 +0000 (23:59 +0000)]
Schema-qualify several references to the builtin function length(), to
avoid mistakenly calling a function of the same name that might happen
to appear earlier in the schema search path.
Tom Lane [Tue, 12 Jun 2007 19:46:24 +0000 (19:46 +0000)]
Add some simple defenses against null fields in pg_largeobject, and add
comments noting that there's an alignment assumption now that the data
field could be in 1-byte-header format. Per discussion with Greg Stark.
Tom Lane [Tue, 12 Jun 2007 15:58:32 +0000 (15:58 +0000)]
Fix DecodeDateTime to allow timezone to appear before year. This had
historically worked in some but not all cases, but as of 8.2 it failed for all
timezone formats. Fix, and add regression test cases to catch future
regressions in this area. Per gripe from Adam Witney.
Magnus Hagander [Tue, 12 Jun 2007 11:07:34 +0000 (11:07 +0000)]
Rewrite ECPG regression test driver in C, by splitting the standard
regression driver into two parts and reusing half of it. Required to
run ECPG tests without a shell on MSVC builds.
Fix ECPG thread tests for MSVC build (incl output files).
Tom Lane [Mon, 11 Jun 2007 22:22:42 +0000 (22:22 +0000)]
Improve UPDATE/DELETE WHERE CURRENT OF so that they can be used from plpgsql
with a plpgsql-defined cursor. The underlying mechanism for this is that the
main SQL engine will now take "WHERE CURRENT OF $n" where $n is a refcursor
parameter. Not sure if we should document that fact or consider it an
implementation detail. Per discussion with Pavel Stehule.
Bruce Momjian [Mon, 11 Jun 2007 01:51:50 +0000 (01:51 +0000)]
Done:
< o Allow UPDATE/DELETE WHERE CURRENT OF cursor
<
< This requires using the row ctid to map cursor rows back to the
< original heap row. This become more complicated if WITH HOLD cursors
< are to be supported because WITH HOLD cursors have a copy of the row
< and no FOR UPDATE lock.
< http://archives.postgresql.org/pgsql-hackers/2007-01/msg01014.php
<
> o -Allow UPDATE/DELETE WHERE CURRENT OF cursor
Tom Lane [Mon, 11 Jun 2007 01:16:30 +0000 (01:16 +0000)]
Support UPDATE/DELETE WHERE CURRENT OF cursor_name, per SQL standard.
Along the way, allow FOR UPDATE in non-WITH-HOLD cursors; there may once
have been a reason to disallow that, but it seems to work now, and it's
really rather necessary if you want to select a row via a cursor and then
update it in a concurrent-safe fashion.
Original patch by Arul Shaji, rather heavily editorialized by Tom Lane.
Tom Lane [Sat, 9 Jun 2007 18:49:55 +0000 (18:49 +0000)]
Teach heapam code to know the difference between a real seqscan and the
pseudo HeapScanDesc created for a bitmap heap scan. This avoids some useless
overhead during a bitmap scan startup, in particular invoking the syncscan
code. (We might someday want to do that, but right now it's merely useless
contention for shared memory, to say nothing of possibly pushing useful
entries out of syncscan's small LRU list.) This also allows elimination of
ugly pgstat_discount_heap_scan() kluge.
Tom Lane [Sat, 9 Jun 2007 17:24:46 +0000 (17:24 +0000)]
Insert ORDER BY into a few regression test queries that now have unstable
results due to syncscan patch, when shared_buffers is small enough. Per
buildfarm reports and some local testing with shared_buffers set to the
lowest value considered by initdb.
Tom Lane [Sat, 9 Jun 2007 15:52:30 +0000 (15:52 +0000)]
Allow numeric_fac() to be interrupted, since it can take quite a while for
large inputs. Also cause it to error out immediately if the result will
overflow, instead of grinding through a lot of calculation first.
Per gripe from Jim Nasby.
Alvaro Herrera [Fri, 8 Jun 2007 21:21:28 +0000 (21:21 +0000)]
Disallow the cost balancing code from resulting in a zero cost limit, which
causes a division-by-zero error in the vacuum code. This can happen when there
are more workers than cost limit units.
Per report from Galy Lee in
<200705310914.l4V9E6JA094603@wwwmaster.postgresql.org>.
Alvaro Herrera [Fri, 8 Jun 2007 21:09:49 +0000 (21:09 +0000)]
Avoid passing zero as a value for vacuum_cost_limit, because it's not a valid
value for the vacuum code. Instead, make zero signify getting the value from a
higher level configuration facility, just like -1 in the original coding. We
still document that -1 is the value that disables the feature, to avoid
confusing the user unnecessarily.
Reported by Galy Lee in <200705310914.l4V9E6JA094603@wwwmaster.postgresql.org>;
per subsequent discussion.
Bruce Momjian [Fri, 8 Jun 2007 18:45:22 +0000 (18:45 +0000)]
Done:
< * Allow sequential scans to take advantage of other concurrent
> * -Allow sequential scans to take advantage of other concurrent
<
< One possible implementation is to start sequential scans from the lowest
< numbered buffer in the shared cache, and when reaching the end wrap
< around to the beginning, rather than always starting sequential scans
< at the start of the table.
<
< http://archives.postgresql.org/pgsql-patches/2006-12/msg00076.php
< http://archives.postgresql.org/pgsql-hackers/2006-12/msg00408.php
< http://archives.postgresql.org/pgsql-hackers/2006-12/msg00784.php
< http://archives.postgresql.org/pgsql-hackers/2007-03/msg00415.php
<
Tom Lane [Fri, 8 Jun 2007 18:23:53 +0000 (18:23 +0000)]
Arrange for large sequential scans to synchronize with each other, so that
when multiple backends are scanning the same relation concurrently, each page
is (ideally) read only once.
Tom Lane [Thu, 7 Jun 2007 21:45:59 +0000 (21:45 +0000)]
Redefine IsTransactionState() to only return true for TRANS_INPROGRESS state,
which is the only state in which it's safe to initiate database queries.
It turns out that all but two of the callers thought that's what it meant;
and the other two were using it as a proxy for "will GetTopTransactionId()
return a nonzero XID"? Since it was in fact an unreliable guide to that,
make those two just invoke GetTopTransactionId() always, then deal with a
zero result if they get one.
Tom Lane [Thu, 7 Jun 2007 19:19:57 +0000 (19:19 +0000)]
Rework temp_tablespaces patch so that temp tablespaces are assigned separately
for each temp file, rather than once per sort or hashjoin; this allows
spreading the data of a large sort or join across multiple tablespaces.
(I remain dubious that this will make any difference in practice, but certain
people insisted.) Arrange to cache the results of parsing the GUC variable
instead of recomputing from scratch on every demand, and push usage of the
cache down to the bottommost fd.c level.
Magnus Hagander [Thu, 7 Jun 2007 09:56:25 +0000 (09:56 +0000)]
The functions bt_metap, bt_page_stats and bt_page_items had moved
from contrib/pgstattuple to pageinspect. We've already fixed English
documentation, but Japanese version does not catch up.
Tom Lane [Wed, 6 Jun 2007 23:00:50 +0000 (23:00 +0000)]
Fix up text concatenation so that it accepts all the reasonable cases that
were accepted by prior Postgres releases. This takes care of the loose end
left by the preceding patch to downgrade implicit casts-to-text. To avoid
breaking desirable behavior for array concatenation, introduce a new
polymorphic pseudo-type "anynonarray" --- the added concatenation operators
are actually text || anynonarray and anynonarray || text.
Tom Lane [Tue, 5 Jun 2007 21:31:09 +0000 (21:31 +0000)]
Downgrade implicit casts to text to be assignment-only, except for the ones
from the other string-category types; this eliminates a lot of surprising
interpretations that the parser could formerly make when there was no directly
applicable operator.
Create a general mechanism that supports casts to and from the standard string
types (text,varchar,bpchar) for *every* datatype, by invoking the datatype's
I/O functions. These new casts are assignment-only in the to-string direction,
explicit-only in the other, and therefore should create no surprising behavior.
Remove a bunch of thereby-obsoleted datatype-specific casting functions.
The "general mechanism" is a new expression node type CoerceViaIO that can
actually convert between *any* two datatypes if their external text
representations are compatible. This is more general than needed for the
immediate feature, but might be useful in plpgsql or other places in future.
This commit does nothing about the issue that applying the concatenation
operator || to non-text types will now fail, often with strange error messages
due to misinterpreting the operator as array concatenation. Since it often
(not always) worked before, we should either make it succeed or at least give
a more user-friendly error; but details are still under debate.
Jan Wieck [Tue, 5 Jun 2007 20:00:41 +0000 (20:00 +0000)]
The session_replication_role actually can be changed at will during
a session regardless of the existence of cached plans. The plancache
only needs to be invalidated so that rules affected by the new setting
will be reflected in the new query plans.
Teodor Sigaev [Mon, 4 Jun 2007 15:56:28 +0000 (15:56 +0000)]
Fix bundle bugs of GIN:
- Fix possible deadlock between UPDATE and VACUUM queries. Bug never was
observed in 8.2, but it still exist there. HEAD is more sensitive to
bug after recent "ring" of buffer improvements.
- Fix WAL creation: if parent page is stored as is after split then
incomplete split isn't removed during replay. This happens rather rare, only
on large tables with a lot of updates/inserts.
- Fix WAL replay: there was wrong test of XLR_BKP_BLOCK_* for left
page after deletion of page. That causes wrong rightlink field: it pointed
to deleted page.
- add checking of match of clearing incomplete split
- cleanup incomplete split list after proceeding
All of this chages doesn't change on-disk storage, so backpatch...
But second point may be an issue for replaying logs from previous version.
Magnus Hagander [Mon, 4 Jun 2007 13:39:28 +0000 (13:39 +0000)]
On win32, retry reading when WSARecv returns WSAEWOULDBLOCK. There seem
to be cases when at least Windows 2000 can do this even though select
just indicated that the socket is readable.
Bruce Momjian [Sun, 3 Jun 2007 18:49:28 +0000 (18:49 +0000)]
Remove description for:
o -Add a GUC variable to control the tablespace for temporary objects
and sort files
<
< It could start with a random tablespace from a supplied list and
< cycle through the list.
<
Tom Lane [Sun, 3 Jun 2007 17:08:34 +0000 (17:08 +0000)]
Create a GUC parameter temp_tablespaces that allows selection of the
tablespace(s) in which to store temp tables and temporary files. This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created). Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.
Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
Bruce Momjian [Sat, 2 Jun 2007 11:28:01 +0000 (11:28 +0000)]
Re-add TODO and clarify it is for the kernel cache:
< * Allow free-behind capability for large sequential scans, perhaps using
< posix_fadvise()
> * Allow free-behind capability for large sequential scans to avoid
> kernel cache spoiling
Bruce Momjian [Sat, 2 Jun 2007 02:46:38 +0000 (02:46 +0000)]
TODO item not needed anymore now that the buffer cache is
scan-resistant:
<
< * Allow free-behind capability for large sequential scans, perhaps using
< posix_fadvise()
<
< Posix_fadvise() can control both sequential/random file caching and
< free-behind behavior, but it is unclear how the setting affects other
< backends that also have the file open, and the feature is not supported
< on all operating systems.
Andrew Dunstan [Sat, 2 Jun 2007 02:03:42 +0000 (02:03 +0000)]
Improve efficiency of LIKE/ILIKE code, especially for multi-byte charsets,
and most especially for UTF8. Remove unnecessary special cases for bytea
processing and single-byte charset ILIKE. a ILIKE b is now processed as
lower(a) LIKE lower(b) in all cases. The code is now considerably simpler. All
comparisons are now performed byte-wise, and the text and pattern are also
advanced byte-wise where it is safe to do so - essentially where a wildcard is
not being matched.
Andrew Dunstan, from an original patch by ITAGAKI Takahiro, with ideas from
Tom Lane and Mark Mielke.