Tom Lane [Fri, 15 Jun 2007 20:56:52 +0000 (20:56 +0000)]
Tweak the API for per-datatype typmodin functions so that they are passed
an array of strings rather than an array of integers, and allow any simple
constant or identifier to be used in typmods; for example
create table foo (f1 widget(42,'23skidoo',point));
Of course the typmodin function has still got to pack this info into a
non-negative int32 for storage, but it's still a useful improvement in
flexibility, especially considering that you can do nearly anything if you
are willing to keep the info in a side table. We can get away with this
change since we have not yet released a version providing user-definable
typmods. Per discussion.
Andrew Dunstan [Thu, 14 Jun 2007 01:48:51 +0000 (01:48 +0000)]
Implement a chunking protocol for writes to the syslogger pipe, with messages
reassembled in the syslogger before writing to the log file. This prevents
partial messages from being written, which mucks up log rotation, and
messages from different backends being interleaved, which causes garbled
logs. Backport as far as 8.0, where the syslogger was introduced.
Neil Conway [Wed, 13 Jun 2007 23:59:47 +0000 (23:59 +0000)]
Schema-qualify several references to the builtin function length(), to
avoid mistakenly calling a function of the same name that might happen
to appear earlier in the schema search path.
Tom Lane [Tue, 12 Jun 2007 19:46:24 +0000 (19:46 +0000)]
Add some simple defenses against null fields in pg_largeobject, and add
comments noting that there's an alignment assumption now that the data
field could be in 1-byte-header format. Per discussion with Greg Stark.
Tom Lane [Tue, 12 Jun 2007 15:58:32 +0000 (15:58 +0000)]
Fix DecodeDateTime to allow timezone to appear before year. This had
historically worked in some but not all cases, but as of 8.2 it failed for all
timezone formats. Fix, and add regression test cases to catch future
regressions in this area. Per gripe from Adam Witney.
Magnus Hagander [Tue, 12 Jun 2007 11:07:34 +0000 (11:07 +0000)]
Rewrite ECPG regression test driver in C, by splitting the standard
regression driver into two parts and reusing half of it. Required to
run ECPG tests without a shell on MSVC builds.
Fix ECPG thread tests for MSVC build (incl output files).
Tom Lane [Mon, 11 Jun 2007 22:22:42 +0000 (22:22 +0000)]
Improve UPDATE/DELETE WHERE CURRENT OF so that they can be used from plpgsql
with a plpgsql-defined cursor. The underlying mechanism for this is that the
main SQL engine will now take "WHERE CURRENT OF $n" where $n is a refcursor
parameter. Not sure if we should document that fact or consider it an
implementation detail. Per discussion with Pavel Stehule.
Bruce Momjian [Mon, 11 Jun 2007 01:51:50 +0000 (01:51 +0000)]
Done:
< o Allow UPDATE/DELETE WHERE CURRENT OF cursor
<
< This requires using the row ctid to map cursor rows back to the
< original heap row. This become more complicated if WITH HOLD cursors
< are to be supported because WITH HOLD cursors have a copy of the row
< and no FOR UPDATE lock.
< http://archives.postgresql.org/pgsql-hackers/2007-01/msg01014.php
<
> o -Allow UPDATE/DELETE WHERE CURRENT OF cursor
Tom Lane [Mon, 11 Jun 2007 01:16:30 +0000 (01:16 +0000)]
Support UPDATE/DELETE WHERE CURRENT OF cursor_name, per SQL standard.
Along the way, allow FOR UPDATE in non-WITH-HOLD cursors; there may once
have been a reason to disallow that, but it seems to work now, and it's
really rather necessary if you want to select a row via a cursor and then
update it in a concurrent-safe fashion.
Original patch by Arul Shaji, rather heavily editorialized by Tom Lane.
Tom Lane [Sat, 9 Jun 2007 18:49:55 +0000 (18:49 +0000)]
Teach heapam code to know the difference between a real seqscan and the
pseudo HeapScanDesc created for a bitmap heap scan. This avoids some useless
overhead during a bitmap scan startup, in particular invoking the syncscan
code. (We might someday want to do that, but right now it's merely useless
contention for shared memory, to say nothing of possibly pushing useful
entries out of syncscan's small LRU list.) This also allows elimination of
ugly pgstat_discount_heap_scan() kluge.
Tom Lane [Sat, 9 Jun 2007 17:24:46 +0000 (17:24 +0000)]
Insert ORDER BY into a few regression test queries that now have unstable
results due to syncscan patch, when shared_buffers is small enough. Per
buildfarm reports and some local testing with shared_buffers set to the
lowest value considered by initdb.
Tom Lane [Sat, 9 Jun 2007 15:52:30 +0000 (15:52 +0000)]
Allow numeric_fac() to be interrupted, since it can take quite a while for
large inputs. Also cause it to error out immediately if the result will
overflow, instead of grinding through a lot of calculation first.
Per gripe from Jim Nasby.
Alvaro Herrera [Fri, 8 Jun 2007 21:21:28 +0000 (21:21 +0000)]
Disallow the cost balancing code from resulting in a zero cost limit, which
causes a division-by-zero error in the vacuum code. This can happen when there
are more workers than cost limit units.
Per report from Galy Lee in
<200705310914.l4V9E6JA094603@wwwmaster.postgresql.org>.
Alvaro Herrera [Fri, 8 Jun 2007 21:09:49 +0000 (21:09 +0000)]
Avoid passing zero as a value for vacuum_cost_limit, because it's not a valid
value for the vacuum code. Instead, make zero signify getting the value from a
higher level configuration facility, just like -1 in the original coding. We
still document that -1 is the value that disables the feature, to avoid
confusing the user unnecessarily.
Reported by Galy Lee in <200705310914.l4V9E6JA094603@wwwmaster.postgresql.org>;
per subsequent discussion.
Bruce Momjian [Fri, 8 Jun 2007 18:45:22 +0000 (18:45 +0000)]
Done:
< * Allow sequential scans to take advantage of other concurrent
> * -Allow sequential scans to take advantage of other concurrent
<
< One possible implementation is to start sequential scans from the lowest
< numbered buffer in the shared cache, and when reaching the end wrap
< around to the beginning, rather than always starting sequential scans
< at the start of the table.
<
< http://archives.postgresql.org/pgsql-patches/2006-12/msg00076.php
< http://archives.postgresql.org/pgsql-hackers/2006-12/msg00408.php
< http://archives.postgresql.org/pgsql-hackers/2006-12/msg00784.php
< http://archives.postgresql.org/pgsql-hackers/2007-03/msg00415.php
<
Tom Lane [Fri, 8 Jun 2007 18:23:53 +0000 (18:23 +0000)]
Arrange for large sequential scans to synchronize with each other, so that
when multiple backends are scanning the same relation concurrently, each page
is (ideally) read only once.
Tom Lane [Thu, 7 Jun 2007 21:45:59 +0000 (21:45 +0000)]
Redefine IsTransactionState() to only return true for TRANS_INPROGRESS state,
which is the only state in which it's safe to initiate database queries.
It turns out that all but two of the callers thought that's what it meant;
and the other two were using it as a proxy for "will GetTopTransactionId()
return a nonzero XID"? Since it was in fact an unreliable guide to that,
make those two just invoke GetTopTransactionId() always, then deal with a
zero result if they get one.
Tom Lane [Thu, 7 Jun 2007 19:19:57 +0000 (19:19 +0000)]
Rework temp_tablespaces patch so that temp tablespaces are assigned separately
for each temp file, rather than once per sort or hashjoin; this allows
spreading the data of a large sort or join across multiple tablespaces.
(I remain dubious that this will make any difference in practice, but certain
people insisted.) Arrange to cache the results of parsing the GUC variable
instead of recomputing from scratch on every demand, and push usage of the
cache down to the bottommost fd.c level.
Magnus Hagander [Thu, 7 Jun 2007 09:56:25 +0000 (09:56 +0000)]
The functions bt_metap, bt_page_stats and bt_page_items had moved
from contrib/pgstattuple to pageinspect. We've already fixed English
documentation, but Japanese version does not catch up.
Tom Lane [Wed, 6 Jun 2007 23:00:50 +0000 (23:00 +0000)]
Fix up text concatenation so that it accepts all the reasonable cases that
were accepted by prior Postgres releases. This takes care of the loose end
left by the preceding patch to downgrade implicit casts-to-text. To avoid
breaking desirable behavior for array concatenation, introduce a new
polymorphic pseudo-type "anynonarray" --- the added concatenation operators
are actually text || anynonarray and anynonarray || text.
Tom Lane [Tue, 5 Jun 2007 21:31:09 +0000 (21:31 +0000)]
Downgrade implicit casts to text to be assignment-only, except for the ones
from the other string-category types; this eliminates a lot of surprising
interpretations that the parser could formerly make when there was no directly
applicable operator.
Create a general mechanism that supports casts to and from the standard string
types (text,varchar,bpchar) for *every* datatype, by invoking the datatype's
I/O functions. These new casts are assignment-only in the to-string direction,
explicit-only in the other, and therefore should create no surprising behavior.
Remove a bunch of thereby-obsoleted datatype-specific casting functions.
The "general mechanism" is a new expression node type CoerceViaIO that can
actually convert between *any* two datatypes if their external text
representations are compatible. This is more general than needed for the
immediate feature, but might be useful in plpgsql or other places in future.
This commit does nothing about the issue that applying the concatenation
operator || to non-text types will now fail, often with strange error messages
due to misinterpreting the operator as array concatenation. Since it often
(not always) worked before, we should either make it succeed or at least give
a more user-friendly error; but details are still under debate.
Jan Wieck [Tue, 5 Jun 2007 20:00:41 +0000 (20:00 +0000)]
The session_replication_role actually can be changed at will during
a session regardless of the existence of cached plans. The plancache
only needs to be invalidated so that rules affected by the new setting
will be reflected in the new query plans.
Teodor Sigaev [Mon, 4 Jun 2007 15:56:28 +0000 (15:56 +0000)]
Fix bundle bugs of GIN:
- Fix possible deadlock between UPDATE and VACUUM queries. Bug never was
observed in 8.2, but it still exist there. HEAD is more sensitive to
bug after recent "ring" of buffer improvements.
- Fix WAL creation: if parent page is stored as is after split then
incomplete split isn't removed during replay. This happens rather rare, only
on large tables with a lot of updates/inserts.
- Fix WAL replay: there was wrong test of XLR_BKP_BLOCK_* for left
page after deletion of page. That causes wrong rightlink field: it pointed
to deleted page.
- add checking of match of clearing incomplete split
- cleanup incomplete split list after proceeding
All of this chages doesn't change on-disk storage, so backpatch...
But second point may be an issue for replaying logs from previous version.
Magnus Hagander [Mon, 4 Jun 2007 13:39:28 +0000 (13:39 +0000)]
On win32, retry reading when WSARecv returns WSAEWOULDBLOCK. There seem
to be cases when at least Windows 2000 can do this even though select
just indicated that the socket is readable.
Bruce Momjian [Sun, 3 Jun 2007 18:49:28 +0000 (18:49 +0000)]
Remove description for:
o -Add a GUC variable to control the tablespace for temporary objects
and sort files
<
< It could start with a random tablespace from a supplied list and
< cycle through the list.
<
Tom Lane [Sun, 3 Jun 2007 17:08:34 +0000 (17:08 +0000)]
Create a GUC parameter temp_tablespaces that allows selection of the
tablespace(s) in which to store temp tables and temporary files. This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created). Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.
Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
Bruce Momjian [Sat, 2 Jun 2007 11:28:01 +0000 (11:28 +0000)]
Re-add TODO and clarify it is for the kernel cache:
< * Allow free-behind capability for large sequential scans, perhaps using
< posix_fadvise()
> * Allow free-behind capability for large sequential scans to avoid
> kernel cache spoiling
Bruce Momjian [Sat, 2 Jun 2007 02:46:38 +0000 (02:46 +0000)]
TODO item not needed anymore now that the buffer cache is
scan-resistant:
<
< * Allow free-behind capability for large sequential scans, perhaps using
< posix_fadvise()
<
< Posix_fadvise() can control both sequential/random file caching and
< free-behind behavior, but it is unclear how the setting affects other
< backends that also have the file open, and the feature is not supported
< on all operating systems.
Andrew Dunstan [Sat, 2 Jun 2007 02:03:42 +0000 (02:03 +0000)]
Improve efficiency of LIKE/ILIKE code, especially for multi-byte charsets,
and most especially for UTF8. Remove unnecessary special cases for bytea
processing and single-byte charset ILIKE. a ILIKE b is now processed as
lower(a) LIKE lower(b) in all cases. The code is now considerably simpler. All
comparisons are now performed byte-wise, and the text and pattern are also
advanced byte-wise where it is safe to do so - essentially where a wildcard is
not being matched.
Andrew Dunstan, from an original patch by ITAGAKI Takahiro, with ideas from
Tom Lane and Mark Mielke.
Tom Lane [Fri, 1 Jun 2007 23:43:11 +0000 (23:43 +0000)]
Fix aboriginal bug in BufFileDumpBuffer that would cause it to write the
wrong data when dumping a bufferload that crosses a component-file boundary.
This probably has not been seen in the wild because (a) component files are
normally 1GB apiece and (b) non-block-aligned buffer usage is relatively
rare. But it's fairly easy to reproduce a problem if one reduces RELSEG_SIZE
in a test build. Kudos to Kurt Harriman for spotting the bug.
Neil Conway [Fri, 1 Jun 2007 23:40:19 +0000 (23:40 +0000)]
Allow leading and trailing whitespace in the input to the boolean
type. Also, add explicit casts between boolean and text/varchar. Both
of these changes are for conformance with SQL:2003.
Tom Lane [Fri, 1 Jun 2007 19:38:07 +0000 (19:38 +0000)]
Make CREATE/DROP/RENAME DATABASE wait a little bit to see if other backends
will exit before failing because of conflicting DB usage. Per discussion,
this seems a good idea to help mask the fact that backend exit takes nonzero
time. Remove a couple of thereby-obsoleted sleeps in contrib and PL
regression test sequences.
Bruce Momjian [Fri, 1 Jun 2007 18:41:55 +0000 (18:41 +0000)]
Add URL for:
o Research self-referential UPDATEs that see inconsistent row versions
in read-committed mode
<
> http://archives.postgresql.org/pgsql-hackers/2007-06/msg00016.php
Tom Lane [Fri, 1 Jun 2007 17:38:44 +0000 (17:38 +0000)]
Buy back some of the cycles spent in more-expensive hash functions by
selecting power-of-2, rather than prime, numbers of buckets in hash joins.
If the hash functions are doing their jobs properly by making all hash bits
equally random, this is good enough, and it saves expensive integer division
and modulus operations.
Tom Lane [Fri, 1 Jun 2007 15:33:19 +0000 (15:33 +0000)]
Fix several hash functions that were taking chintzy shortcuts instead of
delivering a well-randomized hash value. I got religion on this after
observing that performance of multi-batch hash join degrades terribly if the
higher-order bits of hash values aren't random, as indeed was true for say
hashes of small integer values. It's now expected and documented that hash
functions should use hash_any or some comparable method to ensure that all
bits of their output are about equally random.
initdb forced because this change invalidates existing hash indexes. For the
same reason, this isn't back-patchable; the hash join performance problem
will get a band-aid fix in the back branches.
Tom Lane [Thu, 31 May 2007 20:45:26 +0000 (20:45 +0000)]
The shortcut exit that I recently added to ExecInitIndexScan() for
EXPLAIN-only operation was a little too short; it skipped initializing the
node's result tuple type, which may be needed depending on what's above the
indexscan node. Call ExecAssignResultTypeFromTL before exiting. (For good
luck I moved up the ExecAssignScanProjectionInfo call as well, so that
everything except indexscan-specific initialization will still be done.)
Per example from Grant Finnemore.
Tom Lane [Thu, 31 May 2007 16:57:34 +0000 (16:57 +0000)]
Change build_index_pathkeys() so that the expressions it builds to represent
index key columns always have the type expected by the index's associated
operators, ie, we add RelabelType nodes when dealing with binary-compatible
index opclasses. This is needed to get varchar indexes to play nicely with
the new EquivalenceClass machinery, as per recent gripe from Josh Berkus that
CVS HEAD was failing to match a varchar index column to a constant restriction
in the query.
It seems likely that this change will allow removal of a lot of ugly ad-hoc
RelabelType-stripping that the planner has traditionally done while matching
expressions to other expressions, but I'll worry about that some other day.
Tom Lane [Wed, 30 May 2007 21:01:39 +0000 (21:01 +0000)]
Fix overly-strict sanity check in BeginInternalSubTransaction that made it
fail when used in a deferred trigger. Bug goes back to 8.0; no doubt the
reason it hadn't been noticed is that we've been discouraging use of
user-defined constraint triggers. Per report from Frank van Vugt.
Bruce Momjian [Wed, 30 May 2007 20:26:06 +0000 (20:26 +0000)]
Update:
< * Consider allowing 64-bit integers to be passed by value on 64-bit
< platforms
> * Consider allowing 64-bit integers and floats to be passed by value on
> 64-bit platforms
>
> Also change 32-bit floats (float4) to be passed by value at the same
> time.
>
Tom Lane [Wed, 30 May 2007 20:12:03 +0000 (20:12 +0000)]
Make large sequential scans and VACUUMs work in a limited-size "ring" of
buffers, rather than blowing out the whole shared-buffer arena. Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer. Those flushes will now occur only once per
ring-ful. The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done. The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.
This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it. To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.
Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.
Original patch by Simon, reworked by Heikki and again by Tom.
Bruce Momjian [Wed, 30 May 2007 19:07:20 +0000 (19:07 +0000)]
Add URL for:
* Improve speed with indexes
For large table adjustments during VACUUM FULL, it is faster to cluster
or reindex rather than update the index. Also, index updates can bloat
the index.
Neil Conway [Tue, 29 May 2007 04:58:43 +0000 (04:58 +0000)]
Fix a bug in input processing for the "interval" type. Previously,
"microsecond" and "millisecond" units were not considered valid input
by themselves, which caused inputs like "1 millisecond" to be rejected
erroneously.
Update the docs, add regression tests, and backport to 8.2 and 8.1
Tom Lane [Mon, 28 May 2007 16:43:24 +0000 (16:43 +0000)]
Tweak the code in a couple of places to try to deliver more user-friendly
error messages when a single COPY line is too long for us to handle. Per
example from Johann Spies.
Tom Lane [Sun, 27 May 2007 17:28:36 +0000 (17:28 +0000)]
Ooops, I was too busy worrying about getting the transactional infrastructure
right to think carefully about how insert and delete counts map to
n_live_tuples. Of course a deletion should reduce n_live_tuples.