Bruce Momjian [Sun, 3 Jun 2007 18:49:28 +0000 (18:49 +0000)]
Remove description for:
o -Add a GUC variable to control the tablespace for temporary objects
and sort files
<
< It could start with a random tablespace from a supplied list and
< cycle through the list.
<
Tom Lane [Sun, 3 Jun 2007 17:08:34 +0000 (17:08 +0000)]
Create a GUC parameter temp_tablespaces that allows selection of the
tablespace(s) in which to store temp tables and temporary files. This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created). Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.
Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
Bruce Momjian [Sat, 2 Jun 2007 11:28:01 +0000 (11:28 +0000)]
Re-add TODO and clarify it is for the kernel cache:
< * Allow free-behind capability for large sequential scans, perhaps using
< posix_fadvise()
> * Allow free-behind capability for large sequential scans to avoid
> kernel cache spoiling
Bruce Momjian [Sat, 2 Jun 2007 02:46:38 +0000 (02:46 +0000)]
TODO item not needed anymore now that the buffer cache is
scan-resistant:
<
< * Allow free-behind capability for large sequential scans, perhaps using
< posix_fadvise()
<
< Posix_fadvise() can control both sequential/random file caching and
< free-behind behavior, but it is unclear how the setting affects other
< backends that also have the file open, and the feature is not supported
< on all operating systems.
Andrew Dunstan [Sat, 2 Jun 2007 02:03:42 +0000 (02:03 +0000)]
Improve efficiency of LIKE/ILIKE code, especially for multi-byte charsets,
and most especially for UTF8. Remove unnecessary special cases for bytea
processing and single-byte charset ILIKE. a ILIKE b is now processed as
lower(a) LIKE lower(b) in all cases. The code is now considerably simpler. All
comparisons are now performed byte-wise, and the text and pattern are also
advanced byte-wise where it is safe to do so - essentially where a wildcard is
not being matched.
Andrew Dunstan, from an original patch by ITAGAKI Takahiro, with ideas from
Tom Lane and Mark Mielke.
Tom Lane [Fri, 1 Jun 2007 23:43:11 +0000 (23:43 +0000)]
Fix aboriginal bug in BufFileDumpBuffer that would cause it to write the
wrong data when dumping a bufferload that crosses a component-file boundary.
This probably has not been seen in the wild because (a) component files are
normally 1GB apiece and (b) non-block-aligned buffer usage is relatively
rare. But it's fairly easy to reproduce a problem if one reduces RELSEG_SIZE
in a test build. Kudos to Kurt Harriman for spotting the bug.
Neil Conway [Fri, 1 Jun 2007 23:40:19 +0000 (23:40 +0000)]
Allow leading and trailing whitespace in the input to the boolean
type. Also, add explicit casts between boolean and text/varchar. Both
of these changes are for conformance with SQL:2003.
Tom Lane [Fri, 1 Jun 2007 19:38:07 +0000 (19:38 +0000)]
Make CREATE/DROP/RENAME DATABASE wait a little bit to see if other backends
will exit before failing because of conflicting DB usage. Per discussion,
this seems a good idea to help mask the fact that backend exit takes nonzero
time. Remove a couple of thereby-obsoleted sleeps in contrib and PL
regression test sequences.
Bruce Momjian [Fri, 1 Jun 2007 18:41:55 +0000 (18:41 +0000)]
Add URL for:
o Research self-referential UPDATEs that see inconsistent row versions
in read-committed mode
<
> http://archives.postgresql.org/pgsql-hackers/2007-06/msg00016.php
Tom Lane [Fri, 1 Jun 2007 17:38:44 +0000 (17:38 +0000)]
Buy back some of the cycles spent in more-expensive hash functions by
selecting power-of-2, rather than prime, numbers of buckets in hash joins.
If the hash functions are doing their jobs properly by making all hash bits
equally random, this is good enough, and it saves expensive integer division
and modulus operations.
Tom Lane [Fri, 1 Jun 2007 15:33:19 +0000 (15:33 +0000)]
Fix several hash functions that were taking chintzy shortcuts instead of
delivering a well-randomized hash value. I got religion on this after
observing that performance of multi-batch hash join degrades terribly if the
higher-order bits of hash values aren't random, as indeed was true for say
hashes of small integer values. It's now expected and documented that hash
functions should use hash_any or some comparable method to ensure that all
bits of their output are about equally random.
initdb forced because this change invalidates existing hash indexes. For the
same reason, this isn't back-patchable; the hash join performance problem
will get a band-aid fix in the back branches.
Tom Lane [Thu, 31 May 2007 20:45:26 +0000 (20:45 +0000)]
The shortcut exit that I recently added to ExecInitIndexScan() for
EXPLAIN-only operation was a little too short; it skipped initializing the
node's result tuple type, which may be needed depending on what's above the
indexscan node. Call ExecAssignResultTypeFromTL before exiting. (For good
luck I moved up the ExecAssignScanProjectionInfo call as well, so that
everything except indexscan-specific initialization will still be done.)
Per example from Grant Finnemore.
Tom Lane [Thu, 31 May 2007 16:57:34 +0000 (16:57 +0000)]
Change build_index_pathkeys() so that the expressions it builds to represent
index key columns always have the type expected by the index's associated
operators, ie, we add RelabelType nodes when dealing with binary-compatible
index opclasses. This is needed to get varchar indexes to play nicely with
the new EquivalenceClass machinery, as per recent gripe from Josh Berkus that
CVS HEAD was failing to match a varchar index column to a constant restriction
in the query.
It seems likely that this change will allow removal of a lot of ugly ad-hoc
RelabelType-stripping that the planner has traditionally done while matching
expressions to other expressions, but I'll worry about that some other day.
Tom Lane [Wed, 30 May 2007 21:01:39 +0000 (21:01 +0000)]
Fix overly-strict sanity check in BeginInternalSubTransaction that made it
fail when used in a deferred trigger. Bug goes back to 8.0; no doubt the
reason it hadn't been noticed is that we've been discouraging use of
user-defined constraint triggers. Per report from Frank van Vugt.
Bruce Momjian [Wed, 30 May 2007 20:26:06 +0000 (20:26 +0000)]
Update:
< * Consider allowing 64-bit integers to be passed by value on 64-bit
< platforms
> * Consider allowing 64-bit integers and floats to be passed by value on
> 64-bit platforms
>
> Also change 32-bit floats (float4) to be passed by value at the same
> time.
>
Tom Lane [Wed, 30 May 2007 20:12:03 +0000 (20:12 +0000)]
Make large sequential scans and VACUUMs work in a limited-size "ring" of
buffers, rather than blowing out the whole shared-buffer arena. Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer. Those flushes will now occur only once per
ring-ful. The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done. The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.
This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it. To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.
Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.
Original patch by Simon, reworked by Heikki and again by Tom.
Bruce Momjian [Wed, 30 May 2007 19:07:20 +0000 (19:07 +0000)]
Add URL for:
* Improve speed with indexes
For large table adjustments during VACUUM FULL, it is faster to cluster
or reindex rather than update the index. Also, index updates can bloat
the index.
Neil Conway [Tue, 29 May 2007 04:58:43 +0000 (04:58 +0000)]
Fix a bug in input processing for the "interval" type. Previously,
"microsecond" and "millisecond" units were not considered valid input
by themselves, which caused inputs like "1 millisecond" to be rejected
erroneously.
Update the docs, add regression tests, and backport to 8.2 and 8.1
Tom Lane [Mon, 28 May 2007 16:43:24 +0000 (16:43 +0000)]
Tweak the code in a couple of places to try to deliver more user-friendly
error messages when a single COPY line is too long for us to handle. Per
example from Johann Spies.
Tom Lane [Sun, 27 May 2007 17:28:36 +0000 (17:28 +0000)]
Ooops, I was too busy worrying about getting the transactional infrastructure
right to think carefully about how insert and delete counts map to
n_live_tuples. Of course a deletion should reduce n_live_tuples.
Tom Lane [Sun, 27 May 2007 05:37:50 +0000 (05:37 +0000)]
pgstat's on-proc-exit hook has to execute after the last transaction commit
or abort within a backend; rearrange InitPostgres processing to make it so.
Revealed by just-added Asserts along with ECPG regression tests (hm, I wonder
why the core regression tests didn't expose it?). This possibly is another
reason for missing stats updates ...
Tom Lane [Sun, 27 May 2007 03:50:39 +0000 (03:50 +0000)]
Fix up pgstats counting of live and dead tuples to recognize that committed
and aborted transactions have different effects; also teach it not to assume
that prepared transactions are always committed.
Along the way, simplify the pgstats API by tying counting directly to
Relations; I cannot detect any redeeming social value in having stats
pointers in HeapScanDesc and IndexScanDesc structures. And fix a few
corner cases in which counts might be missed because the relation's
pgstat_info pointer hadn't been set.
Tom Lane [Sat, 26 May 2007 18:23:02 +0000 (18:23 +0000)]
Repair two constraint-exclusion corner cases triggered by proving that an
inheritance child of an UPDATE/DELETE target relation can be excluded by
constraints. I had rearranged some code in set_append_rel_pathlist() to
avoid "useless" work when a child is excluded, but overdid it and left
the child with no cheapest_path entry, causing possible failure later
if the appendrel was involved in a join. Also, it seems that the dummy
plan generated by inheritance_planner() when all branches are excluded
has to be a bit less dummy now than was required in 8.2.
Per report from Jan Wieck. Add his test case to the regression tests.
Tom Lane [Fri, 25 May 2007 17:54:25 +0000 (17:54 +0000)]
Create hooks to let a loadable plugin monitor (or even replace) the planner
and/or create plans for hypothetical situations; in particular, investigate
plans that would be generated using hypothetical indexes. This is a
heavily-rewritten version of the hooks proposed by Gurjeet Singh for his
Index Advisor project. In this formulation, the index advisor can be
entirely a loadable module instead of requiring a significant part to be
in the core backend, and plans can be generated for hypothetical indexes
without requiring the creation and rolling-back of system catalog entries.
The index advisor patch as-submitted is not compatible with these hooks,
but it needs significant work anyway due to other 8.2-to-8.3 planner
changes. With these hooks in the core backend, development of the advisor
can proceed as a pgfoundry project.
Tom Lane [Thu, 24 May 2007 18:58:42 +0000 (18:58 +0000)]
Remove ruleutils.c's use of varnoold/varoattno as a shortcut for determining
what a Var node refers to. This is no longer necessary because the new
flat-range-table representation of plan trees makes it relatively easy to dig
down through child plan levels to find the original reference; and to keep
doing it that way, we'd have to store joinaliasvars lists in flattened RTEs,
as demonstrated by bug report from Leszek Trenkner. This change makes
varnoold/varoattno truly just debug aids, which wasn't quite the case before.
Perhaps we should drop them, or only have them in assert-enabled builds?
Tom Lane [Thu, 24 May 2007 18:54:10 +0000 (18:54 +0000)]
Avoid assuming that the fields of struct timeval have exactly type long.
This is probably incorrect on some platforms, and definitely draws a
compiler warning on Darwin.
Tom Lane [Tue, 22 May 2007 23:23:58 +0000 (23:23 +0000)]
Repair planner bug introduced in 8.2 by ability to rearrange outer joins:
in cases where a sub-SELECT inserts a WHERE clause between two outer joins,
that clause may prevent us from re-ordering the two outer joins. The code
was considering only the joins' own ON-conditions in determining reordering
safety, which is not good enough. Add a "delay_upper_joins" flag to
OuterJoinInfo to flag that we have detected such a clause and higher-level
outer joins shouldn't be permitted to commute with this one. (This might
seem overly coarse, but given the current rules for OJ reordering, it's
sufficient AFAICT.)
The failure case is actually pretty narrow: it needs a WHERE clause within
the RHS of a left join that checks the RHS of a lower left join, but is not
strict for that RHS (else we'd have simplified the lower join to a plain
join). Even then no failure will be manifest unless the planner chooses to
rearrange the join order.
Tom Lane [Tue, 22 May 2007 01:40:33 +0000 (01:40 +0000)]
Fix best_inner_indexscan to return both the cheapest-total-cost and
cheapest-startup-cost innerjoin indexscans, and make joinpath.c consider
both of these (when different) as the inside of a nestloop join. The
original design was based on the assumption that indexscan paths always
have negligible startup cost, and so total cost is the only important
figure of merit; an assumption that's obviously broken by bitmap
indexscans. This oversight could lead to choosing poor plans in cases
where fast-start behavior is more important than total cost, such as
LIMIT and IN queries. 8.1-vintage brain fade exposed by an example from
Chuck D.
Tom Lane [Mon, 21 May 2007 17:57:35 +0000 (17:57 +0000)]
Teach tuplestore.c to throw away data before the "mark" point when the caller
is using mark/restore but not rewind or backward-scan capability. Insert a
materialize plan node between a mergejoin and its inner child if the inner
child is a sort that is expected to spill to disk. The materialize shields
the sort from the need to do mark/restore and thereby allows it to perform
its final merge pass on-the-fly; while the materialize itself is normally
cheap since it won't spill to disk unless the number of tuples with equal
key values exceeds work_mem.
Peter Eisentraut [Mon, 21 May 2007 17:10:29 +0000 (17:10 +0000)]
XPath fixes:
- Function renamed to "xpath".
- Function is now strict, per discussion.
- Return empty array in case when XPath expression detects nothing
(previously, NULL was returned in such case), per discussion.
- (bugfix) Work with fragments with prologue: select xpath('/a',
'<?xml version="1.0"?><a /><b />'); // now XML datum is always wrapped
with dummy <x>...</x>, XML prologue simply goes away (if any).
- Some cleanup.
Nikolay Samokhvalov
Some code cleanup and documentation work by myself.
Tom Lane [Sun, 20 May 2007 21:08:19 +0000 (21:08 +0000)]
To support external compression of archived WAL data, add a flag bit to
WAL records that shows whether it is safe to remove full-page images
(ie, whether or not an on-line backup was in progress when the WAL entry
was made). Also make provision for an XLOG_NOOP record type that can be
used to fill in the extra space when decompressing the data for restore.
This is the portion of Koichi Suzuki's "full page writes" patch that
has to go into the core database. The remainder of that work is two
external compression and decompression programs, which for the time being
will undergo separate development on pgfoundry. Per discussion.
Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be
possible to compress them (the previous coding caused essential info
to be omitted). The other commonly-used record types seem OK already,
with the possible exception of GIN and GIST WAL records, which I don't
understand well enough to opine on.
Alvaro Herrera [Fri, 18 May 2007 23:19:42 +0000 (23:19 +0000)]
Have CLUSTER advance the table's relfrozenxid. The new frozen point is the
FreezeXid introduced in a recent commit, so there isn't any data loss in this
approach.
Doing it causes ALTER TABLE (or rather, the forms of it that cause a full table
rewrite) to be affected as well. In this case, the frozen point is RecentXmin,
because after the rewrite all the tuples are relabeled with the rewriting
transaction's Xid.
TOAST tables are fixed automatically as well, as fallout of the way they were
already being handled in the respective code paths.
With this patch, there is no longer need to VACUUM tables for Xid wraparound
purposes that have been cleaned up via TRUNCATE or CLUSTER.
Tom Lane [Fri, 18 May 2007 01:20:16 +0000 (01:20 +0000)]
Remove redundant logging of send failures when SSL is in use. While pqcomm.c
had been taught not to do that ages ago, the SSL code was helpfully bleating
anyway. Resolves some recent reports such as bug #3266; however the
underlying cause of the related bug #2829 is still unclear.
Tom Lane [Thu, 17 May 2007 23:31:49 +0000 (23:31 +0000)]
Temporary fix for the problem that pg_stat_activity, inet_client_addr(),
and inet_server_addr() fail if the client connected over a "scoped" IPv6
address. In this case getnameinfo() will return a string ending with
a poorly-standardized "%something" zone specifier, which these functions
try to feed to network_in(), which won't take it. So that we don't lose
functionality altogether, suppress the zone specifier before giving the
string to network_in(). Per report from Brian Hirt.
TODO: probably someday the inet type should support scoped IPv6 addresses,
and then this patch should be reverted.
Bruce Momjian [Thu, 17 May 2007 22:53:23 +0000 (22:53 +0000)]
Add URL for:
* Implement the SQL standard mechanism whereby REVOKE ROLE revokes only
the privilege granted by the invoking role, and not those granted
by other roles
>
> http://archives.postgresql.org/pgsql-bugs/2007-05/msg00010.php
Bruce Momjian [Thu, 17 May 2007 22:44:11 +0000 (22:44 +0000)]
Add, per Alvaro:
>
> * Implement the SQL standard mechanism whereby REVOKE ROLE revokes only
> the privilege granted by the invoking role, and not those granted
> by other roles
Tom Lane [Thu, 17 May 2007 19:35:08 +0000 (19:35 +0000)]
Fix parameter recalculation for Limit nodes: during a ReScan call we must
recompute the limit/offset immediately, so that the updated values are
available when the child's ReScan function is invoked. Add a regression
test for this, too. Bug is new in HEAD (due to the bounded-sorting patch)
so no need for back-patch.
I did not do anything about merging this signaling with chgParam processing,
but if we were to do that we'd still need to compute the updated values
at this point rather than during the first ProcNode call.
Per observation and test case from Greg Stark, though I didn't use his patch.
Alvaro Herrera [Thu, 17 May 2007 15:28:29 +0000 (15:28 +0000)]
Move the tuple freezing point in CLUSTER to a point further back in the past,
to avoid losing useful Xid information in not-so-old tuples. This makes
CLUSTER behave the same as VACUUM as far a tuple-freezing behavior goes
(though CLUSTER does not yet advance the table's relfrozenxid).
While at it, move the actual freezing operation in rewriteheap.c to a more
appropriate place, and document it thoroughly. This part of the patch from
Tom Lane.
Alvaro Herrera [Wed, 16 May 2007 17:28:20 +0000 (17:28 +0000)]
Have TRUNCATE advance the affected table's relfrozenxid to RecentXmin, to
avoid a later needless VACUUM for Xid-wraparound purposes. We can do this
since the table is known to be left empty, so no Xid remains on it.
Alvaro Herrera [Wed, 16 May 2007 16:36:56 +0000 (16:36 +0000)]
Have the rewriteheap code freeze old tuples. This is safe because it is only
applied to live tuples older than a recent Xmin, not to tuples that may be part
of an update chain. Those still keep their original markings.
This patch makes it possible for CLUSTER to advance relfrozenxid, thus avoiding
the need of vacuuming the table for Xid wraparound purposes. That will be
patched separately.
Alvaro Herrera [Tue, 15 May 2007 20:20:21 +0000 (20:20 +0000)]
Avoid emitting empty role names in the GRANTED BY clause of GRANT ROLE
when the grantor has been dropped. This is a workaround for the fact
that we don't track the grantor as a shared dependency.
Andrew Dunstan [Tue, 15 May 2007 19:47:51 +0000 (19:47 +0000)]
Remove directory qualification in <ossp/uuid.h> because it's not always installed in ossp.
Workaround for when it is: include the ossp directory using --with-includes.
Neil Conway [Tue, 15 May 2007 19:13:55 +0000 (19:13 +0000)]
Various fixes for the SGML docs. Consistently use spaces before/after
parentheses in syntax descriptions. Consistently use the present tense
when describing the basic purpose of each "DROP" command. Add a few
more hyperlinks.
Neil Conway [Tue, 15 May 2007 15:35:46 +0000 (15:35 +0000)]
Add a note to the documentation to clarify that even when
"autovacuum = off", the system may still periodically start autovacuum
processes to prevent XID wraparound. Patch from David Fetter, with
editorializing.
Tom Lane [Mon, 14 May 2007 20:24:41 +0000 (20:24 +0000)]
Get rid of the pg_shdepend entry for a TOAST table; it's unnecessary since
there's an indirect dependency on the owner via the parent table. We were
already handling indexes that way, but not toast tables for some reason.
Saves a little catalog space and cuts down the verbosity of checkSharedDependencies
reports.
Tom Lane [Mon, 14 May 2007 18:13:21 +0000 (18:13 +0000)]
Prevent RevalidateCachedPlan from making any permanent change in
ActiveSnapshot. Having it affect ActiveSnapshot only in the unusual
case of needing to replan seems a bad idea, and there's also the problem
that the created snap might be in a relatively short-lived context, as
noted by Jan Wieck. Also, there's no need to force a new snap at all
unless we are called with no snap currently set, which is an unusual
case in itself.
Alvaro Herrera [Mon, 14 May 2007 16:50:36 +0000 (16:50 +0000)]
Report all dependent objects to the server log when a shared object is dropped,
and only a truncated log of the objects in the current database to the client.
Also, instead of reporting object counts for all databases on which the user
might own objects, report only as many as fit in the predefined line count.
This is to avoid flooding the client when the user owns too many objects,
which could cause problems.
Per report from Ed L. on April 4th and subsequent discussion.