Bruce Momjian [Tue, 7 Feb 2006 16:03:50 +0000 (16:03 +0000)]
I think that NUMERIC datatype has a problem in the performance that
the format on Tuple(Numeric) and the format to calculate(NumericVar)
are different. I understood that to reduce I/O. However, when many
comparisons or calculations of NUMERIC are executed, the conversion
of Numeric and NumericVar becomes a bottleneck.
It is profile result when "create index on NUMERIC column" is executed:
The create index command executes many comparisons of Numeric values.
Functions other than comparetup_index spent a lot of cycles for
conversion from Numeric to NumericVar.
An attached patch enables the comparison of Numeric values without
executing conversion to NumericVar. The execution time of that SQL
becomes half.
o Test SQL (index_test table has 1,000,000 tuples)
create index index_test_idx on index_test(num_col);
o Test results (executed the test five times)
(1)PentiumIII
original: 39.789s 36.823s 36.737s 37.752s 37.019s
patched : 18.560s 19.103s 18.830s 18.408s 18.853s
4.07 36.30 2.38 71101189 0.00 0.00 AllocSetFree
3.83 38.53 2.23 69084012 0.00 0.00 free_var
The create index command executes many comparisons of Numeric values.
Functions other than comparetup_index spent a lot of cycles for
conversion from Numeric to NumericVar.
An attached patch enables the comparison of Numeric values without
executing conversion to NumericVar. The execution time of that SQL
becomes half.
o Test SQL (index_test table has 1,000,000 tuples)
create index index_test_idx on index_test(num_col);
o Test results (executed the test five times)
(1)PentiumIII
original: 39.789s 36.823s 36.737s 37.752s 37.019s
patched : 18.560s 19.103s 18.830s 18.408s 18.853s
Bruce Momjian [Tue, 7 Feb 2006 02:08:08 +0000 (02:08 +0000)]
Split up wal-logging items:
< * Allow control over which tables are WAL-logged [walcontrol]
> * Allow WAL logging to be turned off for a table, but the table
> might be dropped or truncated during crash recovery [walcontrol]
< commit. To do this, only a single writer can modify the table, and
< writes must happen only on new pages. Readers can continue accessing
< the table. This would affect COPY, and perhaps INSERT/UPDATE too.
< Another option is to avoid transaction logging entirely and truncate
< or drop the table on crash recovery. These should be implemented
< using ALTER TABLE, e.g. ALTER TABLE PERSISTENCE [ DROP | TRUNCATE |
< STABLE | DEFAULT ]. Tables using non-default logging should not use
< referential integrity with default-logging tables, and tables using
< stable logging probably can not have indexes. One complexity is
< the handling of indexes on TOAST tables.
> commit. This should be implemented using ALTER TABLE, e.g. ALTER
> TABLE PERSISTENCE [ DROP | TRUNCATE | DEFAULT ]. Tables using
> non-default logging should not use referential integrity with
> default-logging tables. A table without dirty buffers during a
> crash could perhaps avoid the drop/truncate.
>
> * Allow WAL logging to be turned off for a table, but the table would
> avoid being truncated/dropped [walcontrol]
>
> To do this, only a single writer can modify the table, and writes
> must happen only on new pages so the new pages can be removed during
> crash recovery. Readers can continue accessing the table. Such
> tables probably cannot have indexes. One complexity is the handling
> of indexes on TOAST tables.
Tom Lane [Mon, 6 Feb 2006 22:21:12 +0000 (22:21 +0000)]
Improve the tests to see if ScalarArrayOpExpr is strict. Original coding
would basically punt in all cases for 'foo <> ALL (array)', which resulted
in a performance regression for NOT IN compared to what we were doing in
8.1 and before. Per report from Pavel Stehule.
Tom Lane [Sun, 5 Feb 2006 20:58:47 +0000 (20:58 +0000)]
Fix pg_restore to properly discard COPY data when trying to continue
after an error in a COPY statement. Formerly it thought the COPY data
was SQL commands, and got quite confused.
Tom Lane [Sun, 5 Feb 2006 02:59:17 +0000 (02:59 +0000)]
Improve my initial, rather hacky implementation of joins to append
relations: fix the executor so that we can have an Append plan on the
inside of a nestloop and still pass down outer index keys to index scans
within the Append, then generate such plans as if they were regular
inner indexscans. This avoids the need to evaluate the outer relation
multiple times.
Tom Lane [Sat, 4 Feb 2006 23:03:20 +0000 (23:03 +0000)]
Fix constraint exclusion to work in inherited UPDATE/DELETE queries
... in fact, it will be applied now in any query whatsoever. I'm still
a bit concerned about the cycles that might be expended in failed proof
attempts, but given that CE is turned off by default, it's the user's
choice whether to expend those cycles or not. (Possibly we should
change the simple bool constraint_exclusion parameter to something
more fine-grained?)
Bruce Momjian [Sat, 4 Feb 2006 03:23:21 +0000 (03:23 +0000)]
Update walcontrol item:
< * Allow control over which tables are WAL-logged
> * Allow control over which tables are WAL-logged [walcontrol] 1038c1038,1039
< stable logging probably can not have indexes. [walcontrol]
> stable logging probably can not have indexes. One complexity is
> the handling of indexes on TOAST tables.
Tom Lane [Fri, 3 Feb 2006 21:08:49 +0000 (21:08 +0000)]
Teach planner to convert simple UNION ALL subqueries into append relations,
thereby sharing code with the inheritance case. This puts the UNION-ALL-view
approach to partitioned tables on par with inheritance, so far as constraint
exclusion is concerned: it works either way. (Still need to update the docs
to say so.) The definition of "simple UNION ALL" is a little simpler than
I would like --- basically the union arms can only be SELECT * FROM foo
--- but it's good enough for partitioned-table cases.
Bruce Momjian [Wed, 1 Feb 2006 17:32:45 +0000 (17:32 +0000)]
Add:
> * Allow statistics collector information to be pulled from the collector
> process directly, rather than requiring the collector to write a
> filesystem file twice a second?
Bruce Momjian [Wed, 1 Feb 2006 00:31:59 +0000 (00:31 +0000)]
Set progname early in the postmaster/postgres binary, rather than doing
it later. This fixes a problem where EXEC_BACKEND didn't have progname
set, causing a segfault if log_min_messages was set below debug2 and our
own snprintf.c was being used.
Bruce Momjian [Wed, 1 Feb 2006 00:03:09 +0000 (00:03 +0000)]
Add:
>
> o Prevent tab completion of SET TRANSACTION from querying the
> database and therefore preventing the transaction isolation
> level from being set.
>
> Currently, SET <tab> causes a database lookup to check all
> supported session variables. This query causes problems
> because setting the transaction isolation level must be the
> first statement of a transaction.
Tom Lane [Tue, 31 Jan 2006 21:39:25 +0000 (21:39 +0000)]
Restructure planner's handling of inheritance. Rather than processing
inheritance trees on-the-fly, which pretty well constrained us to considering
only one way of planning inheritance, expand inheritance sets during the
planner prep phase, and build a side data structure that can be consulted
later to find which RTEs are members of which inheritance sets. As proof of
concept, use the data structure to plan joins against inheritance sets more
efficiently: we can now use indexes on the set members in inner-indexscan
joins. (The generated plans could be improved further, but it'll take some
executor changes.) This data structure will also support handling UNION ALL
subqueries in the same way as inheritance sets, but that aspect of it isn't
finished yet.
Tom Lane [Mon, 30 Jan 2006 16:18:58 +0000 (16:18 +0000)]
Fix ALTER COLUMN TYPE bug: it sometimes tried to drop UNIQUE or PRIMARY KEY
constraints before FOREIGN KEY constraints that depended on them. Originally
reported by Neil Conway on 29-Jun-2005. Patch by Nakano Yoshihisa.
Tom Lane [Sun, 29 Jan 2006 18:55:48 +0000 (18:55 +0000)]
When building a bitmap scan, must copy the bitmapqualorig expression tree
to avoid sharing substructure with the lower-level indexquals. This is
currently only an issue if there are SubPlans in the indexquals, which is
uncommon but not impossible --- see bug #2218 reported by Nicholas Vinen.
We use the same kluge for indexqual vs indexqualorig in the index scans
themselves ... would be nice to clean this up someday.
Tom Lane [Sun, 29 Jan 2006 17:27:42 +0000 (17:27 +0000)]
Fix code that checks to see if an index can be considered to match the query's
requested sort order. It was assuming that build_index_pathkeys always
generates a pathkey per index column, which was not true if implied equality
deduction had determined that two index columns were effectively equated to
each other. Simplest fix seems to be to install an option that causes
build_index_pathkeys to support this behavior as well as the original one.
Per report from Brian Hirt.
Andrew Dunstan [Sat, 28 Jan 2006 16:20:31 +0000 (16:20 +0000)]
Undo perl's nasty locale setting on Windows. Since we can't do that as
elsewhere by setting the environment appropriately, we make perl do it
right after interpreter startup by calling its POSIX::setlocale().
Neil Conway [Sat, 28 Jan 2006 03:28:15 +0000 (03:28 +0000)]
Per a bug report from Theo Schlossnagle, plperl_return_next() leaks
memory in the executor's per-query memory context. It also inefficient:
it invokes get_call_result_type() and TupleDescGetAttInMetadata() for
every call to return_next, rather than invoking them once (per PL/Perl
function call) and memoizing the result.
This patch makes the following changes:
- refactor the code to include all the "per PL/Perl function call" data
inside a single struct, "current_call_data". This means we don't need to
save and restore N pointers for every recursive call into PL/Perl, we
can just save and restore one.
- lookup the return type metadata needed by plperl_return_next() once,
and then stash it in "current_call_data", so as to avoid doing the
lookup for every call to return_next.
- create a temporary memory context in which to evaluate the return
type's input functions. This memory context is reset for each call to
return_next.
The patch appears to fix the memory leak, and substantially reduces
the overhead imposed by return_next.
Tom Lane [Fri, 27 Jan 2006 19:01:15 +0000 (19:01 +0000)]
Tweak initdb to reduce verbosity of progress messages, by printing just
one 'creating subdirectories' message instead of one per subdirectory.
The original decision to print something for each subdirectory was made
when there were only one or two of 'em; we have way too many now.
Per discussion.
Teodor Sigaev [Fri, 27 Jan 2006 16:32:31 +0000 (16:32 +0000)]
Snowball multibyte. It's a pity, but snowball sources is very diferent for multibyte and
singlebyte encodings, so we should have snowball for every encodings.
I hope that finalize multibyte support work in tsearch2, but testing is needed...
Tom Lane [Thu, 26 Jan 2006 17:08:19 +0000 (17:08 +0000)]
Fix display of whole-row Var appearing at the top level of a SELECT list.
While we normally prefer the notation "foo.*" for a whole-row Var, that does
not work at SELECT top level, because in that context the parser will assume
that what is wanted is to expand the "*" into a list of separate target
columns, yielding behavior different from a whole-row Var. We have to emit
just "foo" instead in that context. Per report from Sokolov Yura.
Bruce Momjian [Thu, 26 Jan 2006 02:50:11 +0000 (02:50 +0000)]
Done:
< * %Prevent INET cast to CIDR if the unmasked bits are not zero, or
< zero the bits
< * %Prevent INET cast to CIDR from dropping netmask, SELECT '1.1.1.1'::inet::cidr
> * -Zero umasked bits in conversion from INET cast to CIDR
> * -Prevent INET cast to CIDR from dropping netmask, SELECT '1.1.1.1'::inet::cidr
Tom Lane [Thu, 26 Jan 2006 02:35:51 +0000 (02:35 +0000)]
Clean up the INET-vs-CIDR situation. Get rid of the internal is_cidr flag
and rely exclusively on the SQL type system to tell the difference between
the types. Prevent creation of invalid CIDR values via casting from INET
or set_masklen() --- both of these operations now silently zero any bits
to the right of the netmask. Remove duplicate CIDR comparison operators,
letting the type rely on the INET operators instead.
Tom Lane [Wed, 25 Jan 2006 23:04:21 +0000 (23:04 +0000)]
Remove the no-longer-useful BTItem/BTItemData level of structure, and
just refer to btree index entries as plain IndexTuples, which is what
they have been for a very long time. This is mostly just an exercise
in removing extraneous notation, but it does save a palloc/pfree cycle
per index insertion.
Tom Lane [Wed, 25 Jan 2006 20:44:32 +0000 (20:44 +0000)]
Remove unnecessary PQconsumeInput call from PQputCopyData; it's redundant
because pqSendSome will absorb input data anytime it'd be forced to block.
Avoiding a kernel call per PQputCopyData call helps COPY speed materially.
Tom Lane [Mon, 23 Jan 2006 22:31:41 +0000 (22:31 +0000)]
Instead of using a numberOfRequiredKeys count to distinguish required
and non-required keys in a btree index scan, mark the required scankeys
with private flag bits SK_BT_REQFWD and/or SK_BT_REQBKWD. This seems
at least marginally clearer to me, and it eliminates a wired-into-the-
data-structure assumption that required keys are consecutive. Even though
that assumption will remain true for the foreseeable future, having it
in there makes the code seem more complex than necessary.
Tom Lane [Mon, 23 Jan 2006 18:16:41 +0000 (18:16 +0000)]
Improve wording of descriptions of SIGHUP GUC parameters, as per my
suggestion a couple days ago. Fix some cases in which the documentation
neglected to mention any restriction on when a parameter can be set.
Try to be consistent about calling parameters parameters; use the term
option only for command-line switches.
Bruce Momjian [Mon, 23 Jan 2006 02:59:27 +0000 (02:59 +0000)]
Done:
< o Allow an alias to be provided for the target table in
< UPDATE/DELETE
<
< This is not SQL-spec but many DBMSs allow it.
<
> o -Allow an alias to be provided for the target table in
> UPDATE/DELETE (Neil)
Tom Lane [Sun, 22 Jan 2006 20:03:16 +0000 (20:03 +0000)]
Fix alias-for-target-table-of-UPDATE-or-DELETE patch so that alias can
be any ColId other than 'SET', rather than only IDENT as originally.
Per discussion.
Neil Conway [Sun, 22 Jan 2006 05:20:35 +0000 (05:20 +0000)]
Allow an optional alias for the target table to be specified for UPDATE
and DELETE. If specified, the alias must be used instead of the full
table name. Also, the alias currently cannot be used in the SET clause
of UPDATE.
Patch from Atsushi Ogawa, various editorialization by Neil Conway.
Along the way, make the rowtypes regression test pass if add_missing_from
is enabled, and add a new (skeletal) regression test for DELETE.
Tom Lane [Sat, 21 Jan 2006 19:34:42 +0000 (19:34 +0000)]
Marginal improvements in the wording of the autovacuum documentation:
be consistent about whether it's called a daemon or a subprocess, and
don't describe the autovacuum setting in exactly the same way as the
stats_start_collector setting, because that leaves people thinking (if
they aren't paying close attention) that autovacuum can't be changed
on the fly.
Tom Lane [Sat, 21 Jan 2006 04:38:21 +0000 (04:38 +0000)]
Repair longstanding bug in slru/clog logic: it is possible for two backends
to try to create a log segment file concurrently, but the code erroneously
specified O_EXCL to open(), resulting in a needless failure. Before 7.4,
it was even a PANIC condition :-(. Correct code is actually simpler than
what we had, because we can just say O_CREAT to start with and not need a
second open() call. I believe this accounts for several recent reports of
hard-to-reproduce "could not create file ...: File exists" errors in both
pg_clog and pg_subtrans.
Bruce Momjian [Sat, 21 Jan 2006 02:16:21 +0000 (02:16 +0000)]
Add GRANT ON SEQUENCE syntax to support sequence-only permissions.
Continue to support GRANT ON [TABLE] for sequences for backward
compatibility; issue warning for invalid sequence permissions.
[Backward compatibility warning message.]
Add USAGE permission for sequences that allows only currval() and
nextval(), not setval().
Mention object name in grant/revoke warnings because of possible
multi-object operations.
Tom Lane [Fri, 20 Jan 2006 22:46:16 +0000 (22:46 +0000)]
Replace bitwise looping with bytewise looping in hemdistsign and
sizebitvec of tsearch2, as well as identical code in several other
contrib modules. This provided about a 20X speedup in building a
large tsearch2 index ... didn't try to measure its effects for other
operations. Thanks to Stephan Vollmer for providing a test case.
Tom Lane [Fri, 20 Jan 2006 15:16:56 +0000 (15:16 +0000)]
Fix thinko in autovacuum's test to skip temp tables: want to skip any
temp table not only our own process' tables. It's not real important
since vacuum.c will skip temp tables anyway, but might as well make the
code do what it claims to do.
Tom Lane [Thu, 19 Jan 2006 20:28:43 +0000 (20:28 +0000)]
Avoid crashing if relcache flush occurs while trying to load data into an
index's support-function cache (in index_getprocinfo). Since none of that
data can change for an index that's in active use, it seems sufficient to
treat all open indexes the same way we were treating "nailed" system indexes
--- that is, just re-read the pg_class row and leave the rest of the relcache
entry strictly alone. The pg_class re-read might not be strictly necessary
either, but since the reltablespace and relfilenode can change in normal
operation it seems safest to do it. (We don't support changing any of the
other info about an index at all, at the moment.)
Back-patch as far as 8.0. It might be possible to adapt the patch to 7.4,
but it would take more work than I care to expend for such a low-probability
problem. 7.3 is out of luck for sure.
Tom Lane [Thu, 19 Jan 2006 04:45:38 +0000 (04:45 +0000)]
It turns out that TablespaceCreateDbspace fails badly if a relcache flush
occurs when it tries to heap_open pg_tablespace. When control returns to
smgrcreate, that routine will be holding a dangling pointer to a closed
SMgrRelation, resulting in mayhem. This is of course a consequence of
the violation of proper module layering inherent in having smgr.c call
a tablespace command routine, but the simplest fix seems to be to change
the locking mechanism. There's no real need for TablespaceCreateDbspace
to touch pg_tablespace at all --- it's only opening it as a way of locking
against a parallel DROP TABLESPACE command. A much better answer is to
create a special-purpose LWLock to interlock these two operations.
This drops TablespaceCreateDbspace quite a few layers down the food chain
and makes it something reasonably safe for smgr to call.
Tom Lane [Thu, 19 Jan 2006 00:27:08 +0000 (00:27 +0000)]
Fix a tiny memory leak (one List header) in RelationCacheInvalidate().
This is utterly insignificant in normal operation, but it becomes a
problem during cache inval stress testing. The original coding in fact
had no leak --- the 8.0 List rewrite created the issue. I wonder whether
list_concat should pfree the discarded header?
Tom Lane [Wed, 18 Jan 2006 20:35:06 +0000 (20:35 +0000)]
Modify pgstats code to reduce performance penalties from oversized stats data
files: avoid creating stats hashtable entries for tables that aren't being
touched except by vacuum/analyze, ensure that entries for dropped tables are
removed promptly, and tweak the data layout to avoid storing useless struct
padding. Also improve the performance of pgstat_vacuum_tabstat(), and make
sure that autovacuum invokes it exactly once per autovac cycle rather than
multiple times or not at all. This should cure recent complaints about 8.1
showing much higher stats I/O volume than was seen in 8.0. It'd still be a
good idea to revisit the design with an eye to not re-writing the entire
stats dataset every half second ... but that would be too much to backpatch,
I fear.
Neil Conway [Wed, 18 Jan 2006 06:49:30 +0000 (06:49 +0000)]
Add a new system view, pg_cursors, that displays the currently available
cursors. Patch from Joachim Wieland, review and ediorialization by Neil
Conway. The view lists cursors defined by DECLARE CURSOR, using SPI, or
via the Bind message of the frontend/backend protocol. This means the
view does not list the unnamed portal or the portal created to implement
EXECUTE. Because we do list SPI portals, there might be more rows in
this view than you might expect if you are using SPI implicitly (e.g.
via a procedural language).
Per recent discussion on -hackers, the query string included in the
view for cursors defined by DECLARE CURSOR is based on
debug_query_string. That means it is not accurate if multiple queries
separated by semicolons are submitted as one query string. However,
there doesn't seem a trivial fix for that: debug_query_string
is better than nothing. I also changed SPI_cursor_open() to include
the source text for the portal it creates: AFAICS there is no reason
not to do this.
Update the documentation and regression tests, bump the catversion.
Tom Lane [Tue, 17 Jan 2006 00:09:01 +0000 (00:09 +0000)]
Improve comments about btree's use of ScanKey data structures: there
are two basically different kinds of scankeys, and we ought to try harder
to indicate which is used in each place in the code. I've chosen the names
"search scankey" and "insertion scankey", though you could make about
as good an argument for "operator scankey" and "comparison function
scankey".