Tom Lane [Thu, 21 Apr 2005 19:18:13 +0000 (19:18 +0000)]
Rethink original decision to use AND/OR Expr nodes to represent bitmap
logic operations during planning. Seems cleaner to create two new Path
node types, instead --- this avoids duplication of cost-estimation code.
Also, create an enable_bitmapscan GUC parameter to control use of bitmap
plans.
Bruce Momjian [Thu, 21 Apr 2005 15:20:39 +0000 (15:20 +0000)]
Updated text for bitmaps:
< Bitmap indexes index single columns that can be combined with other bitmap
< indexes to dynamically create a composite index to match a specific query.
< Each index is a bitmap, and the bitmaps are bitwise AND'ed or OR'ed to be
< combined. They can index by tid or can be lossy requiring a scan of the
< heap page to find matching rows, or perhaps use a mixed solution where
< tids are recorded for pages with only a few matches and per-page bitmaps
< are used for more dense pages. Another idea is to use a 32-bit bitmap
< for every page and set a bit based on the item number mod(32).
> This feature allows separate indexes to be ANDed or ORed together. This
> is particularly useful for data warehousing applications that need to
> query the database in an many permutations. This feature scans an index
> and creates an in-memory bitmap, and allows that bitmap to be combined
> with other bitmap created in a similar way. The bitmap can either index
> all TIDs, or be lossy, meaning it records just page numbers and each
> page tuple has to be checked for validity in a separate pass.
Tom Lane [Wed, 20 Apr 2005 15:48:36 +0000 (15:48 +0000)]
Minor performance improvement: avoid unnecessary creation/unioning of
bitmaps for multiple indexscans. Instead just let each indexscan add
TIDs directly into the BitmapOr node's result bitmap.
Tom Lane [Tue, 19 Apr 2005 22:35:18 +0000 (22:35 +0000)]
Create executor and planner-backend support for decoupled heap and index
scans, using in-memory tuple ID bitmaps as the intermediary. The planner
frontend (path creation and cost estimation) is not there yet, so none
of this code can be executed. I have tested it using some hacked planner
code that is far too ugly to see the light of day, however. Committing
now so that the bulk of the infrastructure changes go in before the tree
drifts under me.
Bruce Momjian [Tue, 19 Apr 2005 03:55:43 +0000 (03:55 +0000)]
>>>>Luckily, PG 8 is available for this. Do you have a short example?
>>>
>>>No, and I think it should be in the manual as an example.
>>>
>>>You will need to enter a loop that uses exception handling to detect
>>>unique_violation.
>>
>>Pursuant to an IRC discussion to which Dennis Bjorklund and
>>Christopher Kings-Lynne made most of the contributions, please find
>>enclosed an example patch demonstrating an UPSERT-like capability.
>>
Bruce Momjian [Tue, 19 Apr 2005 03:37:20 +0000 (03:37 +0000)]
> >Luckily, PG 8 is available for this. Do you have a short example?
>
> No, and I think it should be in the manual as an example.
>
> You will need to enter a loop that uses exception handling to detect
> unique_violation.
Pursuant to an IRC discussion to which Dennis Bjorklund and
Christopher Kings-Lynne made most of the contributions, please find
enclosed an example patch demonstrating an UPSERT-like capability.
Bruce Momjian [Tue, 19 Apr 2005 03:35:15 +0000 (03:35 +0000)]
The following patch should allow UPDATE_INTERVAL to be specified on the
command line. We find this useful because we frequently deal with
thousands of tables in an environment where neither the databases nor
the tables are updated frequently. This helps allow us to cut down on
the overhead of updating the list for every other primary loop of
pg_autovacuum.
I chose -i as the command-line argument and documented it briefly in
the README.
The patch was applied to the 7.4.7 version of pg_autovacuum in contrib.
Bruce Momjian [Tue, 19 Apr 2005 03:13:59 +0000 (03:13 +0000)]
Attached patch gets rid of the global timezone in the following steps:
* Changes the APIs to the timezone functions to take a pg_tz pointer as
an argument, representing the timezone to use for the selected
operation.
* Adds a global_timezone variable that represents the current timezone
in the backend as set by SET TIMEZONE (or guc, or env, etc).
* Implements a hash-table cache of loaded tables, so we don't have to
read and parse the TZ file everytime we change a timezone. While not
necesasry now (we don't change timezones very often), I beleive this
will be necessary (or at least good) when "multiple timezones in the
same query" is eventually implemented. And code-wise, this was the time
to do it.
There are no user-visible changes at this time. Implementing the
"multiple zones in one query" is a later step...
This also gets rid of some of the cruft needed to "back out a timezone
change", since we previously couldn't check a timezone unless it was
activated first.
Passes regression tests on win32, linux (slackware 10) and solaris x86.
Tom Lane [Mon, 18 Apr 2005 23:47:52 +0000 (23:47 +0000)]
pg_dumpall should enforce the server version check for itself, rather
than simply passing it down to pg_dump. Else, version-related failures
in pg_dumpall itself generate unhelpful error messages.
Tom Lane [Mon, 18 Apr 2005 17:11:05 +0000 (17:11 +0000)]
record_in and record_recv must be careful to return a separately
pfree'able result, since some callers expect to be able to pfree
the result of a pass-by-reference function. Per report from Chris Trawick.
Bruce Momjian [Mon, 18 Apr 2005 15:03:21 +0000 (15:03 +0000)]
Update PITR TODO items:
< failure.
> failure. This could be triggered by a user command or a timer.
< * Force archiving of partially-full WAL files when pg_stop_backup() is
< called or the server is stopped
> * Automatically force archiving of partially-filled WAL files when
> pg_stop_backup() is called or the server is stopped
Tom Lane [Sat, 16 Apr 2005 20:07:35 +0000 (20:07 +0000)]
Create a new 'MultiExecProcNode' call API for plan nodes that don't
return just a single tuple at a time. Currently the only such node
type is Hash, but I expect we will soon have indexscans that can return
tuple bitmaps. A side benefit is that EXPLAIN ANALYZE now shows the
correct tuple count for a Hash node.
Tom Lane [Fri, 15 Apr 2005 22:19:48 +0000 (22:19 +0000)]
Reduce PANIC to ERROR in several xlog routines that are used in both
critical and noncritical contexts (an example of noncritical being
post-checkpoint removal of dead xlog segments). In the critical cases
the CRIT_SECTION mechanism will cause ERROR to be promoted to PANIC
anyway, and in the noncritical cases we shouldn't let an error take
down the entire database. Arguably there should be *no* explicit
PANIC errors in this module, only more START/END_CRIT_SECTION calls,
but I didn't go that far. (Yet.)
Tom Lane [Fri, 15 Apr 2005 18:48:10 +0000 (18:48 +0000)]
Modify MoveOfflineLogs/InstallXLogFileSegment to avoid O(N^2) behavior
when recycling a large number of xlog segments during checkpoint.
The former behavior searched from the same start point each time,
requiring O(checkpoint_segments^2) stat() calls to relocate all the
segments. Instead keep track of where we stopped last time through.
Tom Lane [Fri, 15 Apr 2005 16:40:36 +0000 (16:40 +0000)]
Revert addition of poorly-thought-out DUMP TIMESTAMP archive entry,
which induced bug #1597 in addition to having several other misbehaviors
(like labeling the dump with a completion time having nothing to do with
reality). Instead just print out the desired strings where RestoreArchive
was already emitting the 'PostgreSQL database dump' and
'PostgreSQL database dump complete' strings.
Tom Lane [Thu, 14 Apr 2005 22:34:48 +0000 (22:34 +0000)]
Make equalTupleDescs() compare attlen/attbyval/attalign rather than
assuming comparison of atttypid is sufficient. In a dropped column
atttypid will be 0, and we'd better check the physical-storage data
to make sure the tupdescs are physically compatible.
I do not believe there is a real risk before 8.0, since before that
we only used this routine to compare successive states of the tupdesc
for a particular relation. But 8.0's typcache.c might be comparing
arbitrary tupdescs so we'd better play it safer.
Tom Lane [Thu, 14 Apr 2005 21:44:09 +0000 (21:44 +0000)]
Don't try to constant-fold functions returning RECORD, since the optimizer
isn't presently set up to pass them an expected tuple descriptor. Bug has
been there since 7.3 but was just recently reported by Thomas Hallgren.
Tom Lane [Thu, 14 Apr 2005 20:32:43 +0000 (20:32 +0000)]
Marginal hack to use a specialized hash function for dynahash hashtables
whose keys are OIDs. The only one that looks particularly performance
critical is the relcache hashtable, but as long as we've got the function
we may as well use it wherever it's applicable.
Tom Lane [Thu, 14 Apr 2005 20:03:27 +0000 (20:03 +0000)]
Completion of project to use fixed OIDs for all system catalogs and
indexes. Replace all heap_openr and index_openr calls by heap_open
and index_open. Remove runtime lookups of catalog OID numbers in
various places. Remove relcache's support for looking up system
catalogs by name. Bulky but mostly very boring patch ...
Tom Lane [Thu, 14 Apr 2005 01:38:22 +0000 (01:38 +0000)]
First phase of project to use fixed OIDs for all system catalogs and
indexes. Extend the macros in include/catalog/*.h to carry the info
about hand-assigned OIDs, and adjust the genbki script and bootstrap
code to make the relations actually get those OIDs. Remove the small
number of RelOid_pg_foo macros that we had in favor of a complete
set named like the catname.h and indexing.h macros. Next phase will
get rid of internal use of names for looking up catalogs and indexes;
but this completes the changes forcing an initdb, so it looks like a
good place to commit.
Along the way, I made the shared relations (pg_database etc) not be
'bootstrap' relations any more, so as to reduce the number of hardwired
entries and simplify changing those relations in future. I'm not
sure whether they ever really needed to be handled as bootstrap
relations, but it seems to work fine to not do so now.
Tom Lane [Wed, 13 Apr 2005 18:54:57 +0000 (18:54 +0000)]
Simplify initdb-time assignment of OIDs as I proposed yesterday, and
avoid encroaching on the 'user' range of OIDs by allowing automatic
OID assignment to use values below 16k until we reach normal operation.
initdb not forced since this doesn't make any incompatible change;
however a lot of stuff will have different OIDs after your next initdb.
Tom Lane [Wed, 13 Apr 2005 16:50:55 +0000 (16:50 +0000)]
Change addRangeTableEntryForRelation() to take a Relation pointer instead
of just a relation OID, thereby not having to open the relation for itself.
This actually saves code rather than adding it for most of the existing
callers, which had the rel open already. The main point though is to be
able to use this rather than plain addRangeTableEntry in setTargetTable,
thus saving one relation_openrv/relation_close cycle for every INSERT,
UPDATE, or DELETE. Seems to provide a several percent win on simple
INSERTs.
Tom Lane [Tue, 12 Apr 2005 19:45:43 +0000 (19:45 +0000)]
Adjust pg_cast.h so that the OIDs assigned to built-in casts come from
genbki.sh's pool (10000-16383) instead of being run-time assigned by
heap_insert. Might as well use the pool as long as it's there ...
I was a bit bemused to realize that it hadn't been in use at all since 7.2.
initdb not forced since this doesn't really affect anything. The OIDs
of casts and system indexes will change next time you do one, though.
Tom Lane [Tue, 12 Apr 2005 19:29:24 +0000 (19:29 +0000)]
Remove unnecessary UPDATE commands to assign explicit ACLs to functions
and PL languages during initdb. The default permissions for these objects
are the same as what we were assigning anyway, so there is no need to
expend space in the catalogs on them. The space cost is particularly
significant in pg_proc's indexes, which are bloated by about a factor of 2
by the full-table update, and can never really recover the space.
initdb not forced, since the change has no actual impact on behavior.
Tom Lane [Tue, 12 Apr 2005 04:26:34 +0000 (04:26 +0000)]
Add aggsortop column to pg_aggregate, so that MIN/MAX optimization can
be supported for all datatypes. Add CREATE AGGREGATE and pg_dump support
too. Add specialized min/max aggregates for bpchar, instead of depending
on text's min/max, because otherwise the possible use of bpchar indexes
cannot be recognized.
initdb forced because of catalog changes.
Tom Lane [Mon, 11 Apr 2005 23:06:57 +0000 (23:06 +0000)]
Create the planner mechanism for optimizing simple MIN and MAX queries
into indexscans on matching indexes. For the moment, it only handles
int4 and text datatypes; next step is to add a column to pg_aggregate
so that all MIN/MAX aggregates can be handled. Per my recent proposal.
Tom Lane [Mon, 11 Apr 2005 19:51:16 +0000 (19:51 +0000)]
Fix interaction between materializing holdable cursors and firing
deferred triggers: either one can create more work for the other,
so we have to loop till it's all gone. Per example from andrew@supernews.
Add a regression test to help spot trouble in this area in future.
Tom Lane [Mon, 11 Apr 2005 15:59:34 +0000 (15:59 +0000)]
PersistHoldablePortal must establish the correct value for ActiveSnapshot
while completing execution of the cursor's query. Otherwise we get wrong
answers or even crashes from non-volatile functions called by the query.
Per report from andrew@supernews.
Tom Lane [Sun, 10 Apr 2005 20:57:32 +0000 (20:57 +0000)]
Make constant-folding produce sane output for COALESCE(NULL,NULL),
that is a plain NULL and not a COALESCE with no inputs. Fixes crash
reported by Michael Williamson.
Tom Lane [Sun, 10 Apr 2005 19:50:08 +0000 (19:50 +0000)]
Split out into a separate function the code in grouping_planner() that
decides whether to use hashed grouping instead of sort-plus-uniq
grouping. The function needs an annoyingly large number of parameters,
but this still seems like a win for legibility, since it removes over
a hundred lines from grouping_planner (which is still too big :-().
Tom Lane [Sun, 10 Apr 2005 18:04:20 +0000 (18:04 +0000)]
SQL functions returning pass-by-reference types were copying the results
into the wrong memory context, resulting in a query-lifespan memory leak.
Bug is new in 8.0, I believe. Per report from Rae Stiening.
Tom Lane [Fri, 8 Apr 2005 14:18:35 +0000 (14:18 +0000)]
If we're going to have a non-panic check for held_lwlocks[] overrun,
it must occur *before* we get into the critical state of holding a
lock we have no place to record. Per discussion with Qingqing Zhou.
Neil Conway [Fri, 8 Apr 2005 00:59:59 +0000 (00:59 +0000)]
Change the default setting of "add_missing_from" to false. This has been
the long-term plan for this behavior for quite some time, but it is only
possible now that DELETE has a USING clause so that the user can join
other tables in a DELETE statement without relying on this behavior.
Tom Lane [Thu, 7 Apr 2005 14:53:04 +0000 (14:53 +0000)]
Allow plpgsql functions to omit RETURN command when the function returns
output parameters or VOID or a set. There seems no particular reason to
insist on a RETURN in these cases, since the function return value is
determined by other elements anyway. Per recent discussion.
Neil Conway [Thu, 7 Apr 2005 01:51:41 +0000 (01:51 +0000)]
Add a "USING" clause to DELETE, which is equivalent to the FROM clause
in UPDATE. We also now issue a NOTICE if a query has _any_ implicit
range table entries -- in the past, we would only warn about implicit
RTEs in SELECTs with at least one explicit RTE.
As a result of the warning change, 25 of the regression tests had to
be updated. I also took the opportunity to remove some bogus whitespace
differences between some of the float4 and float8 variants. I believe
I have correctly updated all the platform-specific variants, but let
me know if that's not the case.
Original patch for DELETE ... USING from Euler Taveira de Oliveira,
reworked by Neil Conway.
Neil Conway [Wed, 6 Apr 2005 23:56:07 +0000 (23:56 +0000)]
Apply the "nodeAgg" optimization to more of the builtin transition
functions. This patch optimizes int2_sum(), int4_sum(), float4_accum()
and float8_accum() to avoid needing to copy the transition function's
state for each input tuple of the aggregate. In an extreme case
(e.g. SELECT sum(int2_col) FROM table where table has a single column),
it improves performance by about 20%. For more complex queries or tables
with wider rows, the relative performance improvement will not be as
significant.
Tom Lane [Wed, 6 Apr 2005 20:13:49 +0000 (20:13 +0000)]
Remove test for NULL node in ExecProcNode(). No place ever calls
ExecProcNode() with a NULL value, so the test couldn't do anything
for us except maybe mask bugs. Removing it probably doesn't save
anything much either, but then again this is a hot-spot routine.
Tom Lane [Wed, 6 Apr 2005 16:34:07 +0000 (16:34 +0000)]
Merge Resdom nodes into TargetEntry nodes to simplify code and save a
few palloc's. I also chose to eliminate the restype and restypmod fields
entirely, since they are redundant with information stored in the node's
contained expression; re-examining the expression at need seems simpler
and more reliable than trying to keep restype/restypmod up to date.
initdb forced due to change in contents of stored rules.
Bruce Momjian [Wed, 6 Apr 2005 15:19:03 +0000 (15:19 +0000)]
Attached patch cleans up the HTML code in tools/backend. This is
required for us to pull it into the main website. Same kind of fixes as
last time, just make sure things aren't violating the HTML standard. No
context changes at all.
Neil Conway [Wed, 6 Apr 2005 04:34:22 +0000 (04:34 +0000)]
This file was whacked by pgindent before it knew it shouldn't remove
braces around single statements (for PG_TRY macros). This patch fixes
it. Alvaro Herrera.
Tom Lane [Tue, 5 Apr 2005 18:05:46 +0000 (18:05 +0000)]
Adjust grammar for plpgsql's OPEN command so that a cursor can be
OPENed on non-SELECT commands such as EXPLAIN or SHOW (anything that
returns tuples is allowed). This flexibility already existed for
bound cursors, but OPEN was artificially restricting what it would
take. Per a gripe some months back.
Neil Conway [Mon, 4 Apr 2005 23:50:27 +0000 (23:50 +0000)]
This patch changes int2_avg_accum() and int4_avg_accum() use the nodeAgg
performance hack Tom introduced recently. This means we can avoid
copying the transition array for each input tuple if these functions
are invoked as aggregate transition functions.
To test the performance improvement, I created a 1 million row table
with a single int4 column. Without the patch, SELECT avg(col) FROM
table took about 4.2 seconds (after the data was cached); with the
patch, it took about 3.2 seconds. Naturally, the performance
improvement for a less trivial query (or a table with wider rows)
would be relatively smaller.
Neil Conway [Mon, 4 Apr 2005 07:19:44 +0000 (07:19 +0000)]
Minor fixes for psql tab completion. Spell "absolute" like the English word,
not the brand of vodka. Complete FETCH <sth> <sth> with FROM and IN, not
FROM and TO (which is still pretty incomplete, but at least its the right
syntax).