Tom Lane [Tue, 14 Aug 2007 17:35:18 +0000 (17:35 +0000)]
Fix oversight in async-commit patch: there were some places in heapam.c
that still thought they could set HEAP_XMAX_COMMITTED immediately after
seeing the other transaction commit. Make them use the same logic as
tqual.c does to determine if the hint bit can be set yet.
Michael Meskes [Tue, 14 Aug 2007 10:01:54 +0000 (10:01 +0000)]
- Finished major rewrite to use new protocol version
- Really prepare statements
- Added more regression tests
- Added auto-prepare mode
- Use '$n' for positional variables, '?' is still possible via ecpg option
- Cleaned up the sources a little bit
Tom Lane [Mon, 13 Aug 2007 19:27:12 +0000 (19:27 +0000)]
TEMPORARILY make synchronous_commit default to OFF, so that we can get more
thorough testing of async-commit mode from the buildfarm. This patch MUST
get reverted before 8.3 release!
Tom Lane [Mon, 13 Aug 2007 19:08:26 +0000 (19:08 +0000)]
Fix two bugs induced in VACUUM FULL by async-commit patch.
First, we cannot assume that XLogAsyncCommitFlush guarantees hint bits will be
settable, because clog.c's inexact LSN bookkeeping results in windows where a
previously flushed transaction is considered unhintable because it shares an
LSN slot with a later unflushed transaction. But repair_frag requires
XMIN_COMMITTED to be correct so that it can distinguish tuples moved by the
current vacuum. Since not being able to set the bit is an uncommon corner
case, the most practical way of dealing with it seems to be to abandon
shrinking (ie, don't invoke repair_frag) when we find a non-dead tuple whose
XMIN_COMMITTED bit couldn't be set.
Second, it is possible for the same reason that a RECENTLY_DEAD tuple does not
get its XMAX_COMMITTED bit set during scan_heap. But by the time repair_frag
examines the tuple it might be possible to set the bit. We therefore must
take buffer content lock when calling HeapTupleSatisfiesVacuum a second time,
else we can get an Assert failure in SetBufferCommitInfoNeedsSave. This
latter bug is latent in existing releases, but I think it cannot actually
occur without async commit, since the first HeapTupleSatisfiesVacuum call
should always have set the bit. So I'm not going to back-patch it.
In passing, reduce the existing "cannot shrink relation" messages from NOTICE
to LOG level. The new message must be no higher than LOG if we don't want
unpredictable regression test failures, and consistency seems like a good
idea. Also arrange that only one such message is reported per VACUUM FULL;
in typical scenarios you could get spammed with many such messages, which
seems a bit useless.
Tom Lane [Mon, 13 Aug 2007 01:18:47 +0000 (01:18 +0000)]
Document that the regexp split functions ignore zero-length matches in
certain corner cases. Per discussion, the code does what we want, but
it really needs to be documented that these functions act differently
from regexp_matches.
Tom Lane [Sun, 12 Aug 2007 20:39:14 +0000 (20:39 +0000)]
Remove an "optimization" I installed in 2001, to make repalloc() attempt to
enlarge the memory chunk in-place when it was feasible to do so. This turns
out to not work well at all for scenarios involving repeated cycles of
palloc/repalloc/pfree: the eventually freed chunks go into the wrong freelist
for the next initial palloc request, and so we consume memory indefinitely.
While that could be defended against, the number of cases where the
optimization can still be applied drops significantly, and adjusting the
initial sizes of StringInfo buffers makes it drop to almost nothing.
Seems better to just remove the extra complexity.
Per recent discussion and testing.
Tom Lane [Sun, 12 Aug 2007 20:18:06 +0000 (20:18 +0000)]
Increase the initial size of StringInfo buffers to 1024 bytes (from 256);
likewise increase the initial size of the scanner's literal buffer to 1024
(from 128). Instrumentation of the regression tests suggests that this
saves a useful amount of repalloc() traffic --- the number of calls occurring
during one set of tests drops from about 6900 to about 3900. The old sizes
were chosen in the late 90's with an eye to machines much smaller than
are common today.
Tom Lane [Sat, 11 Aug 2007 19:16:41 +0000 (19:16 +0000)]
Avoid memory leakage across successive calls of regexp_matches() or
regexp_split_to_table() within a single query. This is only a partial
solution, as it turns out that with enough matches per string these
functions can also tickle a repalloc() misbehavior. But fixing that
is a topic for a separate patch.
Tom Lane [Sat, 11 Aug 2007 03:56:24 +0000 (03:56 +0000)]
Code review for regexp_matches/regexp_split patch. Refactor to avoid assuming
that cached compiled patterns will still be there when the function is next
called. Clean up looping logic, thereby fixing bug identified by Pavel
Stehule. Share setup code between the two functions, add some comments, and
avoid risky mixing of int and size_t variables. Clean up the documentation a
tad, and accept all the flag characters mentioned in table 9-19 rather than
just a subset.
Tom Lane [Fri, 10 Aug 2007 00:39:31 +0000 (00:39 +0000)]
Fix unintended change of output format for createlang/droplang -l. Missed
these uses of printQuery() in FETCH_COUNT patch a year ago :-(. Per report
from Tomoaki Sato.
Tom Lane [Thu, 9 Aug 2007 01:18:43 +0000 (01:18 +0000)]
Revise postmaster startup/shutdown logic to eliminate the problem that a
constant flow of new connection requests could prevent the postmaster from
completing a shutdown or crash restart. This is done by labeling child
processes that are "dead ends", that is, we know that they were launched only
to tell a client that it can't connect. These processes are managed
separately so that they don't confuse us into thinking that we can't advance
to the next stage of a shutdown or restart sequence, until the very end
where we must wait for them to drain out so we can delete the shmem segment.
Per discussion of a misbehavior reported by Keaton Adams.
Since this code was baroque already, and my first attempt at fixing the
problem made it entirely impenetrable, I took the opportunity to rewrite it
in a state-machine style. That eliminates some duplicated code sections and
hopefully makes everything a bit clearer.
Neil Conway [Wed, 8 Aug 2007 18:07:05 +0000 (18:07 +0000)]
Fix a gradual memory leak in ExecReScanAgg(). Because the aggregation
hash table is allocated in a child context of the agg node's memory
context, MemoryContextReset() will reset but *not* delete the child
context. Since ExecReScanAgg() proceeds to build a new hash table
from scratch (in a new sub-context), this results in leaking the
header for the previous memory context. Therefore, use
MemoryContextResetAndDeleteChildren() instead.
Credit: My colleague Sailesh Krishnamurthy at Truviso for isolating
the cause of the leak.
Neil Conway [Tue, 7 Aug 2007 06:25:14 +0000 (06:25 +0000)]
Adjust the output of MemoryContextStats() so that the stats for a
child memory contexts is indented two spaces to the right of its
parent context. This should make it easier to deduce the memory
context hierarchy from the output of MemoryContextStats().
Tom Lane [Sun, 5 Aug 2007 15:43:00 +0000 (15:43 +0000)]
Adjust configure so that it sets CFLAGS properly for Intel's icc
even if the compiler is not defining __GNUC__. Per report from
Dirk Tilger that it is possible for icc to not do that.
Tom Lane [Sun, 5 Aug 2007 15:11:40 +0000 (15:11 +0000)]
Apparently icc doesn't always define __ICC, and it's more correct to
check for __INTEL_COMPILER. Per report from Dirk Tilger.
Not back-patched since I don't fully trust it yet ...
Neil Conway [Sat, 4 Aug 2007 21:01:09 +0000 (21:01 +0000)]
Tweak for initdb: if more command-line arguments were specified than
expected, exit with an error, rather than complaining about the error
on stderr but continuing onward.
Tom Lane [Sat, 4 Aug 2007 19:29:25 +0000 (19:29 +0000)]
Fix crash caused by log_timezone patch if we attempt to emit any elog messages
between the setting of log_line_prefix and the setting of log_timezone. We
can't realistically set log_timezone any earlier than we do now, so the best
behavior seems to be to use GMT zone if any timestamps are to be logged during
early startup. Create a dummy zone variable with a minimal definition of GMT
(in particular it will never know about leap seconds), so that we can set it
up without reference to any external files.
Tom Lane [Sat, 4 Aug 2007 03:15:49 +0000 (03:15 +0000)]
Fix a problem in my recent patch to initialize cancel_key for autovac workers
as well as regular backends: if no regular backend launches before the autovac
launcher tries to start an autovac worker, the postmaster would get an Assert
fault due to calling PostmasterRandom before random_seed was initialized.
Cleanest solution seems to be to take the initialization of random_seed out
of ServerLoop and let PostmasterRandom do it for itself.
Tom Lane [Sat, 4 Aug 2007 01:26:54 +0000 (01:26 +0000)]
Switch over to using the src/timezone functions for formatting timestamps
displayed in the postmaster log. This avoids Windows-specific problems with
localized time zone names that are in the wrong encoding, and generally seems
like a good idea to forestall other potential platform-dependent issues.
To preserve the existing behavior that all backends will log in the same time
zone, create a new GUC variable log_timezone that can only be changed on a
system-wide basis, and reference log-related calculations to that zone instead
of the TimeZone variable.
This fixes the issue reported by Hiroshi Saito that timestamps printed by
xlog.c startup could be improperly localized on Windows. We still need a
simpler patch for that problem in the back branches, however.
Tom Lane [Fri, 3 Aug 2007 20:06:50 +0000 (20:06 +0000)]
Fix some sloppiness in the recent multiple-autovacuum-worker patch. It was
not bothering to initialize is_autovacuum for regular backends, meaning there
was a significant chance of the postmaster prematurely sending them SIGTERM
during database shutdown. Also, leaving the cancel key unset for an autovac
worker meant that any client could send it SIGINT, which doesn't sound
especially good either.
Andrew Dunstan [Thu, 2 Aug 2007 23:39:45 +0000 (23:39 +0000)]
Move session_start out of MyProcPort stucture and make it a global called MyStartTime,
so that we will be able to create a cookie for all processes for CSVlogs.
It is set wherever MyProcPid is set. Take the opportunity to remove the now
unnecessary session-only restriction on the %s and %c escapes in log_line_prefix.
Tom Lane [Wed, 1 Aug 2007 22:45:09 +0000 (22:45 +0000)]
Support an optional asynchronous commit mode, in which we don't flush WAL
before reporting a transaction committed. Data consistency is still
guaranteed (unlike setting fsync = off), but a crash may lose the effects
of the last few transactions. Patch by Simon, some editorialization by Tom.
Tom Lane [Tue, 31 Jul 2007 19:53:37 +0000 (19:53 +0000)]
Fix a bug in the original implementation of redundant-join-clause removal:
clauses in which one side or the other references both sides of the join
cannot be removed as redundant, because that expression won't have been
constrained below the join. Per report from Sergey Burladyan.
CVS HEAD does not contain this bug due to EquivalenceClass rewrite, but it
seems wise to include the regression test for it anyway.
Tom Lane [Tue, 31 Jul 2007 15:49:49 +0000 (15:49 +0000)]
Fix security definer functions with polymorphic arguments. This case has
never worked because fmgr_security_definer() neglected to pass the fn_expr
information through. Per report from Viatcheslav Kalinin.
Neil Conway [Fri, 27 Jul 2007 19:09:04 +0000 (19:09 +0000)]
Slight refactor for ExecOpenScanRelation(): we can use
ExecRelationIsTargetRelation() to check if the relation is a target
rel, rather than scanning through the result relation array ourselves.
Tom Lane [Thu, 26 Jul 2007 15:15:18 +0000 (15:15 +0000)]
Remove FileUnlink(), which wasn't being used anywhere and interacted poorly
with the recent patch to log temp file sizes at removal time. Doesn't seem
worth fixing since it's unused.
In passing, make a few elog messages conform to the message style guide.
Tom Lane [Wed, 25 Jul 2007 22:16:18 +0000 (22:16 +0000)]
Arrange to put TOAST tables belonging to temporary tables into special schemas
named pg_toast_temp_nnn, alongside the pg_temp_nnn schemas used for the temp
tables themselves. This allows low-level code such as the relcache to
recognize that these tables are indeed temporary, which enables various
optimizations such as not WAL-logging changes and using local rather than
shared buffers for access. Aside from obvious performance benefits, this
provides a solution to bug #3483, in which other backends unexpectedly held
open file references to temporary tables. The scheme preserves the property
that TOAST tables are not in any schema that's normally in the search path,
so they don't conflict with user table names.
initdb forced because of changes in system view definitions.
Tom Lane [Wed, 25 Jul 2007 17:22:37 +0000 (17:22 +0000)]
Adjust horology test to avoid join-plan-dependent result ordering in
a few queries. Should fix buildfarm failures arising from new,
more aggressive autovac settings.
Neil Conway [Wed, 25 Jul 2007 04:19:09 +0000 (04:19 +0000)]
Implement RETURN QUERY for PL/PgSQL. This provides some convenient syntax
sugar for PL/PgSQL set-returning functions that want to return the result
of evaluating a query; it should also be more efficient than repeated
RETURN NEXT statements. Based on an earlier patch from Pavel Stehule.
Tom Lane [Tue, 24 Jul 2007 17:22:07 +0000 (17:22 +0000)]
Fix predicate-proving logic to cope with binary-compatibility cases when
checking whether an IS NULL/IS NOT NULL clause is implied or refuted by
a strict function. Per example from Dawid Kuroczko.
Backpatch to 8.2 since this is arguably a performance bug.
Magnus Hagander [Tue, 24 Jul 2007 09:00:27 +0000 (09:00 +0000)]
Make it possible, and default, for MingW to build with SSPI support
by dynamically loading the function that's missing from the MingW
headers and library.
Tom Lane [Tue, 24 Jul 2007 04:54:09 +0000 (04:54 +0000)]
Create a new dedicated Postgres process, "wal writer", which exists to write
and fsync WAL at convenient intervals. For the moment it just tries to
offload this work from backends, but soon it will be responsible for
guaranteeing a maximum delay before asynchronously-committed transactions
will be flushed to disk.
This is a portion of Simon Riggs' async-commit patch, committed to CVS
separately because a background WAL writer seems like it might be a good idea
independently of the async-commit feature. I rebased walwriter.c on
bgwriter.c because it seemed like a more appropriate way of handling signals;
while the startup/shutdown logic in postmaster.c is more like autovac because
we want walwriter to quit before we start the shutdown checkpoint.
Set a default autovacuum vacuum_cost_delay value of 20ms, to avoid excessive
I/O utilization, per discussion.
While at it, lower the autovacuum vacuum and analyze threshold values to 50
tuples. It is a bit higher (i.e. more conservative) than what I originally
proposed but much better than the old values for small tables.
Tom Lane [Mon, 23 Jul 2007 18:59:50 +0000 (18:59 +0000)]
Just noticed that libpq thinks the maximum command tag length is 40,
whereas in the backend it's been 64 for some time. Hasn't mattered
because no actual tags exceed 40 bytes, but for consistency they should
be alike.
Magnus Hagander [Mon, 23 Jul 2007 10:16:54 +0000 (10:16 +0000)]
SSPI authentication on Windows. GSSAPI compatible client when doing Kerberos
against a Unix server, and Windows-specific server-side authentication
using SSPI "negotiate" method (Kerberos or NTLM).
Tom Lane [Sat, 21 Jul 2007 22:12:04 +0000 (22:12 +0000)]
Fix elog.c to avoid infinite recursion (leading to backend crash) when
log_min_error_statement is active and there is some problem in logging the
current query string; for example, that it's too long to include in the log
message without running out of memory. This problem has existed since the
log_min_error_statement feature was introduced. No doubt the reason it
wasn't detected long ago is that 8.2 is the first release that defaults
log_min_error_statement to less than PANIC level.
Per report from Bill Moran.
Tom Lane [Fri, 20 Jul 2007 16:29:53 +0000 (16:29 +0000)]
Fix WAL replay of truncate operations to cope with the possibility that the
truncated relation was deleted later in the WAL sequence. Since replay
normally auto-creates a relation upon its first reference by a WAL log entry,
failure is seen only if the truncate entry happens to be the first reference
after the checkpoint we're restarting from; which is a pretty unusual case but
of course not impossible. Fix by making truncate entries auto-create like
the other ones do. Per report and test case from Dharmendra Goyal.
Tom Lane [Thu, 19 Jul 2007 21:58:12 +0000 (21:58 +0000)]
On second thought, the tests for what to do with stderr output are a
lot more sensible if we check the chunk-output case first. Not
back-patched since it's just a cosmetic improvement.
Tom Lane [Thu, 19 Jul 2007 20:34:20 +0000 (20:34 +0000)]
Make replace(), split_part(), and string_to_array() behave somewhat sanely
when handed an invalidly-encoded pattern. The previous coding could get
into an infinite loop if pg_mb2wchar_with_len() returned a zero-length
string after we'd tested for nonempty pattern; which is exactly what it
will do if the string consists only of an incomplete multibyte character.
This led to either an out-of-memory error or a backend crash depending
on platform. Per report from Wiktor Wodecki.
Andrew Dunstan [Thu, 19 Jul 2007 19:13:43 +0000 (19:13 +0000)]
Only use the pipe chunking protocol if we know the syslogger should
be catching stderr output, and we are not ourselves the
syslogger. Otherwise, go directly to stderr.
Bug noticed by Tom Lane.
Backpatch as far as 8.0.
Tom Lane [Wed, 18 Jul 2007 21:40:57 +0000 (21:40 +0000)]
Fix an old thinko in SS_make_initplan_from_plan, which is used when optimizing
a MIN or MAX aggregate call into an indexscan: the initplan is being made at
the current query nesting level and so we shouldn't increment query_level.
Though usually harmless, this mistake could lead to bogus "plan should not
reference subplan's variable" failures on complex queries. Per bug report
from David Sanchez i Gregori.
Cast NULL to a pointer type in the execl() call, to avoid a compiler warning on
some platforms and possibly a bug. Per report from Stefan and subsequent
discussion.
Bruce Momjian [Wed, 18 Jul 2007 00:16:21 +0000 (00:16 +0000)]
Add:
>
> o Allow GLOBAL temporary tables to exist as empty by default in
> all sessions
>
> http://archives.postgresql.org/pgsql-hackers/2007-07/msg00006.php
>
Tom Lane [Tue, 17 Jul 2007 17:45:28 +0000 (17:45 +0000)]
Fix incorrect optimization of foreign-key checks. When an UPDATE on the
referencing table does not change the tuple's FK column(s), we don't bother
to check the PK table since the constraint was presumably already valid.
However, the check is still necessary if the tuple was inserted by our own
transaction, since in that case the INSERT trigger will conclude it need not
make the check (since its version of the tuple has been deleted). We got this
right for simple cases, but not when the insert and update are in different
subtransactions of the current top-level transaction; in such cases the FK
check would never be made at all. (Hence, problem dates back to 8.0 when
subtransactions were added --- it's actually the subtransaction version of a
bug fixed in 7.3.5.) Fix, and add regression test cases. Report and fix by
Affan Salman.
Neil Conway [Tue, 17 Jul 2007 05:02:03 +0000 (05:02 +0000)]
Implement CREATE TABLE LIKE ... INCLUDING INDEXES. Patch from NikhilS,
based in part on an earlier patch from Trevor Hardcastle, and reviewed
by myself.
Tom Lane [Tue, 17 Jul 2007 01:21:43 +0000 (01:21 +0000)]
Fix outfuncs.c to dump A_Const nodes representing NULLs correctly. This has
been broken since forever, but was not noticed because people seldom look
at raw parse trees. AFAIK, no impact on users except that debug_print_parse
might fail; but patch it all the way back anyway. Per report from Jeff Ross.
Bruce Momjian [Tue, 17 Jul 2007 00:07:54 +0000 (00:07 +0000)]
Add:
> * Allow multiple indexes to be created concurrently, ideally via a
> single heap scan, and have a restore of a pg_dump somehow use it
>
> http://archives.postgresql.org/pgsql-general/2007-05/msg01274.php
Tom Lane [Mon, 16 Jul 2007 21:20:36 +0000 (21:20 +0000)]
Fix pg_buffercache to release buffer partition locks in reverse order,
and add a note about why. This is not tremendously important right now,
probably, but it will get more urgent if NUM_BUFFER_PARTITIONS is increased
as much as proposed.
Neil Conway [Mon, 16 Jul 2007 17:38:48 +0000 (17:38 +0000)]
With the native compiler on Unixware, disable optimization if
--enable-debug is used, to avoid complaints about debugging and
optimization being mutually exclusive. Patch from Stefan Kaltenbrunner.
Tom Lane [Mon, 16 Jul 2007 17:01:11 +0000 (17:01 +0000)]
Allow plpgsql function parameter names to be qualified with the function's
name. With this patch, it is always possible for the user to qualify a
plpgsql variable name if needed to avoid ambiguity. While there is much more
work to be done in this area, this simple change removes one unnecessary
incompatibility with Oracle. Per discussion.