Tom Lane [Sat, 21 Jan 2006 04:38:21 +0000 (04:38 +0000)]
Repair longstanding bug in slru/clog logic: it is possible for two backends
to try to create a log segment file concurrently, but the code erroneously
specified O_EXCL to open(), resulting in a needless failure. Before 7.4,
it was even a PANIC condition :-(. Correct code is actually simpler than
what we had, because we can just say O_CREAT to start with and not need a
second open() call. I believe this accounts for several recent reports of
hard-to-reproduce "could not create file ...: File exists" errors in both
pg_clog and pg_subtrans.
Bruce Momjian [Sat, 21 Jan 2006 02:16:21 +0000 (02:16 +0000)]
Add GRANT ON SEQUENCE syntax to support sequence-only permissions.
Continue to support GRANT ON [TABLE] for sequences for backward
compatibility; issue warning for invalid sequence permissions.
[Backward compatibility warning message.]
Add USAGE permission for sequences that allows only currval() and
nextval(), not setval().
Mention object name in grant/revoke warnings because of possible
multi-object operations.
Tom Lane [Fri, 20 Jan 2006 22:46:16 +0000 (22:46 +0000)]
Replace bitwise looping with bytewise looping in hemdistsign and
sizebitvec of tsearch2, as well as identical code in several other
contrib modules. This provided about a 20X speedup in building a
large tsearch2 index ... didn't try to measure its effects for other
operations. Thanks to Stephan Vollmer for providing a test case.
Tom Lane [Fri, 20 Jan 2006 15:16:56 +0000 (15:16 +0000)]
Fix thinko in autovacuum's test to skip temp tables: want to skip any
temp table not only our own process' tables. It's not real important
since vacuum.c will skip temp tables anyway, but might as well make the
code do what it claims to do.
Tom Lane [Thu, 19 Jan 2006 20:28:43 +0000 (20:28 +0000)]
Avoid crashing if relcache flush occurs while trying to load data into an
index's support-function cache (in index_getprocinfo). Since none of that
data can change for an index that's in active use, it seems sufficient to
treat all open indexes the same way we were treating "nailed" system indexes
--- that is, just re-read the pg_class row and leave the rest of the relcache
entry strictly alone. The pg_class re-read might not be strictly necessary
either, but since the reltablespace and relfilenode can change in normal
operation it seems safest to do it. (We don't support changing any of the
other info about an index at all, at the moment.)
Back-patch as far as 8.0. It might be possible to adapt the patch to 7.4,
but it would take more work than I care to expend for such a low-probability
problem. 7.3 is out of luck for sure.
Tom Lane [Thu, 19 Jan 2006 04:45:38 +0000 (04:45 +0000)]
It turns out that TablespaceCreateDbspace fails badly if a relcache flush
occurs when it tries to heap_open pg_tablespace. When control returns to
smgrcreate, that routine will be holding a dangling pointer to a closed
SMgrRelation, resulting in mayhem. This is of course a consequence of
the violation of proper module layering inherent in having smgr.c call
a tablespace command routine, but the simplest fix seems to be to change
the locking mechanism. There's no real need for TablespaceCreateDbspace
to touch pg_tablespace at all --- it's only opening it as a way of locking
against a parallel DROP TABLESPACE command. A much better answer is to
create a special-purpose LWLock to interlock these two operations.
This drops TablespaceCreateDbspace quite a few layers down the food chain
and makes it something reasonably safe for smgr to call.
Tom Lane [Thu, 19 Jan 2006 00:27:08 +0000 (00:27 +0000)]
Fix a tiny memory leak (one List header) in RelationCacheInvalidate().
This is utterly insignificant in normal operation, but it becomes a
problem during cache inval stress testing. The original coding in fact
had no leak --- the 8.0 List rewrite created the issue. I wonder whether
list_concat should pfree the discarded header?
Tom Lane [Wed, 18 Jan 2006 20:35:06 +0000 (20:35 +0000)]
Modify pgstats code to reduce performance penalties from oversized stats data
files: avoid creating stats hashtable entries for tables that aren't being
touched except by vacuum/analyze, ensure that entries for dropped tables are
removed promptly, and tweak the data layout to avoid storing useless struct
padding. Also improve the performance of pgstat_vacuum_tabstat(), and make
sure that autovacuum invokes it exactly once per autovac cycle rather than
multiple times or not at all. This should cure recent complaints about 8.1
showing much higher stats I/O volume than was seen in 8.0. It'd still be a
good idea to revisit the design with an eye to not re-writing the entire
stats dataset every half second ... but that would be too much to backpatch,
I fear.
Neil Conway [Wed, 18 Jan 2006 06:49:30 +0000 (06:49 +0000)]
Add a new system view, pg_cursors, that displays the currently available
cursors. Patch from Joachim Wieland, review and ediorialization by Neil
Conway. The view lists cursors defined by DECLARE CURSOR, using SPI, or
via the Bind message of the frontend/backend protocol. This means the
view does not list the unnamed portal or the portal created to implement
EXECUTE. Because we do list SPI portals, there might be more rows in
this view than you might expect if you are using SPI implicitly (e.g.
via a procedural language).
Per recent discussion on -hackers, the query string included in the
view for cursors defined by DECLARE CURSOR is based on
debug_query_string. That means it is not accurate if multiple queries
separated by semicolons are submitted as one query string. However,
there doesn't seem a trivial fix for that: debug_query_string
is better than nothing. I also changed SPI_cursor_open() to include
the source text for the portal it creates: AFAICS there is no reason
not to do this.
Update the documentation and regression tests, bump the catversion.
Tom Lane [Tue, 17 Jan 2006 00:09:01 +0000 (00:09 +0000)]
Improve comments about btree's use of ScanKey data structures: there
are two basically different kinds of scankeys, and we ought to try harder
to indicate which is used in each place in the code. I've chosen the names
"search scankey" and "insertion scankey", though you could make about
as good an argument for "operator scankey" and "comparison function
scankey".
Neil Conway [Mon, 16 Jan 2006 18:15:31 +0000 (18:15 +0000)]
Change the parameter_types column of the pg_prepared_statements to be
an array of regtype, rather than an array of OIDs. This is likely to
be more useful to user, and the type OID can easily be obtained by
casting a regtype value to OID. Per suggestion from Tom.
Update the documentation and regression tests, and bump the catversion.
Neil Conway [Sun, 15 Jan 2006 22:46:53 +0000 (22:46 +0000)]
When using GCC on AMD64 and PPC, ECPGget_variable() takes a va_list *, not
a va_list. Christof Petig's previous patch made this change, but neglected
to update ecpglib/descriptor.c, resulting in a compiler warning (and a
likely runtime crash) on AMD64 and PPC.
Neil Conway [Sun, 15 Jan 2006 22:34:49 +0000 (22:34 +0000)]
Add regression tests to verify that domain constraints on parameters
to prepared statements with unknown type are correctly enforced, per
recent bug report.
Neil Conway [Sun, 15 Jan 2006 22:18:47 +0000 (22:18 +0000)]
Allow the types of parameters to PREPARE to be inferred. If a parameter's
data type is unspecified or is declared to be "unknown", the type will
be inferred from the context in which the parameter is used. This was
already possible for protocol-level prepared statements.
Tom Lane [Sat, 14 Jan 2006 22:03:35 +0000 (22:03 +0000)]
Some minor code cleanup, falling out from the removal of rtree. SK_NEGATE
isn't being used anywhere anymore, and there seems no point in a generic
index_keytest() routine when two out of three remaining access methods
aren't using it. Also, add a comment documenting a convention for
letting access methods define private flag bits in ScanKey sk_flags.
There are no such flags at the moment but I'm thinking about changing
btree's handling of "required keys" to use flag bits in the keys
rather than a count of required key positions. Also, if some AM did
still want SK_NEGATE then it would be reasonable to treat it as a private
flag bit.
Tom Lane [Fri, 13 Jan 2006 21:32:12 +0000 (21:32 +0000)]
Remove logic in XactLockTableWait() that attempted to mark a crashed
transaction as aborted. Since we only call XactLockTableWait on XIDs
that we believe to be currently running, the odds of this code ever
actually firing are minimal. It's certainly unnecessary, since a
transaction that's not either running or committed will be presumed
aborted anyway. What's more, it's not hard to imagine scenarios where
this could result in corrupting pg_clog: for instance, if a bogus XID
somehow got passed to XactLockTableWait. I think the code probably
dates from the ancient era when we didn't have TransactionIdIsInProgress;
back then it may have been necessary, but now I think it's a waste of
cycles and potentially dangerous. Per discussion with Qingqing Zhou
and Karsten Hilbert.
Tom Lane [Fri, 13 Jan 2006 18:10:25 +0000 (18:10 +0000)]
Document that CREATE OPERATOR CLASS amounts to granting public execute
permissions on the functions and operators contained in the opclass.
Since we already require superuser privilege to create an operator class,
there's no expansion-of-privilege hazard here, but if someone were to get
the idea of building an opclass containing functions that need security
restrictions, we'd better warn them off. Also, change the permission
checks from have-execute-privilege to have-ownership, and then comment
them all out since they're dead code anyway under the superuser restriction.
Tom Lane [Fri, 13 Jan 2006 18:06:45 +0000 (18:06 +0000)]
Require the issuer of CREATE TYPE to own the functions mentioned in the
type definition. Because use of a type's I/O conversion functions isn't
access-checked, CREATE TYPE amounts to granting public execute permissions
on the functions, and so allowing it to anybody means that someone could
theoretically gain access to a function he's not supposed to be able to
execute. The parameter-type restrictions already enforced by CREATE TYPE
make it fairly unlikely that this oversight is meaningful in practice,
but still it seems like a good idea to plug the hole going forward.
Also, document the implicit grant just in case anybody gets the idea of
building I/O functions that might need security restrictions.
Neil Conway [Thu, 12 Jan 2006 22:04:02 +0000 (22:04 +0000)]
mbutils was previously doing some allocations, including invoking
fmgr_info(), in the TopMemoryContext. I couldn't see that the code
actually leaked, but in general I think it's fragile to assume that
pfree'ing an FmgrInfo along with its fn_extra field is enough to
reclaim all the resources allocated by fmgr_info(). I changed the
code to do its allocations in a new child context of
TopMemoryContext, MbProcContext. When we want to release the
allocations we can just reset the context, which is cleaner.
Tom Lane [Thu, 12 Jan 2006 21:48:53 +0000 (21:48 +0000)]
Repair "Halloween problem" in EvalPlanQual: a tuple that's been inserted by
our own command (or more generally, xmin = our xact and cmin >= current
command ID) should not be seen as good. Else we may try to update rows
we already updated. This error was inserted last August while fixing the
even bigger problem that the old coding wouldn't see *any* tuples inserted
by our own transaction as good. Per report from Euler Taveira de Oliveira.
Tom Lane [Thu, 12 Jan 2006 19:23:22 +0000 (19:23 +0000)]
Use a more bulletproof test for whether finite() and isinf() are present.
It seems that recent gcc versions can optimize away calls to these functions
even when the functions do not exist on the platform, resulting in a bogus
positive result. Avoid this by using a non-constant argument and ensuring
that the function result is not simply discarded. Per report from
François Laupretre.
Neil Conway [Wed, 11 Jan 2006 22:16:39 +0000 (22:16 +0000)]
Documentation tweak: add spaces around the brackets in the description
of the CREATE CONVERSION syntax, for consistency with the other SQL
reference pages.
Tom Lane [Wed, 11 Jan 2006 20:12:43 +0000 (20:12 +0000)]
Create a standard function pg_sleep() to sleep for a specified amount of time.
Replace the former ad-hoc implementation used in the regression tests.
Joachim Wieland
Neil Conway [Wed, 11 Jan 2006 08:43:13 +0000 (08:43 +0000)]
Cosmetic code cleanup: fix a bunch of places that used "return (expr);"
rather than "return expr;" -- the latter style is used in most of the
tree. I kept the parentheses when they were necessary or useful because
the return expression was complex.
Neil Conway [Tue, 10 Jan 2006 18:50:43 +0000 (18:50 +0000)]
Minor code clarity improvement: AFAICS, estate.eval_econtext must be
non-NULL during the guts of plpgsql_exec_trigger() and
plpgsql_exec_function(). Therefore, we can remove the NULL check,
per discussion on -patches.
Tom Lane [Tue, 10 Jan 2006 17:35:52 +0000 (17:35 +0000)]
Improve patternsel() by applying the operator itself to each value
listed in the column's most-common-values statistics entry. This gives
us an exact selectivity result for the portion of the column population
represented by the MCV list, which can be a big leg up in accuracy if
that's a large fraction of the population. The heuristics involving
pattern contents and prefix are applied only to the part of the population
not included in the MCV list.
Neil Conway [Tue, 10 Jan 2006 00:33:12 +0000 (00:33 +0000)]
In PLy_function_build_args(), the code loops repeatedly, constructing
one argument at a time and then inserting the argument into a Python
list via PyList_SetItem(). This "steals" the reference to the argument:
that is, the reference to the new list member is now held by the Python
list itself. This works fine, except if an elog occurs. This causes the
function's PG_CATCH() block to be invoked, which decrements the
reference counts on both the current argument and the list of arguments.
If the elog happens to occur during the second or subsequent iteration
of the loop, the reference count on the current argument will be
decremented twice.
The fix is simple: set the local pointer to the current argument to NULL
immediately after adding it to the argument list. This ensures that the
Py_XDECREF() in the PG_CATCH() block doesn't double-decrement.
Tom Lane [Mon, 9 Jan 2006 21:16:17 +0000 (21:16 +0000)]
Fix pg_dump to add the required OPERATOR() decoration to schema-qualified
operator names. This is needed when dumping operator definitions that have
COMMUTATOR (or similar) links to operators in other schemas.
Apparently Daniel Whitter is the first person ever to try this :-(
Neil Conway [Mon, 9 Jan 2006 02:47:09 +0000 (02:47 +0000)]
Minor code cleanup for PL/Python: fixup some strangely formatted comments,
and change two elogs into ereports because they could actually occur
in practice.
Tom Lane [Sun, 8 Jan 2006 21:24:37 +0000 (21:24 +0000)]
Fix the assert_enabled issue properly. This eliminates the former ABI
difference between USE_ASSERT_CHECKING and not: the assert_enabled
variable is always there.
Tom Lane [Sun, 8 Jan 2006 20:04:41 +0000 (20:04 +0000)]
Avoid leaking memory while reading toasted entries from pg_rewrite,
and nail a couple more system indexes into cache. This doesn't make
any difference in normal system operation, but when forcing constant
cache resets it's difficult to get through the rules regression test
without these changes.
Neil Conway [Sun, 8 Jan 2006 07:00:27 +0000 (07:00 +0000)]
Add a new system view, pg_prepared_statements, that can be used to
access information about the prepared statements that are available
in the current session. Original patch from Joachim Wieland, various
improvements by Neil Conway.
The "statement" column of the view contains the literal query string
sent by the client, without any rewriting or pretty printing. This
means that prepared statements created via SQL will be prefixed with
"PREPARE ... AS ", whereas those prepared via the FE/BE protocol will
not. That is unfortunate, but discussion on -patches did not yield an
efficient way to improve this, and there is some merit in returning
exactly what the client sent to the backend.
Tom Lane [Sat, 7 Jan 2006 22:45:41 +0000 (22:45 +0000)]
Add RelationOpenSmgr() calls to ensure rd_smgr is valid when we try to
use it. While it normally has been opened earlier during btree index
build, testing shows that it's possible for the link to be closed again
if an sinval reset occurs while the index is being built.
Tom Lane [Sat, 7 Jan 2006 21:16:10 +0000 (21:16 +0000)]
During CatCacheRemoveCList, we must now remove any members that are
dead and have become unreferenced. Before 8.1, such members were left
for AtEOXact_CatCache() to clean up, but now AtEOXact_CatCache isn't
supposed to have anything to do. In an assert-enabled build this bug
leads to an assertion failure at transaction end, but in a non-assert
build the dead member is effectively just a small memory leak.
Per report from Jeremy Drake.
Tom Lane [Fri, 6 Jan 2006 02:58:25 +0000 (02:58 +0000)]
Fix Windows-only postmaster code to reject a connection request and continue,
rather than elog(FATAL), when there is no more room in ShmemBackendArray.
This is a security issue since too many connection requests arriving close
together could cause the postmaster to shut down, resulting in denial of
service. Reported by Yoshiyuki Asaba, fixed by Magnus Hagander.
Tom Lane [Fri, 6 Jan 2006 00:15:50 +0000 (00:15 +0000)]
Convert Assert checking for empty page into a regular test and elog.
The consequences of overwriting a non-empty page are bad enough that
we should not omit this test in production builds.
Tom Lane [Fri, 6 Jan 2006 00:04:20 +0000 (00:04 +0000)]
Fix ReadBuffer() to correctly handle the case where it's trying to extend
the relation but it finds a pre-existing valid buffer. The buffer does not
correspond to any page known to the kernel, so we *must* do smgrextend to
ensure that the space becomes allocated. The 7.x branches all do this
correctly, but the corner case got lost somewhere during 8.0 bufmgr rewrites.
(My fault no doubt :-( ... I think I assumed that such a buffer must be
not-BM_VALID, which is not so.)
Bruce Momjian [Thu, 5 Jan 2006 16:35:19 +0000 (16:35 +0000)]
Update wording:
< STABLE | DEFAULT ]. [wallog]
> STABLE | DEFAULT ]. Tables using non-default logging should not use
> referential integrity with default-logging tables, and tables using
> stable logging probably can not have indexes. [wallog]
Bruce Momjian [Thu, 5 Jan 2006 16:26:49 +0000 (16:26 +0000)]
Update wording:
< the table. Another option is to avoid transaction logging entirely
< and truncate or drop the table on crash recovery. These should be
< implemented using ALTER TABLE, e.g. ALTER TABLE PERSISTENCE [ DROP |
< TRUNCATE | STABLE | DEFAULT ]. [wallog]
> the table. This would affect COPY, and perhaps INSERT/UPDATE too.
> Another option is to avoid transaction logging entirely and truncate
> or drop the table on crash recovery. These should be implemented
> using ALTER TABLE, e.g. ALTER TABLE PERSISTENCE [ DROP | TRUNCATE |
> STABLE | DEFAULT ]. [wallog]
Bruce Momjian [Thu, 5 Jan 2006 16:23:48 +0000 (16:23 +0000)]
Add:
>
> * Allow control over which tables are WAL-logged
>
> Allow tables to bypass WAL writes and just fsync() dirty pages on
> commit. To do this, only a single writer can modify the table, and
> writes must happen only on new pages. Readers can continue accessing
> the table. Another option is to avoid transaction logging entirely
> and truncate or drop the table on crash recovery. These should be
> implemented using ALTER TABLE, e.g. ALTER TABLE PERSISTENCE [ DROP |
> TRUNCATE | STABLE | DEFAULT ]. [wallog]
Make all command-line options of postmaster and postgres the same. See
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00151.php for the
complete plan.
Tom Lane [Wed, 4 Jan 2006 21:06:32 +0000 (21:06 +0000)]
Rearrange backend startup sequence so that ShmemIndexLock can become
an LWLock instead of a spinlock. This hardly matters on Unix machines
but should improve startup performance on Windows (or any port using
EXEC_BACKEND). Per previous discussion.
Tom Lane [Tue, 3 Jan 2006 23:46:24 +0000 (23:46 +0000)]
There is a signedness bug in Openwall gen_salt code that pgcrypto uses.
This makes the salt space for md5 and xdes algorithms a lot smaller than
it should be.
Joe Conway [Tue, 3 Jan 2006 23:45:52 +0000 (23:45 +0000)]
When the remote query result has a different number of columns
than the local query specifies (e.g. in the FROM clause),
throw an ERROR (instead of crashing). Fix for bug #2129 reported
by Akio Iwaasa.
Tom Lane [Tue, 3 Jan 2006 22:48:10 +0000 (22:48 +0000)]
Add checks to verify that a plpgsql function returning a rowtype is actually
returning the rowtype it's supposed to return. Per reports from David Niblett
and Michael Fuhr.
Bruce Momjian [Tue, 3 Jan 2006 16:42:17 +0000 (16:42 +0000)]
Use setitimer() for stats file write, rather than do a gettimeofday()
call for every stats packet read to adjust select() timeout. Other
stylistic improvements.