Tom Lane [Tue, 11 Sep 2007 17:15:33 +0000 (17:15 +0000)]
Make sure that open hash table scans are cleaned up when bgwriter tries to
recover from elog(ERROR). Problem was created by introduction of hash seq
search tracking awhile back, and affects all branches that have bgwriter;
in HEAD the disease has snuck into autovacuum and walwriter too. (Not sure
that the latter two use hash_seq_search at the moment, but surely they might
someday.) Per report from Sergey Koposov.
Tom Lane [Tue, 11 Sep 2007 16:17:46 +0000 (16:17 +0000)]
Include hash table name in all the internal-error elog messages in
dynahash.c. Sergey Koposov's current open problem shows the possible
usefulness of this, and it doesn't add much code.
Add regression tests for ispell, synonym and thesaurus dictionaries.
Rename synonym.syn.sample and thesaurs.ths.sample to
synonym_sample.syn and thesaurs_sample.ths accordingly to be able to use they
in regression test.
Fix ts_debug function to prevent unneeded calls of ts_lexize().
It will be mush better to reimplement ts_debug in C (instead of SQL as now),
but it's planned for the future.
Refactor from Heikki Linnakangas <heikki@enterprisedb.com>:
* Defined new struct WordEntryPosVector that holds a uint16 length and a
variable size array of WordEntries. This replaces the previous
convention of a variable size uint16 array, with the first element
implying the length. WordEntryPosVector has the same layout in memory,
but is more readable in source code. The POSDATAPTR and POSDATALEN
macros are still used, though it would now be more readable to access
the fields in WordEntryPosVector directly.
* Removed needfree field from DocRepresentation. It was always set to false.
Tom Lane [Tue, 11 Sep 2007 00:06:42 +0000 (00:06 +0000)]
Arrange for SET LOCAL's effects to persist until the end of the current top
transaction, unless rolled back or overridden by a SET clause for the same
variable attached to a surrounding function call. Per discussion, these
seem the best semantics. Note that this is an INCOMPATIBLE CHANGE: in 8.0
through 8.2, SET LOCAL's effects disappeared at subtransaction commit
(leading to behavior that made little sense at the SQL level).
I took advantage of the opportunity to rewrite and simplify the GUC variable
save/restore logic a little bit. The old idea of a "tentative" value is gone;
it was a hangover from before we had a stack. Also, we no longer need a stack
entry for every nesting level, but only for those in which a variable's value
actually changed.
Remove the vacuum_delay_point call in count_nondeletable_pages, because we hold
an exclusive lock on the table at this point, which we want to release as soon
as possible. This is called in the phase of lazy vacuum where we truncate the
empty pages at the end of the table.
An alternative solution would be to lower the vacuum delay settings before
starting the truncating phase, but this doesn't work very well in autovacuum
due to the autobalancing code (which can cause other processes to change our
cost delay settings). This case could be considered in the balancing code, but
it is simpler this way.
Fixes from Heikki Linnakangas <heikki@enterprisedb.com>:
Apparently it's a bug I introduced when I refactored spell.c to use the
readline function for reading and recoding the input file. I didn't
notice that some calls to STRNCMP used the non-lowercased version of the
input line.
Tom Lane [Mon, 10 Sep 2007 00:57:22 +0000 (00:57 +0000)]
Code review for GUC revert-values-if-removed-from-postgresql.conf patch;
and in passing, fix some bogosities dating from the custom_variable_classes
patch. Fix guc-file.l to correctly check changes in custom_variable_classes
that are attempted concurrently with additions/removals of custom variables,
and don't allow the new setting to be applied in advance of checking it.
Clean up messy and undocumented situation for string variables with NULL
boot_val. Fix DefineCustomVariable functions to initialize boot_val
correctly. Prevent find_option from inserting bogus placeholders for custom
variables that are simply inquired about rather than being set.
Andrew Dunstan [Sun, 9 Sep 2007 20:40:54 +0000 (20:40 +0000)]
Provide for a file specifying non-standard config options for temp install
for pg_regress, via --temp-config option. Pick this up in the make file
via TEMP_CONFIG setting.
Tom Lane [Sat, 8 Sep 2007 20:31:15 +0000 (20:31 +0000)]
Replace the former method of determining snapshot xmax --- to wit, calling
ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid"
variable that is updated during transaction commit or abort. Since
latestCompletedXid is written only in places that had to lock ProcArrayLock
exclusively anyway, and is read only in places that had to lock ProcArrayLock
shared anyway, it adds no new locking requirements to the system despite being
cluster-wide. Moreover, removing ReadNewTransactionId from snapshot
acquisition eliminates the need to take both XidGenLock and ProcArrayLock at
the same time. Since XidGenLock is sometimes held across I/O this can be a
significant win. Some preliminary benchmarking suggested that this patch has
no effect on average throughput but can significantly improve the worst-case
transaction times seen in pgbench. Concept by Florian Pflug, implementation
by Tom Lane.
Tom Lane [Fri, 7 Sep 2007 20:59:26 +0000 (20:59 +0000)]
Don't take ProcArrayLock while exiting a transaction that has no XID; there is
no need for serialization against snapshot-taking because the xact doesn't
affect anyone else's snapshot anyway. Per discussion. Also, move various
info about the interlocking of transactions and snapshots out of code comments
and into a hopefully-more-cohesive discussion in access/transam/README.
Also, remove a couple of now-obsolete comments about having to force some WAL
to be written to persuade RecordTransactionCommit to do its thing.
Improve page split in rtree emulation. Now if splitted result has
big misalignement, then it tries to split page basing on distribution
of boxe's centers.
Per report from Dolafi, Tom <dolafit@janelia.hhmi.org>
Backpatch is needed, change doesn't affect on-disk storage.
Improvements from Heikki Linnakangas <heikki@enterprisedb.com>
- change the alignment requirement of lexemes in TSVector slightly.
Lexeme strings were always padded to 2-byte aligned length to make sure
that if there's position array (uint16[]) it has the right alignment.
The patch changes that so that the padding is not done when there's no
positions. That makes the storage of tsvectors without positions
slightly more compact.
- added some #include "miscadmin.h" lines I missed in the earlier when I
added calls to check_stack_depth().
- Reimplement the send/recv functions, and added a comment
above them describing the on-wire format. The CRC is now recalculated in
tsquery as well per previous discussion.
Improving various checks by Heikki Linnakangas <heikki@enterprisedb.com>
- add code to check that the query tree is well-formed. It was indeed
possible to send malformed queries in binary mode, which produced all
kinds of strange results.
- make the left-field a uint32. There's no reason to
arbitrarily limit it to 16-bits, and it won't increase the disk/memory
footprint either now that QueryOperator and QueryOperand are separate
structs.
- add check_stack_depth() call to all recursive functions I found.
Some of them might have a natural limit so that you can't force
arbitrarily deep recursions, but check_stack_depth() is cheap enough
that seems best to just stick it into anything that might be a problem.
Refactoring by Heikki Linnakangas <heikki@enterprisedb.com> with
small editorization by me
- Brake the QueryItem struct into QueryOperator and QueryOperand.
Type was really the only common field between them. QueryItem still
exists, and is used in the TSQuery struct as before, but it's now a
union of the two. Many other changes fell from that, like separation
of pushval_asis function into pushValue, pushOperator and pushStop.
- Moved some structs that were for internal use only from header files
to the right .c-files.
- Moved tsvector parser to a new tsvector_parser.c file. Parser code was
about half of the size of tsvector.c, it's also used from tsquery.c, and
it has some data structures of its own, so it seems better to separate
it. Cleaned up the API so that TSVectorParserState is not accessed from
outside tsvector_parser.c.
- Separated enumerations (#defines, really) used for QueryItem.type
field and as return codes from gettoken_query. It was just accidental
code sharing.
- Removed ParseQueryNode struct used internally by makepol and friends.
push*-functions now construct QueryItems directly.
- Changed int4 variables to just ints for variables like "i" or "array
size", where the storage-size was not significant.
Tom Lane [Thu, 6 Sep 2007 17:31:58 +0000 (17:31 +0000)]
Make eval_const_expressions() preserve typmod when simplifying something like
null::char(3) to a simple Const node. (It already worked for non-null values,
but not when we skipped evaluation of a strict coercion function.) This
prevents loss of typmod knowledge in situations such as exhibited in bug
#3598. Unfortunately there seems no good way to fix that bug in 8.1 and 8.2,
because they simply don't carry a typmod for a plain Const node.
In passing I made all the other callers of makeNullConst supply "real" typmod
values too, though I think it probably doesn't matter anywhere else.
Tom Lane [Wed, 5 Sep 2007 21:11:19 +0000 (21:11 +0000)]
Volatile-qualify the ProcArray PGPROC pointer in a bunch of routines
that examine fields that could change under them. This is just to make
really sure that when we are fetching a value 'only once', that's what
actually happens. Possibly this is a bug that should be back-patched,
but in the absence of solid evidence that it's needed, I won't bother.
Tom Lane [Wed, 5 Sep 2007 20:53:17 +0000 (20:53 +0000)]
Quick hack to make the VXID of a prepared transaction be -1/XID,
so that different prepared xacts can be told apart in the pg_locks
view. Per suggestion from Florian.
Tom Lane [Wed, 5 Sep 2007 18:10:48 +0000 (18:10 +0000)]
Implement lazy XID allocation: transactions that do not modify any database
rows will normally never obtain an XID at all. We already did things this way
for subtransactions, but this patch extends the concept to top-level
transactions. In applications where there are lots of short read-only
transactions, this should improve performance noticeably; not so much from
removal of the actual XID-assignments, as from reduction of overhead that's
driven by the rate of XID consumption. We add a concept of a "virtual
transaction ID" so that active transactions can be uniquely identified even
if they don't have a regular XID. This is a much lighter-weight concept:
uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
record is made about them.
Andrew Dunstan [Tue, 4 Sep 2007 16:41:43 +0000 (16:41 +0000)]
Provide for binary input/output of enums, to fix complaint from Merlin Moncure.
This just provides text values, we're not exposing the underlying Oid representation.
Catalog version bumped.
Tom Lane [Tue, 4 Sep 2007 02:16:56 +0000 (02:16 +0000)]
Restrict tsearch config file base names to contain a-z, 0-9, and underscore,
instead of the initial policy of whatever isalpha() likes. Per discussion.
Tom Lane [Mon, 3 Sep 2007 18:46:30 +0000 (18:46 +0000)]
Support SET FROM CURRENT in CREATE/ALTER FUNCTION, ALTER DATABASE, ALTER ROLE.
(Actually, it works as a plain statement too, but I didn't document that
because it seems a bit useless.) Unify VariableResetStmt with
VariableSetStmt, and clean up some ancient cruft in the representation of
same.
Tom Lane [Mon, 3 Sep 2007 02:30:45 +0000 (02:30 +0000)]
Improve stylistic consistency of descriptions of built-in objects by avoiding
initcap style --- the vast majority of the existing descriptions do not use
an initial cap. I didn't change places where the first word was all-cap.
initdb not forced because this doesn't change any regression test results.
Tom Lane [Mon, 3 Sep 2007 01:18:33 +0000 (01:18 +0000)]
Fix breakage of GIN support for varchar[] and cidr[] that I introduced in the
operator-family rewrite. I had mistakenly supposed that these could use the
pg_amproc entries for text[] and inet[] respectively. However, binary
compatibility of the underlying types does not make two array types binary
compatible (since they must differ in the header field that gives the element
type OID), and so the index support code doesn't consider those entries
applicable. Add back the missing pg_amproc entries, and add an opr_sanity
query to try to catch such mistakes in future. Per report from Gregory
Maxwell.
Tom Lane [Mon, 3 Sep 2007 00:39:26 +0000 (00:39 +0000)]
Implement function-local GUC parameter settings, as per recent discussion.
There are still some loose ends: I didn't do anything about the SET FROM
CURRENT idea yet, and it's not real clear whether we are happy with the
interaction of SET LOCAL with function-local settings. The documentation
is a bit spartan, too.
Tom Lane [Sat, 1 Sep 2007 18:47:39 +0000 (18:47 +0000)]
Since sort_bounded_heap makes state changes that should be made
regardless of the number of tuples involved, it's incorrect to skip it
when memtupcount = 1; the number of cycles saved is minuscule anyway.
An alternative solution would be to pull the state changes out to the
call site in tuplesort_performsort, but keeping them near the corresponding
changes in make_bounded_heap seems marginally cleaner. Noticed by
Greg Stark.
Tom Lane [Fri, 31 Aug 2007 23:35:22 +0000 (23:35 +0000)]
Apply a band-aid fix for the problem that 8.2 and up completely misestimate
the number of rows likely to be produced by a query such as
SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL;
What this is doing is selecting for t1 rows with no match in t2, and thus
it may produce a significant number of rows even if the t2.key table column
contains no nulls at all. 8.2 thinks the table column's null fraction is
relevant and thus may estimate no rows out, which results in terrible plans
if there are more joins above this one. A proper fix for this will involve
passing much more information about the context of a clause to the selectivity
estimator functions than we ever have. There's no time left to write such a
patch for 8.3, and it wouldn't be back-patchable into 8.2 anyway. Instead,
put in an ad-hoc test to defeat the normal table-stats-based estimation when
an IS NULL test is evaluated at an outer join, and just use a constant
estimate instead --- I went with 0.5 for lack of a better idea. This won't
catch every case but it will catch the typical ways of writing such queries,
and it seems unlikely to make things worse for other queries.
Tom Lane [Fri, 31 Aug 2007 18:33:40 +0000 (18:33 +0000)]
Extend whole-row Var evaluation to cope with the case that the sub-plan
generating the tuples has resjunk output columns. This is not possible for
simple table scans but can happen when evaluating a whole-row Var for a view.
Per example from Patryk Kordylewski. The problem exists back to 8.0 but
I'm not going to risk back-patching further than 8.2 because of the many
changes in this area.
Tom Lane [Fri, 31 Aug 2007 01:44:06 +0000 (01:44 +0000)]
Rewrite make_outerjoininfo's construction of min_lefthand and min_righthand
sets for outer joins, in the light of bug #3588 and additional thought and
experimentation. The original methodology was fatally flawed for nests of
more than two outer joins: it got the relationships between adjacent joins
right, but didn't always come to the right conclusions about whether a join
could be interchanged with one two or more levels below it. This was largely
caused by a mistaken idea that we should use the min_lefthand + min_righthand
sets of a sub-join as the minimum left or right input set of an upper join
when we conclude that the sub-join can't commute with the upper one. If
there's a still-lower join that the sub-join *can* commute with, this method
led us to think that that one could commute with the topmost join; which it
can't. Another problem (not directly connected to bug #3588) was that
make_outerjoininfo's processing-order-dependent method for enforcing outer
join identity #3 didn't work right: if we decided that join A could safely
commute with lower join B, we dropped all information about sub-joins under B
that join A could perhaps not safely commute with, because we removed B's
entire min_righthand from A's.
To fix, make an explicit computation of all inner join combinations that occur
below an outer join, and add to that the full syntactic relsets of any lower
outer joins that we determine it can't commute with. This method gives much
more direct enforcement of the outer join rearrangement identities, and it
turns out not to cost a lot of additional bookkeeping.
Thanks to Richard Harris for the bug report and test case.
Tom Lane [Thu, 30 Aug 2007 05:27:29 +0000 (05:27 +0000)]
Fix int8mul so that overflow check is applied correctly for INT64_IS_BUSTED
case, per Florian Pflug.
Not back-patched since it's unclear that anyone but me still cares ...
Tom Lane [Wed, 29 Aug 2007 17:24:29 +0000 (17:24 +0000)]
Relax permissions checks on dbsize functions, per discussion. Revert out all
checks for individual-table-size functions, since anyone in the database could
get approximate values from pg_class.relpages anyway. Allow database-size to
users with CONNECT privilege for the target database (note that this is
granted by default). Allow tablespace-size if the user has CREATE privilege
on the tablespace (which is *not* granted by default), or if the tablespace is
the default tablespace for the current database (since we treat that as
implicitly allowing use of the tablespace).
Tom Lane [Wed, 29 Aug 2007 16:31:36 +0000 (16:31 +0000)]
Fix aboriginal bug in _tarAddFile(): when complaining that the amount of data
read from the temp file didn't match the file length reported by ftello(),
the wrong variable's value was printed, and so the message made no sense.
Clean up a couple other coding infelicities while at it.
Tom Lane [Tue, 28 Aug 2007 03:23:44 +0000 (03:23 +0000)]
Improve behavior of log_lock_waits patch. Ensure that something gets logged
even if the "deadlock detected" ERROR message is suppressed by an exception
catcher. Be clearer about the event sequence when a soft deadlock is fixed:
the fixing process might or might not still have to wait, so log that
separately. Fix race condition when someone releases us from the lock partway
through printing all this junk --- we'd not get confused about our state, but
the log message sequence could have been misleading, ie, a "still waiting"
message with no subsequent "acquired" message. Greg Stark and Tom Lane.
Tom Lane [Mon, 27 Aug 2007 03:36:08 +0000 (03:36 +0000)]
Fix a couple of misbehaviors rooted in the fact that the default creation
namespace isn't necessarily first in the search path (there could be implicit
schemas ahead of it). Examples are
test=# set search_path TO s1;
test=# create view pg_timezone_names as select * from pg_timezone_names();
ERROR: "pg_timezone_names" is already a view
test=# create table pg_class (f1 int primary key);
ERROR: permission denied: "pg_class" is a system catalog
You'd expect these commands to create the requested objects in s1, since
names beginning with pg_ aren't supposed to be reserved anymore. What is
happening is that we create the requested base table and then execute
additional commands (here, CREATE RULE or CREATE INDEX), and that code is
passed the same RangeVar that was in the original command. Since that
RangeVar has schemaname = NULL, the secondary commands think they should do a
path search, and that means they find system catalogs that are implicitly in
front of s1 in the search path.
This is perilously close to being a security hole: if the secondary command
failed to apply a permission check then it'd be possible for unprivileged
users to make schema modifications to system catalogs. But as far as I can
find, there is no code path in which a check doesn't occur. Which makes it
just a weird corner-case bug for people who are silly enough to want to
name their tables the same as a system catalog.
The relevant code has changed quite a bit since 8.2, which means this patch
wouldn't work as-is in the back branches. Since it's a corner case no one
has reported from the field, I'm not going to bother trying to back-patch.
Tom Lane [Mon, 27 Aug 2007 01:39:25 +0000 (01:39 +0000)]
Remove the 'not in' operator (!!=). This was a hangover from Berkeley
days that was obsolete the moment we had IN (SELECT ...) capability.
It's arguably a security hole since it applied no permissions check to
the table it searched, and since it was never documented anywhere,
removing it seems more appropriate than fixing it.
Tom Lane [Mon, 27 Aug 2007 01:24:50 +0000 (01:24 +0000)]
Require SELECT privilege on a table to do dblink_get_pkey(). This is
not all that exciting when the system catalogs are readable by all,
but some people try to lock them down, and would not like this sort of
end run ...
Tom Lane [Mon, 27 Aug 2007 01:19:14 +0000 (01:19 +0000)]
Restrict pg_relation_size to relation owner, pg_database_size to DB owner,
and pg_tablespace_size to superusers. Perhaps we could weaken the first
case to just require SELECT privilege, but that doesn't work for the
other cases, so use ownership as the common concept.
Tom Lane [Mon, 27 Aug 2007 00:57:36 +0000 (00:57 +0000)]
Make currtid() functions require SELECT privileges on the target table.
While it's not clear that TID linkage info is of any great use to a
nefarious user, it's certainly unexpected that these functions wouldn't
insist on read privileges.
Tom Lane [Mon, 27 Aug 2007 00:13:51 +0000 (00:13 +0000)]
Restrict pgrowlocks function to superusers. (This might be too strict,
but no permissions check at all is certainly no good.) Clean up usage
of some deprecated APIs.
Tom Lane [Sun, 26 Aug 2007 23:59:50 +0000 (23:59 +0000)]
Restrict pgstattuple functions to superusers. (This might be too strict,
but no permissions check at all is certainly no good.) Clean up usage
of some deprecated APIs.
Tom Lane [Sun, 26 Aug 2007 21:44:25 +0000 (21:44 +0000)]
Make ARRAY(SELECT ...) return an empty array, rather than a NULL, when the
sub-select returns zero rows. Per complaint from Jens Schicke. Since this
is more in the nature of a definition change than a bug, not back-patched.
Tom Lane [Sat, 25 Aug 2007 20:29:25 +0000 (20:29 +0000)]
Adjust with-system-tzdata patch to not attempt to install a symlink,
but just hardwire the specified timezone database path into the executable.
Per discussion, this avoids some packaging disadvantages of using a
symlink.
Tom Lane [Sat, 25 Aug 2007 19:08:19 +0000 (19:08 +0000)]
Fix brain fade in DefineIndex(): it was continuing to access the table's
relcache entry after having heap_close'd it. This could lead to misbehavior
if a relcache flush wiped out the cache entry meanwhile. In 8.2 there is a
very real risk of CREATE INDEX CONCURRENTLY using the wrong relid for locking
and waiting purposes. I think the bug is only cosmetic in 8.0 and 8.1,
because their transgression is limited to using RelationGetRelationName(rel)
in an ereport message immediately after heap_close, and there's no way (except
with special debugging options) for a cache flush to occur in that interval.
Not quite sure that it's cosmetic in 7.4, but seems best to patch anyway.
Found by trying to run the regression tests with CLOBBER_CACHE_ALWAYS enabled.
Maybe we should try to do that on a regular basis --- it's awfully slow,
but perhaps some fast buildfarm machine could do it once in awhile.