Tom Lane [Sun, 29 Jan 2012 23:37:14 +0000 (18:37 -0500)]
Tweak index costing for problems with partial indexes.
btcostestimate() makes an estimate of the number of index tuples that will
be visited based on knowledge of which index clauses can actually bound the
scan within nbtree. However, it forgot to account for partial indexes in
this calculation, with the result that the cost of the index scan could be
significantly overestimated for a partial index. Fix that by merging the
predicate with the abbreviated indexclause list, in the same way as we do
with the full list to estimate how many heap tuples will be visited.
Also, slightly increase the "fudge factor" that's meant to give preference
to smaller indexes over larger ones. While this is applied to all indexes,
it's most important for partial indexes since it can be the only factor
that makes a partial index look cheaper than a similar full index.
Experimentation shows that the existing value is so small as to easily get
swamped by noise such as page-boundary-roundoff behavior. I'm tempted to
kick it up more than this, but will refrain for now.
Per report from Ruben Blanco. These are long-standing issues, but given
the lack of prior complaints I'm not going to risk changing planner
behavior in back branches by back-patching.
Tom Lane [Sun, 29 Jan 2012 21:31:23 +0000 (16:31 -0500)]
Fix pushing of index-expression qualifications through UNION ALL.
In commit 57664ed25e5dea117158a2e663c29e60b3546e1c, I made the planner
wrap non-simple-variable outputs of appendrel children (IOW, child SELECTs
of UNION ALL subqueries) inside PlaceHolderVars, in order to solve some
issues with EquivalenceClass processing. However, this means that any
upper-level WHERE clauses mentioning such outputs will now contain
PlaceHolderVars after they're pushed down into the appendrel child,
and that prevents indxpath.c from recognizing that they could be matched
to index expressions. To fix, add explicit stripping of PlaceHolderVars
from index operands, same as we have long done for RelabelType nodes.
Add a regression test covering both this and the plain-UNION case (which
is a totally different code path, but should also be able to do it).
Per bug #6416 from Matteo Beccati. Back-patch to 9.1, same as the
previous change.
Tom Lane [Sun, 29 Jan 2012 01:24:42 +0000 (20:24 -0500)]
Fix handling of init_plans list in inheritance_planner().
Formerly we passed an empty list to each per-child-table invocation of
grouping_planner, and then merged the results into the global list.
However, that fails if there's a CTE attached to the statement, because
create_ctescan_plan uses the list to find the plan referenced by a CTE
reference; so it was unable to find any CTEs attached to the outer UPDATE
or DELETE. But there's no real reason not to use the same list throughout
the process, and doing so is simpler and faster anyway.
Per report from Josh Berkus of "could not find plan for CTE" failures.
Back-patch to 9.1 where we added support for WITH attached to UPDATE or
DELETE. Add some regression test cases, too.
Tom Lane [Sat, 28 Jan 2012 22:55:08 +0000 (17:55 -0500)]
Add simple tests of EvalPlanQual using the isolationtester infrastructure.
Much more could be done here, but at least now we have *some* automated
test coverage of that mechanism. In particular this tests the writable-CTE
case reported by Phil Sorber.
In passing, remove isolationtester's arbitrary restriction on the number of
steps in a permutation list. I used this so that a single spec file could
be used to run several related test scenarios, but there are other possible
reasons to want a step series that's not exactly a permutation. Improve
documentation and fix a couple other nits as well.
Tom Lane [Sat, 28 Jan 2012 22:43:57 +0000 (17:43 -0500)]
Fix handling of data-modifying CTE subplans in EvalPlanQual.
We can't just skip initializing such subplans, because the referencing CTE
node will expect to find the subplan available when it initializes. That
in turn means that ExecInitModifyTable must allow the case (which actually
it needed to do anyway, since there's no guarantee that ModifyTable is
exactly at the top of the CTE plan tree). So move the complaint about not
being allowed in EvalPlanQual mode to execution instead of initialization.
Testing turned up yet another problem, which is that we'd try to
re-initialize the result relation's index list, leading to leaks and
dangling pointers.
Per report from Phil Sorber. Back-patch to 9.1 where data-modifying CTEs
were introduced.
Tom Lane [Sat, 28 Jan 2012 04:09:16 +0000 (23:09 -0500)]
Fix error detection in contrib/pgcrypto's encrypt_iv() and decrypt_iv().
Due to oversights, the encrypt_iv() and decrypt_iv() functions failed to
report certain types of invalid-input errors, and would instead return
random garbage values.
Tom Lane [Sat, 28 Jan 2012 00:46:41 +0000 (19:46 -0500)]
Undo 8.4-era lobotomization of subquery pullup rules.
After the planner was fixed to convert some IN/EXISTS subqueries into
semijoins or antijoins, we had to prevent it from doing that in some
cases where the plans risked getting much worse. The reason the plans
got worse was that in the unoptimized implementation, subqueries could
reference parameters from the outer query at any join level, and so
full table scans could be avoided even if they were one or more levels
of join below where the semi/anti join would be. Now that we have
sufficient mechanism in the planner to handle such cases properly,
it should no longer be necessary to play dumb here.
This reverts commits 07b9936a0f10d746e5076239813a5e938f2f16be and cd1f0d04bf06938c0ee5728fc8424d62bcf2eef3. The latter was a stopgap
fix that wasn't really sufficiently analyzed at the time. Rather
than just restricting ourselves to cases where the new join can be
stacked on the right-hand input, we should also consider whether it
can be stacked on the left-hand input.
Tom Lane [Sat, 28 Jan 2012 00:26:38 +0000 (19:26 -0500)]
Use parameterized paths to generate inner indexscans more flexibly.
This patch fixes the planner so that it can generate nestloop-with-
inner-indexscan plans even with one or more levels of joining between
the indexscan and the nestloop join that is supplying the parameter.
The executor was fixed to handle such cases some time ago, but the
planner was not ready. This should improve our plans in many situations
where join ordering restrictions formerly forced complete table scans.
There is probably a fair amount of tuning work yet to be done, because
of various heuristics that have been added to limit the number of
parameterized paths considered. However, we are not going to find out
what needs to be adjusted until the code gets some real-world use, so
it's time to get it in there where it can be tested easily.
Note API change for index AM amcostestimate functions. I'm not aware of
any non-core index AMs, but if there are any, they will need minor
adjustments.
Peter Eisentraut [Fri, 27 Jan 2012 19:58:51 +0000 (21:58 +0200)]
Show default privileges in information schema
Hitherto, the information schema only showed explicitly granted
privileges that were visible in the *acl catalog columns. If no
privileges had been granted, the implicit privileges were not shown.
To fix that, add an SQL-accessible version of the acldefault()
function, and use that inside the aclexplode() calls to substitute the
catalog-specific default privilege set for null values.
Peter Eisentraut [Fri, 27 Jan 2012 19:39:38 +0000 (21:39 +0200)]
Revert unfortunate whitespace change
In e5e2fc842c418432756d8b5825ff107c6c5fc4c3, blank lines were removed
after a comment block, which now looks as though the comment refers to
the immediately following code, but it actually refers to the
preceding code. So put the blank lines back.
Peter Eisentraut [Fri, 27 Jan 2012 19:20:34 +0000 (21:20 +0200)]
Disallow ALTER DOMAIN on non-domain type everywhere
This has been the behavior already in most cases, but through
omission, ALTER DOMAIN / OWNER TO and ALTER DOMAIN / SET SCHEMA would
silently work on non-domain types as well.
Peter Eisentraut [Fri, 27 Jan 2012 18:16:17 +0000 (20:16 +0200)]
Hide most variable-length fields from Form_pg_* structs
Those fields only appear in the structs so that genbki.pl can create
the BKI bootstrap files for the catalogs. But they are not actually
usable from C. So hiding them can prevent coding mistakes, saves
stack space, and can help the compiler.
In certain catalogs, the first variable-length field has been kept
visible after manual inspection. These exceptions are noted in C
comments.
Peter Eisentraut [Fri, 27 Jan 2012 18:08:34 +0000 (20:08 +0200)]
Do not access indclass through Form_pg_index
Normally, accessing variable-length members of catalog structures past
the first one doesn't work at all. Here, it happened to work because
indnatts was checked to be 1, and so the defined FormData_pg_index
layout, using int2vector[1] and oidvector[1] for variable-length
arrays, happened to match the actual memory layout. But it's a very
fragile assumption, and it's not in a performance-critical path, so
code it properly using heap_getattr() instead.
Robert Haas [Thu, 26 Jan 2012 19:43:28 +0000 (14:43 -0500)]
Adjust tuplesort.c based on the fact that we never use the OS's qsort().
Our own qsort_arg() implementation doesn't have the defect previously
observed to affect only QNX 4, so it seems sufficiently to assert that
it isn't broken rather than retesting. Also, update a few comments to
clarify why it's valuable to retain a tie-break rule based on CTID
during index builds.
Robert Haas [Thu, 26 Jan 2012 17:44:30 +0000 (12:44 -0500)]
Be more clear when a new column name collides with a system column name.
We now use the same error message for ALTER TABLE .. ADD COLUMN or
ALTER TABLE .. RENAME COLUMN that we do for CREATE TABLE. The old
message was accurate, but might be confusing to users not aware of our
system columns.
Vik Reykja, with some changes by me, and further proofreading by Tom Lane
Make bgwriter sleep longer when it has no work to do, to save electricity.
To make it wake up promptly when activity starts again, backends nudge it
by setting a latch in MarkBufferDirty(). The latch is kept set while
bgwriter is active, so there is very little overhead from that when the
system is busy. It is only armed before going into longer sleep.
Robert Haas [Thu, 26 Jan 2012 13:21:31 +0000 (08:21 -0500)]
Damage control for yesterday's CheckIndexCompatible changes.
Rip out a regression test that doesn't play well with settings put in
place by the build farm, and rewrite the code in CheckIndexCompatible
in a hopefully more transparent style.
Alvaro Herrera [Wed, 25 Jan 2012 21:06:00 +0000 (18:06 -0300)]
Have \copy go through SendQuery
This enables a bunch of features, notably ON_ERROR_ROLLBACK. It also
makes COPY failure (either in the server or psql) as a whole behave more
sanely in psql.
Additionally, having more commands in the same command line as COPY
works better (though since psql splits lines at semicolons, this doesn't
matter much unless you're using -c).
Also tighten a couple of switches on PQresultStatus() to add
PGRES_COPY_BOTH support and stop assuming that unknown statuses received
are errors; have those print diagnostics where warranted.
Robert Haas [Wed, 25 Jan 2012 20:28:07 +0000 (15:28 -0500)]
Make CheckIndexCompatible simpler and more bullet-proof.
This gives up the "don't rewrite the index" behavior in a couple of
relatively unimportant cases, such as changing between an array type
and an unconstrained domain over that array type, in return for
making this code more future-proof.
Simon Riggs [Wed, 25 Jan 2012 18:02:04 +0000 (18:02 +0000)]
Allow pg_basebackup from standby node with safety checking.
Base backup follows recommended procedure, plus goes to great
lengths to ensure that partial page writes are avoided.
Jun Ishizuka and Fujii Masao, with minor modifications
Bruce Momjian [Wed, 25 Jan 2012 14:35:17 +0000 (09:35 -0500)]
Now that the shared library name can be adjusted in the library test,
have pg_upgrade allocate a maximum fixed size buffer for testing the
library file name, rather than base the allocation on the library name.
Simon Riggs [Tue, 24 Jan 2012 20:22:37 +0000 (20:22 +0000)]
Add new replication mode synchronous_commit = 'write'.
Replication occurs only to memory on standby, not to disk,
so provides additional performance if user wishes to
reduce durability level slightly. Adds concept of multiple
independent sync rep queues.
Robert Haas [Tue, 24 Jan 2012 13:38:20 +0000 (08:38 -0500)]
Adjustments to regression tests for security_barrier views.
Drop the role we create, so regression tests pass even when run more
than once against the same cluster, a problem noted by Tom Lane and
Jeff Janes. Also, rename the temporary role so that it starts with
"regress_", to make it unlikely that we'll collide with an existing
role name while running "make installcheck", per further gripe from
Tom Lane.
Simon Riggs [Mon, 23 Jan 2012 23:37:32 +0000 (23:37 +0000)]
Resolve timing issue with logging locks for Hot Standby.
We log AccessExclusiveLocks for replay onto standby nodes,
but because of timing issues on ProcArray it is possible to
log a lock that is still held by a just committed transaction
that is very soon to be removed. To avoid any timing issue we
avoid applying locks made by transactions with InvalidXid.
Simon Riggs, bug report Tom Lane, diagnosis Pavan Deolasee
Magnus Hagander [Thu, 19 Jan 2012 13:19:20 +0000 (14:19 +0100)]
Separate state from query string in pg_stat_activity
This separates the state (running/idle/idleintransaction etc) into
it's own field ("state"), and leaves the query field containing just
query text.
The query text will now mean "current query" when a query is running
and "last query" in other states. Accordingly,the field has been
renamed from current_query to query.
Since backwards compatibility was broken anyway to make that, the procpid
field has also been renamed to pid - along with the same field in
pg_stat_replication for consistency.
Scott Mead and Magnus Hagander, review work from Greg Smith
Fix corner case in cleanup of transactions using SSI.
When the only remaining active transactions are READ ONLY, we do a "partial
cleanup" of committed transactions because certain types of conflicts
aren't possible anymore. For committed r/w transactions, we release the
SIREAD locks but keep the SERIALIZABLEXACT. However, for committed r/o
transactions, we can go further and release the SERIALIZABLEXACT too. The
problem was with the latter case: we were returning the SERIALIZABLEXACT to
the free list without removing it from the finished list.
The only real change in the patch is the SHMQueueDelete line, but I also
reworked some of the surrounding code to make it obvious that r/o and r/w
transactions are handled differently -- the existing code felt a bit too
clever.
Alvaro Herrera [Mon, 16 Jan 2012 22:19:42 +0000 (19:19 -0300)]
Disallow merging ONLY constraints in children tables
When creating a child table, or when attaching an existing table as
child of another, we must not allow inheritable constraints to be
merged with non-inheritable ones, because then grandchildren would not
properly get the constraint. This would violate the grandparent's
expectations.
Robert Haas [Mon, 16 Jan 2012 14:34:21 +0000 (09:34 -0500)]
Prevent adding relations to a concurrently dropped schema.
In the previous coding, it was possible for a relation to be created
via CREATE TABLE, CREATE VIEW, CREATE SEQUENCE, CREATE FOREIGN TABLE,
etc. in a schema while that schema was meanwhile being concurrently
dropped. This led to a pg_class entry with an invalid relnamespace
value. The same problem could occur if a relation was moved using
ALTER .. SET SCHEMA while the target schema was being concurrently
dropped. This patch prevents both of those scenarios by locking the
schema to which the relation is being added using AccessShareLock,
which conflicts with the AccessExclusiveLock taken by DROP.
As a desirable side effect, this also prevents the use of CREATE OR
REPLACE VIEW to queue for an AccessExclusiveLock on a relation on which
you have no rights: that will now fail immediately with a permissions
error, before trying to obtain a lock.
We need similar protection for all other object types, but as everything
other than relations uses a slightly different set of code paths, I'm
leaving that for a separate commit.
Original complaint (as far as I could find) about CREATE by Nikhil
Sontakke; risk for ALTER .. SET SCHEMA pointed out by Tom Lane;
further details by Dan Farina; patch by me; review by Hitoshi Harada.
Magnus Hagander [Sun, 15 Jan 2012 14:34:40 +0000 (15:34 +0100)]
Allow a user to kill his own queries using pg_cancel_backend()
Allows a user to use pg_cancel_queries() to cancel queries in
other backends if they are running under the same role.
pg_terminate_backend() still requires superuser permissoins.
Short patch, many authors working on the bikeshed: Magnus Hagander,
Josh Kupershmidt, Edward Muller, Greg Smith.
Alvaro Herrera [Sat, 14 Jan 2012 22:36:39 +0000 (19:36 -0300)]
Detect invalid permutations in isolationtester
isolationtester is now able to continue running other permutations when
it detects that one of them is invalid, which is useful during initial
development of spec files.
Make superuser imply replication privilege. The idea of a privilege that
superuser doesn't have doesn't make much sense, as a superuser can do
whatever he wants through other means, anyway. So instead of granting
replication privilege to superusers in CREATE USER time by default, allow
replication connection from superusers whether or not they have the
replication privilege.
Patch by Noah Misch, per discussion on bug report #6264
Robert Haas [Fri, 13 Jan 2012 13:22:31 +0000 (08:22 -0500)]
Fix broken logic in lazy_vacuum_heap.
As noted by Tom Lane, the previous coding in this area, which I
introduced in commit bbb6e559c4ea0fb4c346beda76736451dc24eb4e, was
poorly tested and caused the vacuum's second heap to go into what would
have been an infinite loop but for the fact that it eventually caused a
memory allocation failure. This version seems to work better.
Simon Riggs [Fri, 13 Jan 2012 13:02:44 +0000 (13:02 +0000)]
Correctly initialise shared recoveryLastRecPtr in recovery.
Previously we used ReadRecPtr rather than EndRecPtr, which was
not a serious error but caused pg_stat_replication to report
incorrect replay_location until at least one WAL record is replayed.
Tom Lane [Thu, 12 Jan 2012 21:40:14 +0000 (16:40 -0500)]
Fix CLUSTER/VACUUM FULL for toast values owned by recently-updated rows.
In commit 7b0d0e9356963d5c3e4d329a917f5fbb82a2ef05, I made CLUSTER and
VACUUM FULL try to preserve toast value OIDs from the original toast table
to the new one. However, if we have to copy both live and recently-dead
versions of a row that has a toasted column, those versions may well
reference the same toast value with the same OID. The patch then led to
duplicate-key failures as we tried to insert the toast value twice with the
same OID. (The previous behavior was not very desirable either, since it
would have silently inserted the same value twice with different OIDs.
That wastes space, but what's worse is that the toast values inserted for
already-dead heap rows would not be reclaimed by subsequent ordinary
VACUUMs, since they go into the new toast table marked live not deleted.)
To fix, check if the copied OID already exists in the new toast table, and
if so, assume that it stores the desired value. This is reasonably safe
since the only case where we will copy an OID from a previous toast pointer
is when toast_insert_or_update was given that toast pointer and so we just
pulled the data from the old table; if we got two different values that way
then we have big problems anyway. We do have to assume that no other
backend is inserting items into the new toast table concurrently, but
that's surely safe for CLUSTER and VACUUM FULL.
Per bug #6393 from Maxim Boguk. Back-patch to 9.0, same as the previous
patch.
Tom Lane [Thu, 12 Jan 2012 19:18:08 +0000 (14:18 -0500)]
Tweak duplicate-index-column regression test to avoid locale sensitivity.
The originally-chosen test case gives different results in es_EC locale
because of unusual rule for sorting strings beginning with "LL". Adjust
the comparison value to avoid that, while hopefully not introducing new
locale dependencies elsewhere. Per report from Jaime Casanova.
Alvaro Herrera [Wed, 11 Jan 2012 21:46:18 +0000 (18:46 -0300)]
Validate number of steps specified in permutation
A permutation that specifies more steps than defined causes
isolationtester to crash, so avoid that. Using less steps than defined
should probably not be a problem, but no spec currently does that.
Refactor XLogInsert a bit. The rdata entries for backup blocks are now
constructed before acquiring WALInsertLock, which slightly reduces the time
the lock is held. Although I could not measure any benefit in benchmarks,
the code is more readable this way.
Tom Lane [Tue, 10 Jan 2012 00:56:27 +0000 (19:56 -0500)]
Fix one-byte buffer overrun in contrib/test_parser.
The original coding examined the next character before verifying that
there *is* a next character. In the worst case with the input buffer
right up against the end of memory, this would result in a segfault.
Problem spotted by Paul Guyot; this commit extends his patch to fix an
additional case. In addition, make the code a tad more readable by not
overloading the usage of *tlen.
Add compatibility note about grant options on GRANT reference page
Point out in the compatibility section that granting grant options to
PUBLIC is not supported by PostgreSQL. This is already mentioned
earlier, but since it concerns the information schema, it might be
worth pointing out explicitly as a compatibility issue.
Magnus Hagander [Mon, 9 Jan 2012 10:53:38 +0000 (11:53 +0100)]
Fix pg_basebackup for keepalive messages
Teach pg_basebackup in streaming mode to deal with keepalive messages.
Also change the order of checks to complain at the message rather than
block size when a new message is introduced.
In passing, switch to using sizeof() instead of hardcoded sizes for
WAL protocol structs.
Rename the internal structures of the CREATE TABLE (LIKE ...) facility
The original implementation of this interpreted it as a kind of
"inheritance" facility and named all the internal structures
accordingly. This turned out to be very confusing, because it has
nothing to do with the INHERITS feature. So rename all the internal
parser infrastructure, update the comments, adjust the error messages,
and split up the regression tests.
Tom Lane [Sat, 7 Jan 2012 20:38:52 +0000 (15:38 -0500)]
Use __sync_lock_test_and_set() for spinlocks on ARM, if available.
Historically we've used the SWPB instruction for TAS() on ARM, but this
is deprecated and not available on ARMv6 and later. Instead, make use
of a GCC builtin if available. We'll still fall back to SWPB if not,
so as not to break existing ports using older GCC versions.
Eventually we might want to try using __sync_lock_test_and_set() on some
other architectures too, but for now that seems to present only risk and
not reward.
Back-patch to all supported versions, since people might want to use any
of them on more recent ARM chips.