Tom Lane [Mon, 17 Jan 2011 17:38:52 +0000 (12:38 -0500)]
Fix miscalculation of itemsafter in array_set_slice().
If the slice to be assigned to was before the existing array lower bound
(requiring at least one null element to spring into existence to fill the
gap), the code miscalculated how many entries needed to be copied from
the old array's null bitmap. This could result in trashing the array's
data area (as seen in bug #5840 from Karsten Loesing), or worse.
This has been broken since we first allowed the behavior of assigning to
non-adjacent slices, in 8.2. Back-patch to all affected versions.
Before exiting walreceiver, fsync() all the WAL received.
Otherwise WAL recovery will replay the un-flushed WAL after walreceiver has
exited, which can lead to a non-recoverable standby if the system crashes hard
at that point.
Magnus Hagander [Sat, 15 Jan 2011 18:18:14 +0000 (19:18 +0100)]
Enumerate available tablespaces after starting the backup
This closes a race condition where if a tablespace was created
after the enumeration happened but before the do_pg_start_backup()
was called, the backup would be incomplete. Now that it's done
while we are in backup mode, WAL replay will recreate it during
restore.
Treat a WAL sender process that hasn't started streaming yet as a regular
backend, as far as the postmaster shutdown logic is concerned. That means,
fast shutdown will wait for WAL sender processes to exit before signaling
bgwriter to finish. This avoids race conditions between a base backup stopping
or starting, and bgwriter writing the shutdown checkpoint WAL record. We don't
want e.g the end-of-backup WAL record to be written after the shutdown
checkpoint.
Magnus Hagander [Fri, 14 Jan 2011 15:30:33 +0000 (16:30 +0100)]
Use a lexer and grammar for parsing walsender commands
Makes it easier to parse mainly the BASE_BACKUP command
with it's options, and avoids having to manually deal
with quoted identifiers in the label (previously broken),
and makes it easier to add new commands and options in
the future.
In passing, refactor the case statement in the walsender
to put each command in it's own function.
Tom Lane [Fri, 14 Jan 2011 00:01:28 +0000 (19:01 -0500)]
Code review for postmaster.pid contents changes.
Fix broken test for pre-existing postmaster, caused by wrong code for
appending lines to the lockfile; don't write a failed listen_address
setting into the lockfile; don't arbitrarily change the location of the
data directory in the lockfile compared to previous releases; provide more
consistent and useful definitions of the socket path and listen_address
entries; avoid assuming that pg_ctl has the same DEFAULT_PGSOCKET_DIR as
the postmaster; assorted code style improvements.
Tom Lane [Thu, 13 Jan 2011 19:33:19 +0000 (14:33 -0500)]
Revert incorrect memory-conservation hack in inheritance_planner().
This reverts commit d1001a78ce612a16ea622b558f5fc2b68c45ab4c of 2010-12-05,
which was broken as reported by Jeff Davis. The problem is that the
individual planning steps may have side-effects on substructures of
PlannerGlobal, not only the current PlannerInfo root. Arranging to keep
all such side effects in the main planning context is probably possible,
but it would change this from a quick local hack into a wide-ranging and
rather fragile endeavor. Which it's not worth.
Fix the logic in libpqrcv_receive() to determine if there's any incoming data
that can be read without blocking. It used to conclude that there isn't, even
though there was data in the socket receive buffer. That lead walreceiver to
flush the WAL after every received chunk, potentially causing big performance
issues.
Backpatch to 9.0, because the performance impact can be very significant.
Peter Eisentraut [Thu, 13 Jan 2011 07:32:06 +0000 (09:32 +0200)]
Workaround for recursive make breakage
Changing a file two directory levels deep under src/backend/ would not
cause the postgres binary to be rebuilt. This change fixes it, but no
one knows why.
Tom Lane [Thu, 13 Jan 2011 01:47:02 +0000 (20:47 -0500)]
Fix PlanRowMark/ExecRowMark structures to handle inheritance correctly.
In an inherited UPDATE/DELETE, each target table has its own subplan,
because it might have a column set different from other targets. This
means that the resjunk columns we add to support EvalPlanQual might be
at different physical column numbers in each subplan. The EvalPlanQual
rewrite I did for 9.0 failed to account for this, resulting in possible
misbehavior or even crashes during concurrent updates to the same row,
as seen in a recent report from Gordon Shannon. Revise the data structure
so that we track resjunk column numbers separately for each subplan.
I also chose to move responsibility for identifying the physical column
numbers back to executor startup, instead of assuming that numbers derived
during preprocess_targetlist would stay valid throughout subsequent
massaging of the plan. That's a bit slower, so we might want to consider
undoing it someday; but it would complicate the patch considerably and
didn't seem justifiable in a bug fix that has to be back-patched to 9.0.
Tom Lane [Tue, 11 Jan 2011 18:41:13 +0000 (13:41 -0500)]
Adjust basebackup.c to suppress compiler warnings.
Some versions of gcc complain about "variable `tablespaces' might be
clobbered by `longjmp' or `vfork'" with the original coding. Fix by
moving the PG_TRY block into a separate subroutine.
Tom Lane [Tue, 11 Jan 2011 17:12:04 +0000 (12:12 -0500)]
Tweak create_index_paths()'s test for whether to consider a bitmap scan.
Per my note of a couple days ago, create_index_paths would refuse to
consider any path at all for GIN indexes if the selectivity estimate came
out as 1.0; not even if you tried to force it with enable_seqscan. While
this isn't really a bad outcome in practice, it could be annoying for
testing purposes. Adjust the test for "is this path only useful for
sorting" so that it doesn't fire on paths with nil pathkeys, which will
include all GIN paths.
Magnus Hagander [Tue, 11 Jan 2011 09:04:54 +0000 (10:04 +0100)]
Reset walsender ps title in the main loop
When in streaming mode we can never get out, so it will never
be required, but after a base backup (or other operations)
we can get back to the loop, so the title needs to be cleared.
Magnus Hagander [Mon, 10 Jan 2011 13:03:55 +0000 (14:03 +0100)]
Backend support for streaming base backups
Add BASE_BACKUP command to walsender, allowing it to stream a
base backup to the client (in tar format). The syntax is still
far from ideal, that will be fixed in the switch to use a proper
grammar for walsender.
No client included yet, will come as a separate commit.
Tom Lane [Sun, 9 Jan 2011 21:43:56 +0000 (16:43 -0500)]
Update contrib/hstore for new GIN extractQuery API.
In particular, make hstore @> '' succeed for all hstores, likewise
hstore ?& '{}'. Previously the results were inconsistent and could
depend on whether you were using a GiST index, GIN index, or seqscan.
Magnus Hagander [Sun, 9 Jan 2011 20:00:28 +0000 (21:00 +0100)]
Split pg_start_backup() and pg_stop_backup() into two pieces
Move the actual functionality into a separate function that's
easier to call internally, and change the SQL-callable function
to be a wrapper calling this.
Also create a pg_abort_backup() function, only callable internally,
that does only the most vital parts of pg_stop_backup(), making it
safe(r) to call from error handlers.
Tom Lane [Sun, 9 Jan 2011 18:09:07 +0000 (13:09 -0500)]
Use array_contains_nulls instead of ARR_HASNULL on user-supplied arrays.
This applies the fix for bug #5784 to remaining places where we wish
to reject nulls in user-supplied arrays. In all these places, there's
no reason not to allow a null bitmap to be present, so long as none of
the current elements are actually null.
I did not change some other places where we are looking at system catalog
entries or aggregate transition values, as the presence of a null bitmap
in such an array would be suspicious.
Magnus Hagander [Sun, 9 Jan 2011 14:06:55 +0000 (15:06 +0100)]
Add pgreadlink() on Windows to read junction points
Add support for reading back information about the symbolic
links we've created with pgsymlink(), which are actually
Junction Points. Just like pgsymlink() can only create directory
symlinks, pgreadlink() can only read directory symlinks.
Tom Lane [Sun, 9 Jan 2011 05:39:21 +0000 (00:39 -0500)]
Fix assorted corner-case bugs in contrib/intarray.
The array containment operators now behave per mathematical expectation
for empty arrays (ie, an empty array is contained in anything).
Both these operators and the query_int operators now work as expected in
GiST and GIN index searches, rather than having corner cases where the
index searches gave different answers.
Also, fix unexpected failures where the operators would claim that an array
contained nulls, when in fact there was no longer any null present (similar
to bug #5784). The restriction to not have nulls is still there, as
removing it would take a lot of added code complexity and probably slow
things down significantly.
Also, remove the arbitrary restriction to 1-D arrays; unlike the other
restriction, this was buying us nothing performance-wise.
Assorted cosmetic improvements and marginal performance improvements, too.
Tom Lane [Sun, 9 Jan 2011 01:24:08 +0000 (20:24 -0500)]
Add array_contains_nulls() function in arrayfuncs.c.
This will support fixing contrib/intarray (and probably other places)
so that they don't have to fail on arrays that contain a null bitmap
but no live null entries.
Tom Lane [Sun, 9 Jan 2011 01:13:33 +0000 (20:13 -0500)]
Fix up gincostestimate for new extractQuery API.
The only reason this wasn't crashing while testing the core anyarray
operators was that it was disabled for those cases because of passing the
wrong type information to get_opfamily_proc :-(. So fix that too, and make
it insist on finding the support proc --- in hindsight, silently doing
nothing is not as sane a coping mechanism as all that.
Tom Lane [Sat, 8 Jan 2011 21:08:05 +0000 (16:08 -0500)]
Remove pg_am.amindexnulls.
The only use we have had for amindexnulls is in determining whether an
index is safe to cluster on; but since the addition of the amclusterable
flag, that usage is pretty redundant.
In passing, clean up assorted sloppiness from the last patch that touched
pg_am.h: Natts_pg_am was wrong, and ambuildempty was not documented.
Tom Lane [Sat, 8 Jan 2011 19:47:13 +0000 (14:47 -0500)]
Refactor GIN's handling of duplicate search entries.
The original coding could combine duplicate entries only when they
originated from the same qual condition. In particular it could not
combine cases where multiple qual conditions all give rise to full-index
scan requests, which is an expensive case well worth optimizing. Refactor
so that duplicates are recognized across all the quals.
Bruce Momjian [Sat, 8 Jan 2011 18:44:44 +0000 (13:44 -0500)]
In pg_upgrade, remove functions that did sequential array scans looking
up relations, but rather order old/new relations and use the same array
index value for both. This should speed up pg_upgrade for databases
with many relations.
Bruce Momjian [Sat, 8 Jan 2011 13:01:12 +0000 (08:01 -0500)]
In pg_upgrade, remove unnecessary separate handling of toast tables now
that we restore by oid; they can be handled like regular tables when
creating the file mapping structure.
Bruce Momjian [Sat, 8 Jan 2011 02:59:29 +0000 (21:59 -0500)]
Fix pg_upgrade of large object permissions by preserving pg_auth.oid,
which is stored in pg_largeobject_metadata.
No backpatch to 9.0 because you can't migrate from 9.0 to 9.0 with the
same catversion (because of tablespace conflict), and a pre-9.0
migration to 9.0 has not large object permissions to migrate.
Bruce Momjian [Sat, 8 Jan 2011 02:25:34 +0000 (21:25 -0500)]
Force pg_upgrade's to preserve pg_class.oid, not pg_class.relfilenode.
Toast tables have identical pg_class.oid and pg_class.relfilenode, but
for clarity it is good to preserve the pg_class.oid.
Update comments regarding what is preserved, and do some
variable/function renaming for clarity.
Tom Lane [Sat, 8 Jan 2011 01:40:48 +0000 (20:40 -0500)]
Fix the built-in GIN support procedure declarations in pg_proc.h.
Add more "internal" arguments so that these pg_proc entries reflect the
current preferred API. This is purely a cosmetic change, since GIN doesn't
actually consult the pg_proc entry when calling a support function.
Accordingly, no catversion bump.
Tom Lane [Sat, 8 Jan 2011 00:16:24 +0000 (19:16 -0500)]
Fix GIN to support null keys, empty and null items, and full index scans.
Per my recent proposal(s). Null key datums can now be returned by
extractValue and extractQuery functions, and will be stored in the index.
Also, placeholder entries are made for indexable items that are NULL or
contain no keys according to extractValue. This means that the index is
now always complete, having at least one entry for every indexed heap TID,
and so we can get rid of the prohibition on full-index scans. A full-index
scan is implemented much the same way as partial-match scans were already:
we build a bitmap representing all the TIDs found in the index, and then
drive the results off that.
Also, introduce a concept of a "search mode" that can be requested by
extractQuery when the operator requires matching to empty items (this is
just as cheap as matching to a single key) or requires a full index scan
(which is not so cheap, but it sure beats failing or giving wrong answers).
The behavior remains backward compatible for opclasses that don't return
any null keys or request a non-default search mode.
Using these features, we can now make the GIN index opclass for anyarray
behave in a way that matches the actual anyarray operators for &&, <@, @>,
and = ... which it failed to do before in assorted corner cases.
This commit fixes the core GIN code and ginarrayprocs.c, updates the
documentation, and adds some simple regression test cases for the new
behaviors using the array operators. The tsearch and contrib GIN opclass
support functions still need to be looked over and probably fixed.
Another thing I intend to fix separately is that this is pretty inefficient
for cases where more than one scan condition needs a full-index search:
we'll run duplicate GinScanEntrys, each one of which builds a large bitmap.
There is some existing logic to merge duplicate GinScanEntrys but it needs
refactoring to make it work for entries belonging to different scan keys.
Note that most of gin.h has been split out into a new file gin_private.h,
so that gin.h doesn't export anything that's not supposed to be used by GIN
opclasses or the rest of the backend. I did quite a bit of other code
beautification work as well, mostly fixing comments and choosing more
appropriate names for things.
Magnus Hagander [Wed, 5 Jan 2011 13:24:17 +0000 (14:24 +0100)]
Give superusers REPLIACTION permission by default
This can be overriden by using NOREPLICATION on the CREATE ROLE
statement, but by default they will have it, making it backwards
compatible and "less surprising" (given that superusers normally
override all checks).
Bruce Momjian [Wed, 5 Jan 2011 04:35:49 +0000 (23:35 -0500)]
In pg_upgrade, copy pg_largeobject_metadata and its index for 9.0+
servers because, like pg_largeobject, it is a system table whose
contents are not dumped by pg_dump --schema-only.
Implement remaining fields of information_schema.sequences view
Add new function pg_sequence_parameters that returns a sequence's start,
minimum, maximum, increment, and cycle values, and use that in the view.
(bug #5662; design suggestion by Tom Lane)
Also slightly adjust the view's column order and permissions after review of
SQL standard.