SSI has a race condition, where the order of commit sequence numbers of
transactions might not match the order the work done in those transactions
become visible to others. The logic in SSI, however, assumed that it does.
Fix that by having two sequence numbers for each serializable transaction,
one taken before a transaction becomes visible to others, and one after it.
This is easier than trying to make the the transition totally atomic, which
would require holding ProcArrayLock and SerializableXactHashLock at the same
time. By using prepareSeqNo instead of commitSeqNo in a few places where
commit sequence numbers are compared, we can make those comparisons err on
the safe side when we don't know for sure which committed first.
Per analysis by Kevin Grittner and Dan Ports, but this approach to fix it
is different from the original patch.
Robert Haas [Thu, 7 Jul 2011 19:05:21 +0000 (15:05 -0400)]
Adjust OLDSERXID_MAX_PAGE based on BLCKSZ.
The value when BLCKSZ = 8192 is unchanged, but with larger-than-normal
block sizes we might need to crank things back a bit, as we'll have
more entries per page than normal in that case.
If there's a dangerous structure T0 ---> T1 ---> T2, and T2 commits first,
we need to abort something. If T2 commits before both conflicts appear,
then it should be caught by OnConflict_CheckForSerializationFailure. If
both conflicts appear before T2 commits, it should be caught by
PreCommit_CheckForSerializationFailure. But that is actually run when
T2 *prepares*. Fix that in OnConflict_CheckForSerializationFailure, by
treating a prepared T2 as if it committed already.
This is mostly a problem for prepared transactions, which are in prepared
state for some time, but also for regular transactions because they also go
through the prepared state in the SSI code for a short moment when they're
committed.
Andrew Dunstan [Wed, 6 Jul 2011 22:45:29 +0000 (18:45 -0400)]
Reimplement pgbison and pgflex as perl scripts instead of bat files.
In the process, remove almost all knowledge of individual .y and .l files,
and instead get invocation settings from the relevant make files.
The exception is plpgsql's gram.y, which has a target with a different
name. It is hoped that this will make the scripts more future-proof,
so that they won't require adjustment every time we add a new .l or .y
file.
The logic is also notably less tortured than that forced on us
by the idiosyncrasies of the Windows command processor.
The .bat files are kept as thin wrappers for the perl scripts.
Tom Lane [Wed, 6 Jul 2011 18:53:16 +0000 (14:53 -0400)]
Remove assumptions that not-equals operators cannot be in any opclass.
get_op_btree_interpretation assumed this in order to save some duplication
of code, but it's not true in general anymore because we added <> support
to btree_gist. (We still assume it for btree opclasses, though.)
Also, essentially the same logic was baked into predtest.c. Get rid of
that duplication by generalizing get_op_btree_interpretation so that it
can be used by predtest.c.
Per bug report from Denis de Bernardy and investigation by Jeff Davis,
though I didn't use Jeff's patch exactly as-is.
Back-patch to 9.1; we do not support this usage before that.
Robert Haas [Wed, 6 Jul 2011 15:45:13 +0000 (11:45 -0400)]
Add \ir command to psql.
\ir is short for "include relative"; when used from a script, the
supplied pathname will be interpreted relative to the input file,
rather than to the current working directory.
Gurjeet Singh, reviewed by Josh Kupershmidt, with substantial further
cleanup by me.
Tom Lane [Tue, 5 Jul 2011 22:46:03 +0000 (18:46 -0400)]
Make the file_fdw validator check that a filename option has been provided.
This was already a runtime failure condition, but it's better to check
at validation time if possible. Lightly modified version of a patch
by Shigeru Hanada.
Tom Lane [Tue, 5 Jul 2011 19:54:00 +0000 (15:54 -0400)]
Restructure foreign data wrapper chapter so it has more than one section.
As noted by Laurenz Albe, our SGML tools deal rather oddly with chapters
having just one <sect1>. Perhaps the tooling could be fixed, but really
the design of this chapter's introduction is pretty bogus anyhow. Split
it into a true introduction and a <sect1> about the FDW functions, so
that it reads better and dodges the lack-of-a-chapter-TOC problem.
Tom Lane [Tue, 5 Jul 2011 16:04:40 +0000 (12:04 -0400)]
Fix psql's counting of script file line numbers during COPY.
handleCopyIn incremented pset.lineno for each line of COPY data read from
a file. This is correct when reading from the current script file (i.e.,
we are doing COPY FROM STDIN followed by in-line data), but it's wrong if
the data is coming from some other file. Per bug #6083 from Steve Haslam.
Back-patch to all supported versions.
The bug that caused this to be discovered is that the code was trying to
dereference a NULL or ill-defined pointer, as reported by Michael Mueller;
but what it was doing was wrong anyway, per Heikki.
Remove silent_mode. You get the same functionality with "pg_ctl -l
postmaster.log", or nohup.
There was a small issue with LINUX_OOM_ADJ and silent_mode, namely that with
silent_mode the postmaster process incorrectly used the OOM settings meant
for backend processes. We certainly could've fixed that directly, but since
silent_mode was redundant anyway, we might as well just remove it.
Simon Riggs [Mon, 4 Jul 2011 08:31:40 +0000 (09:31 +0100)]
Reset ALTER TABLE lock levels to AccessExclusiveLock in all cases.
Locks on inheritance parent remain at lower level, as they were before.
Remove entry from 9.1 release notes.
Tom Lane [Mon, 4 Jul 2011 02:12:14 +0000 (22:12 -0400)]
Fix omissions in documentation of the pg_roles view.
Somehow, column rolconfig got removed from the documentation of the
pg_roles view in the 9.0 cycle, although the column is actually still
there. In 9.1, we'd also forgotten to document the rolreplication column.
Spotted by Sakamoto Masahiko.
Robert Haas [Sun, 3 Jul 2011 21:34:47 +0000 (17:34 -0400)]
Fix bugs in relpersistence handling during table creation.
Unlike the relistemp field which it replaced, relpersistence must be
set correctly quite early during the table creation process, as we
rely on it quite early on for a number of purposes, including security
checks. Normally, this is set based on whether the user enters CREATE
TABLE, CREATE UNLOGGED TABLE, or CREATE TEMPORARY TABLE, but a
relation may also be made implicitly temporary by creating it in
pg_temp. This patch fixes the handling of that case, and also
disables creation of unlogged tables in temporary tablespace (such
table indeed skip WAL-logging, but we reject an explicit
specification) and creation of relations in the temporary schemas of
other sessions (which is not very sensible, and didn't work right
anyway).
Tom Lane [Sun, 3 Jul 2011 17:55:02 +0000 (13:55 -0400)]
Make distprep and *clean build targets recurse into all subdirectories.
Certain subdirectories do not get built if corresponding options are not
selected at configure time. However, "make distprep" should visit such
directories anyway, so that constructing derived files to be included in
the tarball happens without requiring all configure options to be given
in the tarball build script. Likewise, it's better if cleanup actions
unconditionally visit all directories (for example, this ensures proper
cleanup if someone has done a manual make in such a subdirectory).
To handle this, set up a convention that subdirectories that are
conditionally included in SUBDIRS should be added to ALWAYS_SUBDIRS
instead when they are excluded.
Back-patch to 9.1, so that plpython's spiexceptions.h will get provided
in 9.1 tarballs. There don't appear to be any instances where distprep
actions got missed in previous releases, and anyway this fix requires
gmake 3.80 so we don't want to apply it before 9.1.
Bruce Momjian [Fri, 1 Jul 2011 22:17:12 +0000 (18:17 -0400)]
Change pg_upgrade to use port 50432 by default to avoid unintended
client connections during the upgrade. Also rename data/bin/port
environment variables to being with 'PG'. Also no longer honor PGPORT.
Alvaro Herrera [Wed, 1 Jun 2011 22:43:50 +0000 (18:43 -0400)]
Enable CHECK constraints to be declared NOT VALID
This means that they can initially be added to a large existing table
without checking its initial contents, but new tuples must comply to
them; a separate pass invoked by ALTER TABLE / VALIDATE can verify
existing data and ensure it complies with the constraint, at which point
it is marked validated and becomes a normal part of the table ecosystem.
An non-validated CHECK constraint is ignored in the planner for
constraint_exclusion purposes; when validated, cached plans are
recomputed so that partitioning starts working right away.
This patch also enables domains to have unvalidated CHECK constraints
attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT
VALID, which can later be validated with ALTER DOMAIN / VALIDATE
CONSTRAINT.
Thanks to Thom Brown, Dean Rasheed and Jaime Casanova for the various
reviews, and Robert Hass for documentation wording improvement
suggestions.
Tom Lane [Wed, 29 Jun 2011 23:46:47 +0000 (19:46 -0400)]
Restore correct btree preprocessing of "indexedcol IS NULL" conditions.
Such a condition is unsatisfiable in combination with any other type of
btree-indexable condition (since we assume btree operators are always
strict). 8.3 and 8.4 had an explicit test for this, which I removed in
commit 29c4ad98293e3c5cb3fcdd413a3f4904efff8762, mistakenly thinking that
the case would be subsumed by the more general handling of IS (NOT) NULL
added in that patch. Put it back, and improve the comments about it, and
add a regression test case.
Per bug #6079 from Renat Nasyrov, and analysis by Dean Rasheed.
Move the PredicateLockRelation() call from nodeSeqscan.c to heapam.c. It's
more consistent that way, since all the other PredicateLock* calls are
made in various heapam.c and index AM functions. The call in nodeSeqscan.c
was unnecessarily aggressive anyway, there's no need to try to lock the
relation every time a tuple is fetched, it's enough to do it once.
This has the user-visible effect that if a seq scan is initialized in the
executor, but never executed, we now acquire the predicate lock on the heap
relation anyway. We could avoid that by taking the lock on the first
heap_getnext() call instead, but it doesn't seem worth the trouble given
that it feels more natural to do it in heap_beginscan().
Also, remove the retail PredicateLockTuple() calls from heap_getnext(). In
a seqscan, started with heap_begin(), we're holding a whole-relation
predicate lock on the heap so there's no need to lock the tuples
individually.
Simon Riggs [Tue, 28 Jun 2011 21:58:17 +0000 (22:58 +0100)]
Introduce compact WAL record for the common case of commit (non-DDL).
XLOG_XACT_COMMIT_COMPACT leaves out invalidation messages and relfilenodes,
saving considerable space for the vast majority of transaction commits.
XLOG_XACT_COMMIT keeps same definition as XLOG_PAGE_MAGIC 0xD067 and earlier.
Alvaro Herrera [Mon, 20 Jun 2011 21:20:14 +0000 (17:20 -0400)]
Modernise pg_hba.conf token processing
The previous coding was ugly, as it marked special tokens as such in the
wrong stage, relying on workarounds to figure out if they had been
quoted in the original or not. This made it impossible to have specific
keywords be recognized as such only in certain positions in HBA lines,
for example. Fix by restructuring the parser code so that it remembers
whether tokens were quoted or not. This eliminates widespread knowledge
of possible known keywords for all fields.
Also improve memory management in this area, to use memory contexts that
are reset as a whole instead of using retail pfrees; this removes a
whole lotta crufty (and probably slow) code.
Instead of calling strlen() three times in next_field_expand on the
returned token to find out whether there was a comma (and strip it),
pass back the info directly from the callee, which is simpler.
In passing, update historical artifacts in hba.c API.
Authors: Brendan Jurd, Alvaro Herrera
Reviewed by Pavel Stehule
Peter Eisentraut [Tue, 28 Jun 2011 14:49:28 +0000 (17:49 +0300)]
Implement the collation columns of various information schema views
Fill in the collation columns of the views attributes, columns,
domains, and element_types. Also update collation information in
sql_implementation_info.
Simon Riggs [Mon, 27 Jun 2011 21:12:09 +0000 (22:12 +0100)]
Reduce impact of btree page reuse on Hot Standby by fixing off-by-1 error.
WAL records of type XLOG_BTREE_REUSE_PAGE were generated using a
latestRemovedXid one higher than actually needed because xid used was
page opaque->btpo.xact rather than an actually removed xid.
Noticed on an otherwise quiet system by Noah Misch.
Robert Haas [Mon, 27 Jun 2011 19:06:32 +0000 (15:06 -0400)]
Allow callers to pass a missing_ok flag when opening a relation.
Since the names try_relation_openrv() and try_heap_openrv() don't seem
quite appropriate, rename the functions to relation_openrv_extended()
and heap_openrv_extended(). This is also more general, if we have a
future need for additional parameters that are of interest to only a
few callers.
This is infrastructure for a forthcoming patch to allow
get_object_address() to take a missing_ok argument as well.
Robert Haas [Mon, 27 Jun 2011 14:27:17 +0000 (10:27 -0400)]
Avoid having two copies of the HOT-chain search logic.
It's been like this since HOT was originally introduced, but the logic
is complex enough that this is a recipe for bugs, as we've already
found out with SSI. So refactor heap_hot_search_buffer() so that it
can satisfy the needs of index_getnext(), and make index_getnext() use
that rather than duplicating the logic.
This change was originally proposed by Heikki Linnakangas as part of a
larger refactoring oriented towards allowing index-only scans. I
extracted and adjusted this part, since it seems to have independent
merit. Review by Jeff Davis.
Peter Eisentraut [Mon, 27 Jun 2011 12:40:55 +0000 (15:40 +0300)]
Remove redundant DEF_PGPORT handling
DEF_PGPORT already comes in from pg_config.h, so we don't need to pass
it in again with a -D option. Apparently a leftover from the shell
script conversion.
Peter Eisentraut [Sun, 26 Jun 2011 21:13:10 +0000 (00:13 +0300)]
Add the possibility to pass --flag arguments to xgettext calls
The --flag argument can be used to tell xgettext the arguments of
which functions should be flagged with c-format in the PO files,
instead of guessing based on the presence of format specifiers, which
fails if no format specifiers are present but the translation
accidentally introduces one.
Appropriate flag settings have been added for each message catalog.
Peter Eisentraut [Sun, 26 Jun 2011 20:50:21 +0000 (23:50 +0300)]
Refactor common gettext triggers
Put gettext trigger words that are common to the backend and backend
modules into a makefile variable to include everywhere, to avoid
error-prone repetitions.
Peter Eisentraut [Sun, 26 Jun 2011 17:08:38 +0000 (20:08 +0300)]
Replace := by = in nls.mk files
It currently doesn't make a difference, but it's inconsistent with
most other usage, and it might interfere with a future patch, so I'll
change it all in a separate commit.
Joe Conway [Sat, 25 Jun 2011 22:58:07 +0000 (15:58 -0700)]
Async dblink functions require a named connection, and therefore should
use DBLINK_GET_NAMED_CONN rather than DBLINK_GET_CONN.
Problem found by Peter Eisentraut and patch by Fujii Masao.
Tom Lane [Thu, 23 Jun 2011 03:03:11 +0000 (23:03 -0400)]
Undo overly enthusiastic de-const-ification.
s/const//g wasn't exactly what I was suggesting here ... parameter
declarations of the form "const structtype *param" are good and useful,
so put those occurrences back. Likewise, avoid casting away the const
in a "const void *" parameter.
Tom Lane [Wed, 22 Jun 2011 23:21:29 +0000 (19:21 -0400)]
Revert "Don't select log_cnt in sequence regression tests."
This reverts commit addf11f9a264417aa467d4e135b9a8afc59f172a.
The right fix for the problem is to update the alternative expected
file, not to lobotomize the test case.
Tom Lane [Wed, 22 Jun 2011 17:08:08 +0000 (13:08 -0400)]
Fix symlink for errcodes.h so it works in VPATH builds from tarballs.
backend/Makefile was treating errcodes.h as a header always generated
during build, but actually it's a header provided in tarballs. Hence,
must use the absolute-symlink recipe, not the relative-symlink one.
Per bug #6072 from Hartmut Raschick.
Remove pointless const qualifiers from function arguments in the SSI code.
As Tom Lane pointed out, "const Relation foo" doesn't guarantee that you
can't modify the data the "foo" pointer points to. It just means that you
can't change the pointer to point to something else within the function,
which is not very useful.
Robert Haas [Wed, 22 Jun 2011 03:04:40 +0000 (23:04 -0400)]
Make the visibility map crash-safe.
This involves two main changes from the previous behavior. First,
when we set a bit in the visibility map, emit a new WAL record of type
XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and
the visibility map bit. Second, when inserting, updating, or deleting
a tuple, we can no longer get away with clearing the visibility map
bit after releasing the lock on the corresponding heap page, because
an intervening crash might leave the visibility map bit set and the
page-level bit clear. Making this work requires a bit of interface
refactoring.
In passing, a few minor but related cleanups: change the test in
visibilitymap_set and visibilitymap_clear to throw an error if the
wrong page (or no page) is pinned, rather than silently doing nothing;
this case should never occur. Also, remove duplicate definitions of
InvalidXLogRecPtr.
Robert Haas [Wed, 22 Jun 2011 02:32:30 +0000 (22:32 -0400)]
Make deadlock_timeout PGC_SUSET rather than PGC_SIGHUP.
This allows deadlock_timeout to be reduced for transactions that are
particularly likely to be involved in a deadlock, thus detecting it
more quickly. It is also potentially useful as a poor-man's deadlock
priority mechanism: a transaction with a high deadlock_timeout is less
likely to be chosen as the victim than one with a low
deadlock_timeout. Since that could be used to game the system, we
make this PGC_SUSET rather than PGC_USERSET.
At some point, it might be worth thinking about a more explicit
priority mechanism, since using this is far from fool-proof. But
let's see whether there's enough use case to justify the additional
work before we go down that route.
Robert Haas [Wed, 22 Jun 2011 02:15:24 +0000 (22:15 -0400)]
Add notion of a "transform function" that can simplify function calls.
Initially, we use this only to eliminate calls to the varchar()
function in cases where the length is not being reduced and, therefore,
the function call is equivalent to a RelabelType operation. The most
significant effect of this is that we can avoid a table rewrite when
changing a varchar(X) column to a varchar(Y) column, where Y > X.
Tom Lane [Tue, 21 Jun 2011 18:41:05 +0000 (14:41 -0400)]
Apply upstream fix for blowfish signed-character bug (CVE-2011-2483).
A password containing a character with the high bit set was misprocessed
on machines where char is signed (which is most). This could cause the
preceding one to three characters to fail to affect the hashed result,
thus weakening the password. The result was also unportable, and failed
to match some other blowfish implementations such as OpenBSD's.
Since the fix changes the output for such passwords, upstream chose
to provide a compatibility hack: password salts beginning with $2x$
(instead of the usual $2a$ for blowfish) are intentionally processed
"wrong" to give the same hash as before. Stored password hashes can
thus be modified if necessary to still match, though it'd be better
to change any affected passwords.
In passing, sync a couple other upstream changes that marginally improve
performance and/or tighten error checking.
Back-patch to all supported branches. Since this issue is already
public, no reason not to commit the fix ASAP.
Adjust the alternative expected output file for prepared_xacts test case,
used when max_prepared_transactions=0, for the recent changes in the test
case.
Fix bug in PreCommit_CheckForSerializationFailure. A transaction that has
already been marked as PREPARED cannot be killed. Kill the current
transaction instead.
One of the prepared_xacts regression tests actually hits this bug. I
removed the anomaly from the duplicate-gids test so that it fails in the
intended way, and added a new test to check serialization failures with
a prepared transaction.
Fix bug introduced by recent SSI patch to merge ROLLED_BACK and
MARKED_FOR_DEATH flags into one. We still need the ROLLED_BACK flag to
mark transactions that are in the process of being rolled back. To be
precise, ROLLED_BACK now means that a transaction has already been
discounted from the count of transactions with the oldest xmin, but not
yet removed from the list of active transactions.
Tom Lane [Mon, 20 Jun 2011 18:33:20 +0000 (14:33 -0400)]
Fix thinko in previous patch for optimizing EXISTS-within-EXISTS.
When recursing after an optimization in pull_up_sublinks_qual_recurse, the
available_rels value passed down must include only the relations that are
in the righthand side of the new SEMI or ANTI join; it's incorrect to pull
up a sub-select that refers to other relations, as seen in the added test
case. Per report from BangarRaju Vadapalli.
While at it, rethink the idea of recursing below a NOT EXISTS. That is
essentially the same situation as pulling up ANY/EXISTS sub-selects that
are in the ON clause of an outer join, and it has the same disadvantage:
we'd force the two joins to be evaluated according to the syntactic nesting
order, because the lower join will most likely not be able to commute with
the ANTI join. That could result in having to form a rather large join
product, whereas the handling of a correlated subselect is not quite that
dumb. So until we can handle those cases better, #ifdef NOT_USED that
case. (I think it's okay to pull up in the EXISTS/ANY cases, because SEMI
joins aren't so inflexible about ordering.)
Back-patch to 8.4, same as for previous patch in this area. Fortunately
that patch hadn't made it into any shipped releases yet.
Alvaro Herrera [Fri, 17 Jun 2011 13:43:32 +0000 (09:43 -0400)]
Remove extra copying of TupleDescs for heap_create_with_catalog
Some callers were creating copies of tuple descriptors to pass to that
function, stating in code comments that it was necessary because it
modified the passed descriptor. Code inspection reveals this not to be
true, and indeed not all callers are passing copies in the first place.
So remove the extra ones and the misleading comments about this behavior
as well.
Peter Eisentraut [Sun, 19 Jun 2011 20:27:56 +0000 (23:27 +0300)]
Produce HISTORY file consistently as ASCII
The release notes may contain non-ASCII characters (for contributor
names), which lynx converts to the encoding determined by the current
locale. The get output that is deterministic and easily readable by
everyone, we make lynx produce LATIN1 and then convert that to ASCII
with transliteration for the non-ASCII characters.