Magnus Hagander [Wed, 5 Jan 2011 13:24:17 +0000 (14:24 +0100)]
Give superusers REPLIACTION permission by default
This can be overriden by using NOREPLICATION on the CREATE ROLE
statement, but by default they will have it, making it backwards
compatible and "less surprising" (given that superusers normally
override all checks).
Bruce Momjian [Wed, 5 Jan 2011 04:35:49 +0000 (23:35 -0500)]
In pg_upgrade, copy pg_largeobject_metadata and its index for 9.0+
servers because, like pg_largeobject, it is a system table whose
contents are not dumped by pg_dump --schema-only.
Implement remaining fields of information_schema.sequences view
Add new function pg_sequence_parameters that returns a sequence's start,
minimum, maximum, increment, and cycle values, and use that in the view.
(bug #5662; design suggestion by Tom Lane)
Also slightly adjust the view's column order and permissions after review of
SQL standard.
Robert Haas [Sun, 2 Jan 2011 04:48:11 +0000 (23:48 -0500)]
Basic foreign table support.
Foreign tables are a core component of SQL/MED. This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried. Support for foreign table scans will need to
be added in a future patch. However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.
Bruce Momjian [Sat, 1 Jan 2011 17:06:36 +0000 (12:06 -0500)]
In pg_upgrade, remove use of whichCluster, and just pass old/new cluster
pointers, which simplifies the code. This was not possible in 9.0 because
everything was in a single nested struct, but is possible now.
Bruce Momjian [Fri, 31 Dec 2010 22:24:26 +0000 (17:24 -0500)]
Include the first valid listen address in pg_ctl to improve server start
"wait" detection and add postmaster start time to help determine if the
postmaster is actually using the specified data directory.
Tom Lane [Fri, 31 Dec 2010 01:24:55 +0000 (20:24 -0500)]
Support RIGHT and FULL OUTER JOIN in hash joins.
This is advantageous first because it allows us to hash the smaller table
regardless of the outer-join type, and second because hash join can be more
flexible than merge join in dealing with arbitrary join quals in a FULL
join. For merge join all the join quals have to be mergejoinable, but hash
join will work so long as there's at least one hashjoinable qual --- the
others can be any condition. (This is true essentially because we don't
keep per-inner-tuple match flags in merge join, while hash join can do so.)
To do this, we need a has-it-been-matched flag for each tuple in the
hashtable, not just one for the current outer tuple. The key idea that
makes this practical is that we can store the match flag in the tuple's
infomask, since there are lots of bits there that are of no interest for a
MinimalTuple. So we aren't increasing the size of the hashtable at all for
the feature.
To write this without turning the hash code into even more of a pile of
spaghetti than it already was, I rewrote ExecHashJoin in a state-machine
style, similar to ExecMergeJoin. Other than that decision, it was pretty
straightforward.
Tom Lane [Wed, 29 Dec 2010 18:43:53 +0000 (13:43 -0500)]
Improve pg_upgrade's checks for required executables.
Don't insist on pg_dumpall and psql being present in the old cluster,
since they are not needed. Do insist on pg_resetxlog being present
(in both old and new), since we need it. Also check for pg_config,
but only in the new cluster. Remove the useless attempt to call
pg_config in the old cluster; we don't need to know the old value of
--pkglibdir. (In the case of a stripped-down migration installation
there might be nothing there to look at anyway, so any future change
that might reintroduce that need would have to be considered carefully.)
Per my attempts to build a minimal previous-version installation to support
pg_upgrade.
Robert Haas [Wed, 29 Dec 2010 12:19:21 +0000 (07:19 -0500)]
Bump XLOG_PAGE_MAGIC.
The unlogged tables patch (commit 53dbc27c62d8e1b6c5253feba04a5094cb8fe046,
2010-12-29) should have done this, since it changes the format of an
XLOG_SMGR_CREATE record.
Robert Haas [Wed, 29 Dec 2010 11:48:53 +0000 (06:48 -0500)]
Support unlogged tables.
The contents of an unlogged table are WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery. Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.
Magnus Hagander [Wed, 29 Dec 2010 10:05:03 +0000 (11:05 +0100)]
Add REPLICATION privilege for ROLEs
This privilege is required to do Streaming Replication, instead of
superuser, making it possible to set up a SR slave that doesn't
have write permissions on the master.
Superuser privileges do NOT override this check, so in order to
use the default superuser account for replication it must be
explicitly granted the REPLICATION permissions. This is backwards
incompatible change, in the interest of higher default security.
Tom Lane [Wed, 29 Dec 2010 03:49:57 +0000 (22:49 -0500)]
Avoid unexpected conversion overflow in planner for distant date values.
The "date" type supports a wider range of dates than int64 timestamps do.
However, there is pre-int64-timestamp code in the planner that assumes that
all date values can be converted to timestamp with impunity. Fortunately,
what we really need out of the conversion is always a double (float8)
value; so even when the date is out of timestamp's range it's possible to
produce a sane answer. All we need is a code path that doesn't try to
force the result into int64. Per trouble report from David Rericha.
Back-patch to all supported versions. Although this is surely a corner
case, there's not much point in advertising a date range wider than
timestamp's if we will choke on such values in unexpected places.
Tom Lane [Wed, 29 Dec 2010 02:38:05 +0000 (21:38 -0500)]
Reclassify DEFAULT as a column_constraint item in the CREATE TABLE syntax.
This is how it was documented originally, but several years ago somebody
decided that DEFAULT isn't a type of constraint. Well, the grammar thinks
it is. The documentation was wrong in two ways: it alleged that DEFAULT
had to appear before any other kind of constraint, and it alleged that you
can't prefix a DEFAULT clause with a "CONSTRAINT name" clause, when in fact
you can. (The latter behavior probably isn't SQL-standard, but our grammar
has always allowed it.)
This patch responds to Fujii Masao's observation that the ALTER TABLE
documentation mistakenly implied that you couldn't include DEFAULT in
ALTER TABLE ADD COLUMN; though this isn't the way he proposed fixing it.
Bruce Momjian [Tue, 28 Dec 2010 04:11:33 +0000 (23:11 -0500)]
Fix code to properly pull out shared memory key now that the
postmaster.pid file is larger than in previous major versions.
This is a bug introduced when I added lines to the file recently.
Tom Lane [Mon, 27 Dec 2010 19:57:41 +0000 (14:57 -0500)]
Rename the C functions bitand(), bitor() to bit_and(), bit_or().
This is to avoid use of the C++ keywords "bitand" and "bitor" in
the header file utils/varbit.h. Note the functions' SQL-level
names are not changed, only their C-level names.
In passing, make some comments in varbit.c conform to project-standard
layout.
Tom Lane [Mon, 27 Dec 2010 17:51:44 +0000 (12:51 -0500)]
Rearrange cpluspluscheck to check just one .h file at a time.
This is slower than the original coding but avoids the problem of
including files in an unpredictable order. Aside from being more
trustworthy, we can get rid of some exclusions that were formerly
made for what turn out to be ordering or re-inclusion problems.
I also modified it to include libpq's exported files in the check.
ecpg should be included as well, but I'm unclear on which ecpg .h
files are meant to be included by clients.
Tom Lane [Mon, 27 Dec 2010 16:26:19 +0000 (11:26 -0500)]
Fix ill-chosen use of "private" as an argument and struct field name.
"private" is a keyword in C++, so this breaks the poorly-enforced policy
that header files should be include-able in C++ code. Per report from
Craig Ringer and some investigation with cpluspluscheck.
Robert Haas [Mon, 27 Dec 2010 02:32:07 +0000 (21:32 -0500)]
Corrections to patch adding SQL/MED error codes.
My previous commit, 85cff3ce7f360d139d87aee836d75a6202fee066 on
2010-12-25, failed to update errcodes.sgml or plerrcodes.h. This patch
corrects that oversight, per a gripe from Tom Lane, and also corrects
a typographical error.
Andrew Dunstan [Fri, 24 Dec 2010 18:31:28 +0000 (13:31 -0500)]
Allow vpath builds and regression tests to succeed on Mingw. Backpatch to release 8.4 - earlier releases would require more changes and it's not worth the trouble.
Bruce Momjian [Fri, 24 Dec 2010 16:51:51 +0000 (11:51 -0500)]
Remove quotes from boolean recovery.conf.sample parameters, now that the
quotes are not required. This now matches postgresql.conf's
specification of booleans.
Bruce Momjian [Fri, 24 Dec 2010 14:45:15 +0000 (09:45 -0500)]
Improve "pg_ctl -w start" server detection by writing the postmaster
port and socket directory into postmaster.pid, and have pg_ctl read from
that file, for use by PQping().
Michael Meskes [Thu, 23 Dec 2010 11:41:12 +0000 (12:41 +0100)]
Added rule to ecpg lexer to accept "Unicode surrogate pair in extended quoted
string". This is not really needed because the string gets copied to the output
untranslated anyway, but by adding this rule the lexer stays in sync with the
backend lexer.
Rewrite the GiST insertion logic so that we don't need the post-recovery
cleanup stage to finish incomplete inserts or splits anymore. There was two
reasons for the cleanup step:
1. When a new tuple was inserted to a leaf page, the downlink in the parent
needed to be updated to contain (ie. to be consistent with) the new key.
Updating the parent in turn might require recursively updating the parent of
the parent. We now handle that by updating the parent while traversing down
the tree, so that when we insert the leaf tuple, all the parents are already
consistent with the new key, and the tree is consistent at every step.
2. When a page is split, we need to insert the downlink for the new right
page(s), and update the downlink for the original page to not include keys
that moved to the right page(s). We now handle that by setting a new flag,
F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is
set, scans always follow the rightlink, regardless of the NSN mechanism used
to detect concurrent page splits. That way the tree is consistent right after
split, even though the downlink is still missing. This is very similar to the
way B-tree splits are handled. When the downlink is inserted in the parent,
the flag is cleared. To keep the insertion algorithm simple, when an
insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it
finishes the split before doing anything else.
These changes allow removing the whole "invalid tuple" mechanism, but I
retained the scan code to still follow invalid tuples correctly. While we
don't create any such tuples anymore, we want to handle them gracefully in
case you pg_upgrade a GiST index that has them. If we encounter any on an
insert, though, we just throw an error saying that you need to REINDEX.
The issue that got me into doing this is that if you did a checkpoint while
an insert or split was in progress, and the checkpoint finishes quickly so
that there is no WAL record related to the insert between RedoRecPtr and the
checkpoint record, recovery from that checkpoint would not know to finish
the incomplete insert. IOW, we have the same issue we solved with the
rm_safe_restartpoint mechanism during normal operation too. It's highly
unlikely to happen in practice, and this fix is far too large to backpatch,
so we're just going to live with in previous versions, but this refactoring
fixes it going forward.
With this patch, you don't get the annoying
'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices
anymore if you crash at an unfortunate moment.
Magnus Hagander [Wed, 22 Dec 2010 13:23:56 +0000 (14:23 +0100)]
Add PQlibVersion() function to libpq
This function is like the PQserverVersion() function except
it returns the version of libpq, making it possible for a client
program or driver to determine which version of libpq is in
use at runtime, and not just at link time.
Suggested by Harald Armin Massa and several others.
Robert Haas [Tue, 21 Dec 2010 11:30:32 +0000 (06:30 -0500)]
Work around unfortunate getppid() behavior on BSD-ish systems.
On MacOS X, and apparently also on other BSD-derived systems, attaching
a debugger causes getppid() to return the pid of the debugging process
rather than the actual parent PID. As a result, debugging the
autovacuum launcher, startup process, or WAL sender on such systems
causes it to exit, because the previous coding of PostmasterIsAlive()
detects postmaster death by testing whether getppid() == PostmasterPid.
Work around that behavior by checking the return value of getppid()
more carefully. If it's PostmasterPid, the postmaster must be alive;
if it's 1, assume the postmaster is dead. If it's any other value,
assume we've been debugged and fall through to the less-reliable
kill() test.
Robert Haas [Mon, 20 Dec 2010 17:59:33 +0000 (12:59 -0500)]
Allow transactions that don't write WAL to commit asynchronously.
This case can arise if a transaction has written data, but only to
temporary tables. Loss of the commit record in case of a crash won't
matter, because the temporary tables will be lost anyway.
Magnus Hagander [Sun, 19 Dec 2010 20:31:23 +0000 (21:31 +0100)]
Remove thread dumping constant that requires newer Platform SDK
Since we're not multithreaded it only provides marginally useful
information, and it does require a newer version of the Platform SDK
than we target. We may want to reconsider this in the future along
with a fix for MinGW.
Tom Lane [Sun, 19 Dec 2010 20:30:44 +0000 (15:30 -0500)]
Fix up handling of simple-form CASE with constant test expression.
eval_const_expressions() can replace CaseTestExprs with constants when
the surrounding CASE's test expression is a constant. This confuses
ruleutils.c's heuristic for deparsing simple-form CASEs, leading to
Assert failures or "unexpected CASE WHEN clause" errors. I had put in
a hack solution for that years ago (see commit 514ce7a331c5bea8e55b106d624e55732a002295 of 2006-10-01), but bug #5794
from Peter Speck shows that that solution failed to cover all cases.
Fortunately, there's a much better way, which came to me upon reflecting
that Peter's "CASE TRUE WHEN" seemed pretty redundant: we can "simplify"
the simple-form CASE to the general form of CASE, by simply omitting the
constant test expression from the rebuilt CASE construct. This is
intuitively valid because there is no need for the executor to evaluate
the test expression at runtime; it will never be referenced, because any
CaseTestExprs that would have referenced it are now replaced by constants.
This won't save a whole lot of cycles, since evaluating a Const is pretty
cheap, but a cycle saved is a cycle earned. In any case it beats kluging
ruleutils.c still further. So this patch improves const-simplification
and reverts the previous change in ruleutils.c.
Back-patch to all supported branches. The bug exists in 8.1 too, but it's
out of warranty.
After parsing a parenthesized subexpression, we must pop all pending
ANDs and NOTs off the stack, just like the case for a simple operand.
Per bug #5793.
Also fix clones of this routine in contrib/intarray and contrib/ltree,
where input of types query_int and ltxtquery had the same problem.
Magnus Hagander [Sun, 19 Dec 2010 15:45:28 +0000 (16:45 +0100)]
Support for collecting crash dumps on Windows
Add support for collecting "minidump" style crash dumps on
Windows, by setting up an exception handling filter. Crash
dumps will be generated in PGDATA/crashdumps if the directory
is created (the existance of the directory is used as on/off
switch for the generation of the dumps).
Robert Haas [Fri, 17 Dec 2010 13:30:57 +0000 (08:30 -0500)]
Reset 'ps' display just once when resolving VXID conflicts.
This prevents the word "waiting" from briefly disappearing from the ps
status line when ResolveRecoveryConflictWithVirtualXIDs begins a new
iteration of the outer loop.
Along the way, remove some useless pgstat_report_waiting() calls;
the startup process doesn't appear in pg_stat_activity.
Tom Lane [Thu, 16 Dec 2010 21:22:05 +0000 (16:22 -0500)]
Remove optreset from src/port/ implementations of getopt and getopt_long.
We don't actually need optreset, because we can easily fix the code to
ensure that it's cleanly restartable after having completed a scan over the
argv array; which is the only case we need to restart in. Getting rid of
it avoids a class of interactions with the system libraries and allows
reversion of my change of yesterday in postmaster.c and postgres.c.
Back-patch to 8.4. Before that the getopt code was a bit different anyway.
Bruce Momjian [Thu, 16 Dec 2010 15:13:43 +0000 (10:13 -0500)]
Fix crash caused by NULL lookup when reporting IP address of failed
libpq connection, per report from Magnus. This happens only on GIT
master and only on Win32 because that is the platform where "" maps to
an IP address (localhost).
Tom Lane [Thu, 16 Dec 2010 04:50:41 +0000 (23:50 -0500)]
Fix up getopt() reset management so it works on recent mingw.
The mingw people don't appear to care about compatibility with non-GNU
versions of getopt, so force use of our own copy of getopt on Windows.
Also, ensure that we make use of optreset when using our own copy.
Per report from Andrew Dunstan. Back-patch to all versions supported
on Windows.
Tom Lane [Thu, 16 Dec 2010 02:14:24 +0000 (21:14 -0500)]
Fix contrib/seg's GiST picksplit method.
This patch replaces Guttman's generalized split method with a simple
sort-by-center-points algorithm. Since the data is only one-dimensional
we don't really need the slow and none-too-stable Guttman method.
This is in part a bug fix, since seg has the same size_alpha versus
size_beta typo that was recently fixed in contrib/cube. It seems
prudent to apply this rather aggressive fix only in HEAD, though.
Back branches will just get the typo fix.
Itagaki Takahiro [Wed, 15 Dec 2010 21:56:28 +0000 (06:56 +0900)]
Add pg_read_binary_file() and whole-file-at-once versions of pg_read_file().
One of the usages of the binary version is to read files in a different
encoding from the server encoding.
Robert Haas [Tue, 14 Dec 2010 03:37:55 +0000 (22:37 -0500)]
Improved tab completion for views with triggers.
Allow INSERT INTO, UPDATE, and DELETE FROM to be completed with
either the name of a table (as before) or the name of a view with
an appropriate INSTEAD OF rule.
Along the way, allow CREATE TRIGGER to be completed with INSTEAD OF,
as well as BEFORE and AFTER.