Robert Haas [Tue, 2 Jul 2013 17:35:14 +0000 (13:35 -0400)]
Add support for multiple kinds of external toast datums.
To that end, support tags rather than lengths for external datums.
As an example of how this can be used, add support or "indirect"
tuples which point to some externally allocated memory containing
a toast tuple. Similar infrastructure could be used for other
purposes, including, perhaps, support for alternative compression
algorithms.
Andres Freund, reviewed by Hitoshi Harada and myself
Silence compiler warning in assertion-enabled builds.
With -Wtype-limits, gcc correctly points out that size_t can never be < 0.
Backpatch to 9.3 and 9.2. It's been like this forever, but in <= 9.1 you got
a lot other warnings with -Wtype-limits anyway (at least with my version of
gcc).
Bruce Momjian [Tue, 2 Jul 2013 14:29:27 +0000 (10:29 -0400)]
pg_upgrade: revert changing '' to ""
On the command line, GUC option strings are handled by the guc parser,
not by the shell parser, so '' is the proper way to represent a
zero-length string. This reverts commit 3132a9b7ab3d76c15f88cfa29792fd888e7a959e.
Robert Haas [Tue, 2 Jul 2013 13:47:01 +0000 (09:47 -0400)]
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
The dependencies on the spi and dummy_seclabel contrib modules were
incomplete, because they did not pick up automatically generated
dependencies on header files. This will manifest itself especially when
switching major versions, where the contrib modules would not be
recompiled to contain the new version number, leading to regression test
failures.
To fix this, use the submake approach already in use elsewhere, so that
the contrib modules are built using their full rules.
Bruce Momjian [Mon, 1 Jul 2013 18:52:56 +0000 (14:52 -0400)]
pg_dump docs: use escaped double-quotes, for Windows
On Unix, you can embed double-quotes in single-quotes, and via versa.
However, on Windows, you can only escape double-quotes in double-quotes,
so use that in the pg_dump -t/table example.
Backpatch to 9.3.
Report from Mike Toews
Bruce Momjian [Mon, 1 Jul 2013 18:45:45 +0000 (14:45 -0400)]
pg_upgrade: use "" rather than '', for Windows
If we ever support unix sockets on Windows, we should use "" rather than
'' for zero-length strings on the command-line, so use that.
Bruce Momjian [Mon, 1 Jul 2013 17:40:18 +0000 (13:40 -0400)]
Add timezone offset output option to to_char()
Add ability for to_char() to output the timezone's UTC offset (OF). We
already have the ability to return the timezone abbeviation (TZ/tz).
Per request from Andrew Dunstan
Andrew Dunstan [Mon, 1 Jul 2013 16:53:05 +0000 (12:53 -0400)]
Improve support for building PGXS modules with VPATH.
A VPATH build will be performed when the module's make file path is not
the current directory or when USE_VPATH is set.
This will assist packagers and others who prefer to build without
polluting the source directories.
There is still a bit of work to do here, notably documentation, but it's
probably a good idea to commit what we have so far and let people test
it out on their modules.
Bruce Momjian [Mon, 1 Jul 2013 16:40:02 +0000 (12:40 -0400)]
Remove undocumented -h (help) option
The -h option was not supported by many tools, and not documented, so
remove them for consistency from pg_upgrade, pg_test_fsync, and
pg_test_timing.
The pglz compressor has a significant startup cost, because it has to
initialize to zeros the history-tracking hash table. On a 64-bit system, the
hash table was 64kB in size. While clearing memory is pretty fast, for very
short inputs the relative cost of that was quite large.
This patch alleviates that in two ways. First, instead of storing pointers
in the hash table, store 16-bit indexes into the hist_entries array. That
slashes the size of the hash table to 1/2 or 1/4 of the original, depending
on the pointer width. Secondly, adjust the size of the hash table based on
input size. For very small inputs, you don't need a large hash table to
avoid collisions.
We don't normally bother retrying when the number of bytes written by
write() is short of what was requested. It is generally assumed that a
write() to disk doesn't return short, unless you run out of disk space.
While writing the WAL, however, it seems prudent to try a bit harder,
because a failure leads to PANIC. The write() is also much larger than most
write()s in the backend (up to wal_buffers), so there's more room for
surprises.
Also retry on EINTR. All signals used in the backend are flagged SA_RESTART
nowadays, so it shouldn't happen, but better to be defensive.
Bruce Momjian [Fri, 28 Jun 2013 22:01:46 +0000 (18:01 -0400)]
pg_upgrade: trim down --help and doc option descriptions
Previous code had old/new prefixes on option values, e.g.
--old-datadir=OLDDATADIR. Remove them, for simplicity; now:
--old-datadir=DATADIR. Also update docs to do the same.
Alvaro Herrera [Fri, 28 Jun 2013 21:20:53 +0000 (17:20 -0400)]
Send SIGKILL to children if they don't die quickly in immediate shutdown
On immediate shutdown, or during a restart-after-crash sequence,
postmaster used to send SIGQUIT (and then abandon ship if shutdown); but
this is not a good strategy if backends don't die because of that
signal. (This might happen, for example, if a backend gets tangled
trying to malloc() due to gettext(), as in an example illustrated by
MauMau.) This causes problems when later trying to restart the server,
because some processes are still attached to the shared memory segment.
Instead of just abandoning such backends to their fates, we now have
postmaster hang around for a little while longer, send a SIGKILL after
some reasonable waiting period, and then exit. This makes immediate
shutdown more reliable.
There is disagreement on whether it's best for postmaster to exit after
sending SIGKILL, or to stick around until all children have reported
death. If this controversy is resolved differently than what this patch
implements, it's an easy change to make.
Bruce Momjian [Fri, 28 Jun 2013 21:27:02 +0000 (17:27 -0400)]
pg_upgrade: change -u to -U, for consistency
Change -u (user) option to -U, for consistency with other tools like
pg_dump and psql. Also expand --user to --username, again for
consistency.
BACKWARD INCOMPATIBILITY
Robert Haas [Fri, 28 Jun 2013 14:18:00 +0000 (10:18 -0400)]
Make the OVER keyword unreserved.
This results in a slightly less specific error message when OVER
is used in a context where we don't accept window functions, but
per discussion, it's worth it to get the benefit of not needing
to reserve this keyword any more. This same refactoring will
also let us avoid reserving some other keywords that we expect
to add in upcoming patches (specifically, IGNORE, RESPECT, and
FILTER).
On many platforms the OS will round the sleep time to millisecond
resolution, but there is no reason for us to pre-emptively round the
argument to pg_usleep.
When the delay was measured in milliseconds and started from 1 ms, it
sometimes took many attempts until the logic that increases the delay by
multiplying with a random value between 1 and 2 actually managed to bump it
from 1 ms to 2 ms. That lead to a sequence of 1 ms waits until the delay
started to increase. This wasn't really a problem but it looked odd if you
observed the waits. There is no measurable difference in performance, but
it's more readable this way.
Noah Misch [Thu, 27 Jun 2013 18:53:57 +0000 (14:53 -0400)]
Permit super-MaxAllocSize allocations with MemoryContextAllocHuge().
The MaxAllocSize guard is convenient for most callers, because it
reduces the need for careful attention to overflow, data type selection,
and the SET_VARSIZE() limit. A handful of callers are happy to navigate
those hazards in exchange for the ability to allocate a larger chunk.
Introduce MemoryContextAllocHuge() and repalloc_huge(). Use this in
tuplesort.c and tuplestore.c, enabling internal sorts of up to INT_MAX
tuples, a factor-of-48 increase. In particular, B-tree index builds can
now benefit from much-larger maintenance_work_mem settings.
Reviewed by Stephen Frost, Simon Riggs and Jeff Janes.
Tom Lane [Thu, 27 Jun 2013 17:54:50 +0000 (13:54 -0400)]
Mark index-constraint comments with correct dependency in pg_dump.
When there's a comment on an index that was created with UNIQUE or PRIMARY
KEY constraint syntax, we need to label the comment as depending on the
constraint not the index, since only the constraint object actually appears
in the dump. This incorrect dependency can lead to parallel pg_restore
trying to restore the comment before the index has been created, per bug
#8257 from Lloyd Albin.
This patch fixes pg_dump to produce the right dependency in dumps made
in the future. Usually we also try to hack pg_restore to work around
bogus dependencies, so that existing (wrong) dumps can still be restored in
parallel mode; but that doesn't seem practical here since there's no easy
way to relate the constraint dump entry to the comment after the fact.
Tom Lane [Thu, 27 Jun 2013 16:36:44 +0000 (12:36 -0400)]
Expect EWOULDBLOCK from a non-blocking connect() call only on Windows.
On Unix-ish platforms, EWOULDBLOCK may be the same as EAGAIN, which is
*not* a success return, at least not on Linux. We need to treat it as a
failure to avoid giving a misleading error message. Per the Single Unix
Spec, only EINPROGRESS and EINTR returns indicate that the connection
attempt is in progress.
On Windows, on the other hand, EWOULDBLOCK (WSAEWOULDBLOCK) is the expected
case. We must accept EINPROGRESS as well because Cygwin will return that,
and it doesn't seem worth distinguishing Cygwin from native Windows here.
It's not very clear whether EINTR can occur on Windows, but let's leave
that part of the logic alone in the absence of concrete trouble reports.
Also, remove the test for errno == 0, effectively reverting commit da9501bddb42222dc33c031b1db6ce2133bcee7b, which AFAICS was just a thinko;
or at best it might have been a workaround for a platform-specific bug,
which we can hope is gone now thirteen years later. In any case, since
libpq makes no effort to reset errno to zero before calling connect(),
it seems unlikely that that test has ever reliably done anything useful.
Tom Lane [Thu, 27 Jun 2013 04:23:37 +0000 (00:23 -0400)]
Tweak wording in sequence-function docs to avoid PDF build failures.
Adjust the wording in the first para of "Sequence Manipulation Functions"
so that neither of the link phrases in it break across line boundaries,
in either A4- or US-page-size PDF output. This fixes a reported build
failure for the 9.3beta2 A4 PDF docs, and future-proofs this particular
para against causing similar problems in future. (Perhaps somebody will
fix this issue in the SGML/TeX documentation tool chain someday, but I'm
not holding my breath.)
Back-patch to all supported branches, since the same problem could rise up
to bite us in future updates if anyone changes anything earlier than this
in func.sgml.
Noah Misch [Thu, 27 Jun 2013 00:00:08 +0000 (20:00 -0400)]
Cooperate with the Valgrind instrumentation framework.
Valgrind "client requests" in aset.c and mcxt.c teach Valgrind and its
Memcheck tool about the PostgreSQL allocator. This makes Valgrind
roughly as sensitive to memory errors involving palloc chunks as it is
to memory errors involving malloc chunks. Further client requests in
PageAddItem() and printtup() verify that all bits being added to a
buffer page or furnished to an output function are predictably-defined.
Those tests catch failures of C-language functions to fully initialize
the bits of a Datum, which in turn stymie optimizations that rely on
_equalConst(). Define the USE_VALGRIND symbol in pg_config_manual.h to
enable these additions. An included "suppression file" silences nominal
errors we don't plan to fix.
Reviewed in earlier versions by Peter Geoghegan and Korry Douglas.
Noah Misch [Wed, 26 Jun 2013 23:56:03 +0000 (19:56 -0400)]
Refactor aset.c and mcxt.c in preparation for Valgrind cooperation.
Move some repeated debugging code into functions and store intermediates
in variables where not presently necessary. No code-generation changes
in a production build, and no functional changes. This simplifies and
focuses the main patch.
Noah Misch [Wed, 26 Jun 2013 23:55:15 +0000 (19:55 -0400)]
Initialize pad bytes in GinFormTuple().
Every other core buffer page consumer initializes the bytes it furnishes
to PageAddItem(). For consistency, do the same here. No back-patch;
regardless, we couldn't count on the fix so long as binary upgrade can
carry forward affected index builds.
Noah Misch [Wed, 26 Jun 2013 15:17:33 +0000 (11:17 -0400)]
Renovate display of non-ASCII messages on Windows.
GNU gettext selects a default encoding for the messages it emits in a
platform-specific manner; it uses the Windows ANSI code page on Windows
and follows LC_CTYPE on other platforms. This is inconvenient for
PostgreSQL server processes, so realize consistent cross-platform
behavior by calling bind_textdomain_codeset() on Windows each time we
permanently change LC_CTYPE. This primarily affects SQL_ASCII databases
and processes like the postmaster that do not attach to a database,
making their behavior consistent with PostgreSQL on non-Windows
platforms. Messages from SQL_ASCII databases use the encoding implied
by the database LC_CTYPE, and messages from non-database processes use
LC_CTYPE from the postmaster system environment. PlatformEncoding
becomes unused, so remove it.
Make write_console() prefer WriteConsoleW() to write() regardless of the
encodings in use. In this situation, write() will invariably mishandle
non-ASCII characters.
elog.c has assumed that messages conform to the database encoding.
While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL.
Introduce MessageEncoding to track the actual encoding of message text.
The present consumers are Windows-specific code for converting messages
to UTF16 for use in system interfaces. This fixes the appearance in
Windows event logs and consoles of translated messages from SQL_ASCII
processes like the postmaster. Note that SQL_ASCII inherently disclaims
a strong notion of encoding, so non-ASCII byte sequences interpolated
into messages by %s may yet yield a nonsensical message. MULE_INTERNAL
has similar problems at present, albeit for a different reason: its lack
of libiconv support or a conversion to UTF8.
Consequently, one need no longer restart Windows with a different
Windows ANSI code page to broadly test backend logging under a given
language. Changing the user's locale ("Format") is enough. Several
accounts can simultaneously run postmasters under different locales, all
correctly logging localized messages to Windows event logs and consoles.
Alvaro Herrera [Tue, 25 Jun 2013 20:36:29 +0000 (16:36 -0400)]
Avoid inconsistent type declaration
Clang 3.3 correctly complains that a variable of type enum
MultiXactStatus cannot hold a value of -1, which makes sense. Change
the declared type of the variable to int instead, and apply casting as
necessary to avoid the warning.
Andrew Dunstan [Tue, 25 Jun 2013 17:46:34 +0000 (13:46 -0400)]
Properly dump dropped foreign table cols in binary-upgrade mode.
In binary upgrade mode, we need to recreate and then drop dropped
columns so that all the columns get the right attribute number. This is
true for foreign tables as well as for native tables. For foreign
tables we have been getting the first part right but not the second,
leading to bogus columns in the upgraded database. Fix this all the way
back to 9.1, where foreign tables were introduced.
Fujii Masao [Tue, 25 Jun 2013 17:14:37 +0000 (02:14 +0900)]
Support clean switchover.
In replication, when we shutdown the master, walsender tries to send
all the outstanding WAL records to the standby, and then to exit. This
basically means that all the WAL records are fully synced between
two servers after the clean shutdown of the master. So, after
promoting the standby to new master, we can restart the stopped
master as new standby without the need for a fresh backup from
new master.
But there was one problem so far: though walsender tries to send all
the outstanding WAL records, it doesn't wait for them to be replicated
to the standby. Then, before receiving all the WAL records,
walreceiver can detect the closure of connection and exit. We cannot
guarantee that there is no missing WAL in the standby after clean
shutdown of the master. In this case, backup from new master is
required when restarting the stopped master as new standby.
This patch fixes this problem. It just changes walsender so that it
waits for all the outstanding WAL records to be replicated to the
standby before closing the replication connection.
Per discussion, this is a fix that needs to get backpatched rather than
new feature. So, back-patch to 9.1 where enough infrastructure for
this exists.
Tom Lane [Sun, 23 Jun 2013 18:43:10 +0000 (14:43 -0400)]
Add a comment warning against use of pg_usleep() for long sleeps.
Follow-up to commit 873ab97219caabeb2f7b390268a4fe01e2b7518c, in which
I noted that WaitLatch was a better solution in the commit log message,
but neglected to add any documentation in the code.
Peter Eisentraut [Sat, 22 Jun 2013 02:48:06 +0000 (22:48 -0400)]
doc: Fix date in EPUB manifest
If there is no <date> element, the publication date for the EPUB
manifest is taken from the copyright year. But something like
"1996-2013" is not a legal date specification. So the EPUB output
currently fails epubcheck.
Put in a separate <date> element with the current year. Put it in
legal.sgml, because copyright.pl already instructs to update that
manually, so it hopefully won't be missed.
Peter Eisentraut [Fri, 21 Jun 2013 03:03:18 +0000 (23:03 -0400)]
Clarify terminology standalone backend vs. single-user mode
Most of the documentation uses "single-user mode", so use that in the
code as well. Adjust the documentation to match the new error message
wording. Also add a documentation index entry for "single-user mode".
Based-on-patch-by: Jeff Janes <jeff.janes@gmail.com>
Kevin Grittner [Wed, 19 Jun 2013 15:36:45 +0000 (10:36 -0500)]
Fix the create_index regression test for Danish collation.
In Danish collations, there are letter combinations which sort
higher than 'Z'. A test for values > 'WA' was picking up rows
where the value started with 'AA', causing the test to fail.
Backpatch to 9.2, where the failing test was added.
Per report from Svenne Krap and analysis by Jeff Janes
Jeff Davis [Mon, 17 Jun 2013 15:02:12 +0000 (08:02 -0700)]
Add buffer_std flag to MarkBufferDirtyHint().
MarkBufferDirtyHint() writes WAL, and should know if it's got a
standard buffer or not. Currently, the only callers where buffer_std
is false are related to the FSM.
In passing, rename XLOG_HINT to XLOG_FPI, which is more descriptive.
Tom Lane [Sat, 15 Jun 2013 20:22:29 +0000 (16:22 -0400)]
Use WaitLatch, not pg_usleep, for delaying in pg_sleep().
This avoids platform-dependent behavior wherein pg_sleep() might fail to be
interrupted by statement timeout, query cancel, SIGTERM, etc. Also, since
there's no reason to wake up once a second any more, we can reduce the
power consumption of a sleeping backend a tad.
Back-patch to 9.3, since use of SA_RESTART for SIGALRM makes this a bigger
issue than it used to be.
Tom Lane [Sat, 15 Jun 2013 19:39:51 +0000 (15:39 -0400)]
Use SA_RESTART for all signals, including SIGALRM.
The exclusion of SIGALRM dates back to Berkeley days, when Postgres used
SIGALRM in only one very short stretch of code. Nowadays, allowing it to
interrupt kernel calls doesn't seem like a very good idea, since its use
for statement_timeout means SIGALRM could occur anyplace in the code, and
there are far too many call sites where we aren't prepared to deal with
EINTR failures. When third-party code is taken into consideration, it
seems impossible that we ever could be fully EINTR-proof, so better to
use SA_RESTART always and deal with the implications of that. One such
implication is that we should not assume pg_usleep() will be terminated
early by a signal. Therefore, long sleeps should probably be replaced
by WaitLatch operations where practical.
Back-patch to 9.3 so we can get some beta testing on this change.
Tom Lane [Sat, 15 Jun 2013 18:11:43 +0000 (14:11 -0400)]
Be consistent about #define'ing configure symbols as "1" not empty.
This is just neatnik-ism, since all the tests in the code are #ifdefs,
but we shouldn't specify symbols as "Define to 1 ..." and then not
actually define them that way.
Tom Lane [Fri, 14 Jun 2013 18:26:43 +0000 (14:26 -0400)]
Avoid deadlocks during insertion into SP-GiST indexes.
SP-GiST's original scheme for avoiding deadlocks during concurrent index
insertions doesn't work, as per report from Hailong Li, and there isn't any
evident way to make it work completely. We could possibly lock individual
inner tuples instead of their whole pages, but preliminary experimentation
suggests that the performance penalty would be huge. Instead, if we fail
to get a buffer lock while descending the tree, just restart the tree
descent altogether. We keep the old tuple positioning rules, though, in
hopes of reducing the number of cases where this can happen.
Tom Lane [Fri, 14 Jun 2013 03:15:15 +0000 (23:15 -0400)]
Remove special-case treatment of LOG severity level in standalone mode.
elog.c has historically treated LOG messages as low-priority during
bootstrap and standalone operation. This has led to confusion and even
masked a bug, because the normal expectation of code authors is that
elog(LOG) will put something into the postmaster log, and that wasn't
happening during initdb. So get rid of the special-case rule and make
the priority order the same as it is in normal operation. To keep from
cluttering initdb's output and the behavior of a standalone backend,
tweak the severity level of three messages routinely issued by xlog.c
during startup and shutdown so that they won't appear in these cases.
Per my proposal back in December.
Tom Lane [Fri, 14 Jun 2013 02:35:56 +0000 (22:35 -0400)]
Refactor checksumming code to make it easier to use externally.
pg_filedump and other external utility programs are likely to want to be
able to check Postgres page checksums. To avoid messy duplication of code,
move the checksumming functionality into an exported header file, much as
we did awhile back for the CRC code.
In passing, get rid of an unportable assumption that a static char[] array
will be word-aligned, and do some other minor code beautification.
Peter Eisentraut [Fri, 14 Jun 2013 01:42:42 +0000 (21:42 -0400)]
PL/Python: Fix type mixup
Memory was allocated based on the sizeof a type that was not the type of
the pointer that the result was being assigned to. The types happen to
be of the same size, but it's still wrong.
Tom Lane [Thu, 13 Jun 2013 17:11:29 +0000 (13:11 -0400)]
Only install a portal's ResourceOwner if it actually has one.
In most scenarios a portal without a ResourceOwner is dead and not subject
to any further execution, but a portal for a cursor WITH HOLD remains in
existence with no ResourceOwner after the creating transaction is over.
In this situation, if we attempt to "execute" the portal directly to fetch
data from it, we were setting CurrentResourceOwner to NULL, leading to a
segfault if the datatype output code did anything that required a resource
owner (such as trying to fetch system catalog entries that weren't already
cached). The case appears to be impossible to provoke with stock libpq,
but psqlODBC at least is able to cause it when working with held cursors.
Simplest fix is to just skip the assignment to CurrentResourceOwner, so
that any resources used by the data output operations will be managed by
the transaction-level resource owner instead. For consistency I changed
all the places that install a portal's resowner as current, even though
some of them are probably not reachable with a held cursor's portal.
Per report from Joshua Berry (with thanks to Hiroshi Inoue for developing
a self-contained test case). Back-patch to all supported versions.
Noah Misch [Wed, 12 Jun 2013 23:51:12 +0000 (19:51 -0400)]
Avoid reading past datum end when parsing JSON.
Several loops in the JSON parser examined a byte in memory just before
checking whether its address was in-bounds, so they could read one byte
beyond the datum's allocation. A SIGSEGV is possible. New in 9.3, so
no back-patch.
Noah Misch [Wed, 12 Jun 2013 23:50:14 +0000 (19:50 -0400)]
Observe array length in HaveVirtualXIDsDelayingChkpt().
Since commit f21bb9cfb5646e1793dcc9c0ea697bab99afa523, this function
ignores the caller-provided length and loops until it finds a
terminator, which GetVirtualXIDsDelayingChkpt() never adds. Restore the
previous loop control logic. In passing, revert the addition of an
unused variable by the same commit, presumably a debugging relic.
Noah Misch [Wed, 12 Jun 2013 23:49:50 +0000 (19:49 -0400)]
Don't use ordinary NULL-terminated strings as Name datums.
Consumers are entitled to read the full 64 bytes pertaining to a Name;
using a shorter NULL-terminated string leads to reading beyond the end
its allocation; a SIGSEGV is possible. Use the frequent idiom of
copying to a NameData on the stack. New in 9.3, so no back-patch.
Tom Lane [Wed, 12 Jun 2013 21:52:54 +0000 (17:52 -0400)]
Improve updatability checking for views and foreign tables.
Extend the FDW API (which we already changed for 9.3) so that an FDW can
report whether specific foreign tables are insertable/updatable/deletable.
The default assumption continues to be that they're updatable if the
relevant executor callback function is supplied by the FDW, but finer
granularity is now possible. As a test case, add an "updatable" option to
contrib/postgres_fdw.
This patch also fixes the information_schema views, which previously did
not think that foreign tables were ever updatable, and fixes
view_is_auto_updatable() so that a view on a foreign table can be
auto-updatable.
initdb forced due to changes in information_schema views and the functions
they rely on. This is a bit unfortunate to do post-beta1, but if we don't
change this now then we'll have another API break for FDWs when we do
change it.
Dean Rasheed, somewhat editorialized on by Tom Lane
Andrew Dunstan [Wed, 12 Jun 2013 17:35:24 +0000 (13:35 -0400)]
Fix unescaping of JSON Unicode escapes, especially for non-UTF8.
Per discussion on -hackers. We treat Unicode escapes when unescaping
them similarly to the way we treat them in PostgreSQL string literals.
Escapes in the ASCII range are always accepted, no matter what the
database encoding. Escapes for higher code points are only processed in
UTF8 databases, and attempts to process them in other databases will
result in an error. \u0000 is never unescaped, since it would result in
an impermissible null byte.
Tom Lane [Tue, 11 Jun 2013 21:26:42 +0000 (17:26 -0400)]
Fix cache flush hazard in cache_record_field_properties().
We need to increment the refcount on the composite type's cached tuple
descriptor while we do lookups of its column types. Otherwise a cache
flush could occur and release the tuple descriptor before we're done with
it. This fails reliably with -DCLOBBER_CACHE_ALWAYS, but the odds of a
failure in a production build seem rather low (since the pfree'd descriptor
typically wouldn't get scribbled on immediately). That may explain the
lack of any previous reports. Buildfarm issue noted by Christian Ullrich.
Fujii Masao [Mon, 10 Jun 2013 18:03:16 +0000 (03:03 +0900)]
Fix pg_isready to handle conninfo properly.
pg_isready displays the host name and the port number that it uses to connect
to the server. So far, pg_isready didn't use the conninfo specified in -d option
for calculating those host name and port number. This can lead to wrong display
to a user. This commit changes pg_isready so that it uses the conninfo for that
calculation.
Joe Conway [Mon, 10 Jun 2013 00:30:39 +0000 (17:30 -0700)]
Fix ordering of obj id for Rules and EventTriggers in pg_dump.
getSchemaData() must identify extension member objects and mark them
as not to be dumped. This must happen after reading all objects that can be
direct members of extensions, but before we begin to process table subsidiary
objects. Both rules and event triggers were wrong in this regard.
Backport rules portion of patch to 9.1 -- event triggers do not exist prior to 9.3.
Suggested fix by Tom Lane, initial complaint and patch by me.
Tom Lane [Sun, 9 Jun 2013 23:41:52 +0000 (19:41 -0400)]
Tweak postgres_fdw regression test so autovacuum doesn't change results.
Autovacuum occurring while the test runs could allow some of the inserts to
go into recycled space, thus changing the output ordering of later queries.
While we could complicate those queries to force sorting of their output
rows, it doesn't seem like that would make the test better in any
meaningful way, and conceivably it could hide unexpected diffs. Instead,
tweak the affected queries so that the inserted rows aren't updated by the
following UPDATE. Per buildfarm.
Tom Lane [Sun, 9 Jun 2013 22:39:20 +0000 (18:39 -0400)]
Remove unnecessary restrictions about RowExprs in transformAExprIn().
When the existing code here was written, it made sense to special-case
RowExprs because that was the only way that we could handle row comparisons
at all. Now that we have record_eq() and arrays of composites, the generic
logic for "scalar" types will in fact work on RowExprs too, so there's no
reason to throw error for combinations of RowExprs and other ways of
forming composite values, nor to ignore the possibility of using a
ScalarArrayOpExpr. But keep using the old logic when comparing two
RowExprs, for consistency with the main transformAExprOp() logic. (This
allows some cases with not-quite-identical rowtypes to succeed, so we might
get push-back if we removed it.) Per bug #8198 from Rafal Rzepecki.
Back-patch to all supported branches, since this works fine as far back as
8.4.
Tom Lane [Sun, 9 Jun 2013 19:26:40 +0000 (15:26 -0400)]
Remove ALTER DEFAULT PRIVILEGES' requirement of schema CREATE permissions.
Per discussion, this restriction isn't needed for any real security reason,
and it seems to confuse people more often than it helps them. It could
also result in some database states being unrestorable. So just drop it.
Back-patch to 9.0, where ALTER DEFAULT PRIVILEGES was introduced.
Tom Lane [Sun, 9 Jun 2013 17:46:54 +0000 (13:46 -0400)]
Remove fixed limit on the number of concurrent AllocateFile() requests.
AllocateFile(), AllocateDir(), and some sister routines share a small array
for remembering requests, so that the files can be closed on transaction
failure. Previously that array had a fixed size, MAX_ALLOCATED_DESCS (32).
While historically that had seemed sufficient, Steve Toutant pointed out
that this meant you couldn't scan more than 32 file_fdw foreign tables in
one query, because file_fdw depends on the COPY code which uses
AllocateFile(). There are probably other cases, or will be in the future,
where this nonconfigurable limit impedes users.
We can't completely remove any such limit, at least not without a lot of
work, since each such request requires a kernel file descriptor and most
platforms limit the number we can have. (In principle we could
"virtualize" these descriptors, as fd.c already does for the main VFD pool,
but not without an additional layer of overhead and a lot of notational
impact on the calling code.) But we can at least let the array size be
configurable. Hence, change the code to allow up to max_safe_fds/2
allocated file requests. On modern platforms this should allow several
hundred concurrent file_fdw scans, or more if one increases the value of
max_files_per_process. To go much further than that, we'd need to do some
more work on the data structure, since the current code for closing
requests has potentially O(N^2) runtime; but it should still be all right
for request counts in this range.
Back-patch to 9.1 where contrib/file_fdw was introduced.
Andrew Dunstan [Sat, 8 Jun 2013 14:00:09 +0000 (10:00 -0400)]
Don't downcase non-ascii identifier chars in multi-byte encodings.
Long-standing code has called tolower() on identifier character bytes
with the high bit set. This is clearly an error and produces junk output
when the encoding is multi-byte. This patch therefore restricts this
activity to cases where there is a character with the high bit set AND
the encoding is single-byte.
There have been numerous gripes about this, most recently from Martin
Schäfer.
Andrew Dunstan [Sat, 8 Jun 2013 13:12:48 +0000 (09:12 -0400)]
Handle Unicode surrogate pairs correctly when processing JSON.
In 9.2, Unicode escape sequences are not analysed at all other than
to make sure that they are in the form \uXXXX. But in 9.3 many of the
new operators and functions try to turn JSON text values into text in
the server encoding, and this includes de-escaping Unicode escape
sequences. This processing had not taken into account the possibility
that this might contain a surrogate pair to designate a character
outside the BMP. That is now handled correctly.
This also enforces correct use of surrogate pairs, something that is not
done by the type's input routines. This fact is noted in the docs.
Although the DTD technically allows this, the resulting HTML is invalid
because it puts block elements inside inline elements. DocBook 5.0 also
doesn't allow it anymore, so it's fair to assume that this was never
really intended to work. Replace <synopsis> with <literal>, which is
the markup used elsewhere in the documentation in similar cases.