Fix error handling of XLogReaderAllocate in case of OOM
Similarly to previous fix 9b8d478, commit 2c03216 has switched
XLogReaderAllocate() to use a set of palloc calls instead of malloc,
causing any callers of this function to fail with an error instead of
receiving a NULL pointer in case of out-of-memory error. Fix this by
using palloc_extended with MCXT_ALLOC_NO_OOM that will safely return
NULL in case of an OOM.
Robert Haas [Fri, 3 Apr 2015 12:32:05 +0000 (08:32 -0400)]
Change the way we decide whether to give up on abbreviated text keys.
Be more aggressive about aborting early on if it looks like it's not
helping, but be less aggressive about aborting later on, since it's
more expensive at that point, and also since we're currently aborting
in some cases where abbreviation can still deliver a substantial win.
Peter Geoghegan. Extensive testing by Tomas Vondra.
Rework handling of OOM when allocating record buffer in XLOG reader.
Commit 2c03216 changed allocate_recordbuf() so that it uses a palloc to
allocate the read buffer and fails immediately when an out-of-memory error
shows up, even though its callers still expect that NULL is returned in that
case. This bug is fixed making allocate_recordbuf() use a palloc_extended
with MCXT_ALLOC_NO_OOM flag and return NULL in OOM case.
This commit also adds pg_malloc_extended for frontend. These interfaces
can be used to control at a lower level memory allocation using an interface
similar to MemoryContextAllocExtended. For example, the callers can specify
MCXT_ALLOC_NO_OOM if they want to suppress the "out of memory" error while
allocating the memory and handle a NULL return value.
Tom Lane [Fri, 3 Apr 2015 04:07:29 +0000 (00:07 -0400)]
Fix rare startup failure induced by MVCC-catalog-scans patch.
While a new backend nominally participates in sinval signaling starting
from the SharedInvalBackendInit call near the top of InitPostgres, it
cannot recognize sinval messages for unshared catalogs of its database
until it has set up MyDatabaseId. This is not problematic for the catcache
or relcache, which by definition won't have loaded any data from or about
such catalogs before that point. However, commit 568d4138c646cd7c
introduced a mechanism for re-using MVCC snapshots for catalog scans, and
made invalidation of those depend on recognizing relevant sinval messages.
So it's possible to establish a catalog snapshot to read pg_authid and
pg_database, then before we set MyDatabaseId, receive sinval messages that
should result in invalidating that snapshot --- but do not, because we
don't realize they are for our database. This mechanism explains the
intermittent buildfarm failures we've seen since commit 31eae6028eca4365.
That commit was not itself at fault, but it introduced a new regression
test that does reconnections concurrently with the "vacuum full pg_am"
command in vacuum.sql. This allowed the pre-existing error to be exposed,
given just the right timing, because we'd fail to update our information
about how to access pg_am. In principle any VACUUM FULL on a system
catalog could have created a similar hazard for concurrent incoming
connections. Perhaps there are more subtle failure cases as well.
To fix, force invalidation of the catalog snapshot as soon as we've
set MyDatabaseId.
Robert Haas [Thu, 2 Apr 2015 20:26:49 +0000 (16:26 -0400)]
Improve pgbench error reporting.
This would have been worth doing on general principle anyway, but the
recent addition of an expression syntax to pgbench makes it an even
better idea than it would have been otherwise.
Commit 0d831389749a3 inadvertently reversed the meaning of the
wraparound variable. This causes vacuums which are not required for
wraparound to wait for locks to be acquired, and what is worse, it
allows wraparound vacuums to skip locked pages.
Bug reported by Jeff Janes in
http://www.postgresql.org/message-id/CAMkU=1xmTEiaY=5oMHsSQo5vd9V1Ze4kNLL0qN2eH0P_GXOaYw@mail.gmail.com
Analysis and patch by Kyotaro HORIGUCHI
Andres Freund [Thu, 2 Apr 2015 15:43:35 +0000 (17:43 +0200)]
Define integer limits independently from the system definitions.
In 83ff1618 we defined integer limits iff they're not provided by the
system. That turns out not to be the greatest idea because there's
different ways some datatypes can be represented. E.g. on OSX PG's 64bit
datatype will be a 'long int', but OSX unconditionally uses 'long
long'. That disparity then can lead to warnings, e.g. around printf
formats.
One way to fix that would be to back int64 using stdint.h's
int64_t. While a good idea it's not that easy to implement. We would
e.g. need to include stdint.h in our external headers, which we don't
today. Also computing the correct int64 printf formats in that case is
nontrivial.
Instead simply prefix the integer limits with PG_ and define them
unconditionally. I've adjusted all the references to them in code, but
not the ones in comments; the latter seems unnecessary to me.
This is the second try at this, after fcef1617295 failed miserably and
had to be reverted: as it turns out, libpq cannot depend on libpgcommon
after all. Instead of shuffling code in the master branch, make that one
just like 9.4 and accept the duplication. (This was all my own mistake,
not the patch submitter's).
psql was already accepting conninfo strings as the first parameter in
\connect, but the way it worked wasn't sane; some of the other
parameters would get the previous connection's values, causing it to
connect to a completely unexpected server or, more likely, not finding
any server at all because of completely wrong combinations of
parameters.
Fix by explicitely checking for a conninfo-looking parameter in the
dbname position; if one is found, use its complete specification rather
than mix with the other arguments. Also, change tab-completion to not
try to complete conninfo/URI-looking "dbnames" and document that
conninfos are accepted as first argument.
There was a weak consensus to backpatch this, because while the behavior
of using the dbname as a conninfo is nowhere documented for \connect, it
is reasonable to expect that it works because it does work in many other
contexts. Therefore this is backpatched all the way back to 9.0.
Author: David Fetter, Andrew Dunstan. Some editorialization by me
(probably earning a Gierth's "Sloppy" badge in the process.)
Reviewers: Andrew Gierth, Erik Rijkers, Pavel Stěhule, Stephen Frost,
Robert Haas, Andrew Dunstan.
psql was already accepting conninfo strings as the first parameter in
\connect, but the way it worked wasn't sane; some of the other
parameters would get the previous connection's values, causing it to
connect to a completely unexpected server or, more likely, not finding
any server at all because of completely wrong combinations of
parameters.
Fix by explicitely checking for a conninfo-looking parameter in the
dbname position; if one is found, use its complete specification rather
than mix with the other arguments. Also, change tab-completion to not
try to complete conninfo/URI-looking "dbnames" and document that
conninfos are accepted as first argument.
There was a weak consensus to backpatch this, because while the behavior
of using the dbname as a conninfo is nowhere documented for \connect, it
is reasonable to expect that it works because it does work in many other
contexts. Therefore this is backpatched all the way back to 9.0.
To implement this, routines previously private to libpq have been
duplicated so that psql can decide what looks like a conninfo/URI
string. In back branches, just duplicate the same code all the way back
to 9.2, where URIs where introduced; 9.0 and 9.1 have a simpler version.
In master, the routines are moved to src/common and renamed.
Author: David Fetter, Andrew Dunstan. Some editorialization by me
(probably earning a Gierth's "Sloppy" badge in the process.)
Reviewers: Andrew Gierth, Erik Rijkers, Pavel Stěhule, Stephen Frost,
Robert Haas, Andrew Dunstan.
Tom Lane [Wed, 1 Apr 2015 21:11:21 +0000 (17:11 -0400)]
Provide real selectivity estimators for inet/cidr operators.
This patch fills in the formerly-stub networksel() and networkjoinsel()
estimation functions. Those are used for << <<= >> >>= and && operators
on inet/cidr types. The estimation is not perfect, certainly, because
we rely on the existing statistics collected for the inet btree operators.
But it's a long way better than nothing, and it's not clear that asking
ANALYZE to collect separate stats for these operators would be a win.
Emre Hasegeli, with reviews from Dilip Kumar and Heikki Linnakangas,
and some further hacking by me
Tom Lane [Wed, 1 Apr 2015 00:02:40 +0000 (20:02 -0400)]
Fix incorrect markup in documentation of window frame clauses.
You're required to write either RANGE or ROWS to start a frame clause,
but the documentation incorrectly implied this is optional. Noted by
David Johnston.
Andrew Dunstan [Mon, 30 Mar 2015 21:07:52 +0000 (17:07 -0400)]
Run pg_upgrade and pg_resetxlog with restricted token on Windows
As with initdb these programs need to run with a restricted token, and
if they don't pg_upgrade will fail when run as a user with Adminstrator
privileges.
Backpatch to all live branches. On the development branch the code is
reorganized so that the restricted token code is now in a single
location. On the stable bramches a less invasive change is made by
simply copying the relevant code to pg_upgrade.c and pg_resetxlog.c.
Patches and bug report from Muhammad Asif Naeem, reviewed by Michael
Paquier, slightly edited by me.
Tom Lane [Mon, 30 Mar 2015 20:40:05 +0000 (16:40 -0400)]
Fix bogus concurrent use of _hash_getnewbuf() in bucket split code.
_hash_splitbucket() obtained the base page of the new bucket by calling
_hash_getnewbuf(), but it held no exclusive lock that would prevent some
other process from calling _hash_getnewbuf() at the same time. This is
contrary to _hash_getnewbuf()'s API spec and could in fact cause failures.
In practice, we must only call that function while holding write lock on
the hash index's metapage.
An additional problem was that we'd already modified the metapage's bucket
mapping data, meaning that failure to extend the index would leave us with
a corrupt index.
Fix both issues by moving the _hash_getnewbuf() call to just before we
modify the metapage in _hash_expandtable().
Unfortunately there's still a large problem here, which is that we could
also incur ENOSPC while trying to get an overflow page for the new bucket.
That would leave the index corrupt in a more subtle way, namely that some
index tuples that should be in the new bucket might still be in the old
one. Fixing that seems substantially more difficult; even preallocating as
many pages as we could possibly need wouldn't entirely guarantee that the
bucket split would complete successfully. So for today let's just deal
with the base case.
Per report from Antonin Houska. Back-patch to all active branches.
Alvaro Herrera [Mon, 30 Mar 2015 19:13:21 +0000 (16:13 -0300)]
Change array_offset to return subscripts, not offsets
... and rename it and its sibling array_offsets to array_position and
array_positions, to account for the changed behavior.
Having the functions return subscripts better matches existing practice,
and is better suited to using the result value as a subscript into the
array directly. For one-based arrays, the new definition is identical
to what was originally committed.
(We use the term "subscript" in the documentation, which is what we use
whenever we talk about arrays; but the functions themselves are named
using the word "position" to match the standard-defined POSITION()
functions.)
Author: Pavel Stěhule
Behavioral problem noted by Dean Rasheed.
Alvaro Herrera [Mon, 30 Mar 2015 19:01:44 +0000 (16:01 -0300)]
Fix lost persistence setting during REINDEX INDEX
ReindexIndex() trusts a parser-built RangeVar with the persistence to
use for the new copy of the index; but the parser naturally does not
know what's the persistence of the original index. To find out the
correct persistence, grab it from relcache.
This bug was introduced by commit 85b506bbfc2937c9, and therefore no
backpatch is necessary.
Bug reported by Thom Brown, analysis and patch by Michael Paquier; test
case provided by Fabrízio de Royes Mello.
Tom Lane [Mon, 30 Mar 2015 18:59:49 +0000 (14:59 -0400)]
Be more careful about printing constants in ruleutils.c.
The previous coding in get_const_expr() tried to avoid quoting integer,
float, and numeric literals if at all possible. While that looks nice,
it means that dumped expressions might re-parse to something that's
semantically equivalent but not the exact same parsetree; for example
a FLOAT8 constant would re-parse as a NUMERIC constant with a cast to
FLOAT8. Though the result would be the same after constant-folding,
this is problematic in certain contexts. In particular, Jeff Davis
pointed out that this could cause unexpected failures in ALTER INHERIT
operations because of child tables having not-exactly-equivalent CHECK
expressions. Therefore, favor correctness over legibility and dump
such constants in quotes except in the limited cases where they'll
be interpreted as the same type even without any casting.
This results in assorted small changes in the regression test outputs,
and will affect display of user-defined views and rules similarly.
The odds of that causing problems in the field seem non-negligible;
given the lack of previous complaints, it seems best not to change
this in the back branches.
Tom Lane [Mon, 30 Mar 2015 17:05:27 +0000 (13:05 -0400)]
Fix rare core dump in BackendIdGetTransactionIds().
BackendIdGetTransactionIds() neglected the possibility that the PROC
pointer in a ProcState array entry is null. In current usage, this could
only crash if the other backend had exited since pgstat_read_current_status
saw it as active, which is a pretty narrow window. But it's reachable in
the field, per bug #12918 from Vladimir Borodin.
Back-patch to 9.4 where the faulty code was introduced.
Tom Lane [Mon, 30 Mar 2015 00:02:14 +0000 (20:02 -0400)]
Fix multiple bugs and infelicities in pg_rewind.
Bugs all spotted by Coverity, including wrong realloc() size request
and memory leaks. Cosmetic improvements by me.
The usage of the global variable "filemap" here is still pretty awful,
but at least I got rid of the gratuitous aliasing in several routines
(which was helping to annoy Coverity, as well as being a bug risk).
Tom Lane [Sun, 29 Mar 2015 19:04:09 +0000 (15:04 -0400)]
Add vacuum_delay_point call in compute_index_stats's per-sample-row loop.
Slow functions in index expressions might cause this loop to take long
enough to make it worth being cancellable. Probably it would be enough
to call CHECK_FOR_INTERRUPTS here, but for consistency with other
per-sample-row loops in this file, let's use vacuum_delay_point.
Report and patch by Jeff Janes. Back-patch to all supported branches.
Tom Lane [Sun, 29 Mar 2015 18:02:58 +0000 (14:02 -0400)]
Make ginbuild's funcCtx be independent of its tmpCtx.
Previously the funcCtx was a child of the tmpCtx, but that was broken
by commit eaa5808e8ec4e82ce1a87103a6b6f687666e4e4c, which made
MemoryContextReset() delete, not reset, child contexts. The behavior of
having a tmpCtx reset also clear the other context seems rather dubious
anyway, so let's just disentangle them. Per report from Erik Rijkers.
In passing, fix badly-inaccurate comments about these contexts.
Tom Lane [Sun, 29 Mar 2015 17:06:59 +0000 (13:06 -0400)]
Minor code cleanups in pgbench expression support.
Get rid of unnecessary expr_yylex declaration (we haven't supported
flex 2.5.4 in a long time, and even if we still did, the declaration in
pgbench.h makes this one unnecessary and inappropriate). Fix copyright
dates, improve some layout choices, etc.
Tom Lane [Sat, 28 Mar 2015 17:56:37 +0000 (13:56 -0400)]
Better fix for misuse of Float8GetDatumFast().
We can use that macro as long as we put the value into a local variable.
Commit 735cd6128 was not wrong on its own terms, but I think this way
looks nicer, and it should save a few cycles on 32-bit machines.
Andrew Dunstan [Sat, 28 Mar 2015 15:07:41 +0000 (11:07 -0400)]
Add a pager_min_lines setting to psql
If set, the pager will not be used unless this many lines are to be
displayed, even if that is more than the screen depth. Default is zero,
meaning it's disabled.
There is probably more work to be done in giving the user control over
when the pager is used, particularly when wide output forces use of the
pager regardless of how many lines there are, but this is a start.
Andrew Dunstan [Sat, 28 Mar 2015 13:22:51 +0000 (09:22 -0400)]
Use standard librart sqrt function in pg_stat_statements
The stddev calculation included a faster but unportable sqrt function.
This is not worth the extra effort, and won't work everywhere. If the
standard library function is good enough for the SQL function it
should be good enough here too.
inet, cidr, and timetz indexes still cannot support index-only scans,
because they don't store the original unmodified value in the index, but a
derived approximate value.
Andrew Dunstan [Fri, 27 Mar 2015 21:29:59 +0000 (17:29 -0400)]
Fix portability issues with stddev in pg_stat_statements
Stddev is calculated on the fly, and the code in commit 717f70953264 was
using Float8GetDatumFast() inappropriately to convert the result to a
Datum. Mea culpa. It now uses Float8GetDatum().
Fix GiST index-only scans for opclasses with different storage type.
We cannot use the index's tuple descriptor directly to describe the index
tuples returned in an index-only scan. That's because the index might use
a different datatype for the values stored on disk than the type originally
indexed. As long as they were both pass-by-ref, it worked, but will not work
for pass-by-value types of different sizes. I noticed this as a crash when I
started hacking a patch to add fetch methods to btree_gist.
* pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn().
Likewise pg_attribute_unused(), pg_attribute_packed(). This reduces
pgindent's tendency to misformat declarations involving them.
* attributes are now always attached to function declarations, not
definitions. Previously some places were taking creative shortcuts,
which were not merely candidates for bad misformatting by pgindent
but often were outright wrong anyway. (It does little good to put a
noreturn annotation where callers can't see it.) In any case, if
we would like to believe that these macros can be used with non-gcc
compilers, we should avoid gratuitous variance in usage patterns.
I also went through and manually improved the formatting of a lot of
declarations, and got rid of excessively repetitive (and now obsolete
anyway) comments informing the reader what pg_attribute_printf is for.
This adds a new GiST opclass method, 'fetch', which is used to reconstruct
the original Datum from the value stored in the index. Also, the 'canreturn'
index AM interface function gains a new 'attno' argument. That makes it
possible to use index-only scans on a multi-column index where some of the
opclasses support index-only scans but some do not.
This patch adds support in the box and point opclasses. Other opclasses
can added later as follow-on patches (btree_gist would be particularly
interesting).
Anastasia Lubennikova, with additional fixes and modifications by me.
Andres Freund [Wed, 25 Mar 2015 21:39:42 +0000 (22:39 +0100)]
Centralize definition of integer limits.
Several submitted and even committed patches have run into the problem
that C89, our baseline, does not provide minimum/maximum values for
various integer datatypes. C99's stdint.h does, but we can't rely on
it.
Several parts of the code defined limits locally, so instead centralize
the definitions to c.h.
This patch also changes the more obvious usages of literal limit values;
there's more places that could be changed, but it's less clear whether
it's beneficial to change those.
Author: Andrew Gierth
Discussion: 87619tc5wc.fsf@news-spur.riddles.org.uk
Alvaro Herrera [Wed, 25 Mar 2015 20:17:56 +0000 (17:17 -0300)]
Return ObjectAddress in many ALTER TABLE sub-routines
Since commit a2e35b53c39b2a, most CREATE and ALTER commands return the
ObjectAddress of the affected object. This is useful for event triggers
to try to figure out exactly what happened. This patch extends this
idea a bit further to cover ALTER TABLE as well: an auxiliary
ObjectAddress is returned for each of several subcommands of ALTER
TABLE. This makes it possible to decode with precision what happened
during execution of any ALTER TABLE command; for instance, which
constraint was added by ALTER TABLE ADD CONSTRAINT, or which parent got
dropped from the parents list by ALTER TABLE NO INHERIT.
As with the previous patch, there is no immediate user-visible change
here.
This is all really just continuing what c504513f83a9ee8 started.
Tom Lane [Wed, 25 Mar 2015 19:54:08 +0000 (15:54 -0400)]
Upgrade src/port/rint.c to be POSIX-compliant.
The POSIX spec says that rint() rounds halfway cases to nearest even.
Our substitute implementation failed to do that, rather rounding halfway
cases away from zero; and it also got some other cases (such as minus
zero) wrong. This led to observable cross-platform differences, as
reported in bug #12885 from Rich Schaaf; in particular, casting from
float to int didn't honor round-to-nearest-even on builds using rint.c.
Implement something that attempts to cover all cases per spec, and add
some simple regression tests so that we'll notice if any platforms still
get this wrong.
Although this is a bug fix, no back-patch, as a behavioral change in
the back branches was agreed not to be a good idea.
Pedro Gimeno Fortea, reviewed by Michael Paquier and myself
Kevin Grittner [Wed, 25 Mar 2015 19:24:43 +0000 (14:24 -0500)]
Reduce pinning and buffer content locking for btree scans.
Even though the main benefit of the Lehman and Yao algorithm for
btrees is that no locks need be held between page reads in an
index search, we were holding a buffer pin on each leaf page after
it was read until we were ready to read the next one. The reason
was so that we could treat this as a weak lock to create an
"interlock" with vacuum's deletion of heap line pointers, even
though our README file pointed out that this was not necessary for
a scan using an MVCC snapshot.
The main goal of this patch is to reduce the blocking of vacuum
processes by in-progress btree index scans (including a cursor
which is idle), but the code rearrangement also allows for one
less buffer content lock to be taken when a forward scan steps from
one page to the next, which results in a small but consistent
performance improvement in many workloads.
This patch leaves behavior unchanged for some cases, which can be
addressed separately so that each case can be evaluated on its own
merits. These unchanged cases are when a scan uses a non-MVCC
snapshot, an index-only scan, and a scan of a btree index for which
modifications are not WAL-logged. If later patches allow all of
these cases to drop the buffer pin after reading a leaf page, then
the btree vacuum process can be simplified; it will no longer need
the "super-exclusive" lock to delete tuples from a page.
Reviewed by Heikki Linnakangas and Kyotaro Horiguchi
Alvaro Herrera [Wed, 25 Mar 2015 17:28:34 +0000 (14:28 -0300)]
Fix bug for array-formatted identities of user mappings
I failed to realize that server names reported in the object args array
would get quoted, which is wrong; remove that, making sure that it's
only quoted in the string-formatted identity.
This bug was introduced by my commit cf34e373, which was backpatched,
but since object name/args arrays are new in commit a676201490c8, there
is no need to backpatch this any further.
Tom Lane [Tue, 24 Mar 2015 19:53:06 +0000 (15:53 -0400)]
Fix ExecOpenScanRelation to take a lock on a ROW_MARK_COPY relation.
ExecOpenScanRelation assumed that any relation listed in the ExecRowMark
list has been locked by InitPlan; but this is not true if the rel's
markType is ROW_MARK_COPY, which is possible if it's a foreign table.
In most (possibly all) cases, failure to acquire a lock here isn't really
problematic because the parser, planner, or plancache would have taken the
appropriate lock already. In principle though it might leave us vulnerable
to working with a relation that we hold no lock on, and in any case if the
executor isn't depending on previously-taken locks otherwise then it should
not do so for ROW_MARK_COPY relations.
Noted by Etsuro Fujita. Back-patch to all active versions, since the
inconsistency has been there a long time. (It's almost certainly
irrelevant in 9.0, since that predates foreign tables, but the code's
still wrong on its own terms.)
Tom Lane [Mon, 23 Mar 2015 20:59:29 +0000 (16:59 -0400)]
Apply table and domain CHECK constraints in name order.
Previously, CHECK constraints of the same scope were checked in whatever
order they happened to be read from pg_constraint. (Usually, but not
reliably, this would be creation order for domain constraints and reverse
creation order for table constraints, because of differing implementation
details.) Nondeterministic results of this sort are problematic at least
for testing purposes, and in discussion it was agreed to be a violation of
the principle of least astonishment. Therefore, borrow the principle
already established for triggers, and apply such checks in name order
(using strcmp() sort rules). This lets users control the check order
if they have a mind to.
Domain CHECK constraints still follow the rule of checking lower nested
domains' constraints first; the name sort only applies to multiple
constraints attached to the same domain.
In passing, I failed to resist the temptation to wordsmith a bit in
create_domain.sgml.
Apply to HEAD only, since this could result in a behavioral change in
existing applications, and the potential regression test failures have
not actually been observed in our buildfarm.
It worked in my Windows VM with VS2013, but buildfarm animal mastodon,
running MSVC 2005, was not happy. Amit Kapila also reported a similar error
earlier in his environment. Let's see if this helps.
Andres Freund [Mon, 23 Mar 2015 15:40:10 +0000 (16:40 +0100)]
Don't delay replication for less than recovery_min_apply_delay's resolution.
Recovery delays are implemented by waiting on a latch, and latches take
milliseconds as a parameter. The required amount of waiting was computed
using microsecond resolution though and the wait loop's abort condition
was checking the delay in microseconds as well. This could lead to
short spurts of busy looping when the overall wait time was below a
millisecond, but above 0 microseconds.
Instead just formulate the wait loop's abort condition in millisecond
granularity as well. Given that that's recovery_min_apply_delay
resolution, it seems harmless to not wait for less than a millisecond.
Backpatch to 9.4 where recovery_min_apply_delay was introduced.
Robert Haas [Mon, 23 Mar 2015 13:58:56 +0000 (09:58 -0400)]
Remove ill-advised pre-check for DSM segment exhaustion.
dsm_control->nitems never decreases, so this is testing whether the
server has *ever* run out of DSM segments, not whether it is
*currently* out of DSM segments.
Revert "to_char(float4/8): zero pad to specified length". There are
too many platform-specific problems, and the proper rounding is missing.
Also revert companion patch 9d61b9953c1489cbb458ca70013cf5fca1bb7710.
Tom Lane [Sun, 22 Mar 2015 17:53:11 +0000 (13:53 -0400)]
Allow foreign tables to participate in inheritance.
Foreign tables can now be inheritance children, or parents. Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.
As side effects of this, allow foreign tables to have NOT VALID CHECK
constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to
accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS. Continuing to
disallow these things would've required bizarre and inconsistent special
cases in inheritance behavior. Since foreign tables don't enforce CHECK
constraints anyway, a NOT VALID one is a complete no-op, but that doesn't
mean we shouldn't allow it. And it's possible that some FDWs might have
use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops
for most.
An additional change in support of this is that when a ModifyTable node
has multiple target tables, they will all now be explicitly identified
in EXPLAIN output, for example:
Update on pt1 (cost=0.00..321.05 rows=3541 width=46)
Update on pt1
Foreign Update on ft1
Foreign Update on ft2
Update on child3
-> Seq Scan on pt1 (cost=0.00..0.00 rows=1 width=46)
-> Foreign Scan on ft1 (cost=100.00..148.03 rows=1170 width=46)
-> Foreign Scan on ft2 (cost=100.00..148.03 rows=1170 width=46)
-> Seq Scan on child3 (cost=0.00..25.00 rows=1200 width=46)
This was done mainly to provide an unambiguous place to attach "Remote SQL"
fields, but it is useful for inherited updates even when no foreign tables
are involved.
Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me
Bruce Momjian [Sun, 22 Mar 2015 01:43:15 +0000 (21:43 -0400)]
to_char(float4/8): zero pad to specified length
Previously, zero padding was limited to the internal length, rather than
the specified length. This allows it to match to_char(int/numeric), which
always padded to the specified length.
Make pg_xlogdump MSVC build work more like others.
Instead of copying xlogreader.c and *desc.c files into the source directory,
build them where they are. That's what we do for other binaries that need to
compile and link in files from elsewhere in the source tree.
The commit history suggests that it was done this way because of issues with
older versions of MSVC. I think this should work, but we'll see if the
buildfarm complains.
Andres Freund [Fri, 20 Mar 2015 09:26:17 +0000 (10:26 +0100)]
Use 128-bit math to accelerate some aggregation functions.
On platforms where we support 128bit integers, use them to implement
faster transition functions for sum(int8), avg(int8),
var_*(int2/int4),stdev_*(int2/int4). Where not supported continue to use
numeric as a transition type.
In some synthetic benchmarks this has been shown to provide significant
speedups.
Bumps catversion.
Discussion: 544BB5F1.50709@proxel.se
Author: Andreas Karlsson Reviewed-By: Peter Geoghegan, Petr Jelinek, Andres Freund,
Oskari Saarenmaa, David Rowley
Andres Freund [Fri, 20 Mar 2015 09:26:17 +0000 (10:26 +0100)]
Add, optional, support for 128bit integers.
We will, for the foreseeable future, not expose 128 bit datatypes to
SQL. But being able to use 128bit math will allow us, in a later patch,
to use 128bit accumulators for some aggregates; leading to noticeable
speedups over using numeric.
So far we only detect a gcc/clang extension that supports 128bit math,
but no 128bit literals, and no *printf support. We might want to expand
this in the future to further compilers; if there are any that that
provide similar support.
Discussion: 544BB5F1.50709@proxel.se
Author: Andreas Karlsson, with significant editorializing by me Reviewed-By: Peter Geoghegan, Oskari Saarenmaa