Peter Eisentraut [Thu, 19 Jan 2017 17:00:00 +0000 (12:00 -0500)]
Logical replication
- Add PUBLICATION catalogs and DDL
- Add SUBSCRIPTION catalog and DDL
- Define logical replication protocol and output plugin
- Add logical replication workers
From: Petr Jelinek <petr@2ndquadrant.com> Reviewed-by: Steve Singer <steve@ssinger.info> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Erik Rijkers <er@xs4all.nl> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Tom Lane [Fri, 20 Jan 2017 00:52:13 +0000 (19:52 -0500)]
Avoid core dump for empty prepared statement in an aborted transaction.
Brown-paper-bag bug in commit ab1f0c822: the old code here coped with
null CachedPlanSource.raw_parse_tree, the new code not so much.
Per report from Dave Cramer.
No regression test, because our core testing infrastructure doesn't
provide any easy way to exercise this path. Fortunately, the JDBC
crew test it regularly.
I'd somehow talked myself into believing that set_append_rel_size
doesn't need to worry about getting back an AND clause when it applies
eval_const_expressions to the result of adjust_appendrel_attrs (that is,
transposing the appendrel parent's restriction clauses for one child).
But that is nonsense, and Andreas Seltenreich's fuzz tester soon
turned up a counterexample. Put back the make_ands_implicit step
that was there before, and add a regression test covering the case.
Andres Freund [Thu, 19 Jan 2017 22:21:26 +0000 (14:21 -0800)]
Fix platform dependant regression output triggered by 69f4b9c85f16.
Due to the changed costing in that commit hash-aggregates started to
be used, which results in big-endian vs. little-endian output
differences. Disable hash-aggs for those tests.
Author: Andres Freund, with input from Tom Lane
Discussion: https://postgr.es/m/22891.1484791792@sss.pgh.pa.us
Andres Freund [Thu, 19 Jan 2017 22:12:38 +0000 (14:12 -0800)]
Remove obsoleted code relating to targetlist SRF evaluation.
Since 69f4b9c plain expression evaluation (and thus normal projection)
can't return sets of tuples anymore. Thus remove code dealing with
that possibility.
This will require adjustments in external code using
ExecEvalExpr()/ExecProject() - that should neither be hard nor very
common.
Author: Andres Freund and Tom Lane
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
Alvaro Herrera [Thu, 19 Jan 2017 21:23:09 +0000 (18:23 -0300)]
Fix race condition in reading commit timestamps
If a user requests the commit timestamp for a transaction old enough
that its data is concurrently being truncated away by vacuum at just the
right time, they would receive an ugly internal file-not-found error
message from slru.c rather than the expected NULL return value.
In a primary server, the window for the race is very small: the lookup
has to occur exactly between the two calls by vacuum, and there's not a
lot that happens between them (mostly just a multixact truncate). In a
standby server, however, the window is larger because the truncation is
executed as soon as the WAL record for it is replayed, but the advance
of the oldest-Xid is not executed until the next checkpoint record.
To fix in the primary, simply reverse the order of operations in
vac_truncate_clog. To fix in the standby, augment the WAL truncation
record so that the standby is aware of the new oldest-XID value and can
apply the update immediately. WAL version bumped because of this.
No backpatch, because of the low importance of the bug and its rarity.
Author: Craig Ringer Reviewed-By: Petr Jelínek, Peter Eisentraut
Discussion: https://postgr.es/m/CAMsr+YFhVtRQT1VAwC+WGbbxZZRzNou=N9Ed-FrCqkwQ8H8oJQ@mail.gmail.com
Peter Eisentraut [Thu, 19 Jan 2017 17:00:00 +0000 (12:00 -0500)]
initdb: Fix for mixed-case superuser names
The previous coding did not properly quote the user name before casting
it to regrole. To avoid all that, just pass in BOOTSTRAP_SUPERUSERID
numerically.
Also fix one place where the BOOTSTRAP_SUPERUSERID was hardcoded as 10.
Robert Haas [Thu, 19 Jan 2017 18:56:13 +0000 (13:56 -0500)]
Fix some problems in check_new_partition_bound().
Account for the fact that the highest bound less than or equal to the
upper bound might be either the lower or the upper bound of the
overlapping partition, depending on whether the proposed partition
completely contains the existing partition or merely overlaps it.
Also, we need not continue searching for even greater bound in
partition_bound_bsearch() once we find the first bound that is *equal*
to the probe, because we don't have duplicate datums. That spends
cycles needlessly.
Amit Langote, per a report from Amul Sul. Cosmetic changes by me.
Robert Haas [Thu, 19 Jan 2017 18:20:11 +0000 (13:20 -0500)]
Fix RETURNING to work correctly with partition tuple routing.
In ExecInsert(), do not switch back to the root partitioned table
ResultRelInfo until after we finish ExecProcessReturning(), so that
RETURNING projection is done using the partition's descriptor. For
the projection to work correctly, we must initialize the same for each
leaf partition during ModifyTableState initialization.
Robert Haas [Thu, 19 Jan 2017 17:30:27 +0000 (12:30 -0500)]
Fix failure to enforce partitioning contraint for internal partitions.
When a tuple is inherited into a partitioning root, no partition
constraints need to be enforced; when it is inserted into a leaf, the
parent's partitioning quals needed to be enforced. The previous
coding got both of those cases right. When a tuple is inserted into
an intermediate level of the partitioning hierarchy (i.e. a table
which is both a partition itself and in turn partitioned), it must
enforce the partitioning qual inherited from its parent. That case
got overlooked; repair.
Stephen Frost [Thu, 19 Jan 2017 17:06:21 +0000 (12:06 -0500)]
Dump sequence data based on the TableDataInfo flag
When considering a sequence's Data entry in dumpSequenceData, we were
actually looking at the sequence definition's dump flag to decide if we
should dump the data or not. That's generally fine, except for when the
sequence data entry was created by processExtensionTables() because it's
a config sequence. In that case, the sequence itself won't be marked as
dumping data because it's part of an extension, leading to the need for
processExtensionTables() to create the sequence data entry.
This leads to extension config sequence data not being included in the
dump when it should be. Fix this by looking at the sequence data's dump
flag instead, just as dumpTableData() was doing for tables (which is why
config tables were correctly being handled), and add a regression test
to make sure we don't break it moving forward.
All of this is a bit round-about since we can now represent which
components of a given dump item should be dumped out through the dump
flag. A future improvement might be to change checkExtensionMembership()
to check for config sequences/tables and set the dump flag based on that
directly, possibly removing the need for processExtensionTables().
Tom Lane [Wed, 18 Jan 2017 23:10:23 +0000 (18:10 -0500)]
Doc: improve documentation of new SRF-in-tlist behavior.
Correct a misstatement about how things used to work: we did allow nested
SRFs before, as long as no function had more than one set-returning input.
Also, attempt to document the fact that the new implementation changes the
behavior for SRFs within conditional constructs (eg CASE): the conditional
construct no longer gates whether the SRF is run, and thus cannot affect
the number of rows emitted. We might want to change this behavior, but
first it behooves us to see if we can explain it.
Minor other wordsmithing on what I wrote yesterday, too.
Andres Freund [Wed, 18 Jan 2017 20:46:50 +0000 (12:46 -0800)]
Move targetlist SRF handling from expression evaluation to new executor node.
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.
This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.
That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.
To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist. The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get
input from an underlying node.
We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.
While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e. continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones. We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.
As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.
Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node. As that's quite confusing, we're
likely to instead prohibit SRFs in those places. But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.
There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.
Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
Tom Lane [Wed, 18 Jan 2017 21:33:18 +0000 (16:33 -0500)]
Reset the proper GUC in create_index test.
Thinko in commit a4523c5aa. It doesn't really affect anything at
present, but it would be a problem if any tests added later in this
file ought to get index-only-scan plans. Back-patch, like the previous
commit, just to avoid surprises in case we add such a test and then
back-patch it.
Alvaro Herrera [Wed, 18 Jan 2017 21:06:13 +0000 (18:06 -0300)]
Change some test macros to return true booleans
These macros work fine when they are used directly in an "if" test or
similar, but as soon as the return values are assigned to boolean
variables (or passed as boolean arguments to some function), they become
bugs, hopefully caught by compiler warnings. To avoid future problems,
fix the definitions so that they return actual booleans.
To further minimize the risk that somebody uses them in back-patched
fixes that only work correctly in branches starting from the current
master and not in old ones, back-patch the change to supported branches
as appropriate.
Magnus Hagander [Wed, 18 Jan 2017 20:37:59 +0000 (21:37 +0100)]
Implement array version of jsonb_delete and operator
This makes it possible to delete multiple keys from a jsonb value by
passing in an array of text values, which makes the operaiton much
faster than individually deleting the keys (which would require copying
the jsonb structure over and over again.
Tom Lane [Wed, 18 Jan 2017 20:21:52 +0000 (15:21 -0500)]
Disable transforms that replaced AT TIME ZONE with RelabelType.
These resulted in wrong answers if the relabeled argument could be matched
to an index column, as shown in bug #14504 from Evgeniy Kozlov. We might
be able to resurrect these optimizations by adjusting the planner's
treatment of RelabelType, or by adjusting btree's rules for selecting
comparison functions, but either solution will take careful analysis
and does not sound like a fit candidate for backpatching.
I left the catalog infrastructure in place and just reduced the transform
functions to always-return-NULL. This would be necessary anyway in the
back branches, and it doesn't seem important to be more invasive in HEAD.
Bug introduced by commit b8a18ad48. Back-patch to 9.5 where that came in.
Robert Haas [Wed, 18 Jan 2017 19:43:14 +0000 (14:43 -0500)]
Add some more tests for tuple routing.
Commit a25665088d812d08bb888e961f208eaebf522050 fixed some issues with
how PartitionDispatch related code handled multi-level partitioned
tables, but didn't add any tests.
Alvaro Herrera [Wed, 18 Jan 2017 19:08:20 +0000 (16:08 -0300)]
Make messages mentioning type names more uniform
This avoids additional translatable strings for each distinct type, as
well as making our quoting style around type names more consistent
(namely, that we don't quote type names). This continues what started
as f402b9950120.
Tom Lane [Wed, 18 Jan 2017 18:44:19 +0000 (13:44 -0500)]
Avoid conflicts with collation aliases generated by stripping.
This resulted in failures depending on the order of "locale -a" output.
The original coding in initdb sorted the results, but that should be
unnecessary as long as "locale -a" doesn't print duplicate names. The
original entries will then all be non-dups, and while we might generate
duplicate aliases by stripping, they should be for different encodings and
thus not conflict. Even if the latter assumption fails somehow, it won't
be fatal because we're using if_not_exists mode for the aliases.
Tom Lane [Wed, 18 Jan 2017 17:58:20 +0000 (12:58 -0500)]
Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Peter Eisentraut [Wed, 18 Jan 2017 17:00:00 +0000 (12:00 -0500)]
Add function to import operating system collations
Move this logic out of initdb into a user-callable function. This
simplifies the code and makes it possible to update the standard
collations later on if additional operating system collations appear.
Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Euler Taveira <euler@timbira.com.br>
Peter Eisentraut [Wed, 28 Dec 2016 17:00:00 +0000 (12:00 -0500)]
Generate fmgr prototypes automatically
Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains
prototypes for all functions registered in pg_proc.h. This avoids
having to manually maintain these prototypes across a random variety of
header files. It also automatically enforces a correct function
signature, and since there are warnings about missing prototypes, it
will detect functions that are defined but not registered in
pg_proc.h (or otherwise used).
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Fujii Masao [Tue, 17 Jan 2017 08:27:32 +0000 (17:27 +0900)]
Fix an assertion failure related to an exclusive backup.
Previously multiple sessions could execute pg_start_backup() and
pg_stop_backup() to start and stop an exclusive backup at the same time.
This could trigger the assertion failure of
"FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)".
This happend because, even while pg_start_backup() was starting
an exclusive backup, other session could run pg_stop_backup()
concurrently and mark the backup as not-in-progress unconditionally.
This patch introduces ExclusiveBackupState indicating the state of
an exclusive backup. This state is used to ensure that there is only
one session running pg_start_backup() or pg_stop_backup() at
the same time, to avoid the assertion failure.
Back-patch to all supported versions.
Author: Michael Paquier Reviewed-By: Kyotaro Horiguchi and me Reported-By: Andreas Seltenreich
Discussion: <87mvktojme.fsf@credativ.de>
Tom Lane [Mon, 16 Jan 2017 20:23:11 +0000 (15:23 -0500)]
Fix check_srf_call_placement() to handle VALUES cases correctly.
INSERT ... VALUES with a single VALUES row is implemented quite differently
from the general VALUES case. A user-visible implication of that is that
we accept SRFs in the single-row case, but not in the multi-row case.
That's a historical artifact no doubt, but in view of the lack of field
complaints, I'm not excited about fixing it right now.
However, check_srf_call_placement() needs to know about this, first because
it should throw an error in the unsupported case, and second because it
should set p_hasTargetSRFs in the single-row case (because we treat that
like a SELECT tlist). That's an oversight in commit a4c35ea1c.
To fix, split EXPR_KIND_VALUES into two values. So far as I can see,
this is the only place where we need to distinguish the two cases at
present; but there might be more later.
Tom Lane [Mon, 16 Jan 2017 18:53:40 +0000 (13:53 -0500)]
Fix NULL pointer dereference in tuplesort.c.
Oversight in commit e94568ecc. This could cause a crash when an external
datum tuplesort of a pass-by-value type required multiple passes.
Per report from Mithun Cy.
Magnus Hagander [Mon, 16 Jan 2017 12:56:43 +0000 (13:56 +0100)]
Make pg_basebackup use temporary replication slots
Temporary replication slots will be used by default when wal streaming
is used and no slot name is specified with -S. If a slot name is
specified, then a permanent slot with that name is used. If --no-slot is
specified, then no permanent or temporary slot will be used.
Temporary slots are only used on 10.0 and newer, of course.
Tom Lane [Sun, 15 Jan 2017 19:09:35 +0000 (14:09 -0500)]
Fix matching of boolean index columns to sort ordering.
Normally, if we have a WHERE clause like "indexcol = constant",
the planner will figure out that that index column can be ignored
when determining whether the index has a desired sort ordering.
But this failed to work for boolean index columns, because a
condition like "boolcol = true" is canonicalized to just "boolcol"
which does not give rise to an EquivalenceClass. Add a check to
allow the same type of deduction to be made in this case too.
Per a complaint from Dima Pavlov. Arguably this is a bug, but given the
limited impact and the small number of complaints so far, I won't risk
destabilizing plans in stable branches by back-patching.
Tom Lane [Sat, 14 Jan 2017 21:17:30 +0000 (16:17 -0500)]
Teach contrib/pg_stat_statements to handle multi-statement commands better.
Make use of the statement boundary info added by commit ab1f0c822
to let pg_stat_statements behave more sanely when multiple SQL queries
are jammed into one query string. It now records just the relevant
part of the source string, not the whole thing, for each individual
query.
Even when no multi-statement strings are involved, users may notice small
changes in the output: leading and trailing whitespace and semicolons will
be stripped from statements, which did not happen before.
Also, significantly expand pg_stat_statements' regression test script.
Fabien Coelho, reviewed by Craig Ringer and Kyotaro Horiguchi,
some mods by me
Tom Lane [Sat, 14 Jan 2017 21:02:35 +0000 (16:02 -0500)]
Change representation of statement lists, and add statement location info.
This patch makes several changes that improve the consistency of
representation of lists of statements. It's always been the case
that the output of parse analysis is a list of Query nodes, whatever
the types of the individual statements in the list. This patch brings
similar consistency to the outputs of raw parsing and planning steps:
* The output of raw parsing is now always a list of RawStmt nodes;
the statement-type-dependent nodes are one level down from that.
* The output of pg_plan_queries() is now always a list of PlannedStmt
nodes, even for utility statements. In the case of a utility statement,
"planning" just consists of wrapping a CMD_UTILITY PlannedStmt around
the utility node. This list representation is now used in Portal and
CachedPlan plan lists, replacing the former convention of intermixing
PlannedStmts with bare utility-statement nodes.
Now, every list of statements has a consistent head-node type depending
on how far along it is in processing. This allows changing many places
that formerly used generic "Node *" pointers to use a more specific
pointer type, thus reducing the number of IsA() tests and casts needed,
as well as improving code clarity.
Also, the post-parse-analysis representation of DECLARE CURSOR is changed
so that it looks more like EXPLAIN, PREPARE, etc. That is, the contained
SELECT remains a child of the DeclareCursorStmt rather than getting flipped
around to be the other way. It's now true for both Query and PlannedStmt
that utilityStmt is non-null if and only if commandType is CMD_UTILITY.
That allows simplifying a lot of places that were testing both fields.
(I think some of those were just defensive programming, but in many places,
it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.)
Because PlannedStmt carries a canSetTag field, we're also able to get rid
of some ad-hoc rules about how to reconstruct canSetTag for a bare utility
statement; specifically, the assumption that a utility is canSetTag if and
only if it's the only one in its list. While I see no near-term need for
relaxing that restriction, it's nice to get rid of the ad-hocery.
The API of ProcessUtility() is changed so that what it's passed is the
wrapper PlannedStmt not just the bare utility statement. This will affect
all users of ProcessUtility_hook, but the changes are pretty trivial; see
the affected contrib modules for examples of the minimum change needed.
(Most compilers should give pointer-type-mismatch warnings for uncorrected
code.)
There's also a change in the API of ExplainOneQuery_hook, to pass through
cursorOptions instead of expecting hook functions to know what to pick.
This is needed because of the DECLARE CURSOR changes, but really should
have been done in 9.6; it's unlikely that any extant hook functions
know about using CURSOR_OPT_PARALLEL_OK.
Finally, teach gram.y to save statement boundary locations in RawStmt
nodes, and pass those through to Query and PlannedStmt nodes. This allows
more intelligent handling of cases where a source query string contains
multiple statements. This patch doesn't actually do anything with the
information, but a follow-on patch will. (Passing this information through
cleanly is the true motivation for these changes; while I think this is all
good cleanup, it's unlikely we'd have bothered without this end goal.)
catversion bump because addition of location fields to struct Query
affects stored rules.
This patch is by me, but it owes a good deal to Fabien Coelho who did
a lot of preliminary work on the problem, and also reviewed the patch.
Tom Lane [Sat, 14 Jan 2017 18:27:47 +0000 (13:27 -0500)]
Throw suitable error for COPY TO STDOUT/FROM STDIN in a SQL function.
A client copy can't work inside a function because the FE/BE wire protocol
doesn't support nesting of a COPY operation within query results. (Maybe
it could, but the protocol spec doesn't suggest that clients should support
this, and libpq for one certainly doesn't.)
In most PLs, this prohibition is enforced by spi.c, but SQL functions don't
use SPI. A comparison of _SPI_execute_plan() and init_execution_state()
shows that rejecting client COPY is the only discrepancy in what they
allow, so there's no other similar bugs.
This is an astonishingly ancient oversight, so back-patch to all supported
branches.
Peter Eisentraut [Fri, 13 Jan 2017 17:00:00 +0000 (12:00 -0500)]
pg_ctl: Change default to wait for all actions
The different actions in pg_ctl had different defaults for -w and -W,
mostly for historical reasons. Most users will want the -w behavior, so
make that the default.
Remove the -w option in most example and test code, so avoid confusion
and reduce verbosity. pg_upgrade is not touched, so it can continue to
work with older installations.
Reviewed-by: Beena Emerson <memissemerson@gmail.com> Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
Tom Lane [Fri, 13 Jan 2017 22:32:37 +0000 (17:32 -0500)]
Fix some more regression test row-order-instability issues.
Commit 0563a3a8b just introduced another instance of the same unsafe
testing methodology that appeared in 2ac3ef7a0, which I corrected in 257d81572. Robert/Amit, please stop doing that.
Also look through the rest of f0e44751d's test cases, and correct some
other queries with underdetermined ordering of results from the system
catalogs. These haven't failed in the buildfarm yet, but I don't
have any confidence in that staying true.
Tom Lane [Fri, 13 Jan 2017 21:59:52 +0000 (16:59 -0500)]
In PL/Tcl tests, don't choke if optional error fields are missing.
This fixes a portability issue introduced by commit 961bed020: with a
compiler that doesn't support PG_FUNCNAME_MACRO, the "funcname" field of
errorCode won't be provided, leading to a failure of the unset command.
I added -nocomplain to the unset commands for filename and lineno too, just
in case, though I know of no platform that wouldn't populate those fields.
(BTW, -nocomplain is new in Tcl 8.4, but fortunately we dropped support
for pre-8.4 Tcl some time ago.)
Peter Eisentraut [Fri, 13 Jan 2017 17:00:00 +0000 (12:00 -0500)]
pg_upgrade: Fix for changed pg_ctl default stop mode
In 9.5, the default pg_ctl stop mode was changed from "smart" to "fast".
pg_upgrade still thought the default mode was "smart" and only specified
the mode when "fast" was asked for. This results in using "fast" all
the time. It's not clear what the effect in practice is, but fix it
nonetheless to restore the previous behavior.
Robert Haas [Fri, 13 Jan 2017 19:03:52 +0000 (14:03 -0500)]
Fix a bug in how we generate partition constraints.
Move the code for doing parent attnos to child attnos mapping for Vars
in partition constraint expressions to a separate function
map_partition_varattnos() and call it from the appropriate places.
Doing it in get_qual_from_partbound(), as is now, would produce wrong
result in certain multi-level partitioning cases, because it only
considers the current pair of parent-child relations. In certain
multi-level partitioning cases, attnums for the same key attribute(s)
might differ between various levels causing the same attribute to be
numbered differently in different instances of the Var corresponding
to a given attribute.
With this commit, in generate_partition_qual(), we first generate the
the whole partition constraint (considering all levels of partitioning)
and then do the mapping, so that Vars in the final expression are
numbered according the leaf relation (to which it is supposed to apply).
Robert Haas [Fri, 13 Jan 2017 18:29:31 +0000 (13:29 -0500)]
Fix cardinality estimates for parallel joins.
For a partial path, the cardinality estimate needs to reflect the
number of rows we think each worker will see, rather than the total
number of rows; otherwise, costing will go wrong. The previous coding
got this completely wrong for parallel joins.
Unfortunately, this change may destabilize plans for users of 9.6 who
have enabled parallel query, but since 9.6 is still fairly new I'm
hoping expectations won't be too settled yet. Also, this is really a
brown-paper-bag bug, so leaving it unfixed for the entire lifetime of
9.6 seems unwise.
Related reports (whose import I initially failed to recognize) by
Tomas Vondra and Tom Lane.
Tom Lane [Thu, 12 Jan 2017 23:59:46 +0000 (18:59 -0500)]
Fix field order in struct catcache.
Somebody failed to grasp the point of having the #ifdef CATCACHE_STATS
fields at the end of the struct. Put that back the way it should be,
and add a comment making it more explicit why it should be that way.
Stephen Frost [Wed, 11 Jan 2017 20:45:50 +0000 (15:45 -0500)]
pg_restore: Don't allow non-positive number of jobs
pg_restore will currently accept invalid values for the number of
parallel jobs to run (eg: -1), unlike pg_dump which does check that the
value provided is reasonable.
Worse, '-1' is actually a valid, independent, parameter (as an alias for
--single-transaction), leading to potentially completely unexpected
results from a command line such as:
-> pg_restore -j -1
Where a user would get neither parallel jobs nor a single-transaction.
Add in validity checking of the parallel jobs option, as we already have
in pg_dump, before we try to open up the archive. Also move the check
that we haven't been asked to run more parallel jobs than possible on
Windows to the same place, so we do all the option validity checking
before opening the archive.
Back-patch all the way, though for 9.2 we're adding the Windows-specific
check against MAXIMUM_WAIT_OBJECTS as that check wasn't back-patched
originally.
Stephen Frost [Tue, 10 Jan 2017 16:34:51 +0000 (11:34 -0500)]
pg_dump: Strict names with no matching schema
When using pg_dump --strict-names and a schema pattern which doesn't
match any schemas (eg: --schema='nonexistant*'), we were incorrectly
throwing an error claiming no tables were found when, really, there
were no schemas found:
-> pg_dump --strict-names --schema='nonexistant*'
pg_dump: no matching tables were found for pattern "nonexistant*"
Fix that by changing the error message to say 'schemas' instead, since
that is what we are actually complaining about.
Noticed while testing pg_dump error cases.
Back-patch to 9.6 where --strict-names and this error message were
introduced.
Tom Lane [Mon, 9 Jan 2017 22:47:02 +0000 (17:47 -0500)]
Fix error handling in pltcl_returnnext.
We can't throw elog(ERROR) out of a Tcl command procedure; we have
to catch the error and return TCL_ERROR to the Tcl interpreter.
pltcl_returnnext failed to meet this requirement, so that errors
detected by pltcl_build_tuple_result or other functions called here
led to longjmp'ing out of the Tcl interpreter and thereby leaving it
in a bad state. Use the existing subtransaction support to prevent
that. Oversight in commit 26abb50c4, found more or less accidentally
by the buildfarm thanks to the tests added in 961bed020.
Alvaro Herrera [Mon, 9 Jan 2017 22:26:58 +0000 (19:26 -0300)]
Fix ALTER TABLE / SET TYPE for irregular inheritance
If inherited tables don't have exactly the same schema, the USING clause
in an ALTER TABLE / SET DATA TYPE misbehaves when applied to the
children tables since commit 9550e8348b79. Starting with that commit,
the attribute numbers in the USING expression are fixed during parse
analysis. This can lead to bogus errors being reported during
execution, such as:
ERROR: attribute 2 has wrong type
DETAIL: Table has type smallint, but query expects integer.
Since it wouldn't do to revert to the original coding, we now apply a
transformation to map the attribute numbers to the correct ones for each
child.
Reported by Justin Pryzby
Analysis by Tom Lane; patch by me.
Discussion: https://postgr.es/m/20170102225618.GA10071@telsasoft.com
Alvaro Herrera [Mon, 9 Jan 2017 21:19:29 +0000 (18:19 -0300)]
BRIN revmap pages are not standard pages ...
... and therefore we ought not to tell XLogRegisterBuffer the opposite,
when writing XLog for a brin update that moves the index tuple to a
different page. Otherwise, xlog insertion would try to "compress the
hole" when producing a full-page image for it; but since we don't update
pd_lower/upper, the hole covers the whole page. On WAL replay, the
revmap page becomes empty and so the entire portion of the index is
useless and needs to be recomputed.
This is low-probability: a BRIN update only moves an index tuple to a
different page when the summary tuple is larger than the existing one,
which doesn't happen with fixed-width datatypes. Also, the revmap
page must be first after a checkpoint.
Report and patch: Kuntal Ghosh
Bug is alleged to have detected by a WAL-consistency-checking tool.
Discussion: https://postgr.es/m/CAGz5QCJ=00UQjScSEFbV=0qO5ShTZB9WWz_Fm7+Wd83zPs9Geg@mail.gmail.com
I posted a test case demonstrating the problem, but I'm refraining from
adding it to the test suite; if the WAL consistency tool makes it in,
that will be a better way to catch this from regressing. (We should
definitely have someting that causes not-same-page updates, though.)
Tom Lane [Sat, 7 Jan 2017 21:02:16 +0000 (16:02 -0500)]
Get rid of ParseState.p_value_substitute; use a columnref hook instead.
I noticed that p_value_substitute, which is a single-purpose kluge I added
in 2002 (commit b0422b215), could be replaced by having domainAddConstraint
install a parser hook that looks for the name "value". The parser hook
code only dates back to 2009, so it's not surprising that we had to kluge
this in 2002, but we can do it more cleanly now.
Tom Lane [Sat, 7 Jan 2017 20:34:28 +0000 (15:34 -0500)]
Improve documentation of struct ParseState.
I got annoyed about how some fields of ParseState were documented in the
struct's block comment and some weren't; not all of the latter are trivial.
Fix that. Also reorder a couple of fields that seem to have been placed
rather randomly, or maybe with an idea of avoiding padding space; but there
are never so many ParseStates in existence at one time that we ought to
value pad space over readability.
Stephen Frost [Fri, 6 Jan 2017 21:29:31 +0000 (16:29 -0500)]
Add basic pg_dumpall/pg_restore TAP tests
For reasons unknown, pg_dumpall and pg_restore managed to escape the
basic set of TAP tests that were added for pg_dump in 6bd356c3, so
let's get them added now. A few minor adjustments are also made to the
dump/restore tests to improve code coverage for pg_restore/pg_dumpall.
Tom Lane [Fri, 6 Jan 2017 21:21:57 +0000 (16:21 -0500)]
Merge two copies of tuple-building code in pltcl.c.
Make pltcl_trigger_handler() construct modified tuples using
pltcl_build_tuple_result(), rather than its own copy of essentially
the same logic. This results in slightly different message wording for
the error cases, and in one case a different SQLSTATE, but it seems
unlikely that any existing applications are depending on any of those
details.
While at it, fix a typo in commit 26abb50c4: pltcl_build_tuple_result was
applying encoding conversion in the wrong direction. That would be a
back-patchable bug fix, except the code hasn't shipped yet.
Stephen Frost [Fri, 6 Jan 2017 20:27:47 +0000 (15:27 -0500)]
Protect against NULL-dereference in pg_dump
findTableByOid() is allowed to return NULL and we should therefore be
checking for that case. getOwnedSeqs() and dumpSequence() shouldn't
ever actually see this happen, but given odd circumstances it might and
commit f9e439b1 probably shouldn't have removed that check.
Pointed out by Coverity. Initial patch from Michael Paquier.
Back-patch to 9.6, where that commit had removed the check.
Tom Lane [Fri, 6 Jan 2017 19:12:52 +0000 (14:12 -0500)]
Invalidate cached plans on FDW option changes.
This fixes problems where a plan must change but fails to do so,
as seen in a bug report from Rajkumar Raghuwanshi.
For ALTER FOREIGN TABLE OPTIONS, do this through the standard method of
forcing a relcache flush on the table. For ALTER FOREIGN DATA WRAPPER
and ALTER SERVER, just flush the whole plan cache on any change in
pg_foreign_data_wrapper or pg_foreign_server. That matches the way
we handle some other low-probability cases such as opclass changes, and
it's unclear that the case arises often enough to be worth working harder.
Besides, that gives a patch that is simple enough to back-patch with
confidence.
Back-patch to 9.3. In principle we could apply the code change to 9.2 as
well, but (a) we lack postgres_fdw to test it with, (b) it's doubtful that
anyone is doing anything exciting enough with FDWs that far back to need
this desperately, and (c) the patch doesn't apply cleanly.
Patch originally by Amit Langote, reviewed by Etsuro Fujita and Ashutosh
Bapat, who each contributed substantial changes as well.
This commit purported to use a variable hash seed for Partial
HashAggregate, but actually did the opposite - it made us use a
variable seed for any HashAggregate that is NOT partial. Woops.
Robert Haas [Thu, 5 Jan 2017 18:12:16 +0000 (13:12 -0500)]
Fix possible leak of semaphore count.
Commit 4aec49899e5782247e134f94ce1c6ee926f88e1c reorganized the order
of operations here so that we no longer increment the number of "extra
waits" before locking the semaphore, but it did not change the
starting value of extraWaits from 0 to -1 to compensate. In the worst
case, this could leak a semaphore count, but that seems to be unlikely
in practice.
Robert Haas [Thu, 5 Jan 2017 17:27:09 +0000 (12:27 -0500)]
Fix possible crash reading pg_stat_activity.
With the old code, a backend that read pg_stat_activity without ever
having executed a parallel query might see a backend in the midst of
executing one waiting on a DSA LWLock, resulting in a crash. The
solution is for backends to register the tranche at startup time, not
the first time a parallel query is executed.
Report by Andreas Seltenreich. Patch by me, reviewed by Thomas Munro.
Tom Lane [Thu, 5 Jan 2017 16:33:51 +0000 (11:33 -0500)]
Fix handling of empty arrays in array_fill().
array_fill(..., array[0]) produced an empty array, which is probably
what users expect, but it was a one-dimensional zero-length array
which is not our standard representation of empty arrays. Also, for
no very good reason, it rejected empty input arrays; that case should
be allowed and produce an empty output array.
In passing, remove the restriction that the input array(s) have lower
bound 1. That seems rather pointless, and it would have needed extra
complexity to make the check deal with empty input arrays.
Per bug #14487 from Andrew Gierth. It's been broken all along, so
back-patch to all supported branches.
Tom Lane [Wed, 4 Jan 2017 23:00:11 +0000 (18:00 -0500)]
Handle OID column inheritance correctly in ALTER TABLE ... INHERIT.
Inheritance operations must treat the OID column, if any, much like
regular user columns. But MergeAttributesIntoExisting() neglected to
do that, leading to weird results after a table with OIDs is associated
to a parent with OIDs via ALTER TABLE ... INHERIT.
Report and patch by Amit Langote, reviewed by Ashutosh Bapat, some
adjustments by me. It's been broken all along, so back-patch to
all supported branches.
Robert Haas [Wed, 4 Jan 2017 21:30:16 +0000 (16:30 -0500)]
Improve documentation of timestamp internal representation.
Be more clear that we represent timestamps in microseconds when
integer timestamps are used, and in fractional seconds when
floating-point timestamps are used.
Robert Haas [Wed, 4 Jan 2017 19:36:34 +0000 (14:36 -0500)]
Fix reporting of constraint violations for table partitioning.
After a tuple is routed to a partition, it has been converted from the
root table's row type to the partition's row type. ExecConstraints
needs to report the failure using the original tuple and the parent's
tuple descriptor rather than the ones for the selected partition.
Tom Lane [Wed, 4 Jan 2017 18:36:44 +0000 (13:36 -0500)]
Prefer int-wide pg_atomic_flag over char-wide when using gcc intrinsics.
configure can only probe the existence of gcc intrinsics, not how well
they're implemented, and unfortunately the answer is sometimes "badly".
In particular we've found that multiple compilers fail to implement
char-width __sync_lock_test_and_set() correctly on PPC; and even a correct
implementation would necessarily be pretty inefficient, since that hardware
has only a word-wide primitive to work with.
Given the knowledge we've accumulated in s_lock.h, it appears that it's
best to rely on int-width TAS operations on most non-Intel architectures.
Hence, pick int not char when both are nominally available to us in
generic-gcc.h (note that that code is not used for x86[_64]).
Back-patch to fix regression test failures on FreeBSD/PPC. Ordinarily
back-patching a change like this would be verboten because of ABI breakage.
But since pg_atomic_flag is not yet used in any Postgres data structure,
there's no ABI to break. It seems safer to back-patch to avoid possible
gotchas, if someday we do back-patch something that uses pg_atomic_flag.
Robert Haas [Wed, 4 Jan 2017 18:05:29 +0000 (13:05 -0500)]
Move partition_tuple_slot out of EState.
Commit 2ac3ef7a01df859c62d0a02333b646d65eaec5ff added a TupleTapleSlot
for partition tuple slot to EState (es_partition_tuple_slot) but it's
more logical to have it as part of ModifyTableState
(mt_partition_tuple_slot) and CopyState (partition_tuple_slot).
Tom Lane [Wed, 4 Jan 2017 17:43:52 +0000 (12:43 -0500)]
Re-allow SSL passphrase prompt at server start, but not thereafter.
Leave OpenSSL's default passphrase collection callback in place during
the first call of secure_initialize() in server startup. Although that
doesn't work terribly well in daemon contexts, some people feel we should
not break it for anyone who was successfully using it before. We still
block passphrase demands during SIGHUP, meaning that you can't adjust SSL
configuration on-the-fly if you used a passphrase, but this is no worse
than what it was before commit de41869b6. And we block passphrase demands
during EXEC_BACKEND reloads; that behavior wasn't useful either, but at
least now it's documented.
Tweak some related log messages for more readability, and avoid issuing
essentially duplicate messages about reload failure caused by a passphrase.
Robert Haas [Wed, 4 Jan 2017 17:03:40 +0000 (12:03 -0500)]
Update obsolete comments in lwlock.h.
The typical size of an LWLock is now 16 bytes even on 64-bit platforms,
and the size of slock_t is now irrelevant. But pg_atomic_uint32 can
(perhaps surprisingly) still be larger than 4 bytes, so there's still
some marginal point to allowing LWLOCK_MINIMAL_SIZE == 64.