Tom Lane [Tue, 20 Mar 2018 03:59:17 +0000 (23:59 -0400)]
Prevent query-lifespan memory leakage of SP-GiST traversal values.
The original coding of the SP-GiST scan traversalValue feature (commit ccd6eb49a) arranged for traversal values to be stored in the query's main
executor context. That's fine if there's only one index scan per query,
but if there are many, we have a memory leak as successive scans create
new traversal values. Fix it by creating a separate memory context for
traversal values, which we can reset during spgrescan(). Back-patch
to 9.6 where this code was introduced.
In principle, adding the traversalCxt field to SpGistScanOpaqueData
creates an ABI break in the back branches. But I (tgl) have little
sympathy for extensions including spgist_private.h, so I'm not very
worried about that. Alternatively we could stick the new field at the
end of the struct in back branches, but that has its own downsides.
Tom Lane [Mon, 19 Mar 2018 22:49:53 +0000 (18:49 -0400)]
Fix some corner-case issues in REFRESH MATERIALIZED VIEW CONCURRENTLY.
refresh_by_match_merge() has some issues in the way it builds a SQL
query to construct the "diff" table:
1. It doesn't require the selected unique index(es) to be indimmediate.
2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.
3. It doesn't check that the indexes are btrees.
4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query. (This would have been a
security bug before CVE-2018-1058.)
5. It's not careful about indexes on system columns.
The way to fix #4 is to make use of the existing code in ri_triggers.c
for generating an arbitrary binary operator clause. I chose to move
that to ruleutils.c, since that seems a more reasonable place to be
exporting such functionality from than ri_triggers.c.
While #1, #3, and #5 are just latent given existing feature restrictions,
and #2 doesn't arise in the core system for lack of alternate opclasses
with different equality behaviors, #4 seems like an issue worth
back-patching. That's the bulk of the change anyway, so just back-patch
the whole thing to 9.4 where this code was introduced.
Andrew Dunstan [Mon, 19 Mar 2018 21:55:36 +0000 (08:25 +1030)]
Don't use an Msys virtual path to create a tablespace
The new unlogged_reinit recovery tests create a new tablespace using
TestLib.pm's tempdir. However, on msys that function returns a virtual
path that isn't understood by Postgres. Here we add a new function to
TestLib.pm to turn such a path into a real path on the underlying file
system, and use it in the new test to create the tablespace. The new
function is essentially a NOOP everywhere but msys.
Tom Lane [Mon, 19 Mar 2018 21:23:07 +0000 (17:23 -0400)]
Fix performance hazard in REFRESH MATERIALIZED VIEW CONCURRENTLY.
Jeff Janes discovered that commit 7ca25b7de made one of the queries run by
REFRESH MATERIALIZED VIEW CONCURRENTLY perform badly. The root cause is
bad cardinality estimation for correlated quals, but a principled solution
to that problem is some way off, especially since the planner lacks any
statistics about whole-row variables. Moreover, in non-error cases this
query produces no rows, meaning it must be run to completion; but use of
LIMIT 1 encourages the planner to pick a fast-start, slow-completion plan,
exactly not what we want. Remove the LIMIT clause, and instead rely on
the count parameter we pass to SPI_execute() to prevent excess work if the
query does return some rows.
While we've heard no field reports of planner misbehavior with this query,
it could be that people are having performance issues that haven't reached
the level of pain needed to cause a bug report. In any case, that LIMIT
clause can't possibly do anything helpful with any existing version of the
planner, and it demonstrably can cause bad choices in some cases, so
back-patch to 9.4 where the code was introduced.
Alvaro Herrera [Mon, 19 Mar 2018 20:43:57 +0000 (17:43 -0300)]
Fix state reversal after partition tuple routing
We make some changes to ModifyTableState and the EState it uses whenever
we route tuples to partitions; but we weren't restoring properly in all
cases, possibly causing crashes when partitions with different tuple
descriptors are targeted by tuples inserted in the same command.
Refactor some code, creating ExecPrepareTupleRouting, to encapsulate the
needed state changing logic, and have it invoked one level above its
current place (ie. put it in ExecModifyTable instead of ExecInsert);
this makes it all more readable.
Add a test case to exercise this.
We don't support having views as partitions; and since only views can
have INSTEAD OF triggers, there is no point in testing for INSTEAD OF
when processing insertions into a partitioned table. Remove code that
appears to support this (but which is actually never relevant.)
In passing, fix location of some very confusing comments in
ModifyTableState.
Reported-by: Amit Langote
Author: Etsuro Fujita, Amit Langote
Discussion: https://postgr/es/m/0473bf5c-57b1-f1f7-3d58-455c2230bc5f@lab.ntt.co.jp
Robert Haas [Mon, 19 Mar 2018 15:55:38 +0000 (11:55 -0400)]
Generate a separate upper relation for each stage of setop planning.
Commit 3fc6e2d7f5b652b417fa6937c34de2438d60fa9f made setop planning
stages return paths rather than plans, but all such paths were loosely
associated with a single RelOptInfo, and only the final path was added
to the RelOptInfo. Even at the time, it was foreseen that this should
be changed, because there is otherwise no good way for a single stage
of setop planning to return multiple paths. With this patch, each
stage of set operation planning now creates a separate RelOptInfo;
these are distinguished by using appropriate relid sets. Note that
this patch does nothing whatsoever about actually returning multiple
paths for the same set operation; it just makes it possible for a
future patch to do so.
Along the way, adjust things so that create_upper_paths_hook is called
for each of these new RelOptInfos rather than just once, since that
might be useful to extensions using that hook. It might be a good
to provide an FDW API here as well, but I didn't try to do that for
now.
Patch by me, reviewed and tested by Ashutosh Bapat and Rajkumar
Raghuwanshi.
Robert Haas [Mon, 19 Mar 2018 15:54:56 +0000 (11:54 -0400)]
Rewrite recurse_union_children to iterate, rather than recurse.
Also, rename it to plan_union_chidren, so the old name wasn't
very descriptive. This results in a small net reduction in code,
seems at least to me to be easier to understand, and saves
space on the process stack.
Patch by me, reviewed and tested by Ashutosh Bapat and Rajkumar
Raghuwanshi.
Tom Lane [Sun, 18 Mar 2018 19:10:28 +0000 (15:10 -0400)]
Doc: note that statement-level view triggers require an INSTEAD OF trigger.
If a view lacks an INSTEAD OF trigger, DML on it can only work by rewriting
the command into a command on the underlying base table(s). Then we will
fire triggers attached to those table(s), not those for the view. This
seems appropriate from a consistency standpoint, but nowhere was the
behavior explicitly documented, so let's do that.
There was some discussion of throwing an error or warning if a statement
trigger is created on a view without creating a row INSTEAD OF trigger.
But a simple implementation of that would result in dump/restore ordering
hazards. Given that it's been like this all along, and we hadn't heard
a complaint till now, a documentation improvement seems sufficient.
Per bug #15106 from Pu Qun. Back-patch to all supported branches.
Magnus Hagander [Sun, 18 Mar 2018 12:08:25 +0000 (13:08 +0100)]
Fix pg_recvlogical for pre-10 versions
In e170b8c8, protection against modified search_path was added. However,
PostgreSQL versions prior to 10 does not accept SQL commands over a
replication connection, so the protection would generate a syntax error.
Since we cannot run SQL commands on it, we are also not vulnerable to
the issue that e170b8c8 fixes, so we can just skip this command for
older versions.
Tom Lane [Sat, 17 Mar 2018 19:38:15 +0000 (15:38 -0400)]
Fix overflow handling in plpgsql's integer FOR loops.
The test to exit the loop if the integer control value would overflow
an int32 turns out not to work on some ICC versions, as it's dependent
on the assumption that the compiler will execute the code as written
rather than "optimize" it. ICC lacks any equivalent of gcc's -fwrapv
switch, so it was optimizing on the assumption of no integer overflow,
and that breaks this. Rewrite into a form that in fact does not
do any overflowing computations.
Per Tomas Vondra and buildfarm member fulmar. It's been like this
for a long time, although it was not till we added a regression test
case covering the behavior (in commit dd2243f2a) that the problem
became apparent. Back-patch to all supported versions.
Tom Lane [Sat, 17 Mar 2018 18:59:31 +0000 (14:59 -0400)]
Fix WHERE CURRENT OF when the referenced cursor uses an index-only scan.
"UPDATE/DELETE WHERE CURRENT OF cursor_name" failed, with an error message
like "cannot extract system attribute from virtual tuple", if the cursor
was using a index-only scan for the target table. Fix it by digging the
current TID out of the indexscan state.
It seems likely that the same failure could occur for CustomScan plans
and perhaps some FDW plan types, so that leaving this to be treated as an
internal error with an obscure message isn't as good an idea as it first
seemed. Hence, add a bit of heaptuple.c infrastructure to let us deliver
a more on-topic message. I chose to make the message match what you get
for the case where execCurrentOf can't identify the target scan node at
all, "cursor "foo" is not a simply updatable scan of table "bar"".
Perhaps it should be different, but we can always adjust that later.
In the future, it might be nice to provide hooks that would let custom
scan providers and/or FDWs deal with this in other ways; but that's
not a suitable topic for a back-patchable bug fix.
It's been like this all along, so back-patch to all supported branches.
Peter Eisentraut [Sat, 17 Mar 2018 12:56:50 +0000 (08:56 -0400)]
Set libpq sslcompression to off by default
Since SSL compression is no longer recommended, turn the default in
libpq from on to off.
OpenSSL 1.1.0 and many distribution packages already turn compression
off by default, so such a server won't accept compression anyway. So
this will mainly affect users of older OpenSSL installations.
Also update the documentation to make clear that this setting is no
longer recommended.
Peter Eisentraut [Mon, 26 Feb 2018 18:28:38 +0000 (13:28 -0500)]
Add ssl_passphrase_command setting
This allows specifying an external command for prompting for or
otherwise obtaining passphrases for SSL key files. This is useful
because in many cases there is no TTY easily available during service
startup.
Also add a setting ssl_passphrase_command_supports_reload, which allows
supporting SSL configuration reload even if SSL files need passphrases.
Andres Freund [Sat, 17 Mar 2018 06:13:12 +0000 (23:13 -0700)]
Make ExplainPropertyInteger accept 64bit input, remove *Long variant.
'long' is not useful type across platforms, as it's 32bit on 32 bit
platforms, and even on some 64bit platforms (e.g. windows) it's still
only 32bits wide.
As ExplainPropertyInteger should never be performance critical, change
it to accept a 64bit argument and remove ExplainPropertyLong.
Author: Andres Freund
Discussion: https://postgr.es/m/20180314164832.n56wt7zcbpzi6zxe@alap3.anarazel.de
Tom Lane [Fri, 16 Mar 2018 20:03:45 +0000 (16:03 -0400)]
Fix query-lifespan memory leakage in repeatedly executed hash joins.
ExecHashTableCreate allocated some memory that wasn't freed by
ExecHashTableDestroy, specifically the per-hash-key function information.
That's not a huge amount of data, but if one runs a query that repeats
a hash join enough times, it builds up. Fix by arranging for the data
in question to be kept in the hashtable's hashCxt instead of leaving it
"loose" in the query-lifespan executor context. (This ensures that we'll
also clean up anything that the hash functions allocate in fn_mcxt.)
Per report from Amit Khandekar. It's been like this forever, so back-patch
to all supported branches.
Tom Lane [Fri, 16 Mar 2018 17:44:34 +0000 (13:44 -0400)]
Doc: explicitly point out that enum values can't be dropped.
This was not stated in so many words anywhere. Document it to make
clear that it's a design limitation and not just an oversight or
documentation omission.
Peter Eisentraut [Sat, 17 Feb 2018 01:44:15 +0000 (20:44 -0500)]
Rename TransactionChain functions
We call this thing a "transaction block" everywhere except in a few
functions, where it is mysteriously called a "transaction chain". In
the SQL standard, a transaction chain is something different. So rename
these functions to match the common terminology.
Tom Lane [Fri, 16 Mar 2018 16:48:13 +0000 (12:48 -0400)]
Mop-up for letting VOID-returning SQL functions end with a SELECT.
Part of the intent in commit fd1a421fe was to allow SQL functions that are
declared to return VOID to contain anything, including an unrelated final
SELECT, the same as SQL-language procedures can. However, the planner's
inlining logic didn't get that memo. Fix it, and add some regression tests
covering this area, since evidently we had none.
In passing, clean up some typos in comments in create_function_3.sql,
and get rid of its none-too-safe assumption that DROP CASCADE notice
output is immutably ordered.
Tom Lane [Thu, 15 Mar 2018 21:08:51 +0000 (17:08 -0400)]
Clean up duplicate table and function names in regression tests.
Many of the objects we create during the regression tests are put in the
public schema, so that using the same names in different regression tests
creates a hazard of test failures if any two such scripts run concurrently.
This patch cleans up a bunch of latent hazards of that sort, as well as two
live hazards.
The current situation in this regard is far worse than it was a year or two
back, because practically all of the partitioning-related test cases have
reused table names with enthusiasm. I despaired of cleaning up that mess
within the five most-affected tests (create_table, alter_table, insert,
update, inherit); fortunately those don't run concurrently.
Other than partitioning problems, most of the issues boil down to using
names like "foo", "bar", "tmp", etc, without thought for the fact that
other test scripts might use similar names concurrently. I've made an
effort to make all such names more specific.
One of the live hazards was that commit 7421f4b8 caused with.sql to
create a table named "test", conflicting with a similarly-named table
in alter_table.sql; this was exposed in the buildfarm recently.
The other one was that join.sql and transactions.sql both create tables
named "foo" and "bar"; but join.sql's uses of those names date back
only to December or so.
Since commit 7421f4b8 was back-patched to v10, back-patch a minimal
fix for that problem. The rest of this is just future-proofing.
Robert Haas [Thu, 15 Mar 2018 18:43:58 +0000 (14:43 -0400)]
Split create_grouping_paths into degenerate and non-degenerate cases.
There's no functional change here, or at least I hope there isn't,
just code rearrangement. The rearrangement is motivated by
partition-wise aggregate, which doesn't need to consider the
degenerate case but wants to reuse the logic for the ordinary case.
Based loosely on a patch from Ashutosh Bapat and Jeevan Chalke, but I
whacked it around pretty heavily. The larger patch series of which
this patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.
Tom Lane [Thu, 15 Mar 2018 18:00:31 +0000 (14:00 -0400)]
Clean up duplicate role and schema names in regression tests.
Since these names are global, using the same ones in different regression
tests creates a hazard of test failures if any two such scripts run
concurrently. Let's establish a policy of not doing that. In the cases
where a conflict existed, I chose to rename both sides: in principle one
script or the other could've been left in possession of the common name,
but that seems to just invite more trouble of the same sort.
There are a number of places where scripts are using names that seem
unduly generic, but in the absence of actual conflicts I left them alone.
In addition, fix insert.sql's use of "someone_else" as a role name.
That's a flat out violation of longstanding project policy, so back-patch
that change to v10 where the usage appeared. The rest of this is just
future-proofing, as no two of these scripts are actually run concurrently
in the existing parallel_schedule.
Conflicts of schema-qualified names also exist, but will be dealt with
separately.
Peter Eisentraut [Thu, 15 Mar 2018 15:10:41 +0000 (11:10 -0400)]
Fix more format truncation issues
Fix the warnings created by the compiler warning options
-Wformat-overflow=2 -Wformat-truncation=2, supported since GCC 7. This
is a more aggressive variant of the fixes in 6275f5d28a1577563f53f2171689d4f890a46881, which GCC 7 warned about by
default.
The issues are all harmless, but some dubious coding patterns are
cleaned up.
One issue that is of external interest is that BGW_MAXLEN is increased
from 64 to 96. Apparently, the old value would cause the bgw_name of
logical replication workers to be truncated in some circumstances.
But this doesn't actually add those warning options. It appears that
the warnings depend a bit on compilation and optimization options, so it
would be annoying to have to keep up with that. This is more of a
once-in-a-while cleanup.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Robert Haas [Thu, 15 Mar 2018 15:33:52 +0000 (11:33 -0400)]
Pass additional arguments to a couple of grouping-related functions.
get_number_of_groups() and make_partial_grouping_target() currently
fish information directly out of the PlannerInfo; in the former case,
the target list, and in the latter case, the HAVING qual. This works
fine if there's only one grouping relation, but if the pending patch
for partition-wise aggregate gets committed, we'll have multiple
grouping relations and must therefore use appropriately translated
versions of these values for each one. To make that simpler, pass the
values to be used as arguments.
Jeevan Chalke. The larger patch series of which this patch is a part
was also reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi,
David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal Legrand, Rafia
Sabih, and me.
Alvaro Herrera [Thu, 15 Mar 2018 12:51:12 +0000 (09:51 -0300)]
test_ddl_deparse: Don't use pg_class as source for a matview
Doing so causes a pg_upgrade of a database containing these objects to
fail whenever pg_class changes. And it's pointless anyway: we have more
interesting tables anyhow.
Alvaro Herrera [Thu, 15 Mar 2018 00:34:26 +0000 (21:34 -0300)]
logical replication: fix OID type mapping mechanism
The logical replication type map seems to have been misused by its only
caller -- it would try to use the remote OID as input for local type
routines, which unsurprisingly could result in bogus "cache lookup
failed for type XYZ" errors, or random other type names being picked up
if they happened to use the right OID. Fix that, changing
Oid logicalrep_typmap_getid(Oid remoteid) to
char *logicalrep_typmap_gettypname(Oid remoteid)
which is more useful. If the remote type is not part of the typmap,
this simply prints "unrecognized type" instead of choking trying to
figure out -- a pointless exercise (because the only input for that
comes from replication messages, which are not under the local node's
control) and dangerous to boot, when called from within an error context
callback.
Once that is done, it comes to light that the local OID in the typmap
entry was not being used for anything; the type/schema names are what we
need, so remove local type OID from that struct.
Once you do that, it becomes pointless to attach a callback to regular
syscache invalidation. So remove that also.
Peter Eisentraut [Wed, 14 Mar 2018 18:59:40 +0000 (14:59 -0400)]
Remove pg_class.relhaspkey
It is not used for anything internally, and it cannot be relied on for
external uses, so it can just be removed. To correct recommended way to
check for a primary key is in pg_index.
Stephen Frost [Wed, 14 Mar 2018 17:51:15 +0000 (13:51 -0400)]
Fix function-header comments in planner.c
In b5635948ab1, a couple of function header comments weren't changed, or
weren't changed correctly, to reflect the arguments being passed into
the functions. Specifically, get_number_of_groups() had the wrong
argument name in the commit and create_grouping_paths() wasn't
updated even though the arguments had been changed.
The issue with create_grouping_paths() was noticed by Ashutosh Bapat,
while I discovered the issue with get_number_of_groups() by looking to
see if there were any similar issues from that commit.
Peter Eisentraut [Wed, 14 Mar 2018 15:47:21 +0000 (11:47 -0400)]
Support INOUT arguments in procedures
In a top-level CALL, the values of INOUT arguments will be returned as a
result row. In PL/pgSQL, the values are assigned back to the input
arguments. In other languages, the same convention as for return a
record from a function is used. That does not require any code changes
in the PL implementations.
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Alvaro Herrera [Wed, 14 Mar 2018 14:53:56 +0000 (11:53 -0300)]
Log when a BRIN autosummarization request fails
Autovacuum's 'workitem' request queue is of limited size, so requests
can fail if they arrive more quickly than autovacuum can process them.
Emit a log message when this happens, to provide better visibility of
this.
Backpatch to 10. While this represents an API change for
AutoVacuumRequestWork, that function is not yet prepared to deal with
external modules calling it, so there doesn't seem to be any risk (other
than log spam, that is.)
Andres Freund [Fri, 9 Mar 2018 01:07:16 +0000 (17:07 -0800)]
Expand AND/OR regression tests around NULL handling.
Previously there were no tests verifying that NULL handling in AND/OR
was correct (i.e. that NULL rather than false is returned if
expression doesn't return true).
Robert Haas [Tue, 13 Mar 2018 20:34:08 +0000 (16:34 -0400)]
Let Parallel Append over simple UNION ALL have partial subpaths.
A simple UNION ALL gets flattened into an appendrel of subquery
RTEs, but up until now it's been impossible for the appendrel to use
the partial paths for the subqueries, so we can implement the
appendrel as a Parallel Append but only one with non-partial paths
as children.
There are three separate obstacles to removing that limitation.
First, when planning a subquery, propagate any partial paths to the
final_rel so that they are potentially visible to outer query levels
(but not if they have initPlans attached, because that wouldn't be
safe). Second, after planning a subquery, propagate any partial paths
for the final_rel to the subquery RTE in the outer query level in the
same way we do for non-partial paths. Third, teach finalize_plan() to
account for the possibility that the fake parameter we use for rescan
signalling when the plan contains a Gather (Merge) node may be
propagated from an outer query level.
Patch by me, reviewed and tested by Amit Khandekar, Rajkumar
Raghuwanshi, and Ashutosh Bapat. Test cases based on examples by
Rajkumar Raghuwanshi.
Tom Lane [Tue, 13 Mar 2018 17:24:27 +0000 (13:24 -0400)]
When updating reltuples after ANALYZE, just extrapolate from our sample.
The existing logic for updating pg_class.reltuples trusted the sampling
results only for the pages ANALYZE actually visited, preferring to
believe the previous tuple density estimate for all the unvisited pages.
While there's some rationale for doing that for VACUUM (first that
VACUUM is likely to visit a very nonrandom subset of pages, and second
that we know for sure that the unvisited pages did not change), there's
no such rationale for ANALYZE: by assumption, it's looked at an unbiased
random sample of the table's pages. Furthermore, in a very large table
ANALYZE will have examined only a tiny fraction of the table's pages,
meaning it cannot slew the overall density estimate very far at all.
In a table that is physically growing, this causes reltuples to increase
nearly proportionally to the change in relpages, regardless of what is
actually happening in the table. This has been observed to cause reltuples
to become so much larger than reality that it effectively shuts off
autovacuum, whose threshold for doing anything is a fraction of reltuples.
(Getting to the point where that would happen seems to require some
additional, not well understood, conditions. But it's undeniable that if
reltuples is seriously off in a large table, ANALYZE alone will not fix it
in any reasonable number of iterations, especially not if the table is
continuing to grow.)
Hence, restrict the use of vac_estimate_reltuples() to VACUUM alone,
and in ANALYZE, just extrapolate from the sample pages on the assumption
that they provide an accurate model of the whole table. If, by very bad
luck, they don't, at least another ANALYZE will fix it; in the old logic
a single bad estimate could cause problems indefinitely.
In HEAD, let's remove vac_estimate_reltuples' is_analyze argument
altogether; it was never used for anything and now it's totally pointless.
But keep it in the back branches, in case any third-party code is calling
this function.
Per bug #15005. Back-patch to all supported branches.
David Gould, reviewed by Alexander Kuzmenkov, cosmetic changes by me
Tom Lane [Tue, 13 Mar 2018 16:28:15 +0000 (12:28 -0400)]
Avoid holding AutovacuumScheduleLock while rechecking table statistics.
In databases with many tables, re-fetching the statistics takes some time,
so that this behavior seriously decreases the available concurrency for
multiple autovac workers. There's discussion afoot about more complete
fixes, but a simple and back-patchable amelioration is to claim the table
and release the lock before rechecking stats. If we find out there's no
longer a reason to process the table, re-taking the lock to un-claim the
table is cheap enough.
(This patch is quite old, but got lost amongst a discussion of more
aggressive fixes. It's not clear when or if such a fix will be
accepted, but in any case it'd be unlikely to get back-patched.
Let's do this now so we have some improvement for the back branches.)
In passing, make the normal un-claim step take AutovacuumScheduleLock
not AutovacuumLock, since that is what is documented to protect the
wi_tableoid field. This wasn't an actual bug in view of the fact that
readers of that field hold both locks, but it creates some concurrency
penalty against operations that need only AutovacuumLock.
Peter Eisentraut [Mon, 12 Mar 2018 16:17:58 +0000 (12:17 -0400)]
Change internal integer representation of Value node
A Value node would store an integer as a long. This causes needless
portability risks, as long can be of varying sizes. Change it to use
int instead. All code using this was already careful to only store
32-bit values anyway.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Fix CREATE TABLE / LIKE with bigint identity column
CREATE TABLE / LIKE with a bigint identity column would fail on
platforms where long is 32 bits. Copying the sequence values used
makeInteger(), which would truncate the 64-bit sequence data to 32 bits.
To fix, use makeFloat() instead, like the parser. (This does not
actually make use of floats, but stores the values as strings.)
Bug: #15096 Reviewed-by: Michael Paquier <michael@paquier.xyz>
Alvaro Herrera [Mon, 12 Mar 2018 22:42:32 +0000 (19:42 -0300)]
Avoid having two PKs in a partition
If a table containing a primary key is attach as partition to a
partitioned table which has a primary key with a different definition,
we would happily create a second one in the new partition. Oops. It
turns out that this is because an error check in DefineIndex is executed
only if you tell it that it's being run by ALTER TABLE, and the original
code here wasn't. Change it so that it does.
Added a couple of test cases for this, also. A previously working test
started to fail in a different way than before patch because the new
check is called earlier; change the PK to plain UNIQUE so that the new
behavior isn't invoked, so that the test continues to verify what we
want it to verify.
Tom Lane [Sun, 11 Mar 2018 22:10:42 +0000 (18:10 -0400)]
Fix improper uses of canonicalize_qual().
One of the things canonicalize_qual() does is to remove constant-NULL
subexpressions of top-level AND/OR clauses. It does that on the assumption
that what it's given is a top-level WHERE clause, so that NULL can be
treated like FALSE. Although this is documented down inside a subroutine
of canonicalize_qual(), it wasn't mentioned in the documentation of that
function itself, and some callers hadn't gotten that memo.
Notably, commit d007a9505 caused get_relation_constraints() to apply
canonicalize_qual() to CHECK constraints. That allowed constraint
exclusion to misoptimize situations in which a CHECK constraint had a
provably-NULL subclause, as seen in the regression test case added here,
in which a child table that should be scanned is not. (Although this
thinko is ancient, the test case doesn't fail before 9.2, for reasons
I've not bothered to track down in detail. There may be related cases
that do fail before that.)
More recently, commit f0e44751d added an independent bug by applying
canonicalize_qual() to index expressions, which is even sillier since
those might not even be boolean. If they are, though, I think this
could lead to making incorrect index entries for affected index
expressions in v10. I haven't attempted to prove that though.
To fix, add an "is_check" parameter to canonicalize_qual() to specify
whether it should assume WHERE or CHECK semantics, and make it perform
NULL-elimination accordingly. Adjust the callers to apply the right
semantics, or remove the call entirely in cases where it's not known
that the expression has one or the other semantics. I also removed
the call in some cases involving partition expressions, where it should
be a no-op because such expressions should be canonical already ...
and was a no-op, independently of whether it could in principle have
done something, because it was being handed the qual in implicit-AND
format which isn't what it expects. In HEAD, add an Assert to catch
that type of mistake in future.
This represents an API break for external callers of canonicalize_qual().
While that's intentional in HEAD to make such callers think about which
case applies to them, it seems like something we probably wouldn't be
thanked for in released branches. Hence, in released branches, the
extra parameter is added to a new function canonicalize_qual_ext(),
and canonicalize_qual() is a wrapper that retains its old behavior.
Patch by me with suggestions from Dean Rasheed. Back-patch to all
supported branches.
Tom Lane [Sat, 10 Mar 2018 18:18:21 +0000 (13:18 -0500)]
In psql, restore old behavior of Query_for_list_of_functions.
Historically, tab completion for functions has offered the names of
aggregates as well. This is essential in at least one context, namely
GRANT/REVOKE, because there is no GRANT ON AGGREGATE syntax. There
are other cases where a command that nominally is for functions will
allow aggregates as well, though not all do.
Commit fd1a421fe changed this query to disallow aggregates, but that
doesn't seem to have been thought through very carefully. Change it
to allow aggregates (but still ignore procedures).
We might at some point tighten this up, but it'd require sorting through
all the uses of this query to see which ones should offer aggregate
names and which shouldn't. Given the lack of field complaints about
the historical laxity here, that's work I'm not eager to do right now.
Tom Lane [Fri, 9 Mar 2018 21:58:26 +0000 (16:58 -0500)]
Improve predtest.c's internal docs, and enhance its functionality a bit.
Commit b08df9cab left things rather poorly documented as far as the
exact semantics of "clause_is_check" mode went. Also, that mode did
not really work correctly for predicate_refuted_by; although given the
lack of specification as to what it should do, as well as the lack
of any actual use-case, that's perhaps not surprising.
Rename "clause_is_check" to "weak" proof mode, and provide specifications
for what it should do. I defined weak refutation as meaning "truth of A
implies non-truth of B", which makes it possible to use the mode in the
part of relation_excluded_by_constraints that checks for mutually
contradictory WHERE clauses. Fix up several places that did things wrong
for that definition. (As far as I can see, these errors would only lead
to failure-to-prove, not incorrect claims of proof, making them not
serious bugs even aside from the fact that v10 contains no use of this
mode. So there seems no need for back-patching.)
In addition, teach predicate_refuted_by_recurse that it can use
predicate_implied_by_recurse after all when processing a strong NOT-clause,
so long as it asks for the correct proof strength. This is an optimization
that could have been included in commit b08df9cab, but wasn't.
Also, simplify and generalize the logic that checks for whether nullness of
the argument of IS [NOT] NULL would force overall nullness of the predicate
or clause. (This results in a change in the partition_prune test's output,
as it is now able to prune an all-nulls partition that it did not recognize
before.)
In passing, in PartConstraintImpliedByRelConstraint, remove bogus
conversion of the constraint list to explicit-AND form and then right back
again; that accomplished nothing except forcing a useless extra level of
recursion inside predicate_implied_by.
Tom Lane [Fri, 9 Mar 2018 00:44:14 +0000 (19:44 -0500)]
Fix test_predtest's idea of what weak refutation means.
I'd initially supposed that predicate_refuted_by(..., true) ought to
say that "A refutes B" means "non-falsity of A implies non-truth of B".
But it seems better to define it as "truth of A implies non-truth of B".
This is more useful in the current system, slightly easier to prove,
and in closer correspondence to the existing code behavior.
With this change, test_predtest no longer claims that any existing
test cases show false proof reports, though there still are cases
where we could prove something and fail to.
Robert Haas [Thu, 8 Mar 2018 19:25:31 +0000 (14:25 -0500)]
Correctly assess parallel-safety of tlists when SRFs are used.
Since commit 69f4b9c85f168ae006929eec44fc44d569e846b9, the existing
code was no longer assessing the parallel-safety of the real tlist
for each upper rel, but rather the first of possibly several tlists
created by split_pathtarget_at_srfs(). Repair.
Even though this is clearly wrong, it's not clear that it has any
user-visible consequences at the moment, so no back-patch for now. If
we discover later that it does have user-visible consequences, we
might need to back-patch this to v10.
Patch by me, per a report from Rajkumar Raghuwanshi.
Tom Lane [Thu, 8 Mar 2018 18:25:26 +0000 (13:25 -0500)]
Add test scaffolding for exercising optimizer's predicate-proof logic.
The predicate-proof code in predtest.c is a bit hard to test as-is:
you have to construct a query whose plan changes depending on the success
of a test, and in tests that have been around for awhile, it's always
possible that the plan shape is now being determined by some other factor.
Our existing regression tests aren't doing real well at providing full
code coverage of predtest.c, either. So, let's add a small test module
that allows directly inspecting the results of predicate_implied_by()
and predicate_refuted_by() for arbitrary boolean expressions.
I chose the set of tests committed here in order to get reasonably
complete code coverage of predtest.c just from running this test
module, and to cover some cases called out as being interesting in
the existing comments. We might want to add more later. But this
set already shows a few cases where perhaps things could be improved.
Indeed, this exercise proves that predicate_refuted_by() is buggy for
the case of clause_is_check = true, though fortunately we aren't using
that case anywhere yet. I'll look into doing something about that in
a separate commit. For now, just memorialize the current behavior.
Tom Lane [Thu, 8 Mar 2018 16:33:27 +0000 (11:33 -0500)]
Revert "Temporarily instrument postgres_fdw test to look for statistics changes."
This reverts commit c2c537c56dc30ec3cdc12051f4ea5363aa66d73c.
It's now clear that whatever is going on there, it can't be blamed
on unexpected ANALYZE runs, because the statistics are the same
just before the failing query as they were at the start of the test.
Tom Lane [Thu, 8 Mar 2018 16:26:20 +0000 (11:26 -0500)]
In initdb, don't bother trying max_connections = 10.
The server won't actually start with that setting anymore, not since
we raised the default max_wal_senders to 10. Per discussion, we don't
wish to back down on that default, so instead raise the effective floor
for max_connections (to 20). It's still possible to configure a smaller
setting manually, but initdb won't set it that way.
Since that change happened in v10, back-patch to v10.
Tom Lane [Thu, 8 Mar 2018 16:25:26 +0000 (11:25 -0500)]
Fix cross-checking of ReservedBackends/max_wal_senders/MaxConnections.
We were independently checking ReservedBackends < MaxConnections and
max_wal_senders < MaxConnections, but because walsenders aren't allowed
to use superuser-reserved connections, that's really the wrong thing.
Correct behavior is to insist on ReservedBackends + max_wal_senders being
less than MaxConnections. Fix the code and associated documentation.
This has been wrong for a long time, but since the situation probably
hardly ever arises in the field (especially pre-v10, when the default
for max_wal_senders was zero), no back-patch.
Alvaro Herrera [Wed, 7 Mar 2018 14:47:35 +0000 (11:47 -0300)]
Add missing debug lines during bootstrap
Noticed while playing with changes that mess with the bootstrap
sequence; the operations patched here failed to emit anything, leading
the developer to think that the bug was in the previous operation that
did emit a message.
Commit 1804284042e659e7d16904e7bbb0ad546394b6a3 established that single-batch
parallel-aware hash joins could create one large shared hash table using the
combined work_mem budget of all participants. The costing accidentally
assumed that parallel-oblivious hash joins could also do that. The
documentation for initial_cost_hashjoin() also failed to mention the new
argument. Repair.
Alvaro Herrera [Tue, 6 Mar 2018 19:22:03 +0000 (16:22 -0300)]
Refrain from duplicating data in reorderbuffers
If a walsender exits leaving data in reorderbuffers, the next walsender
that tries to decode the same transaction would append its decoded data
in the same spill files without truncating it first, which effectively
duplicate the data. Avoid that by removing any leftover reorderbuffer
spill files when a walsender starts.
Backpatch to 9.4; this bug has been there from the very beginning of
logical decoding.
Author: Craig Ringer, revised by me
Reviewed by: Álvaro Herrera, Petr Jelínek, Masahiko Sawada
Alvaro Herrera [Tue, 6 Mar 2018 16:17:13 +0000 (13:17 -0300)]
Fix bogus Name assignment in CreateStatistics
Apparently, it doesn't work to use a plain cstring as a Name datum: you
may end up having random bytes because of failing to zero the bytes
after the terminating \0, as indicated by valgrind. I introduced this
bug in 5564c1181548, so backpatch this fix to REL_10_STABLE, like that
commit.
While at it, fix a slightly misleading comment, pointed out by David
Rowley.
Andres Freund [Tue, 6 Mar 2018 01:49:59 +0000 (17:49 -0800)]
Fix parent node of WCO expressions in partitioned tables.
Since edd44738bc8814 WCO expressions of partitioned tables are
initialized with the first subplan as parent. That's not correct, as
the correct context is the ModifyTableState node. That's also what is
used for RETURNING processing, initialized nearby.
This appears not to cause any visible problems for in core code, but
is problematic for in development patch.
Andres Freund [Tue, 6 Mar 2018 00:21:05 +0000 (16:21 -0800)]
Add parenthesized options syntax for ANALYZE.
This is analogous to the syntax allowed for VACUUM. This allows us to
avoid making new options reserved keywords and makes it easier to
allow arbitrary argument order. Oh, and it's consistent with the other
commands, too.
Author: Nathan Bossart Reviewed-By: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/D3FC73E2-9B1A-4DB4-8180-55F57D116B4E@amazon.com
Alvaro Herrera [Mon, 5 Mar 2018 22:37:19 +0000 (19:37 -0300)]
Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL)
The LIKE INCLUDING ALL clause to CREATE TABLE intuitively indicates
cloning of extended statistics on the source table, but it failed to do
so. Patch it up so that it does. Also include an INCLUDING STATISTICS
option to the LIKE clause, so that the behavior can be requested
individually, or excluded individually.
While at it, reorder the INCLUDING options, both in code and in docs, in
alphabetical order which makes more sense than feature-implementation
order that was previously used.
Backpatch this to Postgres 10, where extended statistics were
introduced, because this is seen as an oversight in a fresh feature
which is better to get consistent from the get-go instead of changing
only in pg11.
In pg11, comments on statistics objects are cloned too. In pg10 they
are not, because I (Álvaro) was too coward to change the parse node as
required to support it. Also, in pg10 I chose not to renumber the
parser symbols for the various INCLUDING options in LIKE, for the same
reason. Any corresponding user-visible changes (docs) are backpatched,
though.
Reported-by: Stephen Froehlich
Author: David Rowley Reviewed-by: Álvaro Herrera, Tomas Vondra
Discussion: https://postgr.es/m/CY1PR0601MB1927315B45667A1B679D0FD5E5EF0@CY1PR0601MB1927.namprd06.prod.outlook.com
Tom Lane [Mon, 5 Mar 2018 21:20:06 +0000 (16:20 -0500)]
Temporarily instrument postgres_fdw test to look for statistics changes.
It seems fairly hard to explain recent buildfarm failures without the
theory that something is doing an ANALYZE behind our backs. Probe
for this directly to see if it's true.
In principle the outputs of these queries should be stable, since the table
in question is small enough that ANALYZE's sample will include all rows.
But even if that turns out to be wrong, we can put up with some failures
for a bit. I don't intend to leave this here indefinitely.
Tom Lane [Mon, 5 Mar 2018 20:37:23 +0000 (15:37 -0500)]
Add infrastructure to support server-version-dependent tab completion.
Up to now we've not worried about whether psql's tab completion queries
would work against prior server versions. But since we support older
server versions in describe.c, we really ought to do so here as well.
Failing to take care of this not only leads to loss of tab-completion
service when using an older server, but risks aborting a user's open
transaction when we send an incompatible query to the server.
The recent changes in pg_proc.prokind are one reason to take this more
seriously now than before, and the proposed patch for completion after
SELECT needs some such capability as well.
Hence, create some infrastructure to allow selection of one of several
versions of a query depending on server version. For cases where we
just need to select one of several query strings, introduce VersionedQuery
and COMPLETE_WITH_VERSIONED_QUERY(). Likewise extend the SchemaQuery
infrastructure to allow versioning of those.
I went ahead and fixed the prokind issues, to serve as an illustration
of how to use versioned SchemaQuery. To have some illustration of
VersionedQuery, change the support for completion of publication and
subscription names so that psql will not send sure-to-fail queries to
pre-v10 servers. There is much more that should be done to make tab
completion more friendly to older servers, but this commit is mainly
meant to get the infrastructure in place, so I stopped here.
Robert Haas [Mon, 5 Mar 2018 20:12:49 +0000 (15:12 -0500)]
shm_mq: Fix detach race condition.
Commit 34db06ef9a1d7f36391c64293bf1e0ce44a33915 adopted a lock-free
design for shm_mq.c, but it introduced a race condition that could
lose messages. When shm_mq_receive_bytes() detects that the other end
has detached, it must make sure that it has seen the final version of
mq_bytes_written, or it might miss a message sent before detaching.
Fujii Masao [Mon, 5 Mar 2018 17:08:18 +0000 (02:08 +0900)]
Fix pg_rewind to handle relation data files in tablespaces properly.
pg_rewind checks whether each file is a relation data file, from its path.
Previously this check logic had the bug which made pg_rewind fail to
recognize any relation data files in tablespaces. Which also caused
an assertion failure in pg_rewind.
Tom Lane [Sun, 4 Mar 2018 01:31:35 +0000 (20:31 -0500)]
Fix assorted issues in convert_to_scalar().
If convert_to_scalar is passed a pair of datatypes it can't cope with,
its former behavior was just to elog(ERROR). While this is OK so far as
the core code is concerned, there's extension code that would like to use
scalarltsel/scalargtsel/etc as selectivity estimators for operators that
work on non-core datatypes, and this behavior is a show-stopper for that
use-case. If we simply allow convert_to_scalar to return FALSE instead of
outright failing, then the main logic of scalarltsel/scalargtsel will work
fine for any operator that behaves like a scalar inequality comparison.
The lack of conversion capability will mean that we can't estimate to
better than histogram-bin-width precision, since the code will effectively
assume that the comparison constant falls at the middle of its bin. But
that's still a lot better than nothing. (Someday we should provide a way
for extension code to supply a custom version of convert_to_scalar, but
today is not that day.)
While poking at this issue, we noted that the existing code for handling
type bytea in convert_to_scalar is several bricks shy of a load.
It assumes without checking that if the comparison value is type bytea,
the bounds values are too; in the worst case this could lead to a crash.
It also fails to detoast the input values, so that the comparison result is
complete garbage if any input is toasted out-of-line, compressed, or even
just short-header. I'm not sure how often such cases actually occur ---
the bounds values, at least, are probably safe since they are elements of
an array and hence can't be toasted. But that doesn't make this code OK.
Back-patch to all supported branches, partly because author requested that,
but mostly because of the bytea bugs. The change in API for the exposed
routine convert_network_to_scalar() is theoretically a back-patch hazard,
but it seems pretty unlikely that any third-party code is calling that
function directly.