Peter Eisentraut [Mon, 22 Jan 2018 13:30:16 +0000 (08:30 -0500)]
Transaction control in PL procedures
In each of the supplied procedural languages (PL/pgSQL, PL/Perl,
PL/Python, PL/Tcl), add language-specific commit and rollback
functions/commands to control transactions in procedures in that
language. Add similar underlying functions to SPI. Some additional
cleanup so that transaction commit or abort doesn't blow away data
structures still used by the procedure call. Add execution context
tracking to CALL and DO statements so that transaction control commands
can only be issued in top-level procedure and block calls, not function
calls or other procedure or block calls.
- SPI
Add a new function SPI_connect_ext() that is like SPI_connect() but
allows passing option flags. The only option flag right now is
SPI_OPT_NONATOMIC. A nonatomic SPI connection can execute transaction
control commands, otherwise it's not allowed. This is meant to be
passed down from CALL and DO statements which themselves know in which
context they are called. A nonatomic SPI connection uses different
memory management. A normal SPI connection allocates its memory in
TopTransactionContext. For nonatomic connections we use PortalContext
instead. As the comment in SPI_connect_ext() (previously SPI_connect())
indicates, one could potentially use PortalContext in all cases, but it
seems safest to leave the existing uses alone, because this stuff is
complicated enough already.
SPI also gets new functions SPI_start_transaction(), SPI_commit(), and
SPI_rollback(), which can be used by PLs to implement their transaction
control logic.
- portalmem.c
Some adjustments were made in the code that cleans up portals at
transaction abort. The portal code could already handle a command
*committing* a transaction and continuing (e.g., VACUUM), but it was not
quite prepared for a command *aborting* a transaction and continuing.
In AtAbort_Portals(), remove the code that marks an active portal as
failed. As the comment there already predicted, this doesn't work if
the running command wants to keep running after transaction abort. And
it's actually not necessary, because pquery.c is careful to run all
portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if
there is an exception. So the code in AtAbort_Portals() is never used
anyway.
In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not
to clean up active portals too much. This mirrors similar code in
PreCommit_Portals().
- PL/Perl
Gets new functions spi_commit() and spi_rollback()
- PL/pgSQL
Gets new commands COMMIT and ROLLBACK.
Update the PL/SQL porting example in the documentation to reflect that
transactions are now possible in procedures.
- PL/Python
Gets new functions plpy.commit and plpy.rollback.
- PL/Tcl
Gets new commands commit and rollback.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Bruce Momjian [Sun, 21 Jan 2018 02:47:02 +0000 (21:47 -0500)]
doc: update intermediate certificate instructions
Document how to properly create root and intermediate certificates using
v3_ca extensions and where to place intermediate certificates so they
are properly transferred to the remote side with the leaf certificate to
link to the remote root certificate. This corrects docs that used to
say that intermediate certificates must be stored with the root
certificate.
Also add instructions on how to create root, intermediate, and leaf
certificates.
Peter Eisentraut [Sat, 20 Jan 2018 13:02:01 +0000 (08:02 -0500)]
Improve type conversion of SPI_processed in Python
The previous code converted SPI_processed to a Python float if it didn't
fit into a Python int. But Python longs have unlimited precision, so
use that instead in all cases.
Apparently, Peter's compiler has faith that the switch test values here
could never not be valid values of their enums. Mine does not, and
I tend to agree with it.
Peter Eisentraut [Fri, 19 Jan 2018 22:22:38 +0000 (17:22 -0500)]
PL/Python: Simplify PLyLong_FromInt64
We don't actually need two code paths, one for 32 bits and one for 64
bits. Since the existing code already assumed that "long long" is
available, we can just use PyLong_FromLongLong() for 64 bits as well.
In Python 2.5 and later, PyLong_FromLong() and PyLong_FromLongLong() use
the same code, so there will be no difference for 64-bit platforms. In
Python 2.4, the code is different, but performance testing showed no
noticeable difference in PL/Python, and that Python version is ancient
anyway.
Robert Haas [Fri, 19 Jan 2018 20:33:06 +0000 (15:33 -0500)]
Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)
Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.
Alvaro Herrera [Fri, 19 Jan 2018 19:34:44 +0000 (16:34 -0300)]
Fix CompareIndexInfo's attnum comparisons
When an index column is an expression, it makes no sense to compare its
attribute numbers.
This seems to account for remaining buildfarm fallout from 8b08f7d4820f.
At least, it solves the issue in my local 32bit VM -- let's see what the
rest thinks.
AclObjectKind was basically just another enumeration for object types,
and we already have a preferred one for that. It's only used in
aclcheck_error. By using ObjectType instead, we can also give some more
precise error messages, for example "index" instead of "relation".
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Peter Eisentraut [Wed, 11 Oct 2017 22:35:19 +0000 (18:35 -0400)]
Replace GrantObjectType with ObjectType
There used to be a lot of different *Type and *Kind symbol groups to
address objects within different commands, most of which have been
replaced by ObjectType, starting with b256f2426433c56b4bea3a8102757749885b81ba. But this conversion was never
done for the ACL commands until now.
This change ends up being just a plain replacement of the types and
symbols, without any code restructuring needed, except deleting some now
redundant code.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Stephen Frost <sfrost@snowman.net>
Alvaro Herrera [Fri, 19 Jan 2018 14:49:22 +0000 (11:49 -0300)]
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
Alvaro Herrera [Fri, 19 Jan 2018 13:15:08 +0000 (10:15 -0300)]
Fix StoreCatalogInheritance1 to use 32bit inhseqno
For no apparent reason, this function was using a 16bit-wide inhseqno
value, rather than the correct 32 bit width which is what is stored in
the pg_inherits catalog. This becomes evident if you try to create a
table with more than 65535 parents, because this error appears:
Needless to say, having so many parents is an uncommon situations, which
explains why this error has never been reported despite being having
been introduced with the Postgres95 1.01 sources in commit d31084e9d111:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/creatinh.c;hb=d31084e9d111#l349
Backpatch all the way back.
David Rowley noticed this while reviewing a patch of mine.
Discussion: https://postgr.es/m/CAKJS1f8Dn7swSEhOWwzZzssW7747YB=2Hi+T7uGud40dur69-g@mail.gmail.com
Robert Haas [Fri, 19 Jan 2018 12:48:44 +0000 (07:48 -0500)]
Transfer state pertaining to pending REINDEX operations to workers.
This will allow the pending patch for parallel CREATE INDEX to work
on system catalogs, and to provide the same level of protection
against use of user indexes while they are being rebuilt that we
have for non-parallel CREATE INDEX.
Peter Eisentraut [Thu, 18 Jan 2018 16:24:07 +0000 (11:24 -0500)]
Add tests for session_replication_role
This was hardly tested at all. The trigger case was lightly tested by
the logical replication tests, but rules and event triggers were not
tested at all.
Tom Lane [Thu, 18 Jan 2018 16:09:44 +0000 (11:09 -0500)]
Extend configure's __int128 test to check for a known gcc bug.
On Sparc64, use of __attribute__(aligned(8)) with __int128 causes faulty
code generation in gcc versions at least through 5.5.0. We can work around
that by disabling use of __int128, so teach configure to test for the bug.
This solution doesn't fix things for the case of cross-compiling with a
buggy compiler; to support that nicely, we'd need to add a manual disable
switch. Unless more such cases turn up, it doesn't seem worth the work.
Affected users could always edit pg_config.h manually.
In passing, fix some typos in the existing configure test for __int128.
They're harmless because we only compile that code not run it, but
they're still confusing for anyone looking at it closely.
This is needed in support of commit 751804998, so back-patch to 9.5
as that was.
Robert Haas [Wed, 17 Jan 2018 21:18:39 +0000 (16:18 -0500)]
postgres_fdw: Avoid 'outer pathkeys do not match mergeclauses' error.
When pushing down a join to a foreign server, postgres_fdw constructs
an alternative plan to be used for any EvalPlanQual rechecks that
prove to be necessary. This plan is stored as the outer subplan of
the Foreign Scan implementing the pushed-down join. Previously, this
alternative plan could have a different nominal sort ordering than its
parent, which seemed OK since there will only be one tuple per base
table anyway in the case of an EvalPlanQual recheck. Actually,
though, it caused a problem if that path was used as a building block
for the EvalPlanQual recheck plan of a higher-level foreign join,
because we could end up with a merge join one of whose inputs was not
labelled with the correct sort order. Repair by injecting an extra
Sort node into the EvalPlanQual recheck plan whenever it would
otherwise fail to be sorted at least as well as its parent Foreign
Scan.
Report by Jeff Janes. Patch by me, reviewed by Tom Lane, who also
provided the test case and comment text.
Tom Lane [Wed, 17 Jan 2018 19:44:15 +0000 (14:44 -0500)]
Remove useless lookup of root partitioned rel in ExecInitModifyTable().
node->partitioned_rels is only set in UPDATE/DELETE cases, but
ExecInitModifyTable only uses its "rel" variable in INSERT cases,
so the extra logic to find the root rel is just a waste of complexity
and cycles.
Simon Riggs [Wed, 17 Jan 2018 11:38:34 +0000 (11:38 +0000)]
Ability to advance replication slots
Ability to advance both physical and logical replication slots using a
new user function pg_replication_slot_advance().
For logical advance that means records are consumed as fast as possible
and changes are not given to output plugin for sending. Makes 2nd phase
(after we reached SNAPBUILD_FULL_SNAPSHOT) of replication slot creation
faster, especially when there are big transactions as the reorder buffer
does not have to deal with data changes and does not have to spill to
disk.
Andrew Dunstan [Wed, 17 Jan 2018 00:07:13 +0000 (19:07 -0500)]
Centralize json and jsonb handling of datetime types
The creates a single function JsonEncodeDateTime which will format these
data types in an efficient and consistent manner. This will be all the
more important when we come to jsonpath so we don't have to implement yet
more code doing the same thing in two more places.
This also extends the code to handle time and timetz types which were
not previously handled specially. This requires exposing the time2tm and
timetz2tm functions.
Tom Lane [Fri, 12 Jan 2018 21:52:49 +0000 (16:52 -0500)]
Fix postgres_fdw to cope with duplicate GROUP BY entries.
Commit 7012b132d, which added the ability to push down aggregates and
grouping to the remote server, wasn't careful to ensure that the remote
server would have the same idea we do about which columns are the grouping
columns, in cases where there are textually identical GROUP BY expressions.
Such cases typically led to "targetlist item has multiple sortgroupref
labels" errors.
To fix this reliably, switch over to using "GROUP BY column-number" syntax
rather than "GROUP BY expression" in transmitted queries, and adjust
foreign_grouping_ok() to be more careful about duplicating the sortgroupref
labeling of the local pathtarget.
Per bug #14890 from Sean Johnston. Back-patch to v10 where the buggy code
was introduced.
Tom Lane [Fri, 12 Jan 2018 20:46:37 +0000 (15:46 -0500)]
Avoid unnecessary failure in SELECT concurrent with ALTER NO INHERIT.
If a query against an inheritance tree runs concurrently with an ALTER
TABLE that's disinheriting one of the tree members, it's possible to get
a "could not find inherited attribute" error because after obtaining lock
on the removed member, make_inh_translation_list sees that its columns
have attinhcount=0 and decides they aren't the columns it's looking for.
An ideal fix, perhaps, would avoid including such a just-removed member
table in the query at all; but there seems no way to accomplish that
without adding expensive catalog rechecks or creating a likelihood of
deadlocks. Instead, let's just drop the check on attinhcount. In this
way, a query that's included a just-disinherited child will still
succeed, which is not a completely unreasonable behavior.
This problem has existed for a long time, so back-patch to all supported
branches. Also add an isolation test verifying related behaviors.
Patch by me; the new isolation test is based on Kyotaro Horiguchi's work.
Tom Lane [Fri, 12 Jan 2018 17:24:50 +0000 (12:24 -0500)]
Fix incorrect handling of subquery pullup in the presence of grouping sets.
If we flatten a subquery whose target list contains constants or
expressions, when those output columns are used in GROUPING SET columns,
the planner was capable of doing the wrong thing by merging a pulled-up
expression into the surrounding expression during const-simplification.
Then the late processing that attempts to match subexpressions to grouping
sets would fail to match those subexpressions to grouping sets, with the
effect that they'd not go to null when expected.
To fix, wrap such subquery outputs in PlaceHolderVars, ensuring that
they preserve their separate identity throughout the planner's expression
processing. This is a bit of a band-aid, because the wrapper defeats
const-simplification even in places where it would be safe to allow.
But a nicer fix would likely be too invasive to back-patch, and the
consequences of the missed optimizations probably aren't large in most
cases.
Back-patch to 9.5 where grouping sets were introduced.
Heikki Linnakangas, with small mods and better test cases by me;
additional review by Andrew Gierth
Alvaro Herrera [Fri, 12 Jan 2018 14:21:42 +0000 (11:21 -0300)]
Remove hard-coded schema knowledge about pg_attribute from genbki.pl
Add the ability to label a column's default value in the catalog header,
and implement this for pg_attribute. A new function in Catalog.pm is
used to fill in a tuple with defaults. The build process will complain
loudly if a catalog entry is incomplete,
Commit 8137f2c3232 labeled variable length columns for the C preprocessor.
Expose that label to genbki.pl so we can exclude those columns from schema
macros in a general fashion. Also, format schema macro entries according
to their types.
This means slightly less code maintenance, but more importantly it's a
proving ground for mechanisms intended to be used in later commits.
While at it, I (Álvaro) couldn't resist making some changes in
genbki.pl: rename some functions to actually indicate their purpose
instead of actively misleading onlookers; and don't iterate on the whole
of pg_type to find the entry for each catalog row, using a hash instead
of an array.
Author: John Naylor, some changes by Álvaro Herrera
Discussion: https://postgr.es/m/CAJVSVGVJHwD8sfDfZW9TbCHWKf=C1YDRM-rF=2JenRU_y+VcFg@mail.gmail.com
Refactor subscription tests to use PostgresNode's wait_for_catchup
This was nearly the same code. Extend wait_for_catchup to allow waiting
for pg_current_wal_lsn() and use that in the subscription tests. Also
change one use in the pg_rewind tests to use this.
Also remove some broken code in wait_for_catchup and
wait_for_slot_catchup. The error message in case the waiting failed
wanted to show the current LSN, but the way it was written never
worked. So since nobody ever cared, just remove it.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Tom Lane [Thu, 11 Jan 2018 17:16:18 +0000 (12:16 -0500)]
Add QueryEnvironment to ExplainOneQuery_hook's parameter list.
This should have been done in commit 18ce3a4ab, which added that parameter
to ExplainOneQuery, but it was overlooked. This makes it impossible for
a user of the hook to pass the queryEnv down to ExplainOnePlan.
It's too late to change this API in v10, I suppose, but fortunately
passing NULL to ExplainOnePlan will work in nearly all interesting
cases in v10. That might not be true forever, so we'd better fix it.
Teodor Sigaev [Thu, 11 Jan 2018 11:49:36 +0000 (14:49 +0300)]
llow negative coordinate for ~> (cube, int) operator
~> (cube, int) operator was especially designed for knn-gist search.
However, knn-gist supports only ascending ordering of results. Nevertheless
it would be useful to support descending ordering by ~> (cube, int) operator.
We provide workaround for that: negative coordinate give us inversed value
of corresponding cube bound. Therefore, knn search using negative coordinate
gives us an effect of descending ordering by cube bound.
Author: Alexander Korotkov
Reviewed by: Tomas Vondra, Andrey Borodin
Discussion: https://www.postgresql.org/message-id/flat/a9657f6a-b497-36ff-e56-482a2c7e3292@2ndquadrant.com
Teodor Sigaev [Thu, 11 Jan 2018 11:41:14 +0000 (14:41 +0300)]
Fix behavior of ~> (cube, int) operator
~> (cube, int) operator was especially designed for knn-gist search.
However, it appears that knn-gist search can't work correctly with current
behavior of this operator when dataset contains cubes of variable
dimensionality. In this case, the same value of second operator argument
can point to different dimension depending on dimensionality of particular cube.
Such behavior is incompatible with gist indexing of cubes, and knn-gist doesn't
work correctly for it.
This patch changes behavior of ~> (cube, int) operator by introducing dimension
numbering where value of second argument unambiguously identifies number of
dimension. With new behavior, this operator can be correctly supported by
knn-gist. Relevant changes to cube operator class are also included.
Backpatch to v9.6 where operator was introduced.
Since behavior of ~> (cube, int) operator is changed, depending entities
must be refreshed after upgrade. Such as, expression indexes using this
operator must be reindexed, materialized views must be rebuilt, stored
procedures and client code must be revised to correctly use new behavior.
That should be mentioned in release notes.
Noticed by: Tomas Vondra
Author: Alexander Korotkov
Reviewed by: Tomas Vondra, Andrey Borodin
Discussion: https://www.postgresql.org/message-id/flat/a9657f6a-b497-36ff-e56-482a2c7e3292@2ndquadrant.com
Tom Lane [Wed, 10 Jan 2018 22:13:29 +0000 (17:13 -0500)]
Fix sample INSTR() functions in the plpgsql documentation.
These functions are stated to be Oracle-compatible, but they weren't.
Yugo Nagata noticed that while our code returns zero for a zero or
negative fourth parameter (occur_index), Oracle throws an error.
Further testing by me showed that there was also a discrepancy in the
interpretation of a negative third parameter (beg_index): Oracle thinks
that a negative beg_index indicates the last place where the target
substring can *begin*, whereas our code thinks it is the last place
where the target can *end*.
Adjust the sample code to behave like Oracle in both these respects.
Also change it to be a CDATA[] section, simplifying copying-and-pasting
out of the documentation source file. And fix minor problems in the
introductory comment, which wasn't very complete or accurate.
Back-patch to all supported branches. Although this patch only touches
documentation, we should probably call it out as a bug fix in the next
minor release notes, since users who have adopted the functions will
likely want to update their versions.
Peter Eisentraut [Tue, 12 Dec 2017 15:26:47 +0000 (10:26 -0500)]
Use portal pinning in PL/Perl and PL/Python
PL/pgSQL "pins" internally generated portals so that user code cannot
close them by guessing their names. Add this functionality to PL/Perl
and PL/Python as well, preventing users from manually closing cursors
created by spi_query and plpy.cursor, respectively. (PL/Tcl does not
currently offer any cursor functionality.)
Tom Lane [Wed, 10 Jan 2018 20:50:54 +0000 (15:50 -0500)]
Remove dubious micro-optimization in ckpt_buforder_comparator().
It seems incorrect to assume that the list of CkptSortItems can never
contain duplicate page numbers: concurrent activity could result in some
page getting dropped from a low-numbered buffer and later loaded into a
high-numbered buffer while BufferSync is scanning the buffer pool.
If that happened, the comparator would give self-inconsistent results,
potentially confusing qsort(). Saving one comparison step is not worth
possibly getting the sort wrong.
So far as I can tell, nothing would actually go wrong given our current
implementation of qsort(). It might get a bit slower than expected
if there were a large number of duplicates of one value, but that's
surely a probability-epsilon case. Still, the comment is wrong,
and if we ever switched to another sort implementation it might be
less forgiving.
In passing, avoid casting away const-ness of the argument pointers;
I've not seen any compiler complaints from that, but it seems likely
that some compilers would not like it.
Back-patch to 9.6 where this code came in, just in case I've underestimated
the possible consequences.
Peter Eisentraut [Fri, 15 Dec 2017 20:24:10 +0000 (15:24 -0500)]
Move portal pinning from PL/pgSQL to SPI
PL/pgSQL "pins" internally generated (unnamed) portals so that user code
cannot close them by guessing their names. This logic is also useful in
other languages and really for any code. So move that logic into SPI.
An unnamed portal obtained through SPI_cursor_open() and related
functions is now automatically pinned, and SPI_cursor_close()
automatically unpins a portal that is pinned.
In the core distribution, this affects PL/Perl and PL/Python, preventing
users from manually closing cursors created by spi_query and
plpy.cursor, respectively. (PL/Tcl does not currently offer any cursor
functionality.)
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Peter Eisentraut [Wed, 10 Jan 2018 14:22:07 +0000 (09:22 -0500)]
Give more accurate error message for dropping pinned portal
The previous code gave the same error message for attempting to drop
pinned and active portals, but those are separate states, so give
separate error messages.
Andres Freund [Tue, 9 Jan 2018 21:25:38 +0000 (13:25 -0800)]
Expression evaluation based aggregate transition invocation.
Previously aggregate transition and combination functions were invoked
by special case code in nodeAgg.c, evaluating input and filters
separately using the expression evaluation machinery. That turns out
to not be great for performance for several reasons:
- repeated expression evaluations have some cost
- the transition functions invocations are poorly predicted, as
commonly there are multiple aggregates in a query, resulting in the
same call-stack invoking different functions.
- filter and input computation had to be done separately
- the special case code made it hard to implement JITing of the whole
transition function invocation
Address this by building one large expression that computes input,
evaluates filters, and invokes transition functions.
This leads to moderate speedups in queries bottlenecked by aggregate
computations, and enables large speedups for similar cases once JITing
is done.
There's potential for further improvement:
- It'd be nice if we could simplify the somewhat expensive
aggstate->all_pergroups lookups.
- right now there's still an advance_transition_function invocation in
nodeAgg.c, leading to some code duplication.
Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
Alvaro Herrera [Tue, 9 Jan 2018 18:54:39 +0000 (15:54 -0300)]
Change some bogus PageGetLSN calls to BufferGetLSNAtomic
As src/backend/access/transam/README says, PageGetLSN may only be called
by processes holding either exclusive lock on buffer, or a shared lock
on buffer plus buffer header lock. Therefore any place that only holds
a shared buffer lock must use BufferGetLSNAtomic instead of PageGetLSN,
which internally obtains buffer header lock prior to reading the LSN.
A few callsites failed to comply with this rule. This was detected by
running all tests under a new (not committed) assertion that verifies
PageGetLSN locking contract. All but one of the callsites that failed
the assertion are fixed by this patch. Remaining callsites were
inspected manually and determined not to need any change.
The exception (unfixed callsite) is in TestForOldSnapshot, which only
has a Page argument, making it impossible to access the corresponding
Buffer from it. Fixing that seems a much larger patch that will have to
be done separately; and that's just as well, since it was only
introduced in 9.6 and other bugs are much older.
Some of these bugs are ancient; backpatch all the way back to 9.3.
Peter Eisentraut [Sat, 16 Dec 2017 22:43:41 +0000 (17:43 -0500)]
Remove PortalGetQueryDesc()
After having gotten rid of PortalGetHeapMemory(), there seems little
reason to keep one Portal access macro around that offers no actual
abstraction and isn't consistently used anyway.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Peter Eisentraut [Sat, 16 Dec 2017 22:26:26 +0000 (17:26 -0500)]
Update portal-related memory context names and API
Rename PortalMemory to TopPortalContext, to avoid confusion with
PortalContext and align naming with similar top-level memory contexts.
Rename PortalData's "heap" field to portalContext. The "heap" naming
seems quite antiquated and confusing. Also get rid of the
PortalGetHeapMemory() macro and access the field directly, which we do
for other portal fields, so this abstraction doesn't buy anything.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Tom Lane [Tue, 9 Jan 2018 18:25:53 +0000 (13:25 -0500)]
Rewrite list_qsort() to avoid trashing its input list.
The initial implementation of list_qsort(), from commit ab7271677,
re-used the ListCells of the input list while not touching the List
header. This meant that anybody who still had a pointer to the
original header would now be in possession of a corrupted list,
a problem that seems sure to bite us eventually.
One possible solution is to re-use the original List header as well,
giving the function the semantics of update-in-place. However, that
doesn't seem like a very good idea either given the way that the
function is used in the planner: create_path functions aren't normally
supposed to modify their input lists. It doesn't look like there would
be a problem today, but it's not hard to foresee a time when modifying
a list of Paths in-place could have side-effects on some other append
path.
On the whole, and in view of the likelihood that this function might
be used in other contexts in the future, it seems best to get rid of
the micro-optimization of re-using the input list cells. Just build
a new list.
Tom Lane [Tue, 9 Jan 2018 18:07:52 +0000 (13:07 -0500)]
Improve the heuristic for ordering child paths of a parallel append.
Commit ab7271677 introduced code that attempts to order the child
scans of a Parallel Append node in a way that will minimize execution
time, based on total cost and startup cost. However, it failed to
think hard about what to do when estimated costs are exactly equal;
a case that's particularly likely to occur when comparing on startup
cost. In such a case the ordering of the child paths would be left
to the whims of qsort, an algorithm that isn't even stable.
We can improve matters by applying the rule used elsewhere in the
planner: if total costs are equal, sort on startup cost, and
vice versa. When both cost estimates are exactly equal, rather
than letting qsort do something unpredictable, sort based on the
child paths' relids, which should typically result in sorting in
inheritance order. (The latter provision requires inventing a
qsort-style comparator for bitmapsets, but maybe we'll have use
for that for other reasons in future.)
This results in a few plan changes in the select_parallel test,
but those all look more reasonable than before, when the actual
underlying cost numbers are taken into account.
Tom Lane [Tue, 9 Jan 2018 17:34:46 +0000 (12:34 -0500)]
While waiting for a condition variable, detect postmaster death.
The general assumption for postmaster child processes is that they
should just exit(1), reasonably promptly, if the postmaster disappears.
condition_variable.c neglected this consideration and could be left
waiting forever, if the counterpart process it is waiting for has
done the right thing and exited.
We had some discussion of adjusting the WaitEventSet API to make it
harder to make this type of mistake in future; but for the moment,
and for v10, let's make this narrow fix.
Tom Lane [Tue, 9 Jan 2018 17:09:30 +0000 (12:09 -0500)]
Fix race condition during replication origin drop.
replorigin_drop() misunderstood the API for condition variables: it
had ConditionVariablePrepareToSleep and ConditionVariableCancelSleep
inside its test-and-sleep loop, rather than outside the loop as
intended. The net effect is a narrow race-condition window wherein,
if the process using a replication slot releases it immediately after
replorigin_drop() releases the ReplicationOriginLock, replorigin_drop()
would get into the condition variable's wait list too late and then
wait indefinitely for a signal that won't come.
Because there's a different CV for each replication slot, we can't
just move the ConditionVariablePrepareToSleep call to above the
test-and-sleep loop. What we can do, in the wake of commit 13db3b936,
is drop the ConditionVariablePrepareToSleep call entirely. This fix
depends on that commit because (at least in principle) the slot matching
the target replication origin might move around, so that once in a blue
moon successive loop iterations might involve different CVs. We can now
cope with such a scenario, at the cost of an extra trip through the
retry loop.
(There are ways we could fix this bug without depending on that commit,
but they're all a lot more complicated than this way.)
While at it, upgrade the rather skimpy comments in this function.
Tom Lane [Tue, 9 Jan 2018 16:39:10 +0000 (11:39 -0500)]
Allow ConditionVariable[PrepareTo]Sleep to auto-switch between CVs.
The original coding here insisted that callers manually cancel any prepared
sleep for one condition variable before starting a sleep on another one.
While that's not a huge burden today, it seems like a gotcha that will bite
us in future if the use of condition variables increases; anything we can
do to make the use of this API simpler and more robust is attractive.
Hence, allow these functions to automatically switch their attention to
a different CV when required. This is safe for the same reason it was OK
for commit aced5a92b to let a broadcast operation cancel any prepared CV
sleep: whenever we return to the other test-and-sleep loop, we will
automatically re-prepare that CV, paying at most an extra test of that
loop's exit condition.
Back-patch to v10 where condition variables were introduced. Ordinarily
we would probably not back-patch a change like this, but since it does not
invalidate any coding pattern that was legal before, it seems safe enough.
Furthermore, there's an open bug in replorigin_drop() for which the
simplest fix requires this. Even if we chose to fix that in some more
complicated way, the hazard would remain that we might back-patch some
other bug fix that requires this behavior.
Robert Haas [Tue, 9 Jan 2018 15:12:58 +0000 (10:12 -0500)]
Don't allow VACUUM VERBOSE ANALYZE VERBOSE.
There are plans to extend the syntax for ANALYZE, so we need to break
the link between VacuumStmt and AnalyzeStmt. But apart from that, the
syntax above is undocumented and, if discovered by users, might give
the impression that the VERBOSE option for VACUUM differs from the
verbose option from ANALYZE, which it does not.
Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada
Teodor Sigaev [Tue, 9 Jan 2018 15:02:04 +0000 (18:02 +0300)]
Improve scripting language in pgbench
Added:
- variable now might contain integer, double, boolean and null values
- functions ln, exp
- logical AND/OR/NOT
- bitwise AND/OR/NOT/XOR
- bit right/left shift
- comparison operators
- IS [NOT] (NULL|TRUE|FALSE)
- conditional choice (in form of when/case/then)
New operations and functions allow to implement more complicated test scenario.
Author: Fabien Coelho with minor editorization by me Reviewed-By: Pavel Stehule, Jeevan Ladhe, me
Discussion: https://www.postgresql.org/message-id/flat/alpine.DEB.2.10.1604030742390.31618@sto
Bruce Momjian [Tue, 9 Jan 2018 03:43:51 +0000 (22:43 -0500)]
pg_upgrade: prevent check on live cluster from generating error
Previously an inaccurate but harmless error was generated when running
--check on a live server before reporting the servers as compatible.
The fix is to split error reporting and exit control in the exec_prog()
API.
Reported-by: Daniel Westermann
Backpatch-through: 10
Tom Lane [Mon, 8 Jan 2018 23:07:04 +0000 (18:07 -0500)]
Improve error detection capability in proclists.
Previously, although the initial state of a proclist_node is expected
to be next == prev == 0, proclist_delete_offset would reset nodes to
next == prev == INVALID_PGPROCNO when removing them from a list.
This is the same state that a node in a singleton list has, so that
it's impossible to distinguish not-in-a-list from in-a-list. Change
proclist_delete_offset to reset removed nodes to next == prev == 0,
making it possible to distinguish those cases, and then add Asserts
to the list add and delete functions that the supplied node isn't
or is in a list at entry. Also tighten assertions about the node
being in the particular list (not some other one) where it is possible
to check that in O(1) time.
In ConditionVariablePrepareToSleep, since we don't expect the process's
cvWaitLink to already be in a list, remove the more-or-less-useless
proclist_contains check; we'd rather have proclist_push_tail's new
assertion fire if that happens.
Improve various comments related to proclists, too.
Patch by me, reviewed by Thomas Munro. This isn't back-patchable, since
there could theoretically be inlined copies of proclist_delete_offset in
third-party modules. But it's only improving debuggability anyway.
Tom Lane [Mon, 8 Jan 2018 01:40:40 +0000 (20:40 -0500)]
Back off chattiness in RemovePgTempFiles().
In commit 561885db0, as part of normalizing RemovePgTempFiles's error
handling, I removed its behavior of silently ignoring ENOENT failures
during directory opens. Thomas Munro points out that this is a bad idea at
the top level, because we don't create pgsql_tmp directories until needed.
Thus this coding could produce LOG messages in perfectly normal situations,
which isn't what I intended. Restore the suppression of ENOENT logging,
but only at top level --- it would still be unexpected for a nested temp
directory to disappear between seeing it in the parent directory and
opening it.
Tom Lane [Sat, 6 Jan 2018 01:33:26 +0000 (20:33 -0500)]
Remove return values of ConditionVariableSignal/Broadcast.
In the wake of commit aced5a92b, the semantics of these results are
a bit squishy: we can tell whether we signaled some other process(es),
but we do not know which ones were real waiters versus mere sentinels
for ConditionVariableBroadcast operations. It does not help much that
ConditionVariableBroadcast will attempt to pass on the signal to the
next real waiter, because (a) there might not be one, and (b) that will
only happen awhile later, anyway. So these results could overstate how
much effect the calls really had.
However, no existing caller of either function pays any attention to its
result value, so it seems reasonable to just define that as a required
property of a correct algorithm. To encourage correctness and save some
tiny number of cycles, change both functions to return void.
Patch by me, per an observation by Thomas Munro. No back-patch, since
if any third parties happen to be using these functions, they might not
appreciate an API break in a minor release.
Tom Lane [Sat, 6 Jan 2018 00:42:49 +0000 (19:42 -0500)]
Reorder steps in ConditionVariablePrepareToSleep for more safety.
In the admittedly-very-unlikely case that AddWaitEventToSet fails,
ConditionVariablePrepareToSleep would error out after already having
set cv_sleep_target, which is probably bad, and after having already
set cv_wait_event_set, which is very bad. Transaction abort might or
might not clean up cv_sleep_target properly; but there is nothing
that would be aware that the WaitEventSet wasn't fully constructed,
so that all future condition variable sleeps would be broken.
We can easily guard against these hazards with slight restructuring.
Back-patch to v10 where condition_variable.c was introduced.
Tom Lane [Sat, 6 Jan 2018 00:21:30 +0000 (19:21 -0500)]
Rewrite ConditionVariableBroadcast() to avoid live-lock.
The original implementation of ConditionVariableBroadcast was, per its
self-description, "the dumbest way possible". Thomas Munro found out
it was a bit too dumb. An awakened process may immediately re-queue
itself, if the specific condition it's waiting for is not yet satisfied.
If this happens before ConditionVariableBroadcast is able to see the wait
queue as empty, then ConditionVariableBroadcast will re-awaken the same
process, repeating the cycle. Given unlucky timing this back-and-forth
can repeat indefinitely; loops lasting thousands of seconds have been
seen in testing.
To fix, add our own process to the end of the wait queue to serve as a
sentinel, and exit the broadcast loop once our process is not there
anymore. There are various special considerations described in the
comments, the principal disadvantage being that wakers can no longer
be sure whether they awakened a real waiter or just a sentinel. But in
practice nobody pays attention to the result of ConditionVariableSignal
or ConditionVariableBroadcast anyway, so that problem seems hypothetical.
Back-patch to v10 where condition_variable.c was introduced.
Robert Haas [Fri, 5 Jan 2018 20:18:03 +0000 (15:18 -0500)]
Factor error generation out of ExecPartitionCheck.
At present, we always raise an ERROR if the partition constraint
is violated, but a pending patch for UPDATE tuple routing will
consider instead moving the tuple to the correct partition.
Refactor to make that simpler.
Amit Khandekar, reviewed by Amit Langote, David Rowley, and me.
Alvaro Herrera [Fri, 5 Jan 2018 15:17:10 +0000 (12:17 -0300)]
Fix failure to delete spill files of aborted transactions
Logical decoding's reorderbuffer.c may spill transaction files to disk
when transactions are large. These are supposed to be removed when they
become "too old" by xid; but file removal requires the boundary LSNs of
the transaction to be known. The final_lsn is only set when we see the
commit or abort record for the transaction, but nothing sets the value
for transactions that crash, so the removal code misbehaves -- in
assertion-enabled builds, it crashes by a failed assertion.
To fix, modify the final_lsn of transactions that don't have a value
set, to the LSN of the very latest change in the transaction. This
causes the spilled files to be removed appropriately.
Another attempt at fixing build with various OpenSSL versions
It seems we can't easily work around the lack of
X509_get_signature_nid(), so revert the previous attempts and just
disable the tls-server-end-point feature if we don't have it.
Robert Haas [Thu, 4 Jan 2018 21:25:49 +0000 (16:25 -0500)]
Minor preparatory refactoring for UPDATE row movement.
Generalize is_partition_attr to has_partition_attrs and make it
accessible from outside tablecmds.c. Change map_partition_varattnos
to clarify that it can be used for mapping between any two relations
in a partitioning hierarchy, not just parent -> child.
Amit Khandekar, reviewed by Amit Langote, David Rowley, and me.
Some comment changes by me.
Robert Haas [Thu, 4 Jan 2018 20:48:15 +0000 (15:48 -0500)]
Simplify and encapsulate tuple routing support code.
Instead of having ExecSetupPartitionTupleRouting return multiple out
parameters, have it return a pointer to a structure containing all of
those different things. Also, provide and use a cleanup function,
ExecCleanupTupleRouting, instead of cleaning up all of the resources
allocated by ExecSetupPartitionTupleRouting individually.
Amit Khandekar, reviewed by Amit Langote, David Rowley, and me
Implement channel binding tls-server-end-point for SCRAM
This adds a second standard channel binding type for SCRAM. It is
mainly intended for third-party clients that cannot implement
tls-unique, for example JDBC.
Author: Michael Paquier <michael.paquier@gmail.com>
Tom Lane [Thu, 4 Jan 2018 19:59:00 +0000 (14:59 -0500)]
Fix incorrect computations of length of null bitmap in pageinspect.
Instead of using our standard macro for this calculation, this code
did it itself ... and got it wrong, leading to incorrect display of
the null bitmap in some cases. Noted and fixed by Maksim Milyutin.
In passing, remove a uselessly duplicative error check.
Errors were introduced in commit d6061f83a; back-patch to 9.6
where that came in.
Refactor channel binding code to fetch cbind_data only when necessary
As things stand now, channel binding data is fetched from OpenSSL and
saved into the SCRAM exchange context for any SSL connection attempted
for a SCRAM authentication, resulting in data fetched but not used if no
channel binding is used or if a different channel binding type is used
than what the data is here for.
Refactor the code in such a way that binding data is fetched from the
SSL stack only when a specific channel binding is used for both the
frontend and the backend. In order to achieve that, save the libpq
connection context directly in the SCRAM exchange state, and add a
dependency to SSL in the low-level SCRAM routines.
This makes the interface in charge of initializing the SCRAM context
cleaner as all its data comes from either PGconn* (for frontend) or
Port* (for the backend).
Author: Michael Paquier <michael.paquier@gmail.com>
Define LDAPS_PORT if it's missing and disable implicit LDAPS on Windows
Some versions of Windows don't define LDAPS_PORT.
Also, Windows' ldap_sslinit() is documented to use LDAPS even if you
said secure=0 when the port number happens to be 636 or 3269. Let's
avoid using the port number to imply that you want LDAPS, so that
connection strings have the same meaning on Windows and Unix.
Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm%3D23B7GV4AUz3MYH1TKpTv030VHxD2Sn%2BLYWDv8d-qWxww%40mail.gmail.com
Robert Haas [Thu, 4 Jan 2018 12:56:09 +0000 (07:56 -0500)]
Code review for Parallel Append.
- Remove unnecessary #include mistakenly added in execnodes.h.
- Fix mistake in comment in choose_next_subplan_for_leader.
- Adjust row estimates in cost_append for a possibly-different
parallel divisor.
- Clamp row estimates in cost_append after operations that may
not produce integers.
Tom Lane [Wed, 3 Jan 2018 22:53:06 +0000 (17:53 -0500)]
Clean up tupdesc.c for recent changes.
TupleDescCopy needs to have the same effects as CreateTupleDescCopy in
that, since it doesn't copy constraints, it should clear the per-attribute
fields associated with them. Oversight in commit cc5f81366.
Since TupleDescCopy has already established the presumption that it
can just flat-copy the entire attribute array in one go, propagate
that approach into CreateTupleDescCopy and CreateTupleDescCopyConstr.
(I'm suspicious that this would lead to valgrind complaints if we
had any trailing padding in the struct, but we do not, and anyway
fixing that seems like a job for a separate commit.)
Add some better comments.
Thomas Munro, reviewed by Vik Fearing, some additional hacking by me
Alvaro Herrera [Wed, 3 Jan 2018 14:16:34 +0000 (11:16 -0300)]
Revert "Fix isolation test to be less timing-dependent"
This reverts commit 2268e6afd596. It turned out that inconsistency in
the report is still possible, so go back to the simpler formulation of
the test and instead add an alternate expected output.
Andres Freund [Wed, 3 Jan 2018 20:00:11 +0000 (12:00 -0800)]
Rename pg_rewind's copy_file_range() to avoid conflict with new linux syscall.
Upcoming versions of glibc will contain copy_file_range(2), a wrapper
around a new linux syscall for in-kernel copying of data ranges. This
conflicts with pg_rewinds function of the same name.
Therefore rename pg_rewinds version. As our version isn't a generic
copying facility we decided to choose a rewind specific function name.
Per buildfarm animal caiman and subsequent discussion with Tom Lane.
Author: Andres Freund
Discussion:
https://postgr.es/m/20180103033425.w7jkljth3e26sduc@alap3.anarazel.de
https://postgr.es/m/31122.1514951044@sss.pgh.pa.us
Backpatch: 9.5-, where pg_rewind was introduced
Andrew Dunstan [Wed, 3 Jan 2018 20:26:39 +0000 (15:26 -0500)]
Fix use of config-specific libraries for Windows OpenSSL
Commit 614350a3 allowed for an different builds of OpenSSL libraries on
Windows, but ignored the fact that the alternative builds don't have
config-specific libraries. This patch fixes the Solution file to ask for
the correct libraries.
per offline discussions with Leonardo Cecchi and Marco Nenciarini,