Noah Misch [Sun, 12 Mar 2017 23:35:34 +0000 (19:35 -0400)]
Use wrappers of PG_DETOAST_DATUM_PACKED() more.
This makes almost all core code follow the policy introduced in the
previous commit. Specific decisions:
- Text search support functions with char* and length arguments, such as
prsstart and lexize, may receive unaligned strings. I doubt
maintainers of non-core text search code will notice.
- Use plain VARDATA() on values detoasted or synthesized earlier in the
same function. Use VARDATA_ANY() on varlenas sourced outside the
function, even if they happen to always have four-byte headers. As an
exception, retain the universal practice of using VARDATA() on return
values of SendFunctionCall().
- Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large
for a one-byte header, so this misses no optimization.) Sites that do
not call get_page_from_raw() typically need the four-byte alignment.
- For now, do not change btree_gist. Its use of four-byte headers in
memory is partly entangled with storage of 4-byte headers inside
GBT_VARKEY, on disk.
- For now, do not change gtrgm_consistent() or gtrgm_distance(). They
incorporate the varlena header into a cache, and there are multiple
credible implementation strategies to consider.
Noah Misch [Sun, 12 Mar 2017 23:35:33 +0000 (19:35 -0400)]
Recommend wrappers of PG_DETOAST_DATUM_PACKED().
When commit 3e23b68dac006e8deb0afa327e855258df8de064 introduced
single-byte varlena headers, its fmgr.h changes presented
PG_GETARG_TEXT_PP() and PG_GETARG_TEXT_P() as equals. Its postgres.h
changes presented PG_DETOAST_DATUM_PACKED() and VARDATA_ANY() as the
exceptional case. Now, instead, firmly recommend PG_GETARG_TEXT_PP()
over PG_GETARG_TEXT_P(); likewise for other ...PP() macros. This shaves
cycles and invites consistency of style.
Noah Misch [Sun, 12 Mar 2017 23:35:31 +0000 (19:35 -0400)]
Fix pg_file_write() error handling.
Detect fclose() failures; given "ln -s /dev/full $PGDATA/devfull",
"pg_file_write('devfull', 'x', true)" now fails as it should. Don't
leak a stream when fwrite() fails. Remove a born-ineffective test that
aimed to skip zero-length writes. Back-patch to 9.2 (all supported
versions).
Noah Misch [Sun, 12 Mar 2017 23:35:31 +0000 (19:35 -0400)]
Assume deconstruct_array() outputs are untoasted.
In functions that issue a deconstruct_array() call, consistently use
plain VARSIZE()/VARDATA() on the array elements. Prior practice was
divided between those and VARSIZE_ANY_EXHDR()/VARDATA_ANY().
Tom Lane [Sun, 12 Mar 2017 20:01:33 +0000 (16:01 -0400)]
Add "break"s to make it clearer what will happen in a nested switch.
This could only matter if the guessed_type variable had a value that wasn't
a member of the PasswordType enum; but just in case, let's be sure that
control falls out to reach the elog(ERROR) at the end of the function.
Per gripe from Coverity.
Tom Lane [Sun, 12 Mar 2017 19:52:50 +0000 (15:52 -0400)]
Remove dead code in nodeGatherMerge.c.
Coverity noted that the last line of gather_merge_getnext() was
unreachable, since each arm of the preceding "if" ends in a "return".
Drop it as an oversight. In passing, improve some nearby comments.
Andres Freund [Sat, 11 Mar 2017 23:36:50 +0000 (15:36 -0800)]
Improve expression evaluation test coverage.
Upcoming patches are revamping expression evaluation significantly. It
therefore seems prudent to try to ensure that the coverage of the
existing evaluation code is high.
This commit adds coverage for the cases that can reasonably be
tested. There's still a bunch of unreachable error messages and such,
but otherwise this achieves nearly full regression test coverage (with
the exception of the unused GetAttributeByNum/GetAttributeByName).
Author: Andres Freund
Discussion: https://postgr.es/m/20170310194021.ek4bs4bl2khxkmll@alap3.anarazel.de
Joe Conway [Sat, 11 Mar 2017 21:32:18 +0000 (13:32 -0800)]
Fix ancient connection leak in dblink
When using unnamed connections with dblink, every time a new
connection is made, the old one is leaked. Fix that.
This has been an issue probably since dblink was first committed.
Someone complained almost ten years ago, but apparently I decided
not to pursue it at the time, and neither did anyone else, so it
slipped between the cracks. Now that someone else has complained,
fix in all supported branches.
Tom Lane [Fri, 10 Mar 2017 21:32:18 +0000 (16:32 -0500)]
Improve postmaster's logging of listen socket creation.
When one of the kernel calls in the socket()/bind()/listen() sequence
fails, include the specific address we're trying to bind to in the log
message. This greatly eases debugging of network misconfigurations.
Also, after successfully setting up a listen socket, report its address
in the log, to ease verification that the expected addresses were bound.
There was some debate about whether to print this message at LOG level or
only DEBUG1, but the majority of votes were for the former.
Tom Lane [Fri, 10 Mar 2017 20:18:38 +0000 (15:18 -0500)]
Reduce log verbosity of startup/shutdown for launcher subprocesses.
There's no really good reason why the autovacuum launcher and logical
replication launcher should announce themselves at startup and shutdown
by default. Users don't care that those processes exist, and it's
inconsistent that those background processes announce themselves while
others don't. So, reduce those messages from LOG to DEBUG1 level.
I was sorely tempted to reduce the "starting logical replication worker
for subscription ..." message to DEBUG1 as well, but forebore for now.
Those processes might possibly be of direct interest to users, at least
until logical replication is a lot better shaken out than it is today.
Robert Haas [Fri, 10 Mar 2017 19:49:56 +0000 (14:49 -0500)]
Revert "Use group updates when setting transaction status in clog."
This reverts commit ccce90b398673d55b0387b3de66639b1b30d451b. This
optimization is unsafe, at least, of rollbacks and rollbacks to
savepoints, but I'm concerned there may be other problematic cases as
well. Therefore, I've decided to revert this pending further
investigation.
Tom Lane [Fri, 10 Mar 2017 19:15:09 +0000 (14:15 -0500)]
Sanitize newlines in object names in "pg_restore -l" output.
Commits 89e0bac86 et al replaced newlines with spaces in object names
printed in SQL comments, but we neglected to consider that the same
names are also printed by "pg_restore -l", and a newline would render
the output unparseable by "pg_restore -L". Apply the same replacement
in "-l" output. Since "pg_restore -L" doesn't actually examine any
object names, only the dump ID field that starts each line, this is
enough to fix things for its purposes.
The previous fix was treated as a security issue, and we might have
done that here as well, except that the issue was reported publicly
to start with. Anyway it's hard to see how this could be exploited
for SQL injection; "pg_restore -L" doesn't do much with the file
except parse it for leading integers.
Per bug #14587 from Milos Urbanek. Back-patch to all supported versions.
Tom Lane [Fri, 10 Mar 2017 18:15:47 +0000 (13:15 -0500)]
Change the relkind for partitioned tables from 'P' to 'p'.
Seven of the eight other relkind codes are lower-case, so it wasn't
consistent for this one to be upper-case. Fix it while we still can.
Historical notes: the reason for the lone exception, i.e. sequences being
'S', is that 's' was once used for "special" relations. Also, at one time
the partitioned-tables patch used both 'P' and 'p', but that got changed,
leaving only a surprising choice behind.
This also fixes a couple little bits of technical debt, such as
type_sanity.sql not knowing that 'm' is a legal value for relkind.
Tom Lane [Fri, 10 Mar 2017 04:36:44 +0000 (23:36 -0500)]
Fix hard-coded relkind constants in assorted other files.
Although it's reasonable to expect that most of these constants will
never change, that does not make it good programming style to hard-code
the value rather than using the RELKIND_FOO macros.
I think I've now gotten all the hard-coded references in C code.
Unfortunately there's no equally convenient way to parameterize
SQL files ...
Tom Lane [Fri, 10 Mar 2017 03:55:46 +0000 (22:55 -0500)]
contrib/amcheck needs RecentGlobalXmin to be PGDLLIMPORT'ified.
Per buildfarm. Maybe some of the other xmin variables in snapmgr.h
ought to get this too, but for the moment I'm just interested in
un-breaking the buildfarm.
Tom Lane [Fri, 10 Mar 2017 03:42:16 +0000 (22:42 -0500)]
Fix hard-coded relkind constants in assorted src/bin files.
Although it's reasonable to expect that most of these constants will
never change, that does not make it good programming style to hard-code
the value rather than using the RELKIND_FOO macros.
Tom Lane [Fri, 10 Mar 2017 01:45:52 +0000 (20:45 -0500)]
Fix portability problem in Catalog.pm.
Commit 7666e73a2 introduced a dependency on filehandles' input_line_number
method, but apparently that's a Perl neologism. Use $. instead, which
works at least back to Perl 5.10, and hopefully back to 5.8.
Tom Lane [Fri, 10 Mar 2017 01:42:42 +0000 (20:42 -0500)]
Fix hard-coded relkind constants in psql/describe.c.
Although it's reasonable to expect that most of these constants will
never change, that does not make it good programming style to hard-code
the value rather than using the RELKIND_FOO macros.
Andres Freund [Thu, 9 Mar 2017 23:50:40 +0000 (15:50 -0800)]
Add amcheck extension to contrib.
This is the beginning of a collection of SQL-callable functions to
verify the integrity of data files. For now it only contains code to
verify B-Tree indexes.
This adds two SQL-callable functions, validating B-Tree consistency to
a varying degree. Check the, extensive, docs for details.
The goal is to later extend the coverage of the module to further
access methods, possibly including the heap. Once checks for
additional access methods exist, we'll likely add some "dispatch"
functions that cover multiple access methods.
Author: Peter Geoghegan, editorialized by Andres Freund Reviewed-By: Andres Freund, Tomas Vondra, Thomas Munro,
Anastasia Lubennikova, Robert Haas, Amit Langote
Discussion: CAM3SWZQzLMhMwmBqjzK+pRKXrNUZ4w90wYMUWfkeV8mZ3Debvw@mail.gmail.com
Tom Lane [Fri, 10 Mar 2017 00:18:57 +0000 (19:18 -0500)]
Fix hard-coded relkind constants in pg_dump.c.
Although it's reasonable to expect that most of these constants will
never change, that does not make it good programming style to hard-code
the value rather than using the RELKIND_FOO macros. There were only
a few such violations, and all relatively new AFAICT.
Existing style is mostly to inject relkind values into constructed
query strings using %c. I did not bother to touch places that did it
like that, but really a better technique is to stringify the RELKIND
macro, at least in places where you'd want single quotes around the
code character. That avoids any runtime effort and keeps the RELKIND
symbol close to where it's used.
Tom Lane [Thu, 9 Mar 2017 23:54:45 +0000 (18:54 -0500)]
Make CppAsString2() more visible in c.h.
For some reason this standard C string-processing hack was buried in an
NLS-related section of c.h. Put it beside CppAsString() so that people
are more likely to find it and not be tempted to reinvent local copies,
as I nearly did. And provide a more helpful comment, too.
Robert Haas [Thu, 9 Mar 2017 22:44:52 +0000 (17:44 -0500)]
Use group updates when setting transaction status in clog.
Commit 0e141c0fbb211bdd23783afa731e3eef95c9ad7a introduced a mechanism
to reduce contention on ProcArrayLock by having a single process clear
XIDs in the procArray on behalf of multiple processes, reducing the
need to hand the lock around. Use a similar mechanism to reduce
contention on CLogControlLock. Testing shows that this very
significantly reduces the amount of time waiting for CLogControlLock
on high-concurrency pgbench tests run on a large multi-socket
machines; whether that translates into a TPS improvement depends on
how much of that contention is simply shifted to some other lock,
particularly WALWriteLock.
Amit Kapila, with some cosmetic changes by me. Extensively reviewed,
tested, and benchmarked over a period of about 15 months by Simon
Riggs, Robert Haas, Andres Freund, Jesper Pedersen, and especially by
Tomas Vondra and Dilip Kumar.
Tom Lane [Thu, 9 Mar 2017 22:20:11 +0000 (17:20 -0500)]
Fix timestamptz regression test to still work with latest IANA zone data.
The IANA timezone crew continues to chip away at their project of removing
timezone abbreviations that have no real-world currency from their
database. The tzdata2017a update removes all such abbreviations for
South American zones, as well as much of the Pacific. This breaks some
test cases in timestamptz.sql that were expecting America/Santiago and
America/Caracas to have non-numeric abbreviations.
The test cases involving America/Santiago seem to have selected that
zone more or less at random, so just replace it with America/New_York,
which is of similar longitude. The cases involving America/Caracas are
harder since they were chosen to test a time-varying zone abbreviation
around a point where it changed meaning in the backwards direction.
Fortunately, Europe/Moscow has a similar case in 2014, and the MSK/MSD
abbreviations are well enough attested that IANA seems unlikely to
decide to remove them from the database in future.
With these changes, this regression test should pass when using any IANA
zone database from 2015 or later. One could wish that there were a few
years more daylight on how out-of-date your zone database can be ... but
really the --with-system-tzdata option is only meant for use on platforms
where the zone database is kept up-to-date pretty faithfully, so I do not
think this is a big objection.
Stephen Frost [Thu, 9 Mar 2017 21:34:25 +0000 (16:34 -0500)]
Add relkind checks to certain contrib modules
The contrib extensions pageinspect, pg_visibility and pgstattuple only
work against regular relations which have storage. They don't work
against foreign tables, partitioned (parent) tables, views, et al.
Add checks to the user-callable functions to return a useful error
message to the user if they mistakenly pass an invalid relation to a
function which doesn't accept that kind of relation.
In passing, improve some of the existing checks to use ereport() instead
of elog(), add a function to consolidate common checks where
appropriate, and add some regression tests.
Author: Amit Langote, with various changes by me
Reviewed by: Michael Paquier and Corey Huinker
Discussion: https://postgr.es/m/ab91fd9d-4751-ee77-c87b-4dd704c1e59c@lab.ntt.co.jp
Robert Haas [Thu, 9 Mar 2017 18:02:34 +0000 (13:02 -0500)]
Document some new parallel query capabilities.
This updates the text for parallel index scan, parallel index-only
scan, parallel bitmap heap scan, and parallel merge join. It also
expands the discussion of parallel joins slightly.
Enable replication connections by default in pg_hba.conf
initdb now initializes a pg_hba.conf that allows replication connections
from the local host, same as it does for regular connections. The
connecting user still needs to have the REPLICATION attribute or be a
superuser.
The intent is to allow pg_basebackup from the local host to succeed
without requiring additional configuration.
Michael Paquier <michael.paquier@gmail.com> and me
Robert Haas [Thu, 9 Mar 2017 12:40:36 +0000 (07:40 -0500)]
Add a Gather Merge executor node.
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers. (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)
This works out to a win if it saves us from having to perform an
expensive Sort. In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't. But if this avoids an expensive sort then it's a win.
Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.
Tom Lane [Thu, 9 Mar 2017 01:58:17 +0000 (20:58 -0500)]
Fix inclusions of c.h from .h files.
We have a project policy that every .c file should start by including
postgres.h, postgres_fe.h, or c.h as appropriate; and then there is no
need for any .h file to explicitly include any of these. Fix a few
headers that were violating this policy by including c.h.
Tom Lane [Thu, 9 Mar 2017 01:41:06 +0000 (20:41 -0500)]
Fix inclusions of postgres_fe.h from .h files.
We have a project policy that every .c file should start by including
postgres.h, postgres_fe.h, or c.h as appropriate; and then there is no
need for any .h file to explicitly include any of these. Fix a few
headers that were violating this policy by including postgres_fe.h.
Tom Lane [Wed, 8 Mar 2017 22:21:08 +0000 (17:21 -0500)]
Bring plpgsql into line with header inclusion policy.
We have a project policy that every .c file should start by including
postgres.h, postgres_fe.h, or c.h as appropriate; and then there is no
need for any .h file to explicitly include any of these. (The core
reason for this policy is to make it easy to verify that pg_config_os.h
is included before any system headers such as <stdio.h>; without that,
we have portability issues on some platforms due to variation in largefile
options across different modules in the backend. Also, if .h files were
responsible for choosing which of these key headers to include, .h files
that need to be includable in either frontend or backend compiles would be
in trouble.)
plpgsql was blithely ignoring this policy, so whack it upside the head
until it complies. I also chose to standardize on including plpgsql's
own .h files after all core-system headers that it pulls in. That
could've been done either way, but this way seems saner.
Tom Lane [Wed, 8 Mar 2017 22:01:13 +0000 (17:01 -0500)]
Document intentional violations of header inclusion policy.
Although there are good reasons for our policy of including postgres.h
as the first #include in every .c file, never from .h files, there are
two places where it seems expedient to violate the policy because the
alternative is to modify externally-supplied .c files. (In the case
of the regexp library, the idea that it's externally-supplied is kind
of at odds with reality, but I haven't entirely given up hope that it
will become a standalone project some day.) Add some comments to make
it explicit that this is a policy violation and provide the reasoning.
In passing, move #include "miscadmin.h" out of regcomp.c and into
regcustom.h, which is where it should be if we're taking this reasoning
seriously at all.
Tom Lane [Wed, 8 Mar 2017 21:10:00 +0000 (16:10 -0500)]
Suppress compiler warning in non-USE_LIBXML builds.
Compilers that don't realize that ereport(ERROR) doesn't return
complained that XmlTableGetValue() failed to return a value.
Also, make XmlTableFetchRow's non-USE_LIBXML case look more like
the other ones. As coded, it could lead to "unreachable code"
warnings with USE_LIBXML enabled.
Tom Lane [Wed, 8 Mar 2017 20:38:26 +0000 (15:38 -0500)]
Put back <float.h> in a few files that need it for _isnan().
Further fallout from commit c29aff959: there are some files that need
<float.h>, and were getting it from datatype/timestamp.h, but it was not
apparent in my (tgl's) testing because the requirement for <float.h>
exists only on certain Windows toolchains.
Stephen Frost [Wed, 8 Mar 2017 20:14:03 +0000 (15:14 -0500)]
Expose explain's SUMMARY option
This exposes the existing explain summary option to users to allow them
to choose if they wish to have the planning time and totalled run time
included in the EXPLAIN result. The existing default behavior is
retained if SUMMARY is not specified- running explain without analyze
will not print the summary lines (just the planning time, currently)
while running explain with analyze will include the summary lines (both
the planning time and the totalled execution time).
Users who wish to see the summary information for plain explain can now
use: EXPLAIN (SUMMARY ON) query; Users who do not want to have the
summary printed for an analyze run can use:
EXPLAIN (ANALYZE ON, SUMMARY OFF) query;
With this, we can now also have EXPLAIN ANALYZE queries included in our
regression tests by using:
EXPLAIN (ANALYZE ON, TIMING OFF, SUMMARY off) query;
I went ahead and added an example of this, which will hopefully not make
the buildfarm complain.
Tom Lane [Wed, 8 Mar 2017 17:21:12 +0000 (12:21 -0500)]
Use doubly-linked block lists in aset.c to reduce large-chunk overhead.
Large chunks (those too large for any palloc freelist) are managed as
separate blocks. Formerly, realloc'ing or pfree'ing such a chunk required
O(N) time in a context with N blocks, since we had to traipse down the
singly-linked block list to locate the block's predecessor before we could
fix the list links. This can result in O(N^2) runtime in situations where
large numbers of such chunks are manipulated within one context. Cases
like that were not foreseen in the original design of aset.c, and indeed
didn't arise until fairly recently. But such problems can now occur in
reorderbuffer.c and in hash joining, both of which make repeated large
requests without scaling up their request size as they do so, and which
will free their requests in not-necessarily-LIFO order.
To fix, change the block list from singly-linked to doubly-linked.
This adds another 4 or 8 bytes to ALLOC_BLOCKHDRSZ, but that doesn't
seem like unacceptable overhead, since aset.c's blocks are normally
8K or more, and never less than 1K in current practice.
In passing, get rid of some redundant AllocChunkGetPointer() calls in
AllocSetRealloc (the compiler might be smart enough to optimize these
away anyway, but no need to assume that) and improve AllocSetCheck's
checking of block header fields.
Back-patch to 9.4 where reorderbuffer.c appeared. We could take this
further back, but currently there's no evidence that it would be useful.
Robert Haas [Wed, 8 Mar 2017 17:05:43 +0000 (12:05 -0500)]
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Fujii Masao [Wed, 8 Mar 2017 16:44:23 +0000 (01:44 +0900)]
Prevent logical rep workers with removed subscriptions from starting.
Any logical rep workers must have their subscription entries in
pg_subscription. To ensure this, we need to prevent the launcher
from starting new worker corresponding to the subscription that
DROP SUBSCRIPTION command is removing. To implement this,
previously LogicalRepLauncherLock was introduced and held until
the end of transaction running DROP SUBSCRIPTION. But using
LWLock for that purpose was not valid.
Instead, this commit changes DROP SUBSCRIPTION so that it takes
AccessExclusiveLock on pg_subscription, in order to ensure that
the launcher cannot see any subscriptions being removed. Also this
commit gets rid of LogicalRepLauncherLock.
Alvaro Herrera [Wed, 8 Mar 2017 16:29:48 +0000 (13:29 -0300)]
Fix XMLTABLE on older libxml2
libxml2 older than 2.9.1 does not have xmlXPathSetContextNode (released
in 2013, so reasonable platforms have trouble). That function is fairly
trivial, so I have inlined it in the one added caller. This passes
tests on my machine; let's see what the buildfarm thinks about it.
Alvaro Herrera [Wed, 8 Mar 2017 15:39:37 +0000 (12:39 -0300)]
Support XMLTABLE query expression
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.
This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE. (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).
The implemented syntax deviates slightly from what the standard
requires. First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only. Second, we don't
currently support a default namespace to be specified.
This implementation relies on a new executor node based on a hardcoded
method table. (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)
Fujii Masao [Wed, 8 Mar 2017 14:43:38 +0000 (23:43 +0900)]
Fix connection leak in DROP SUBSCRIPTION command, take 2.
Commit 898a792eb8283e31efc0b6fcbc03bbcd5f7df667 fixed the connection
leak issue, but it was an unreliable way of bugfix. This bugfix was
assuming that walrcv_command() subroutine cannot throw an error,
but it's untenable assumption. For example, if it will be changed
so that an error is thrown, connection leak issue will happen again.
This patch ensures that the connection is closed even when
walrcv_command() subroutine throws an error.
Patch by me, reviewed by Petr Jelinek and Michael Paquier
Robert Haas [Wed, 8 Mar 2017 13:15:24 +0000 (08:15 -0500)]
Fix parallel index and index-only scans to fall back to serial.
Parallel executor nodes can't assume that parallel execution will
happen in every case where the plan calls for it, because it might
not work out that way. However, parallel index scan and parallel
index-only scan failed to do the right thing here. Repair.
Robert Haas [Wed, 8 Mar 2017 13:02:03 +0000 (08:02 -0500)]
tidbitmap: Support shared iteration.
When a shared iterator is used, each call to tbm_shared_iterate()
returns a result that has not yet been returned to any process
attached to the shared iterator. In other words, each cooperating
processes gets a disjoint subset of the full result set, but all
results are returned exactly once.
This is infrastructure for parallel bitmap heap scan.
Dilip Kumar. The larger patch set of which this is a part has been
reviewed and tested by (at least) Andres Freund, Amit Khandekar,
Tushar Ahuja, Rafia Sabih, Haribabu Kommi, and Thomas Munro.
Robert Haas [Tue, 7 Mar 2017 22:03:51 +0000 (17:03 -0500)]
hash: Refactor hash index creation.
The primary goal here is to move all of the related page modifications
to a single section of code, in preparation for adding write-ahead
logging. In passing, rename _hash_metapinit to _hash_init, since it
initializes more than just the metapage.
Amit Kapila. The larger patch series of which this is a part has been
reviewed and tested by Álvaro Herrera, Ashutosh Sharma, Mark Kirkwood,
Jeff Janes, and Jesper Pedersen.
Tom Lane [Tue, 7 Mar 2017 17:40:44 +0000 (12:40 -0500)]
Invent start_proc parameters for PL/Tcl.
Define GUCs pltcl.start_proc and pltclu.start_proc. When set to a
nonempty value at the time a new Tcl interpreter is created, the
parameterless pltcl or pltclu function named by the GUC is called to
allow user-controlled initialization to occur within the interpreter.
This is modeled on plv8's start_proc parameter, and also has much in
common with plperl's on_init feature. It allows users to fully
replace the "modules" feature that was removed in commit 817f2a586.
Since an initializer function could subvert later Tcl code in nearly
arbitrary ways, mark both GUCs as SUSET for now. It would be nice
to find a way to relax that someday; but the corresponding GUCs in
plperl are also SUSET, and there's not been much complaint.
Tom Lane [Tue, 7 Mar 2017 17:05:57 +0000 (12:05 -0500)]
Clean up test_ifaddrs a bit.
We customarily #include <netinet/in.h> before <arpa/inet.h>; according
to our git history (cf commit 527f8babc) there used to be platform(s)
where <arpa/inet.h> didn't compile otherwise. That's probably not
really an issue anymore, but since test_ifaddrs.c is the one and only
place in our code that's not following that rule, bring it into line.
Also remove #include <sys/socket.h>, as that's duplicative given that
libpq/ifaddr.h does so (via pqcomm.h).
In passing, add a .gitignore file so nobody accidentally commits the
test_ifaddrs executable, as I nearly did.
I see no particular need to back-patch this, as it's just neatnik-ism
considering we don't build test_ifaddrs by default, or even document
it anywhere.
Robert Haas [Tue, 7 Mar 2017 16:49:49 +0000 (11:49 -0500)]
Consider parallel merge joins.
Commit 45be99f8cd5d606086e0a458c9c72910ba8a613d took the position
that performing a merge join in parallel was not likely to work out
well, but this conclusion was greeted with skepticism even at the
time. Whether it was true then or not, it's clearly not true any
more now that we have parallel index scan.
Tom Lane [Tue, 7 Mar 2017 16:36:35 +0000 (11:36 -0500)]
Fix pgbench's failure to honor the documented long-form option "--builtin".
Not only did it not accept --builtin as a synonym for -b, but what it did
accept as a synonym was --tpc-b (huh?), which it got even further wrong
by marking as no_argument, so that if you did try that you got a core
dump. I suppose this is leftover from some early design for the new
switches added by commit 8bea3d221, but it's still pretty sloppy work.
Per bug #14580 from Stepan Pesternikov. Back-patch to 9.6 where the
error was introduced.
Tom Lane [Tue, 7 Mar 2017 15:42:11 +0000 (10:42 -0500)]
Remove vestigial grammar support for CHARACTER ... CHARACTER SET option.
The SQL standard says that you should be able to write "CHARACTER SET foo"
as part of the declaration of a char-type column. We don't implement that,
but a rough form of support has existed in gram.y since commit f10b63923.
That's now sat there for nigh 20 years without anyone fleshing it out ---
and even if someone did, the contemplated approach of having separate data
type name(s) for every character set certainly isn't what we'd do today.
Let's just remove the grammar production; if anyone is ever motivated to
work on this, reinventing the grammar support is a trivial fraction of
what they'd have to do. And we've never documented anything about
supporting such a clause.
Robert Haas [Tue, 7 Mar 2017 15:33:29 +0000 (10:33 -0500)]
Preparatory refactoring for parallel merge join support.
Extract the logic used by hash_inner_and_outer into a separate
function, get_cheapest_parallel_safe_total_inner, so that it can
also be used to plan parallel merge joins.
Also, add a require_parallel_safe argument to the existing function
get_cheapest_path_for_pathkeys, because parallel merge join needs
to find the cheapest path for a given set of pathkeys that is
parallel-safe, not just the cheapest one overall.
Robert Haas [Tue, 7 Mar 2017 15:22:07 +0000 (10:22 -0500)]
Fix parallel hash join path search.
When the very cheapest path is not parallel-safe, we want to instead use
the cheapest unparameterized path that is. The old code searched
innerrel->cheapest_parameterized_paths, but that isn't right, because
the path we want may not be in that list. Search innerrel->pathlist
instead.
Stephen Frost [Tue, 7 Mar 2017 14:31:52 +0000 (09:31 -0500)]
psql: Add \gx command
It can often be useful to use expanded mode output (\x) for just a
single query. Introduce a \gx which acts exactly like \g except that it
will force expanded output mode for that one \gx call. This is simpler
than having to use \x as a toggle and also means that the user doesn't
have to worry about the current state of the expanded variable, or
resetting it later, to ensure a given query is always returned in
expanded mode.
Primairly Christoph's patch, though I did tweak the documentation and help
text a bit, and re-indented the tab completion section.
Author: Christoph Berg
Reviewed By: Daniel Verite
Discussion: https://postgr.es/m/20170127132737.6skslelaf4txs6iw%40msg.credativ.de
Simon Riggs [Tue, 7 Mar 2017 14:00:54 +0000 (22:00 +0800)]
Allow pg_dumpall to dump roles w/o user passwords
Add new option --no-role-passwords which dumps roles without passwords.
Since we don’t need passwords, we choose to use pg_roles in preference
to pg_authid since access may be restricted for security reasons in
some configrations.
Support SCRAM-SHA-256 authentication (RFC 5802 and 7677).
This introduces a new generic SASL authentication method, similar to the
GSS and SSPI methods. The server first tells the client which SASL
authentication mechanism to use, and then the mechanism-specific SASL
messages are exchanged in AuthenticationSASLcontinue and PasswordMessage
messages. Only SCRAM-SHA-256 is supported at the moment, but this allows
adding more SASL mechanisms in the future, without changing the overall
protocol.
Support for channel binding, aka SCRAM-SHA-256-PLUS is left for later.
The SASLPrep algorithm, for pre-processing the password, is not yet
implemented. That could cause trouble, if you use a password with
non-ASCII characters, and a client library that does implement SASLprep.
That will hopefully be added later.
Authorization identities, as specified in the SCRAM-SHA-256 specification,
are ignored. SET SESSION AUTHORIZATION provides more or less the same
functionality, anyway.
If a user doesn't exist, perform a "mock" authentication, by constructing
an authentic-looking challenge on the fly. The challenge is derived from
a new system-wide random value, "mock authentication nonce", which is
created at initdb, and stored in the control file. We go through these
motions, in order to not give away the information on whether the user
exists, to unauthenticated users.
Bumps PG_CONTROL_VERSION, because of the new field in control file.
Patch by Michael Paquier and Heikki Linnakangas, reviewed at different
stages by Robert Haas, Stephen Frost, David Steele, Aleksander Alekseev,
and many others.
Stephen Frost [Tue, 7 Mar 2017 04:29:02 +0000 (23:29 -0500)]
pg_dump: Properly handle public schema ACLs with --clean
pg_dump has always handled the public schema in a special way when it
comes to the "--clean" option. To wit, we do not drop or recreate the
public schema in "normal" mode, but when we are run in "--clean" mode
then we do drop and recreate the public schema.
When running in "--clean" mode, the public schema is dropped and then
recreated and it is recreated with the normal schema-default privileges
of "nothing". This is unlike how the public schema starts life, which
is to have CREATE and USAGE GRANT'd to the PUBLIC role, and that is what
is recorded in pg_init_privs.
Due to this, in "--clean" mode, pg_dump would mistakenly only dump out
the set of privileges required to go from the initdb-time privileges on
the public schema to whatever the current-state privileges are. If the
privileges were not changed from initdb time, then no privileges would
be dumped out for the public schema, but with the schema being dropped
and recreated, the result was that the public schema would have no ACLs
on it instead of what it should have, which is the initdb-time
privileges.
Practically speaking, this meant that pg_dump with --clean mode dumping
a database where the ACLs on the public schema were not changed from the
default would, upon restore, result in a public schema with *no*
privileges GRANT'd, not matching the state of the existing database
(where the initdb-time privileges would have been CREATE and USAGE to
the PUBLIC role for the public schema).
To fix, adjust the query in getNamespaces() to ignore the pg_init_privs
entry for the public schema when running in "--clean" mode, meaning that
the privileges for the public schema would be dumped, correctly, as if
it was going from a newly-created schema to the current state (which is,
indeed, what will happen during the restore thanks to the DROP/CREATE).
Only the public schema is handled in this special way by pg_dump, no
other initdb-time objects are dropped/recreated in --clean mode.
Tom Lane [Tue, 7 Mar 2017 00:33:59 +0000 (19:33 -0500)]
Repair incorrect pg_dump labeling for some comments and security labels.
We attached no schema label to comments for procedural languages, casts,
transforms, operator classes, operator families, or text search objects.
The first three categories of objects don't really have schemas, but
pg_dump treats them as if they do, and it seems like the TocEntry fields
for their comments had better match the TocEntry fields for the parent
objects. (As an example of a possible hazard, the type names in a CAST
will be formatted with the assumption of a particular search_path, so
failing to ensure that this same path is active for the COMMENT ON command
could lead to an error or to attaching the comment to the wrong cast.)
In the last six cases, this was a flat-out error --- possibly mine to
begin with, but it was a long time ago.
The security label for a procedural language was likewise not correctly
labeled as to schema, and both the comment and security label for a
procedural language were not correctly labeled as to owner.
In simple cases the restore would accidentally work correctly anyway, since
these comments and security labels would normally get emitted right after
the owning object, and so the search path and active user would be correct
anyhow. But it could fail in corner cases; for example a schema-selective
restore would omit comments it should include.
Giuseppe Broccolo noted the oversight, and proposed the correct fix, for
text search dictionary objects; I found the rest by cross-checking other
dumpComment() calls. These oversights are ancient, so back-patch all
the way.
Andres Freund [Wed, 23 Nov 2016 08:23:42 +0000 (00:23 -0800)]
Make simplehash.h grow hashtable in additional cases.
Increase the size when either the distance between actual and optimal
slot grows too large, or when too many subsequent entries would have
to be moved.
This addresses reports that the simplehash performed, sometimes
considerably, worse than dynahash.
The reason turned out to be that insertions into the hashtable where,
due to the use of parallel query, in effect done from another
hashtable, in hash-value order. If the target hashtable, due to
mis-estimation, was sized a lot smaller than the source table(s) that
lead to very imbalanced tables; a lot of entries in many close-by
buckets from the source tables were inserted into a single, wider,
bucket on the target table. As the growth factor was solely computed
based on the fillfactor, the performance of the table decreased
further and further.
b81b5a96f424531b was an attempt to address this problem for hash
aggregates (but not for bitmap scans), but it turns out that the
current method of mixing hash values often actually leaves neighboring
hash-values close to each other, just in different value range. It
might be worth revisiting that independently of the performance issues
addressed in this patch..
To address that problem resize tables in two additional cases: Firstly
when the optimal position for an entry would be far from the actual
position, secondly when many entries would have to be moved to make
space for the new entry (while satisfying the robin hood property).
Due to the additional resizing threshold it seems possible, and
testing confirms that so far, that a higher fillfactor doesn't hurt
performance and saves a bit of memory. It seems better to increase it
now, before a release containing any of this code, rather than wonder
in some later release.
The various boundaries aren't determined in a particularly scientific
manner, they might need some fine-tuning.
In all my tests the new code now, even with parallelism, performs at
least as good as the old code, in several scenarios significantly
better.
Reported-By: Dilip Kumar, Robert Haas, Kuntal Ghosh
Discussion:
https://postgr.es/m/CAFiTN-vagvuAydKG9VnWcoK=ADAhxmOa4ZTrmNsViBBooTnriQ@mail.gmail.com
https://postgr.es/m/CAGz5QC+=fNTYgzMLTBUNeKt6uaWZFXJbkB5+7oWm-n9DwVxcLA@mail.gmail.com
Stephen Frost [Mon, 6 Mar 2017 22:03:57 +0000 (17:03 -0500)]
pg_upgrade: Fix large object COMMENTS, SECURITY LABELS
When performing a pg_upgrade, we copy the files behind pg_largeobject
and pg_largeobject_metadata, allowing us to avoid having to dump out and
reload the actual data for large objects and their ACLs.
Unfortunately, that isn't all of the information which can be associated
with large objects. Currently, we also support COMMENTs and SECURITY
LABELs with large objects and these were being silently dropped during a
pg_upgrade as pg_dump would skip everything having to do with a large
object and pg_upgrade only copied the tables mentioned to the new
cluster.
As the file copies happen after the catalog dump and reload, we can't
simply include the COMMENTs and SECURITY LABELs in pg_dump's binary-mode
output but we also have to include the actual large object definition as
well. With the definition, comments, and security labels in the pg_dump
output and the file copies performed by pg_upgrade, all of the data and
metadata associated with large objects is able to be successfully pulled
forward across a pg_upgrade.
In 9.6 and master, we can simply adjust the dump bitmask to indicate
which components we don't want. In 9.5 and earlier, we have to put
explciit checks in in dumpBlob() and dumpBlobs() to not include the ACL
or the data when in binary-upgrade mode.
Adjustments made to the privileges regression test to allow another test
(large_object.sql) to be added which explicitly leaves a large object
with a comment in place to provide coverage of that case with
pg_upgrade.
Tom Lane [Mon, 6 Mar 2017 21:50:47 +0000 (16:50 -0500)]
Avoid dangling pointer to relation name in RLS code path in DoCopy().
With RLS active, "COPY tab TO ..." failed under -DRELCACHE_FORCE_RELEASE,
and would sometimes fail without that, because it used the relation name
directly from the relcache as part of the parsetree it's building. That
becomes a potentially-dangling pointer as soon as the relcache entry is
closed, a bit further down. Typical symptom if the relcache entry chanced
to get cleared would be "relation does not exist" error with a garbage
relation name, or possibly a core dump; but if you were really truly
unlucky, the COPY might copy from the wrong table.
Per report from Andrew Dunstan that regression tests fail with
-DRELCACHE_FORCE_RELEASE. The core tests now pass for me (but have
not tried "make check-world" yet).
Peter Eisentraut [Wed, 28 Dec 2016 17:00:00 +0000 (12:00 -0500)]
Allow dropping multiple functions at once
The generic drop support already supported dropping multiple objects of
the same kind at once. But the previous representation
of function signatures across two grammar symbols and structure members
made this cumbersome to do for functions, so it was not supported. Now
that function signatures are represented by a single structure, it's
trivial to add this support. Same for aggregates and operators.
Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Peter Eisentraut [Wed, 28 Dec 2016 17:00:00 +0000 (12:00 -0500)]
Replace LookupFuncNameTypeNames() with LookupFuncWithArgs()
The old function took function name and function argument list as
separate arguments. Now that all function signatures are passed around
as ObjectWithArgs structs, this is no longer necessary and can be
replaced by a function that takes ObjectWithArgs directly. Similarly
for aggregates and operators.
Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>