granicus.if.org Git - postgresql/log

Fix latent(?) race condition in LockReleaseAll.

We have for a long time checked the head pointer of each of the backend's
proclock lists and skipped acquiring the corresponding locktable partition
lock if the head pointer was NULL.  This was safe enough in the days when
proclock lists were changed only by the owning backend, but it is pretty
questionable now that the fast-path patch added cases where backends add
entries to other backends' proclock lists.  However, we don't really wish
to revert to locking each partition lock every time, because in simple
transactions that would add a lot of useless lock/unlock cycles on
already-heavily-contended LWLocks.  Fortunately, the only way that another
backend could be modifying our proclock list at this point would be if it
was promoting a formerly fast-path lock of ours; and any such lock must be
one that we'd decided not to delete in the previous loop over the locallock
table.  So it's okay if we miss seeing it in this loop; we'd just decide
not to delete it again.  However, once we've detected a non-empty list,
we'd better re-fetch the list head pointer after acquiring the partition
lock.  This guards against possibly fetching a corrupt-but-non-null pointer
if pointer fetch/store isn't atomic.  It's not clear if any practical
architectures are like that, but we've never assumed that before and don't
wish to start here.  In any case, the situation certainly deserves a code
comment.

While at it, refactor the partition traversal loop to use a for() construct
instead of a while() loop with goto's.

Back-patch, just in case the risk is real and not hypothetical.

Unbreak buildfarm

I removed an intermediate commit before pushing and forgot to test the
resulting tree :-(

Use a more granular approach to follow update chains

Instead of simply checking the KEYS_UPDATED bit, we need to check
whether each lock held on the future version of the tuple conflicts with
the lock we're trying to acquire.

Per bug report #8434 by Tomonari Katsumata

Compare Xmin to previous Xmax when locking an update chain

Not doing so causes us to traverse an update chain that has been broken
by concurrent page pruning. All other code that traverses update chains
uses this check as one of the cases in which to stop iterating, so
replicate it here too. Failure to do so leads to erroneous CLOG,
subtrans or multixact lookups.

Per discussion following the bug report by J Smith in
CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com
as diagnosed by Andres Freund.

Don't try to set InvalidXid as page pruning hint

If a transaction updates/deletes a tuple just before aborting, and a
concurrent transaction tries to prune the page concurrently, the pruner
may see HeapTupleSatisfiesVacuum return HEAPTUPLE_DELETE_IN_PROGRESS,
but a later call to HeapTupleGetUpdateXid() return InvalidXid. This
would cause an assertion failure in development builds, but would be
otherwise Mostly Harmless.

Fix by checking whether the updater Xid is valid before trying to apply
it as page prune point.

Reported by Andres in 20131124000203.GA4403@alap2.anarazel.de

Cope with heap_fetch failure while locking an update chain

The reason for the fetch failure is that the tuple was removed because
it was dead; so the failure is innocuous and can be ignored. Moreover,
there's no need for further work and we can return success to the caller
immediately. EvalPlanQualFetch is doing something very similar to this
already.

Report and test case from Andres Freund in
20131124000203.GA4403@alap2.anarazel.de

doc: Set chunk.first.sections in XSLT, for consistency with DSSSL output

pg_buffercache docs: adjust order of fields

Adjust order of fields to match view order.

Jaime Casanova

doc: Put data types in alphabetical order

From: Andreas Karlsson <andreas@proxel.se>

Fix stale-pointer problem in fast-path locking logic.

When acquiring a lock in fast-path mode, we must reset the locallock
object's lock and proclock fields to NULL.  They are not necessarily that
way to start with, because the locallock could be left over from a failed
lock acquisition attempt earlier in the transaction.  Failure to do this
led to all sorts of interesting misbehaviors when LockRelease tried to
clean up no-longer-related lock and proclock objects in shared memory.
Per report from Dan Wood.

In passing, modify LockRelease to elog not just Assert if it doesn't find
lock and proclock objects for a formerly fast-path lock, matching the code
in FastPathGetRelationLockEntry and LockRefindAndRelease.  This isn't a
bug but it will help in diagnosing any future bugs in this area.

Also, modify FastPathTransferRelationLocks and FastPathGetRelationLockEntry
to break out of their loops over the fastpath array once they've found the
sole matching entry.  This was inconsistently done in some search loops
and not others.

Improve assorted related comments, too.

Back-patch to 9.2 where the fast-path mechanism was introduced.

Minor correction of READ COMMITTED isolation level docs.

Per report from AK

Minor corrections in lmgr/README.

Correct an obsolete statement that no backend touches another backend's
PROCLOCK lists. This was probably wrong even when written (the deadlock
checker looks at everybody's lists), and it's certainly quite wrong now
that fast-path locking can require creation of lock and proclock objects
on behalf of another backend. Also improve some statements in the hot
standby explanation, and do one or two other trivial bits of wordsmithing/
reformatting.

Get rid of the post-recovery cleanup step of GIN page splits.

Replace it with an approach similar to what GiST uses: when a page is split,
the left sibling is marked with a flag indicating that the parent hasn't been
updated yet. When the parent is updated, the flag is cleared. If an insertion
steps on a page with the flag set, it will finish split before proceeding
with the insertion.

The post-recovery cleanup mechanism was never totally reliable, as insertion
to the parent could fail e.g because of running out of memory or disk space,
leaving the tree in an inconsistent state.

This also divides the responsibility of WAL-logging more clearly between
the generic ginbtree.c code, and the parts specific to entry and posting
trees. There is now a common WAL record format for insertions and deletions,
which is written by ginbtree.c, followed by tree-specific payload, which is
returned by the placetopage- and split- callbacks.

More GIN refactoring.

Separate the insertion payload from the more static portions of GinBtree.
GinBtree now only contains information related to searching the tree, and
the information of what to insert is passed separately.

Add root block number to GinBtree, instead of passing it around all the
functions as argument.

Split off ginFinishSplit() from ginInsertValue(). ginFinishSplit is
responsible for finding the parent and inserting the downlink to it.

Fix plpython3 expected output.

I neglected this in the previous commit that updated the plpython2 output,
which I forgot to "git add" earlier.

As pointed out by Rodolfo Campero and Marko Kreen.

Don't update relfrozenxid if any pages were skipped.

Vacuum recognizes that it can update relfrozenxid by checking whether it has
processed all pages of a relation. Unfortunately it performed that check
after truncating the dead pages at the end of the relation, and used the new
number of pages to decide whether all pages have been scanned. If the new
number of pages happened to be smaller or equal to the number of pages
scanned, it incorrectly decided that all pages were scanned.

This can lead to relfrozenxid being updated, even though some pages were
skipped that still contain old XIDs. That can lead to data loss due to xid
wraparounds with some rows suddenly missing. This likely has escaped notice
so far because it takes a large number (~2^31) of xids being used to see the
effect, while a full-table vacuum before that would fix the issue.

The incorrect logic was introduced by commit
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. Backpatch this fix down to 8.4,
like that commit.

Andres Freund, with some modifications by me.

Documentation fix for ecpg.

The latest fixes removed a limitation that was still in the docs, so Zoltan updated the docs, too.

ECPG: Fix searching for quoted cursor names case-sensitively.

Patch by Böszörményi Zoltán <zb@cybertec.at>

Add --xlogdir option to pg_basebackup, for specifying the pg_xlog directory.

Haribabu kommi, slightly modified by me.

Fix typo in release note.

Backpatch to 9.1.

Josh Kupershmidt

Implement information_schema.parameters.parameter_default column

Reviewed-by: Ali Dar <ali.munir.dar@gmail.com>
Reviewed-by: Amit Khandekar <amit.khandekar@enterprisedb.com>
Reviewed-by: Rodolfo Campero <rodolfo.campero@anachronics.com>

doc: Add id to index in XSLT build

That way, the HTML file name of the index will be the same as currently
for the DSSSL build.

Oops, forgot to "git add" last minute changes to regression test.

ECPG: Fix offset to NULL/size indicator array.

Patch by Boszormenyi Zoltan <zb@cybertec.at>

ECPG: Simplify free_variable()

Patch by Boszormenyi Zoltan <zb@cybertec.at>

ECPG: Add EXEC SQL CLOSE C to the tests.

Patch by Boszormenyi Zoltan <zb@cybertec.at>

ECPG: Free the malloc()'ed variables in the test so it comes out clean on
Valgrind runs.

Patch by Boszormenyi Zoltan <zb@cybertec.at>

ECPG: Make the preprocessor emit ';' if the variable type for a list of
variables is varchar. This fixes this test case:

int main(void)
{
    exec sql begin declare section;
    varchar a[50], b[50];
    exec sql end declare section;

    return 0;
}

Since varchars are internally turned into custom structs and
the type name is emitted for these variable declarations,
the preprocessed code previously had:

struct varchar_1  { ... }  a _,_  struct varchar_2  { ... }  b ;

The comma in the generated C file was a syntax error.

There are no regression test changes since it's not exercised.

Patch by Boszormenyi Zoltan <zb@cybertec.at>

Handle domains over arrays like plain arrays in PL/python.

Domains over arrays are now converted to/from python lists when passed as
arguments or return values. Like regular arrays.

This has some potential to break applications that rely on the old behavior
that they are passed as strings, but in practice there probably aren't many
such applications out there.

Rodolfo Campero

Add missing entry for session_preload_libraries in sample config.

The omission was apparently an oversight in the original patch.

Change SET LOCAL/CONSTRAINTS/TRANSACTION and ABORT behavior

Change SET LOCAL/CONSTRAINTS/TRANSACTION behavior outside of a
transaction block from error (post-9.3) to warning. (Was nothing in <=
9.3.) Also change ABORT outside of a transaction block from notice to
warning.

More improvement to comment parsing in ecpg.

ECPG is not supposed to allow and output nested comments in C. These comments
are only allowed in the SQL parts and must not be written into the C file.
Also the different handling of different comments is documented.

Fix ecpg parsing of sizeof().

The last fix used the wrong non-terminal to define valid types.

Lessen library-loading log level.

Previously, messages were emitted at the LOG level every time a
backend preloaded a library. That was acceptable (though unnecessary)
for shared_preload_libraries; but it was excessive for
local_preload_libraries and session_preload_libraries. Reduce to
DEBUG1.

Also, there was logic in the EXEC_BACKEND case to avoid repeated
messages for shared_preload_libraries by demoting them to
DEBUG2. DEBUG1 seems more appropriate there, as well, so eliminate
that special case.

Peter Geoghegan.

Fix new and latent bugs with errno handling in secure_read/secure_write.

These functions must be careful that they return the intended value of
errno to their callers.  There were several scenarios where this might
not happen:

1. The recent SSL renegotiation patch added a hunk of code that would
execute after setting errno.  In the first place, it's doubtful that we
should consider renegotiation to be successfully completed after a failure,
and in the second, there's no real guarantee that the called OpenSSL
routines wouldn't clobber errno.  Fix by not executing that hunk except
during success exit.

2. errno was left in an unknown state in case of an unrecognized return
code from SSL_get_error().  While this is a "can't happen" case, it seems
like a good idea to be sure we know what would happen, so reset errno to
ECONNRESET in such cases.  (The corresponding code in libpq's fe-secure.c
already did this.)

3. There was an (undocumented) assumption that client_read_ended() wouldn't
change errno.  While true in the current state of the code, this seems less
than future-proof.  Add explicit saving/restoring of errno to make sure
that changes in the called functions won't break things.

I see no need to back-patch, since #1 is new code and the other two issues
are mostly hypothetical.

Per discussion with Amit Kapila.

Allow C array definitions to use sizeof().

When parsing C variable definitions ecpg should allow sizeof() operators as array dimensions.

Distinguish between C and SQL mode for C-style comments.

SQL standard asks for allowing nested comments, while C does not. Therefore the
two comments, while mostly similar, have to be parsed seperately.

Defend against bad trigger definitions in contrib/lo's lo_manage() trigger.

This function formerly crashed if called as a statement-level trigger,
or if a column-name argument wasn't given.

In passing, add the trigger name to all error messages from the function.
(None of them are expected cases, so this shouldn't pose any compatibility
risk.)

Marc Cousin, reviewed by Sawada Masahiko

PL/Tcl: Add event trigger support

From: Dimitri Fontaine <dimitri@2ndQuadrant.fr>

Fix array slicing of int2vector and oidvector values.

The previous coding labeled expressions such as pg_index.indkey[1:3] as
being of int2vector type; which is not right because the subscript bounds
of such a result don't, in general, satisfy the restrictions of int2vector.
To fix, implicitly promote the result of slicing int2vector to int2[],
or oidvector to oid[].  This is similar to what we've done with domains
over arrays, which is a good analogy because these types are very much
like restricted domains of the corresponding regular-array types.

A side-effect is that we now also forbid array-element updates on such
columns, eg while "update pg_index set indkey[4] = 42" would have worked
before if you were superuser (and corrupted your catalogs irretrievably,
no doubt) it's now disallowed.  This seems like a good thing since, again,
some choices of subscripting would've led to results not satisfying the
restrictions of int2vector.  The case of an array-slice update was
rejected before, though with a different error message than you get now.
We could make these cases work in future if we added a cast from int2[]
to int2vector (with a cast function checking the subscript restrictions)
but it seems unlikely that there's any value in that.

Per report from Ronan Dunklau.  Back-patch to all supported branches
because of the crash risks involved.

Ensure _dosmaperr() actually sets errno correctly.

If logging is enabled, either ereport() or fprintf() might stomp on errno
internally, causing this function to return the wrong result. That might
only end in a misleading error report, but in any code that's examining
errno to decide what to do next, the consequences could be far graver.

This has been broken since the very first version of this file in 2006
... it's a bit astonishing that we didn't identify this long ago.

Reported by Amit Kapila, though this isn't his proposed fix.

Fix thinko in SPI_execute_plan() calls

Two call sites were apparently thinking that the last argument of
SPI_execute_plan() is the number of query parameters, but it is actually
the row limit. Change the calls to 0, since we don't care about the
limit there. The previous code didn't break anything, but it was still
wrong.

Avoid potential buffer overflow crash

A pointer to a C string was treated as a pointer to a "name" datum and
passed to SPI_execute_plan(). This pointer would then end up being
passed through datumCopy(), which would try to copy the entire 64 bytes
of name data, thus running past the end of the C string. Fix by
converting the string to a proper name structure.

Found by LLVM AddressSanitizer.

Flatten join alias Vars before pulling up targetlist items from a subquery.

pullup_replace_vars()'s decisions about whether a pulled-up replacement
expression needs to be wrapped in a PlaceHolderVar depend on the assumption
that what looks like a Var behaves like a Var.  However, if the Var is a
join alias reference, later flattening of join aliases might replace the
Var with something that's not a Var at all, and should have been wrapped.

To fix, do a forcible pass of flatten_join_alias_vars() on the subquery
targetlist before we start to copy items out of it.  We'll re-run that
processing on the pulled-up expressions later, but that's harmless.

Per report from Ken Tanzer; the added regression test case is based on his
example.  This bug has been there since the PlaceHolderVar mechanism was
invented, but has escaped detection because the circumstances that trigger
it are fairly narrow.  You need a flattenable query underneath an outer
join, which contains another flattenable query inside a join of its own,
with a dangerous expression (a constant or something else non-strict)
in that one's targetlist.

Having seen this, I'm wondering if it wouldn't be prudent to do all
alias-variable flattening earlier, perhaps even in the rewriter.
But that would probably not be a back-patchable change.

Fix quoting in help messages in uuid-ossp extension scripts.

The command we're telling people to type needs to include double-quoting
around the unfortunately-chosen extension name. Twiddle the textual
quoting so that it looks somewhat sane. Per gripe from roadrunner6.

Fix Hot-Standby initialization of clog and subtrans.

These bugs can cause data loss on standbys started with hot_standby=on at
the moment they start to accept read only queries, by marking committed
transactions as uncommited. The likelihood of such corruptions is small
unless the primary has a high transaction rate.

5a031a5556ff83b8a9646892715d7fef415b83c3 fixed bugs in HS's startup logic
by maintaining less state until at least STANDBY_SNAPSHOT_PENDING state
was reached, missing the fact that both clog and subtrans are written to
before that. This only failed to fail in common cases because the usage
of ExtendCLOG in procarray.c was superflous since clog extensions are
actually WAL logged.

f44eedc3f0f347a856eea8590730769125964597/I then tried to fix the missing
extensions of pg_subtrans due to the former commit's changes - which are
not WAL logged - by performing the extensions when switching to a state
> STANDBY_INITIALIZED and not performing xid assignments before that -
again missing the fact that ExtendCLOG is unneccessary - but screwed up
twice: Once because latestObservedXid wasn't updated anymore in that
state due to the earlier commit and once by having an off-by-one error in
the loop performing extensions. This means that whenever a
CLOG_XACTS_PER_PAGE (32768 with default settings) boundary was crossed
between the start of the checkpoint recovery started from and the first
xl_running_xact record old transactions commit bits in pg_clog could be
overwritten if they started and committed in that window.

Fix this mess by not performing ExtendCLOG() in HS at all anymore since
it's unneeded and evidently dangerous and by performing subtrans
extensions even before reaching STANDBY_SNAPSHOT_PENDING.

Analysis and patch by Andres Freund. Reported by Christophe Pettus.
Backpatch down to 9.0, like the previous commit that caused this.

Avoid acquiring spinlock when checking if recovery has finished, for speed.

RecoveryIsInProgress() can be called very frequently. During normal
operation, it just checks a backend-local variable and returns quickly,
but during hot standby, it checks a spinlock-protected shared variable.
Those spinlock acquisitions can become a point of contention on a busy
hot standby system.

Replace the spinlock acquisition with a memory barrier.

Per discussion with Andres Freund, Ants Aasma and Merlin Moncure.

Tweak streamutil.c further to avoid scan-build warning

The previous change added a new scan-build warning about need_password
assigned but not read.

Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.

This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry.  The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others.  This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.

This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.

Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).

The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does.  There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST().  After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.

Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me

Fix pg_isready to handle -d option properly.

Previously, -d option for pg_isready was broken. When the name of the
database was specified by -d option, pg_isready failed with an error.
When the conninfo specified by -d option contained the setting of the
host name but not Numeric IP address (i.e., hostaddr), pg_isready
displayed wrong connection message. -d option could not handle a valid
URI prefix at all. This commit fixes these bugs of pg_isready.

Backpatch to 9.3, where pg_isready was introduced.

Per report from Josh Berkus and Robert Haas.
Original patch by Fabrízio de Royes Mello, heavily modified by me.

More GIN refactoring.

Split off the portion of ginInsertValue that inserts the tuple to current
level into a separate function, ginPlaceToPage. ginInsertValue's charter
is now to recurse up the tree to insert the downlink, when a page split is
required.

This is in preparation for a patch to change the way incomplete splits are
handled, which will need to do these operations separately. And IMHO makes
the code more readable anyway.

Refactor the internal GIN B-tree interface for forming a downlink.

This creates a new gin-btree callback function for creating a downlink for
a page. Previously, ginxlog.c duplicated the logic used during normal
operation.

Further GIN refactoring.

Merge some functions that were always called together. Makes the code
little bit more readable.

ecpg: Split off mmfatal() from mmerror()

This allows decorating mmfatal() with noreturn compiler hints, leading
to better diagnostics.

docs: update page format to specify page checksum field

Backpatch to 9.3

Per report from Steffen Hildebrandt

pg_upgrade: avoid ALTER COLUMN TYPE on inherited columns

This only affects upgrades from 8.3 currently, and is harmless as the
child just generates an error in the script, but we should get it right
in case we ever need this for more complex uses.

Per report from Peter Eisentraut

Add tab completion for \pset in psql.

Pavel Stehule, reviewed by Ian Lawrence Barwick

pg_upgrade: Report full disk better

Previously, pg_upgrade would abort copy_file() on a short write without
setting errno, which the caller would report as an error with the
message "Success". We assume ENOSPC in that case, as we do elsewhere in
the code. Also set errno in some other error cases in copy_file() to
avoid bogus "Success" error messages.

This was broken in 6b711cf37c228749b6a8cef50e16e3c587d18dd4, so 9.2 and
before are OK.

unaccent: Revert patch 9299f6179838cef8aa1123f6fb76f0d3d6f2decc

The reverted patch to change functions from strict to immutable was
incorrect and needs additional research.

Spell SQL keywords in uppercase in pg_dump's query.

The server won't care, but let's be consistent.

David Rowley.

Replace appendPQExpBuffer(..., <constant>) with appendPQExpBufferStr

Arguably makes the code a bit more readable, and might give a small
performance gain.

David Rowley

Use cstring_to_text_with_len when length is known.

This avoids a potentially-expensive extra call to strlen().

David Rowley

Count locked pages that don't need vacuuming as scanned.

Previously, if VACUUM skipped vacuuming a page because it's pinned, it
didn't count that page as scanned. However, that meant that relfrozenxid
was not bumped up either, which prevented anti-wraparound vacuum from
doing its job.

Report by Миша Тюрин, analysis and patch by Sergey Burladyn and Jeff Janes.
Backpatch to 9.2, where the skip-locked-pages behavior was introduced.

Add make_date() and make_time() functions.

Pavel Stehule, reviewed by Jeevan Chalke and Atri Sharma

Improve performance of numeric sum(), avg(), stddev(), variance(), etc.

This patch improves performance of most built-in aggregates that formerly
used a NUMERIC or NUMERIC array as their transition type; this includes
not only aggregates on numeric inputs, but some aggregates on integer
inputs where overflow of an int8 value is a possibility.  The code now
uses a special-purpose data structure to avoid array construction and
deconstruction overhead, as well as packing and unpacking overhead for
numeric values.

These aggregates' transition type is now declared as INTERNAL, since
it doesn't correspond to any SQL data type.  To keep the planner from
thinking that that means a lot of storage will be used, we make use
of the just-added pg_aggregate.aggtransspace feature.  The space estimate
is set to 128 bytes, which is at least in the right ballpark.

Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra

Allow aggregates to provide estimates of their transition state data size.

Formerly the planner had a hard-wired rule of thumb for guessing the amount
of space consumed by an aggregate function's transition state data.  This
estimate is critical to deciding whether it's OK to use hash aggregation,
and in many situations the built-in estimate isn't very good.  This patch
adds a column to pg_aggregate wherein a per-aggregate estimate can be
provided, overriding the planner's default, and infrastructure for setting
the column via CREATE AGGREGATE.

It may be that additional smarts will be required in future, perhaps even
a per-aggregate estimation function.  But this is already a step forward.

This is extracted from a larger patch to improve the performance of numeric
and int8 aggregates.  I (tgl) thought it was worth reviewing and committing
this infrastructure separately.  In this commit, all built-in aggregates
are given aggtransspace = 0, so no behavior should change.

Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra

pg_upgrade: Fix some whitespace oddities

Remove pgbench's hardwired limit on line length in custom script files.

pgbench formerly failed on lines longer than BUFSIZ, unexpectedly
splitting them into multiple commands. Allow it to work with any
length of input line.

Sawada Masahiko

Fix incorrect loop counts in tidbitmap.c.

A couple of places that should have been iterating over WORDS_PER_CHUNK
words were iterating over WORDS_PER_PAGE words instead. This thinko
accidentally failed to fail, because (at least on common architectures
with default BLCKSZ) WORDS_PER_CHUNK is a bit less than WORDS_PER_PAGE,
and the extra words being looked at were always zero so nothing happened.
Still, it's a bug waiting to happen if anybody ever fools with the
parameters affecting TIDBitmap sizes, and it's a small waste of cycles
too. So back-patch to all active branches.

Etsuro Fujita

Speed up printing of INSERT statements in pg_dump.

In --inserts and especially --column-inserts mode, we can get a useful
speedup by generating the common prefix of all a table's INSERT commands
just once, and then printing the prebuilt string for each row. This avoids
multiple invocations of fmtId() and other minor fooling around.

David Rowley

Clean up password prompting logic in streamutil.c.

The previous coding was fairly unreadable and drew double-free warnings
from clang.  I believe the double free was actually not reachable, because
PQconnectionNeedsPassword is coded to not return true if a password was
provided, so that the loop can't iterate more than twice.  Nonetheless
it seems worth rewriting.  No back-patch since this is just cosmetic.

Compute correct em_nullable_relids in get_eclass_for_sort_expr().

Bug #8591 from Claudio Freire demonstrates that get_eclass_for_sort_expr
must be able to compute valid em_nullable_relids for any new equivalence
class members it creates.  I'd worried about this in the commit message
for db9f0e1d9a4a0842c814a464cdc9758c3f20b96c, but claimed that it wasn't a
problem because multi-member ECs should already exist when it runs.  That
is transparently wrong, though, because this function is also called by
initialize_mergeclause_eclasses, which runs during deconstruct_jointree.
The example given in the bug report (which the new regression test item
is based upon) fails because the COALESCE() expression is first seen by
initialize_mergeclause_eclasses rather than process_equivalence.

Fixing this requires passing the appropriate nullable_relids set to
get_eclass_for_sort_expr, and it requires new code to compute that set
for top-level expressions such as ORDER BY, GROUP BY, etc.  We store
the top-level nullable_relids in a new field in PlannerInfo to avoid
computing it many times.  In the back branches, I've added the new
field at the end of the struct to minimize ABI breakage for planner
plugins.  There doesn't seem to be a good alternative to changing
get_eclass_for_sort_expr's API signature, though.  There probably aren't
any third-party extensions calling that function directly; moreover,
if there are, they probably need to think about what to pass for
nullable_relids anyway.

Back-patch to 9.2, like the previous patch in this area.

Prevent leakage of cached plans and execution trees in plpgsql DO blocks.

plpgsql likes to cache query plans and simple-expression execution state
trees across calls.  This is a considerable win for multiple executions
of the same function.  However, it's useless for DO blocks, since by
definition those are executed only once and discarded.  Nonetheless,
we were allowing a DO block's expression execution trees to survive
until end of transaction, resulting in a significant intra-transaction
memory leak, as reported by Yeb Havinga.  Worse, if the DO block exited
with an error, the compiled form of the block's code was leaked till
end of session --- along with subsidiary plancache entries.

To fix, make DO blocks keep their expression execution trees in a private
EState that's deleted at exit from the block, and add a PG_TRY block
to plpgsql_inline_handler to make sure that memory cleanup happens
even on error exits.  Also add a regression test covering error handling
in a DO block, because my first try at this broke that.  (The test is
not meant to prove that we don't leak memory anymore, though it could
be used for that with a much larger loop count.)

Ideally we'd back-patch this into all versions supporting DO blocks;
but the patch needs to add a field to struct PLpgSQL_execstate, and that
would break ABI compatibility for third-party plugins such as the plpgsql
debugger.  Given the small number of complaints so far, fixing this in
HEAD only seems like an acceptable choice.

Minor comment corrections for sequence hashtable patch.

There were enough typos in the comments to annoy me ...

Fix buffer overrun in isolation test program.

Commit 061b88c732952c59741374806e1e41c1ec845d50 saved argv0 to a
global buffer without ensuring that it was zero terminated,
allowing references to it to overrun the buffer and access other
memory. This probably would not have presented any security risk,
but could have resulted in very confusing failures if the path to
the executable was very long.

Reported by David Rowley

doc: Restore proper alphabetical order.

Colin 't Hart

Fix bogus hash table creation.

Andres Freund

Use a hash table to store current sequence values.

This speeds up nextval() and currval(), when you touch a lot of different
sequences in the same backend.

David Rowley

Add a regression test case for \d on an index.

Previous commit shows the need for this. The coverage isn't really
thorough, but it's better than nothing.

Fix incorrect column name in psql \d code.

pg_index.indisreplident had at one time in its development been called
indisidentity. describe.c got missed when it was renamed.
Bug introduced in commit 07cacba983ef79be4a84fcd0e0ca3b5fcb85dd65.

Andres Freund

Fix whitespace

Clarify CREATE FUNCTION documentation about handling of typmods.

The previous text was a bit misleading, as well as unnecessarily vague
about what information would be discarded. Per gripe from Craig Skinner.

Fix isolation check for MSVC to handle recent changes.

Fix relfilenodemap.c's handling of cache invalidations.

The old code entered a new hash table entry first, then scanned
pg_class to determine what value to fill in, and then populated the
entry.  This fails to work properly if a cache invalidation happens
as a result of opening pg_class.  Repair.

Along the way, get rid of the idea of blowing away the entire hash
table as a method of processing invalidations.  Instead, just delete
all the entries one by one.  This is probably not quite as cheap but
it's simpler, and shouldn't happen often.

Andres Freund

docs: clarify MVCC introduction to allow for per-statement snapshots

Free ignorelist after each regression test schedule.

It's a trivial amount of RAM held until the end of the regression
test run; but it's probably worth fixing to silence future warnings
from code analyzers.

This was the only memory leak pointed out by clang's static code
analysis tool.

Fix bug in GIN posting tree root creation.

The root page is filled with as many items as fit, and the rest are inserted
using normal insertions. However, I fumbled the variable names, and the code
actually memcpy'd all the items on the page, overflowing the buffer. While
at it, rename the variable to make the distinction more clear.

Reported by Teodor Sigaev. This bug was introduced by my recent
refactorings, so no backpatching required.

Move variable closer to where it is used

This avoids an unused variable warning on Windows when building without
asserts

From: David Rowley <dgrowleyml@gmail.com>

gitattributes: Make syntax compatible with older Git versions

Avoid the use of **, which was only introduced in Git version 1.8.2.

Try again to make pg_isolation_regress work its build directory.

We can't search for the isolationtester binary until after we've set
up the environment, because otherwise when find_other_exec() tries
to invoke it with the -V option, it might fail for inability to
locate a working libpq. So postpone that step.

Andres Freund

doc: Fix typo.

Reported by Thom Brown.

Fix doc links in README file to work with new website layout

Per report from Colin 't Hart

Remove leftovers of IRIX port

This removes the remaining pieces of the IRIX port that was removed by
ea91a6be89575095f61ebf36d67c2df98be093db.

Fix failure with whole-row reference to a subquery.

Simple oversight in commit 1cb108efb0e60d87e4adec38e7636b6e8efbeb57 ---
recursively examining a subquery output column is only sane if the
original Var refers to a single output column. Found by Kevin Grittner.

Fix ruleutils pretty-printing to not generate trailing whitespace.

The pretty-printing logic in ruleutils.c operates by inserting a newline
and some indentation whitespace into strings that are already valid SQL.
This naturally results in leaving some trailing whitespace before the
newline in many cases; which can be annoying when processing the output
with other tools, as complained of by Joe Abbate. We can fix that in
a pretty localized fashion by deleting any trailing whitespace before
we append a pretty-printing newline. In addition, we have to modify the
code inserted by commit 2f582f76b1945929ff07116cd4639747ce9bb8a1 so that
we also delete trailing whitespace when transposing items from temporary
buffers into the main result string, when a temporary item starts with a
newline.

This results in rather voluminous changes to the regression test results,
but it's easily verified that they are only removal of trailing whitespace.

Back-patch to 9.3, because the aforementioned commit resulted in many
more cases of trailing whitespace than had occurred in earlier branches.

Re-allow duplicate aliases within aliased JOINs.

Although the SQL spec forbids duplicate table aliases, historically
we've allowed queries like
    SELECT ... FROM tab1 x CROSS JOIN (tab2 x CROSS JOIN tab3 y) z
on the grounds that the aliased join (z) hides the aliases within it,
therefore there is no conflict between the two RTEs named "x".  The
LATERAL patch broke this, on the misguided basis that "x" could be
ambiguous if tab3 were a LATERAL subquery.  To avoid breaking existing
queries, it's better to allow this situation and complain only if
tab3 actually does contain an ambiguous reference.  We need only remove
the check that was throwing an error, because the column lookup code
is already prepared to handle ambiguous references.  Per bug #8444.

Don't abort pg_basebackup when receiving empty WAL block

This is a similar fix as c6ec8793aa59d1842082e14b4b4aae7d4bd883fd
9.2. This should never happen in 9.3 and newer since the special case
cannot happen there, but this patch synchronizes up the code so there
is no confusion on why they're different. An empty block is as harmless
in 9.3 as it was in 9.2, and can safely be ignored.

Fix whitespace issues found by git diff --check, add gitattributes

Set per file type attributes in .gitattributes to fine-tune whitespace
checks. With the associated cleanups, the tree is now clean for git

Fix ECPG compiler warning.

Commit 9b4d52f2095be96ca238ce41f6963ec56376491f failed to notice
that pg_regress_ecpg needed updating.

This patch was independently submitted by both David Rowley
and Andres Freund.

Fix race condition in GIN posting tree page deletion.

If a page is deleted, and reused for something else, just as a search is
following a rightlink to it from its left sibling, the search would continue
scanning whatever the new contents of the page are. That could lead to
incorrect query results, or even something more curious if the page is
reused for a different kind of a page.

To fix, modify the search algorithm to lock the next page before releasing
the previous one, and refrain from deleting pages from the leftmost branch
of the tree.

Add a new Concurrency section to the README, explaining why this works.
There is a lot more one could say about concurrency in GIN, but that's for
another patch.

Backpatch to all supported versions.