<sect1 id="functions-admin">
<title>System Administration Functions</title>
+ <para>
+ The functions described in this section are used to control and
+ monitor a <productname>PostgreSQL</> installation.
+ </para>
+
+ <sect2 id="functions-admin-set">
+ <title>Configuration Settings Functions</title>
+
<para>
<xref linkend="functions-admin-set-table"> shows the functions
available to query and alter run-time configuration parameters.
</programlisting>
</para>
+ </sect2>
+
+ <sect2 id="functions-admin-signal">
+ <title>Server Signalling Functions</title>
+
<indexterm>
<primary>pg_cancel_backend</primary>
</indexterm>
subprocess.
</para>
+ </sect2>
+
+ <sect2 id="functions-admin-backup">
+ <title>Backup Control Functions</title>
+
<indexterm>
<primary>backup</primary>
</indexterm>
<xref linkend="continuous-archiving">.
</para>
+ </sect2>
+
+ <sect2 id="functions-recovery-control">
+ <title>Recovery Control Functions</title>
+
<indexterm>
<primary>pg_is_in_recovery</primary>
</indexterm>
The functions shown in <xref
linkend="functions-recovery-info-table"> provide information
about the current status of the standby.
- These functions may be executed during both recovery and in normal running.
+ These functions may be executed both during recovery and in normal running.
</para>
<table id="functions-recovery-info-table">
the pause, the rate of WAL generation and available disk space.
</para>
+ </sect2>
+
+ <sect2 id="functions-snapshot-synchronization">
+ <title>Snapshot Synchronization Functions</title>
+
+ <indexterm>
+ <primary>pg_export_snapshot</primary>
+ </indexterm>
+
+ <para>
+ <productname>PostgreSQL</> allows database sessions to synchronize their
+ snapshots. A <firstterm>snapshot</> determines which data is visible to the
+ transaction that is using the snapshot. Synchronized snapshots are
+ necessary when two or more sessions need to see identical content in the
+ database. If two sessions just start their transactions independently,
+ there is always a possibility that some third transaction commits
+ between the executions of the two <command>START TRANSACTION</> commands,
+ so that one session sees the effects of that transaction and the other
+ does not.
+ </para>
+
+ <para>
+ To solve this problem, <productname>PostgreSQL</> allows a transaction to
+ <firstterm>export</> the snapshot it is using. As long as the exporting
+ transaction remains open, other transactions can <firstterm>import</> its
+ snapshot, and thereby be guaranteed that they see exactly the same view
+ of the database that the first transaction sees. But note that any
+ database changes made by any one of these transactions remain invisible
+ to the other transactions, as is usual for changes made by uncommitted
+ transactions. So the transactions are synchronized with respect to
+ pre-existing data, but act normally for changes they make themselves.
+ </para>
+
+ <para>
+ Snapshots are exported with the <function>pg_export_snapshot</> function,
+ shown in <xref linkend="functions-snapshot-synchronization-table">, and
+ imported with the <xref linkend="sql-set-transaction"> command.
+ </para>
+
+ <table id="functions-snapshot-synchronization-table">
+ <title>Snapshot Synchronization Functions</title>
+ <tgroup cols="3">
+ <thead>
+ <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry>
+ <literal><function>pg_export_snapshot()</function></literal>
+ </entry>
+ <entry><type>text</type></entry>
+ <entry>Save the current snapshot and return its identifier</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ The function <function>pg_export_snapshot</> saves the current snapshot
+ and returns a <type>text</> string identifying the snapshot. This string
+ must be passed (outside the database) to clients that want to import the
+ snapshot. The snapshot is available for import only until the end of the
+ transaction that exported it. A transaction can export more than one
+ snapshot, if needed. Note that doing so is only useful in <literal>READ
+ COMMITTED</> transactions, since in <literal>REPEATABLE READ</> and
+ higher isolation levels, transactions use the same snapshot throughout
+ their lifetime. Once a transaction has exported any snapshots, it cannot
+ be prepared with <xref linkend="sql-prepare-transaction">.
+ </para>
+
+ <para>
+ See <xref linkend="sql-set-transaction"> for details of how to use an
+ exported snapshot.
+ </para>
+ </sect2>
+
+ <sect2 id="functions-admin-dbobject">
+ <title>Database Object Management Functions</title>
+
<para>
The functions shown in <xref linkend="functions-admin-dbsize"> calculate
the disk space usage of database objects.
the relation.
</para>
+ </sect2>
+
+ <sect2 id="functions-admin-genfile">
+ <title>Generic File Access Functions</title>
+
<para>
The functions shown in <xref
- linkend="functions-admin-genfile"> provide native access to
+ linkend="functions-admin-genfile-table"> provide native access to
files on the machine hosting the server. Only files within the
database cluster directory and the <varname>log_directory</> can be
accessed. Use a relative path for files in the cluster directory,
for log files. Use of these functions is restricted to superusers.
</para>
- <table id="functions-admin-genfile">
+ <table id="functions-admin-genfile-table">
<title>Generic File Access Functions</title>
<tgroup cols="3">
<thead>
</programlisting>
</para>
+ </sect2>
+
+ <sect2 id="functions-advisory-locks">
+ <title>Advisory Lock Functions</title>
+
<para>
- The functions shown in <xref linkend="functions-advisory-locks"> manage
- advisory locks. For details about proper use of these functions, see
- <xref linkend="advisory-locks">.
+ The functions shown in <xref linkend="functions-advisory-locks-table">
+ manage advisory locks. For details about proper use of these functions,
+ see <xref linkend="advisory-locks">.
</para>
- <table id="functions-advisory-locks">
+ <table id="functions-advisory-locks-table">
<title>Advisory Lock Functions</title>
<tgroup cols="3">
<thead>
at session end, even if the client disconnects ungracefully.)
</para>
+ </sect2>
+
</sect1>
<sect1 id="functions-trigger">
<refsynopsisdiv>
<synopsis>
SET TRANSACTION <replaceable class="parameter">transaction_mode</replaceable> [, ...]
+SET TRANSACTION SNAPSHOT <replaceable class="parameter">snapshot_id</replaceable>
SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transaction_mode</replaceable> [, ...]
<phrase>where <replaceable class="parameter">transaction_mode</replaceable> is one of:</phrase>
The available transaction characteristics are the transaction
isolation level, the transaction access mode (read/write or
read-only), and the deferrable mode.
+ In addition, a snapshot can be selected, though only for the current
+ transaction, not as a session default.
</para>
<para>
serializable transactions would create a situation which could not
have occurred for any serial (one-at-a-time) execution of those
transactions, one of them will be rolled back with a
- <literal>serialization_failure</literal> <literal>SQLSTATE</literal>.
+ <literal>serialization_failure</literal> error.
</para>
</listitem>
</varlistentry>
<para>
The <literal>DEFERRABLE</literal> transaction property has no effect
unless the transaction is also <literal>SERIALIZABLE</literal> and
- <literal>READ ONLY</literal>. When all of these properties are set on a
+ <literal>READ ONLY</literal>. When all three of these properties are
+ selected for a
transaction, the transaction may block when first acquiring its snapshot,
after which it is able to run without the normal overhead of a
<literal>SERIALIZABLE</literal> transaction and without any risk of
contributing to or being canceled by a serialization failure. This mode
is well suited for long-running reports or backups.
</para>
+
+ <para>
+ The <literal>SET TRANSACTION SNAPSHOT</literal> command allows a new
+ transaction to run with the same <firstterm>snapshot</> as an existing
+ transaction. The pre-existing transaction must have exported its snapshot
+ with the <literal>pg_export_snapshot</literal> function (see <xref
+ linkend="functions-snapshot-synchronization">). That function returns a
+ snapshot identifier, which must be given to <literal>SET TRANSACTION
+ SNAPSHOT</literal> to specify which snapshot is to be imported. The
+ identifier must be written as a string literal in this command, for example
+ <literal>'000003A1-1'</>.
+ <literal>SET TRANSACTION SNAPSHOT</literal> can only be executed at the
+ start of a transaction, before the first query or
+ data-modification statement (<command>SELECT</command>,
+ <command>INSERT</command>, <command>DELETE</command>,
+ <command>UPDATE</command>, <command>FETCH</command>, or
+ <command>COPY</command>) of the transaction. Furthermore, the transaction
+ must already be set to <literal>SERIALIZABLE</literal> or
+ <literal>REPEATABLE READ</literal> isolation level (otherwise, the snapshot
+ would be discarded immediately, since <literal>READ COMMITTED</> mode takes
+ a new snapshot for each command). If the importing transaction uses
+ <literal>SERIALIZABLE</literal> isolation level, then the transaction that
+ exported the snapshot must also use that isolation level. Also, a
+ non-read-only serializable transaction cannot import a snapshot from a
+ read-only transaction.
+ </para>
+
</refsect1>
<refsect1>
by instead specifying the desired <replaceable
class="parameter">transaction_modes</replaceable> in
<command>BEGIN</command> or <command>START TRANSACTION</command>.
+ But that option is not available for <command>SET TRANSACTION
+ SNAPSHOT</command>.
</para>
<para>
</para>
</refsect1>
+ <refsect1>
+ <title>Examples</title>
+
+ <para>
+ To begin a new transaction with the same snapshot as an already
+ existing transaction, first export the snapshot from the existing
+ transaction. That will return the snapshot identifier, for example:
+
+<programlisting>
+BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+SELECT pg_export_snapshot();
+ pg_export_snapshot
+--------------------
+ 000003A1-1
+(1 row)
+</programlisting>
+
+ Then give the snapshot identifier in a <command>SET TRANSACTION
+ SNAPSHOT</command> command at the beginning of the newly opened
+ transaction:
+
+<programlisting>
+BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+SET TRANSACTION SNAPSHOT '000003A1-1';
+</programlisting>
+ </para>
+ </refsect1>
+
<refsect1 id="R1-SQL-SET-TRANSACTION-3">
<title>Compatibility</title>
<para>
- Both commands are defined in the <acronym>SQL</acronym> standard.
+ These commands are defined in the <acronym>SQL</acronym> standard,
+ except for the <literal>DEFERRABLE</literal> transaction mode
+ and the <command>SET TRANSACTION SNAPSHOT</> form, which are
+ <productname>PostgreSQL</productname> extensions.
+ </para>
+
+ <para>
<literal>SERIALIZABLE</literal> is the default transaction
isolation level in the standard. In
<productname>PostgreSQL</productname> the default is ordinarily
not implemented in the <productname>PostgreSQL</productname> server.
</para>
- <para>
- The <literal>DEFERRABLE</literal>
- <replaceable class="parameter">transaction_mode</replaceable>
- is a <productname>PostgreSQL</productname> language extension.
- </para>
-
<para>
The SQL standard requires commas between successive <replaceable
class="parameter">transaction_modes</replaceable>, but for historical
* handle this reference as an internally-tracked registration, so that this
* module is entirely lower-level than ResourceOwners.
*
+ * Likewise, any snapshots that have been exported by pg_export_snapshot
+ * have regd_count = 1 and are counted in RegisteredSnapshots, but are not
+ * tracked by any resource owner.
+ *
* These arrangements let us reset MyProc->xmin when there are no snapshots
* referenced by this transaction. (One possible improvement would be to be
* able to advance Xmin when the snapshot with the earliest Xmin is no longer
* referenced. That's a bit harder though, it requires more locking, and
- * anyway it should be rather uncommon to keep snapshots referenced for too
- * long.)
+ * anyway it should be rather uncommon to keep temporary snapshots referenced
+ * for too long.)
*
*
* Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
*/
#include "postgres.h"
+#include <sys/stat.h>
+#include <unistd.h>
+
#include "access/transam.h"
#include "access/xact.h"
+#include "miscadmin.h"
#include "storage/predicate.h"
#include "storage/proc.h"
#include "storage/procarray.h"
-#include "utils/memutils.h"
+#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
#include "utils/tqual.h"
*/
static Snapshot FirstXactSnapshot = NULL;
+/* Define pathname of exported-snapshot files */
+#define SNAPSHOT_EXPORT_DIR "pg_snapshots"
+#define XactExportFilePath(path, xid, num, suffix) \
+ snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%d%s", \
+ xid, num, suffix)
+
+/* Current xact's exported snapshots (a list of Snapshot structs) */
+static List *exportedSnapshots = NIL;
+
static Snapshot CopySnapshot(Snapshot snapshot);
static void FreeSnapshot(Snapshot snapshot);
* In transaction-snapshot mode, the first snapshot must live until
* end of xact regardless of what the caller does with it, so we must
* make a copy of it rather than returning CurrentSnapshotData
- * directly.
+ * directly. Furthermore, if we're running in serializable mode,
+ * predicate.c needs to wrap the snapshot fetch in its own processing.
*/
if (IsolationUsesXactSnapshot())
{
SecondarySnapshot->curcid = curcid;
}
+/*
+ * SetTransactionSnapshot
+ * Set the transaction's snapshot from an imported MVCC snapshot.
+ *
+ * Note that this is very closely tied to GetTransactionSnapshot --- it
+ * must take care of all the same considerations as the first-snapshot case
+ * in GetTransactionSnapshot.
+ */
+static void
+SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid)
+{
+ /* Caller should have checked this already */
+ Assert(!FirstSnapshotSet);
+
+ Assert(RegisteredSnapshots == 0);
+ Assert(FirstXactSnapshot == NULL);
+
+ /*
+ * Even though we are not going to use the snapshot it computes, we must
+ * call GetSnapshotData, for two reasons: (1) to be sure that
+ * CurrentSnapshotData's XID arrays have been allocated, and (2) to update
+ * RecentXmin and RecentGlobalXmin. (We could alternatively include those
+ * two variables in exported snapshot files, but it seems better to have
+ * snapshot importers compute reasonably up-to-date values for them.)
+ */
+ CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
+
+ /*
+ * Now copy appropriate fields from the source snapshot.
+ */
+ CurrentSnapshot->xmin = sourcesnap->xmin;
+ CurrentSnapshot->xmax = sourcesnap->xmax;
+ CurrentSnapshot->xcnt = sourcesnap->xcnt;
+ Assert(sourcesnap->xcnt <= GetMaxSnapshotXidCount());
+ memcpy(CurrentSnapshot->xip, sourcesnap->xip,
+ sourcesnap->xcnt * sizeof(TransactionId));
+ CurrentSnapshot->subxcnt = sourcesnap->subxcnt;
+ Assert(sourcesnap->subxcnt <= GetMaxSnapshotSubxidCount());
+ memcpy(CurrentSnapshot->subxip, sourcesnap->subxip,
+ sourcesnap->subxcnt * sizeof(TransactionId));
+ CurrentSnapshot->suboverflowed = sourcesnap->suboverflowed;
+ CurrentSnapshot->takenDuringRecovery = sourcesnap->takenDuringRecovery;
+ /* NB: curcid should NOT be copied, it's a local matter */
+
+ /*
+ * Now we have to fix what GetSnapshotData did with MyProc->xmin and
+ * TransactionXmin. There is a race condition: to make sure we are not
+ * causing the global xmin to go backwards, we have to test that the
+ * source transaction is still running, and that has to be done atomically.
+ * So let procarray.c do it.
+ *
+ * Note: in serializable mode, predicate.c will do this a second time.
+ * It doesn't seem worth contorting the logic here to avoid two calls,
+ * especially since it's not clear that predicate.c *must* do this.
+ */
+ if (!ProcArrayInstallImportedXmin(CurrentSnapshot->xmin, sourcexid))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("could not import the requested snapshot"),
+ errdetail("The source transaction %u is not running anymore.",
+ sourcexid)));
+
+ /*
+ * In transaction-snapshot mode, the first snapshot must live until end of
+ * xact, so we must make a copy of it. Furthermore, if we're running in
+ * serializable mode, predicate.c needs to do its own processing.
+ */
+ if (IsolationUsesXactSnapshot())
+ {
+ if (IsolationIsSerializable())
+ SetSerializableTransactionSnapshot(CurrentSnapshot, sourcexid);
+ /* Make a saved copy */
+ CurrentSnapshot = CopySnapshot(CurrentSnapshot);
+ FirstXactSnapshot = CurrentSnapshot;
+ /* Mark it as "registered" in FirstXactSnapshot */
+ FirstXactSnapshot->regd_count++;
+ RegisteredSnapshots++;
+ }
+
+ FirstSnapshotSet = true;
+}
+
/*
* CopySnapshot
* Copy the given snapshot.
}
FirstXactSnapshot = NULL;
+ /*
+ * If we exported any snapshots, clean them up.
+ */
+ if (exportedSnapshots != NIL)
+ {
+ TransactionId myxid = GetTopTransactionId();
+ int i;
+ char buf[MAXPGPATH];
+
+ /*
+ * Get rid of the files. Unlink failure is only a WARNING because
+ * (1) it's too late to abort the transaction, and (2) leaving a
+ * leaked file around has little real consequence anyway.
+ */
+ for (i = 1; i <= list_length(exportedSnapshots); i++)
+ {
+ XactExportFilePath(buf, myxid, i, "");
+ if (unlink(buf))
+ elog(WARNING, "could not unlink file \"%s\": %m", buf);
+ }
+
+ /*
+ * As with the FirstXactSnapshot, we needn't spend any effort on
+ * cleaning up the per-snapshot data structures, but we do need to
+ * adjust the RegisteredSnapshots count to prevent a warning below.
+ *
+ * Note: you might be thinking "why do we have the exportedSnapshots
+ * list at all? All we need is a counter!". You're right, but we do
+ * it this way in case we ever feel like improving xmin management.
+ */
+ Assert(RegisteredSnapshots >= list_length(exportedSnapshots));
+ RegisteredSnapshots -= list_length(exportedSnapshots);
+
+ exportedSnapshots = NIL;
+ }
+
/* On commit, complain about leftover snapshots */
if (isCommit)
{
SnapshotResetXmin();
}
+
+
+/*
+ * ExportSnapshot
+ * Export the snapshot to a file so that other backends can import it.
+ * Returns the token (the file name) that can be used to import this
+ * snapshot.
+ */
+static char *
+ExportSnapshot(Snapshot snapshot)
+{
+ TransactionId topXid;
+ TransactionId *children;
+ int nchildren;
+ int addTopXid;
+ StringInfoData buf;
+ FILE *f;
+ int i;
+ MemoryContext oldcxt;
+ char path[MAXPGPATH];
+ char pathtmp[MAXPGPATH];
+
+ /*
+ * It's tempting to call RequireTransactionChain here, since it's not
+ * very useful to export a snapshot that will disappear immediately
+ * afterwards. However, we haven't got enough information to do that,
+ * since we don't know if we're at top level or not. For example, we
+ * could be inside a plpgsql function that is going to fire off other
+ * transactions via dblink. Rather than disallow perfectly legitimate
+ * usages, don't make a check.
+ *
+ * Also note that we don't make any restriction on the transaction's
+ * isolation level; however, importers must check the level if they
+ * are serializable.
+ */
+
+ /*
+ * This will assign a transaction ID if we do not yet have one.
+ */
+ topXid = GetTopTransactionId();
+
+ /*
+ * We cannot export a snapshot from a subtransaction because there's no
+ * easy way for importers to verify that the same subtransaction is still
+ * running.
+ */
+ if (IsSubTransaction())
+ ereport(ERROR,
+ (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
+ errmsg("cannot export a snapshot from a subtransaction")));
+
+ /*
+ * We do however allow previous committed subtransactions to exist.
+ * Importers of the snapshot must see them as still running, so get their
+ * XIDs to add them to the snapshot.
+ */
+ nchildren = xactGetCommittedChildren(&children);
+
+ /*
+ * Copy the snapshot into TopTransactionContext, add it to the
+ * exportedSnapshots list, and mark it pseudo-registered. We do this to
+ * ensure that the snapshot's xmin is honored for the rest of the
+ * transaction. (Right now, because SnapshotResetXmin is so stupid, this
+ * is overkill; but later we might make that routine smarter.)
+ */
+ snapshot = CopySnapshot(snapshot);
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ exportedSnapshots = lappend(exportedSnapshots, snapshot);
+ MemoryContextSwitchTo(oldcxt);
+
+ snapshot->regd_count++;
+ RegisteredSnapshots++;
+
+ /*
+ * Fill buf with a text serialization of the snapshot, plus identification
+ * data about this transaction. The format expected by ImportSnapshot
+ * is pretty rigid: each line must be fieldname:value.
+ */
+ initStringInfo(&buf);
+
+ appendStringInfo(&buf, "xid:%u\n", topXid);
+ appendStringInfo(&buf, "dbid:%u\n", MyDatabaseId);
+ appendStringInfo(&buf, "iso:%d\n", XactIsoLevel);
+ appendStringInfo(&buf, "ro:%d\n", XactReadOnly);
+
+ appendStringInfo(&buf, "xmin:%u\n", snapshot->xmin);
+ appendStringInfo(&buf, "xmax:%u\n", snapshot->xmax);
+
+ /*
+ * We must include our own top transaction ID in the top-xid data, since
+ * by definition we will still be running when the importing transaction
+ * adopts the snapshot, but GetSnapshotData never includes our own XID in
+ * the snapshot. (There must, therefore, be enough room to add it.)
+ *
+ * However, it could be that our topXid is after the xmax, in which case
+ * we shouldn't include it because xip[] members are expected to be before
+ * xmax. (We need not make the same check for subxip[] members, see
+ * snapshot.h.)
+ */
+ addTopXid = TransactionIdPrecedes(topXid, snapshot->xmax) ? 1 : 0;
+ appendStringInfo(&buf, "xcnt:%d\n", snapshot->xcnt + addTopXid);
+ for (i = 0; i < snapshot->xcnt; i++)
+ appendStringInfo(&buf, "xip:%u\n", snapshot->xip[i]);
+ if (addTopXid)
+ appendStringInfo(&buf, "xip:%u\n", topXid);
+
+ /*
+ * Similarly, we add our subcommitted child XIDs to the subxid data.
+ * Here, we have to cope with possible overflow.
+ */
+ if (snapshot->suboverflowed ||
+ snapshot->subxcnt + nchildren > GetMaxSnapshotSubxidCount())
+ appendStringInfoString(&buf, "sof:1\n");
+ else
+ {
+ appendStringInfoString(&buf, "sof:0\n");
+ appendStringInfo(&buf, "sxcnt:%d\n", snapshot->subxcnt + nchildren);
+ for (i = 0; i < snapshot->subxcnt; i++)
+ appendStringInfo(&buf, "sxp:%u\n", snapshot->subxip[i]);
+ for (i = 0; i < nchildren; i++)
+ appendStringInfo(&buf, "sxp:%u\n", children[i]);
+ }
+ appendStringInfo(&buf, "rec:%u\n", snapshot->takenDuringRecovery);
+
+ /*
+ * Now write the text representation into a file. We first write to a
+ * ".tmp" filename, and rename to final filename if no error. This
+ * ensures that no other backend can read an incomplete file
+ * (ImportSnapshot won't allow it because of its valid-characters check).
+ */
+ XactExportFilePath(pathtmp, topXid, list_length(exportedSnapshots), ".tmp");
+ if (!(f = AllocateFile(pathtmp, PG_BINARY_W)))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", pathtmp)));
+
+ if (fwrite(buf.data, buf.len, 1, f) != 1)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to file \"%s\": %m", pathtmp)));
+
+ /* no fsync() since file need not survive a system crash */
+
+ if (FreeFile(f))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to file \"%s\": %m", pathtmp)));
+
+ /*
+ * Now that we have written everything into a .tmp file, rename the file
+ * to remove the .tmp suffix.
+ */
+ XactExportFilePath(path, topXid, list_length(exportedSnapshots), "");
+
+ if (rename(pathtmp, path) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not rename file \"%s\" to \"%s\": %m",
+ pathtmp, path)));
+
+ /*
+ * The basename of the file is what we return from pg_export_snapshot().
+ * It's already in path in a textual format and we know that the path
+ * starts with SNAPSHOT_EXPORT_DIR. Skip over the prefix and the slash
+ * and pstrdup it so as not to return the address of a local variable.
+ */
+ return pstrdup(path + strlen(SNAPSHOT_EXPORT_DIR) + 1);
+}
+
+/*
+ * pg_export_snapshot
+ * SQL-callable wrapper for ExportSnapshot.
+ */
+Datum
+pg_export_snapshot(PG_FUNCTION_ARGS)
+{
+ char *snapshotName;
+
+ snapshotName = ExportSnapshot(GetActiveSnapshot());
+ PG_RETURN_TEXT_P(cstring_to_text(snapshotName));
+}
+
+
+/*
+ * Parsing subroutines for ImportSnapshot: parse a line with the given
+ * prefix followed by a value, and advance *s to the next line. The
+ * filename is provided for use in error messages.
+ */
+static int
+parseIntFromText(const char *prefix, char **s, const char *filename)
+{
+ char *ptr = *s;
+ int prefixlen = strlen(prefix);
+ int val;
+
+ if (strncmp(ptr, prefix, prefixlen) != 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid snapshot data in file \"%s\"", filename)));
+ ptr += prefixlen;
+ if (sscanf(ptr, "%d", &val) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid snapshot data in file \"%s\"", filename)));
+ ptr = strchr(ptr, '\n');
+ if (!ptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid snapshot data in file \"%s\"", filename)));
+ *s = ptr + 1;
+ return val;
+}
+
+static TransactionId
+parseXidFromText(const char *prefix, char **s, const char *filename)
+{
+ char *ptr = *s;
+ int prefixlen = strlen(prefix);
+ TransactionId val;
+
+ if (strncmp(ptr, prefix, prefixlen) != 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid snapshot data in file \"%s\"", filename)));
+ ptr += prefixlen;
+ if (sscanf(ptr, "%u", &val) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid snapshot data in file \"%s\"", filename)));
+ ptr = strchr(ptr, '\n');
+ if (!ptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid snapshot data in file \"%s\"", filename)));
+ *s = ptr + 1;
+ return val;
+}
+
+/*
+ * ImportSnapshot
+ * Import a previously exported snapshot. The argument should be a
+ * filename in SNAPSHOT_EXPORT_DIR. Load the snapshot from that file.
+ * This is called by "SET TRANSACTION SNAPSHOT 'foo'".
+ */
+void
+ImportSnapshot(const char *idstr)
+{
+ char path[MAXPGPATH];
+ FILE *f;
+ struct stat stat_buf;
+ char *filebuf;
+ int xcnt;
+ int i;
+ TransactionId src_xid;
+ Oid src_dbid;
+ int src_isolevel;
+ bool src_readonly;
+ SnapshotData snapshot;
+
+ /*
+ * Must be at top level of a fresh transaction. Note in particular that
+ * we check we haven't acquired an XID --- if we have, it's conceivable
+ * that the snapshot would show it as not running, making for very
+ * screwy behavior.
+ */
+ if (FirstSnapshotSet ||
+ GetTopTransactionIdIfAny() != InvalidTransactionId ||
+ IsSubTransaction())
+ ereport(ERROR,
+ (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
+ errmsg("SET TRANSACTION SNAPSHOT must be called before any query")));
+
+ /*
+ * If we are in read committed mode then the next query would execute
+ * with a new snapshot thus making this function call quite useless.
+ */
+ if (!IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("a snapshot-importing transaction must have isolation level SERIALIZABLE or REPEATABLE READ")));
+
+ /*
+ * Verify the identifier: only 0-9, A-F and hyphens are allowed. We do
+ * this mainly to prevent reading arbitrary files.
+ */
+ if (strspn(idstr, "0123456789ABCDEF-") != strlen(idstr))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid snapshot identifier \"%s\"", idstr)));
+
+ /* OK, read the file */
+ snprintf(path, MAXPGPATH, SNAPSHOT_EXPORT_DIR "/%s", idstr);
+
+ f = AllocateFile(path, PG_BINARY_R);
+ if (!f)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid snapshot identifier \"%s\"", idstr)));
+
+ /* get the size of the file so that we know how much memory we need */
+ if (fstat(fileno(f), &stat_buf))
+ elog(ERROR, "could not stat file \"%s\": %m", path);
+
+ /* and read the file into a palloc'd string */
+ filebuf = (char *) palloc(stat_buf.st_size + 1);
+ if (fread(filebuf, stat_buf.st_size, 1, f) != 1)
+ elog(ERROR, "could not read file \"%s\": %m", path);
+
+ filebuf[stat_buf.st_size] = '\0';
+
+ FreeFile(f);
+
+ /*
+ * Construct a snapshot struct by parsing the file content.
+ */
+ memset(&snapshot, 0, sizeof(snapshot));
+
+ src_xid = parseXidFromText("xid:", &filebuf, path);
+ /* we abuse parseXidFromText a bit here ... */
+ src_dbid = parseXidFromText("dbid:", &filebuf, path);
+ src_isolevel = parseIntFromText("iso:", &filebuf, path);
+ src_readonly = parseIntFromText("ro:", &filebuf, path);
+
+ snapshot.xmin = parseXidFromText("xmin:", &filebuf, path);
+ snapshot.xmax = parseXidFromText("xmax:", &filebuf, path);
+
+ snapshot.xcnt = xcnt = parseIntFromText("xcnt:", &filebuf, path);
+
+ /* sanity-check the xid count before palloc */
+ if (xcnt < 0 || xcnt > GetMaxSnapshotXidCount())
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid snapshot data in file \"%s\"", path)));
+
+ snapshot.xip = (TransactionId *) palloc(xcnt * sizeof(TransactionId));
+ for (i = 0; i < xcnt; i++)
+ snapshot.xip[i] = parseXidFromText("xip:", &filebuf, path);
+
+ snapshot.suboverflowed = parseIntFromText("sof:", &filebuf, path);
+
+ if (!snapshot.suboverflowed)
+ {
+ snapshot.subxcnt = xcnt = parseIntFromText("sxcnt:", &filebuf, path);
+
+ /* sanity-check the xid count before palloc */
+ if (xcnt < 0 || xcnt > GetMaxSnapshotSubxidCount())
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid snapshot data in file \"%s\"", path)));
+
+ snapshot.subxip = (TransactionId *) palloc(xcnt * sizeof(TransactionId));
+ for (i = 0; i < xcnt; i++)
+ snapshot.subxip[i] = parseXidFromText("sxp:", &filebuf, path);
+ }
+ else
+ {
+ snapshot.subxcnt = 0;
+ snapshot.subxip = NULL;
+ }
+
+ snapshot.takenDuringRecovery = parseIntFromText("rec:", &filebuf, path);
+
+ /*
+ * Do some additional sanity checking, just to protect ourselves. We
+ * don't trouble to check the array elements, just the most critical
+ * fields.
+ */
+ if (!TransactionIdIsNormal(src_xid) ||
+ !OidIsValid(src_dbid) ||
+ !TransactionIdIsNormal(snapshot.xmin) ||
+ !TransactionIdIsNormal(snapshot.xmax))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid snapshot data in file \"%s\"", path)));
+
+ /*
+ * If we're serializable, the source transaction must be too, otherwise
+ * predicate.c has problems (SxactGlobalXmin could go backwards). Also,
+ * a non-read-only transaction can't adopt a snapshot from a read-only
+ * transaction, as predicate.c handles the cases very differently.
+ */
+ if (IsolationIsSerializable())
+ {
+ if (src_isolevel != XACT_SERIALIZABLE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("a serializable transaction cannot import a snapshot from a non-serializable transaction")));
+ if (src_readonly && !XactReadOnly)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("a non-read-only serializable transaction cannot import a snapshot from a read-only transaction")));
+ }
+
+ /*
+ * We cannot import a snapshot that was taken in a different database,
+ * because vacuum calculates OldestXmin on a per-database basis; so the
+ * source transaction's xmin doesn't protect us from data loss. This
+ * restriction could be removed if the source transaction were to mark
+ * its xmin as being globally applicable. But that would require some
+ * additional syntax, since that has to be known when the snapshot is
+ * initially taken. (See pgsql-hackers discussion of 2011-10-21.)
+ */
+ if (src_dbid != MyDatabaseId)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot import a snapshot from a different database")));
+
+ /* OK, install the snapshot */
+ SetTransactionSnapshot(&snapshot, src_xid);
+}
+
+/*
+ * XactHasExportedSnapshots
+ * Test whether current transaction has exported any snapshots.
+ */
+bool
+XactHasExportedSnapshots(void)
+{
+ return (exportedSnapshots != NIL);
+}
+
+/*
+ * DeleteAllExportedSnapshotFiles
+ * Clean up any files that have been left behind by a crashed backend
+ * that had exported snapshots before it died.
+ *
+ * This should be called during database startup or crash recovery.
+ */
+void
+DeleteAllExportedSnapshotFiles(void)
+{
+ char buf[MAXPGPATH];
+ DIR *s_dir;
+ struct dirent *s_de;
+
+ if (!(s_dir = AllocateDir(SNAPSHOT_EXPORT_DIR)))
+ {
+ /*
+ * We really should have that directory in a sane cluster setup. But
+ * then again if we don't, it's not fatal enough to make it FATAL.
+ * Since we're running in the postmaster, LOG is our best bet.
+ */
+ elog(LOG, "could not open directory \"%s\": %m", SNAPSHOT_EXPORT_DIR);
+ return;
+ }
+
+ while ((s_de = ReadDir(s_dir, SNAPSHOT_EXPORT_DIR)) != NULL)
+ {
+ if (strcmp(s_de->d_name, ".") == 0 ||
+ strcmp(s_de->d_name, "..") == 0)
+ continue;
+
+ snprintf(buf, MAXPGPATH, SNAPSHOT_EXPORT_DIR "/%s", s_de->d_name);
+ /* Again, unlink failure is not worthy of FATAL */
+ if (unlink(buf))
+ elog(LOG, "could not unlink file \"%s\": %m", buf);
+ }
+
+ FreeDir(s_dir);
+}