Don't take ProcArrayLock while exiting a transaction that has no XID; there is

author Tom Lane <tgl@sss.pgh.pa.us>

Fri, 7 Sep 2007 20:59:26 +0000 (20:59 +0000)

committer Tom Lane <tgl@sss.pgh.pa.us>

Fri, 7 Sep 2007 20:59:26 +0000 (20:59 +0000)
author Tom Lane <tgl@sss.pgh.pa.us>
Fri, 7 Sep 2007 20:59:26 +0000 (20:59 +0000)
committer Tom Lane <tgl@sss.pgh.pa.us>
Fri, 7 Sep 2007 20:59:26 +0000 (20:59 +0000)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c

index 3f23378b8fa0de65abe97e0e302cd8adc0a2dc3c..f1ca5249cae8768c04fb12425fdb5fcf64242091 100644 (file)
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8,7 +8,7 @@
   *
   *
   * IDENTIFICATION
- *       $PostgreSQL: pgsql/src/backend/access/heap/heapam.c,v 1.238 2007/09/05 18:10:47 tgl Exp $
+ *       $PostgreSQL: pgsql/src/backend/access/heap/heapam.c,v 1.239 2007/09/07 20:59:26 tgl Exp $
   *
   *
   * INTERFACE ROUTINES
@@ -1546,9 +1546,8 @@ UpdateXmaxHintBits(HeapTupleHeader tuple, Buffer buffer, TransactionId xid)
   * If use_wal is false, the new tuple is not logged in WAL, even for a
   * non-temp relation.  Safe usage of this behavior requires that we arrange
   * that all new tuples go into new pages not containing any tuples from other
- * transactions, that the relation gets fsync'd before commit, and that the
- * transaction emits at least one WAL record to ensure RecordTransactionCommit
- * will decide to WAL-log the commit.  (See also heap_sync() comments)
+ * transactions, and that the relation gets fsync'd before commit.
+ * (See also heap_sync() comments)
   *
   * use_fsm is passed directly to RelationGetBufferForTuple, which see for
   * more info.
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README

index 87b405917021f070a43b6d67441e15b392995e53..7c69e09cb5db8e8943be7b8f635e11154fc58b76 100644 (file)
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -1,4 +1,4 @@
-$PostgreSQL: pgsql/src/backend/access/transam/README,v 1.7 2007/09/05 18:10:47 tgl Exp $
+$PostgreSQL: pgsql/src/backend/access/transam/README,v 1.8 2007/09/07 20:59:26 tgl Exp $
  
  The Transaction System
  ----------------------
@@ -221,6 +221,110 @@ InvalidSubTransactionId.)  Note that subtransactions do not have their
  own VXIDs; they use the parent top transaction's VXID.
  
  
+Interlocking transaction begin, transaction end, and snapshots
+--------------------------------------------------------------
+
+We try hard to minimize the amount of overhead and lock contention involved
+in the frequent activities of beginning/ending a transaction and taking a
+snapshot.  Unfortunately, we must have some interlocking for this, because
+we must ensure consistency about the commit order of transactions.
+For example, suppose an UPDATE in xact A is blocked by xact B's prior
+update of the same row, and xact B is doing commit while xact C gets a
+snapshot.  Xact A can complete and commit as soon as B releases its locks.
+If xact C's GetSnapshotData sees xact B as still running, then it had
+better see xact A as still running as well, or it will be able to see two
+tuple versions - one deleted by xact B and one inserted by xact A.  Another
+reason why this would be bad is that C would see (in the row inserted by A)
+earlier changes by B, and it would be inconsistent for C not to see any
+of B's changes elsewhere in the database.
+
+Formally, the correctness requirement is "if A sees B as committed,
+and B sees C as committed, then A must see C as committed".
+
+What we actually enforce is strict serialization of commits and rollbacks
+with snapshot-taking: we do not allow any transaction to exit the set of
+running transactions while a snapshot is being taken.  (This rule is
+stronger than necessary for consistency, but is relatively simple to
+enforce, and it assists with some other issues as explained below.)  The
+implementation of this is that GetSnapshotData takes the ProcArrayLock in
+shared mode (so that multiple backends can take snapshots in parallel),
+but xact.c must take the ProcArrayLock in exclusive mode while clearing
+MyProc->xid at transaction end (either commit or abort).
+
+GetSnapshotData must in fact acquire ProcArrayLock before it calls
+ReadNewTransactionId.  Otherwise it would be possible for a transaction A
+postdating the xmax to commit, and then an existing transaction B that saw
+A as committed to commit, before GetSnapshotData is able to acquire
+ProcArrayLock and finish taking its snapshot.  This would violate the
+consistency requirement, because A would be still running and B not
+according to this snapshot.
+
+In short, then, the rule is that no transaction may exit the set of
+currently-running transactions between the time we fetch xmax and the time
+we finish building our snapshot.  However, this restriction only applies
+to transactions that have an XID --- read-only transactions can end without
+acquiring ProcArrayLock, since they don't affect anyone else's snapshot.
+
+Transaction start, per se, doesn't have any interlocking with these
+considerations, since we no longer assign an XID immediately at transaction
+start.  But when we do decide to allocate an XID, we must require
+GetNewTransactionId to store the new XID into the shared ProcArray before
+releasing XidGenLock.  This ensures that when GetSnapshotData calls
+ReadNewTransactionId (which also takes XidGenLock), all active XIDs before
+the returned value of nextXid are already present in the ProcArray and
+can't be missed by GetSnapshotData.  Unfortunately, we can't have
+GetNewTransactionId take ProcArrayLock to do this, else it could deadlock
+against GetSnapshotData.  Therefore, we simply let GetNewTransactionId
+store into MyProc->xid without any lock.  We are thereby relying on
+fetch/store of an XID to be atomic, else other backends might see a
+partially-set XID.  (NOTE: for multiprocessors that need explicit memory
+access fence instructions, this means that acquiring/releasing XidGenLock
+is just as necessary as acquiring/releasing ProcArrayLock for
+GetSnapshotData to ensure it sees up-to-date xid fields.)  This also means
+that readers of the ProcArray xid fields must be careful to fetch a value
+only once, rather than assume they can read it multiple times and get the
+same answer each time.
+
+Another important activity that uses the shared ProcArray is GetOldestXmin,
+which must determine a lower bound for the oldest xmin of any active MVCC
+snapshot, system-wide.  Each individual backend advertises the smallest
+xmin of its own snapshots in MyProc->xmin, or zero if it currently has no
+live snapshots (eg, if it's between transactions or hasn't yet set a
+snapshot for a new transaction).  GetOldestXmin takes the MIN() of the
+valid xmin fields.  It does this with only shared lock on ProcArrayLock,
+which means there is a potential race condition against other backends
+doing GetSnapshotData concurrently: we must be certain that a concurrent
+backend that is about to set its xmin does not compute an xmin less than
+what GetOldestXmin returns.  We ensure that by including all the active
+XIDs into the MIN() calculation, along with the valid xmins.  The rule that
+transactions can't exit without taking exclusive ProcArrayLock ensures that
+concurrent holders of shared ProcArrayLock will compute the same minimum of
+currently-active XIDs: no xact, in particular not the oldest, can exit
+while we hold shared ProcArrayLock.  So GetOldestXmin's view of the minimum
+active XID will be the same as that of any concurrent GetSnapshotData, and
+so it can't produce an overestimate.  If there is no active transaction at
+all, GetOldestXmin returns the result of ReadNewTransactionId.  Note that
+two concurrent executions of GetOldestXmin might not see the same result
+from ReadNewTransactionId --- but if there is a difference, the intervening
+execution(s) of GetNewTransactionId must have stored their XIDs into the
+ProcArray, so the later execution of GetOldestXmin will see them and
+compute the same global xmin anyway.
+
+GetSnapshotData also performs an oldest-xmin calculation (which had better
+match GetOldestXmin's) and stores that into RecentGlobalXmin, which is used
+for some tuple age cutoff checks where a fresh call of GetOldestXmin seems
+too expensive.  Note that while it is certain that two concurrent
+executions of GetSnapshotData will compute the same xmin for their own
+snapshots, as argued above, it is not certain that they will arrive at the
+same estimate of RecentGlobalXmin.  This is because we allow XID-less
+transactions to clear their MyProc->xmin asynchronously (without taking
+ProcArrayLock), so one execution might see what had been the oldest xmin,
+and another not.  This is OK since RecentGlobalXmin need only be a valid
+lower bound.  As noted above, we are already assuming that fetch/store
+of the xid fields is atomic, so assuming it for xmin as well is no extra
+risk.
+
+
  pg_clog and pg_subtrans
  -----------------------
  
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c

index 2e972d56f60fb0605618e3c7304d6ecc78049ff4..02b064179f2bc8e302b75756e19b08542b34a810 100644 (file)
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -10,7 +10,7 @@
   *
   *
   * IDENTIFICATION
- *       $PostgreSQL: pgsql/src/backend/access/transam/xact.c,v 1.248 2007/09/05 18:10:47 tgl Exp $
+ *       $PostgreSQL: pgsql/src/backend/access/transam/xact.c,v 1.249 2007/09/07 20:59:26 tgl Exp $
   *
   *-------------------------------------------------------------------------
   */
@@ -747,6 +747,8 @@ AtSubStart_ResourceOwner(void)
  
  /*
   *     RecordTransactionCommit
+ *
+ * This is exported only to support an ugly hack in VACUUM FULL.
   */
  void
  RecordTransactionCommit(void)
@@ -1552,46 +1554,53 @@ CommitTransaction(void)
          */
         RecordTransactionCommit();
  
-       /*----------
+       PG_TRACE1(transaction__commit, MyProc->lxid);
+
+       /*
          * Let others know about no transaction in progress by me. Note that
          * this must be done _before_ releasing locks we hold and _after_
          * RecordTransactionCommit.
          *
-        * LWLockAcquire(ProcArrayLock) is required; consider this example:
-        *              UPDATE with xid 0 is blocked by xid 1's UPDATE.
-        *              xid 1 is doing commit while xid 2 gets snapshot.
-        * If xid 2's GetSnapshotData sees xid 1 as running then it must see
-        * xid 0 as running as well, or it will be able to see two tuple versions
-        * - one deleted by xid 1 and one inserted by xid 0.  See notes in
-        * GetSnapshotData.
-        *
          * Note: MyProc may be null during bootstrap.
-        *----------
          */
         if (MyProc != NULL)
         {
-               /*
-                * Lock ProcArrayLock because that's what GetSnapshotData uses.
-                * You might assume that we can skip this step if we had no
-                * transaction id assigned, because the failure case outlined
-                * in GetSnapshotData cannot happen in that case. This is true,
-                * but we *still* need the lock guarantee that two concurrent
-                * computations of the *oldest* xmin will get the same result.
-                */
-               LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-               MyProc->xid = InvalidTransactionId;
-               MyProc->lxid = InvalidLocalTransactionId;
-               MyProc->xmin = InvalidTransactionId;
-               MyProc->inVacuum = false;               /* must be cleared with xid/xmin */
+               if (TransactionIdIsValid(MyProc->xid))
+               {
+                       /*
+                        * We must lock ProcArrayLock while clearing MyProc->xid, so
+                        * that we do not exit the set of "running" transactions while
+                        * someone else is taking a snapshot.  See discussion in
+                        * src/backend/access/transam/README.
+                        */
+                       LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  
-               /* Clear the subtransaction-XID cache too while holding the lock */
-               MyProc->subxids.nxids = 0;
-               MyProc->subxids.overflowed = false;
+                       MyProc->xid = InvalidTransactionId;
+                       MyProc->lxid = InvalidLocalTransactionId;
+                       MyProc->xmin = InvalidTransactionId;
+                       MyProc->inVacuum = false;       /* must be cleared with xid/xmin */
  
-               LWLockRelease(ProcArrayLock);
-       }
+                       /* Clear the subtransaction-XID cache too while holding the lock */
+                       MyProc->subxids.nxids = 0;
+                       MyProc->subxids.overflowed = false;
  
-       PG_TRACE1(transaction__commit, s->transactionId);
+                       LWLockRelease(ProcArrayLock);
+               }
+               else
+               {
+                       /*
+                        * If we have no XID, we don't need to lock, since we won't
+                        * affect anyone else's calculation of a snapshot.  We might
+                        * change their estimate of global xmin, but that's OK.
+                        */
+                       MyProc->lxid = InvalidLocalTransactionId;
+                       MyProc->xmin = InvalidTransactionId;
+                       MyProc->inVacuum = false;       /* must be cleared with xid/xmin */
+
+                       Assert(MyProc->subxids.nxids == 0);
+                       Assert(MyProc->subxids.overflowed == false);
+               }
+       }
  
         /*
          * This is all post-commit cleanup.  Note that if an error is raised here,
@@ -1815,28 +1824,21 @@ PrepareTransaction(void)
          * Let others know about no transaction in progress by me.      This has to be
          * done *after* the prepared transaction has been marked valid, else
          * someone may think it is unlocked and recyclable.
+        *
+        * We can skip locking ProcArrayLock here, because this action does not
+        * actually change anyone's view of the set of running XIDs: our entry
+        * is duplicate with the gxact that has already been inserted into the
+        * ProcArray.
          */
-
-       /*
-        * Lock ProcArrayLock because that's what GetSnapshotData uses.
-        * You might assume that we can skip this step if we have no
-        * transaction id assigned, because the failure case outlined
-        * in GetSnapshotData cannot happen in that case. This is true,
-        * but we *still* need the lock guarantee that two concurrent
-        * computations of the *oldest* xmin will get the same result.
-        */
-       LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
         MyProc->xid = InvalidTransactionId;
         MyProc->lxid = InvalidLocalTransactionId;
         MyProc->xmin = InvalidTransactionId;
         MyProc->inVacuum = false;       /* must be cleared with xid/xmin */
  
-       /* Clear the subtransaction-XID cache too while holding the lock */
+       /* Clear the subtransaction-XID cache too */
         MyProc->subxids.nxids = 0;
         MyProc->subxids.overflowed = false;
  
-       LWLockRelease(ProcArrayLock);
-
         /*
          * This is all post-transaction cleanup.  Note that if an error is raised
          * here, it's too late to abort the transaction.  This should be just
@@ -1987,36 +1989,55 @@ AbortTransaction(void)
          */
         RecordTransactionAbort(false);
  
+       PG_TRACE1(transaction__abort, MyProc->lxid);
+
         /*
          * Let others know about no transaction in progress by me. Note that this
          * must be done _before_ releasing locks we hold and _after_
          * RecordTransactionAbort.
+        *
+        * Note: MyProc may be null during bootstrap.
          */
         if (MyProc != NULL)
         {
-               /*
-                * Lock ProcArrayLock because that's what GetSnapshotData uses.
-                * You might assume that we can skip this step if we have no
-                * transaction id assigned, because the failure case outlined
-                * in GetSnapshotData cannot happen in that case. This is true,
-                * but we *still* need the lock guarantee that two concurrent
-                * computations of the *oldest* xmin will get the same result.
-                */
-               LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-               MyProc->xid = InvalidTransactionId;
-               MyProc->lxid = InvalidLocalTransactionId;
-               MyProc->xmin = InvalidTransactionId;
-               MyProc->inVacuum = false;               /* must be cleared with xid/xmin */
-               MyProc->inCommit = false;               /* be sure this gets cleared */
-
-               /* Clear the subtransaction-XID cache too while holding the lock */
-               MyProc->subxids.nxids = 0;
-               MyProc->subxids.overflowed = false;
-
-               LWLockRelease(ProcArrayLock);
-       }
+               if (TransactionIdIsValid(MyProc->xid))
+               {
+                       /*
+                        * We must lock ProcArrayLock while clearing MyProc->xid, so
+                        * that we do not exit the set of "running" transactions while
+                        * someone else is taking a snapshot.  See discussion in
+                        * src/backend/access/transam/README.
+                        */
+                       LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  
-       PG_TRACE1(transaction__abort, s->transactionId);
+                       MyProc->xid = InvalidTransactionId;
+                       MyProc->lxid = InvalidLocalTransactionId;
+                       MyProc->xmin = InvalidTransactionId;
+                       MyProc->inVacuum = false;       /* must be cleared with xid/xmin */
+                       MyProc->inCommit = false;       /* be sure this gets cleared */
+
+                       /* Clear the subtransaction-XID cache too while holding the lock */
+                       MyProc->subxids.nxids = 0;
+                       MyProc->subxids.overflowed = false;
+
+                       LWLockRelease(ProcArrayLock);
+               }
+               else
+               {
+                       /*
+                        * If we have no XID, we don't need to lock, since we won't
+                        * affect anyone else's calculation of a snapshot.  We might
+                        * change their estimate of global xmin, but that's OK.
+                        */
+                       MyProc->lxid = InvalidLocalTransactionId;
+                       MyProc->xmin = InvalidTransactionId;
+                       MyProc->inVacuum = false;       /* must be cleared with xid/xmin */
+                       MyProc->inCommit = false;       /* be sure this gets cleared */
+
+                       Assert(MyProc->subxids.nxids == 0);
+                       Assert(MyProc->subxids.overflowed == false);
+               }
+       }
  
         /*
          * Post-abort cleanup.  See notes in CommitTransaction() concerning
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c

index 9bc1c388ee763e8a25e5e9bf0f96c42306e0ca24..d28a6ad11c2f3682f60a49730619f4563e8e764b 100644 (file)
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -8,7 +8,7 @@
   *
   *
   * IDENTIFICATION
- *       $PostgreSQL: pgsql/src/backend/commands/copy.c,v 1.285 2007/06/20 02:02:49 neilc Exp $
+ *       $PostgreSQL: pgsql/src/backend/commands/copy.c,v 1.286 2007/09/07 20:59:26 tgl Exp $
   *
   *-------------------------------------------------------------------------
   */
@@ -1678,13 +1678,6 @@ CopyFrom(CopyState cstate)
          * rd_newRelfilenodeSubid can be cleared before the end of the transaction.
          * However this is OK since at worst we will fail to make the optimization.
          *
-        * When skipping WAL it's entirely possible that COPY itself will write no
-        * WAL records at all.  This is of concern because RecordTransactionCommit
-        * might decide it doesn't need to log our eventual commit, which we
-        * certainly need it to do.  However, we need no special action here for
-        * that, because if we have a new table or new relfilenode then there
-        * must have been a WAL-logged pg_class update earlier in the transaction.
-        *
          * Also, if the target file is new-in-transaction, we assume that checking
          * FSM for free space is a waste of time, even if we must use WAL because
          * of archiving.  This could possibly be wrong, but it's unlikely.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c

index 322c1d1a27dbb5e6f103f600f8fcf1c8fa18ace5..7e5873b89df1625c12870203bfd9d20a3b0f52b1 100644 (file)
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -26,7 +26,7 @@
   *
   *
   * IDENTIFICATION
- *       $PostgreSQL: pgsql/src/backend/executor/execMain.c,v 1.296 2007/08/15 21:39:50 tgl Exp $
+ *       $PostgreSQL: pgsql/src/backend/executor/execMain.c,v 1.297 2007/09/07 20:59:26 tgl Exp $
   *
   *-------------------------------------------------------------------------
   */
@@ -2635,12 +2635,6 @@ OpenIntoRel(QueryDesc *queryDesc)
  
         /*
          * We can skip WAL-logging the insertions, unless PITR is in use.
-        *
-        * Note that for a non-temp INTO table, this is safe only because we know
-        * that the catalog changes above will have been WAL-logged, and so
-        * RecordTransactionCommit will think it needs to WAL-log the eventual
-        * transaction commit.  Else the commit might be lost, even though all the
-        * data is safely fsync'd ...
          */
         estate->es_into_relation_use_wal = XLogArchivingActive();
         estate->es_into_relation_descriptor = intoRelationDesc;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c

index 15701b959cef14baa319567eb624df62e2412794..b7c4ebcb1b13136064e35d99ce249a72abfb6e10 100644 (file)
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -10,7 +10,7 @@
   *
   * Because of various subtle race conditions it is critical that a backend
   * hold the correct locks while setting or clearing its MyProc->xid field.
- * See notes in GetSnapshotData.
+ * See notes in src/backend/access/transam/README.
   *
   * The process array now also includes PGPROC structures representing
   * prepared transactions.  The xid and subxids fields of these are valid,
@@ -23,7 +23,7 @@
   *
   *
   * IDENTIFICATION
- *       $PostgreSQL: pgsql/src/backend/storage/ipc/procarray.c,v 1.31 2007/09/07 00:58:56 tgl Exp $
+ *       $PostgreSQL: pgsql/src/backend/storage/ipc/procarray.c,v 1.32 2007/09/07 20:59:26 tgl Exp $
   *
   *-------------------------------------------------------------------------
   */
@@ -407,6 +407,7 @@ TransactionIdIsActive(TransactionId xid)
   * Note: we include all currently running xids in the set of considered xids.
   * This ensures that if a just-started xact has not yet set its snapshot,
   * when it does set the snapshot it cannot set xmin less than what we compute.
+ * See notes in src/backend/access/transam/README.
   */
  TransactionId
  GetOldestXmin(bool allDbs, bool ignoreVacuum)
@@ -468,7 +469,7 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
         return result;
  }
  
-/*----------
+/*
   * GetSnapshotData -- returns information about running transactions.
   *
   * The returned snapshot includes xmin (lowest still-running xact ID),
@@ -481,12 +482,13 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
   * This ensures that the set of transactions seen as "running" by the
   * current xact will not change after it takes the snapshot.
   *
- * All running top-level XIDs are included in the snapshot.  We also try
- * to include running subtransaction XIDs, but since PGPROC has only a
- * limited cache area for subxact XIDs, full information may not be
- * available.  If we find any overflowed subxid arrays, we have to mark
- * the snapshot's subxid data as overflowed, and extra work will need to
- * be done to determine what's running (see XidInMVCCSnapshot() in tqual.c).
+ * All running top-level XIDs are included in the snapshot, except for lazy
+ * VACUUM processes.  We also try to include running subtransaction XIDs,
+ * but since PGPROC has only a limited cache area for subxact XIDs, full
+ * information may not be available.  If we find any overflowed subxid arrays,
+ * we have to mark the snapshot's subxid data as overflowed, and extra work
+ * will need to be done to determine what's running (see XidInMVCCSnapshot()
+ * in tqual.c).
   *
   * We also update the following backend-global variables:
   *             TransactionXmin: the oldest xmin of any snapshot in use in the
@@ -497,7 +499,6 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
   *             RecentGlobalXmin: the global xmin (oldest TransactionXmin across all
   *                     running transactions, except those running LAZY VACUUM).  This is
   *                     the same computation done by GetOldestXmin(true, true).
- *----------
   */
  Snapshot
  GetSnapshotData(Snapshot snapshot, bool serializable)
@@ -550,57 +551,16 @@ GetSnapshotData(Snapshot snapshot, bool serializable)
  
         /*
          * It is sufficient to get shared lock on ProcArrayLock, even if we are
-        * computing a serializable snapshot and therefore will be setting
-        * MyProc->xmin.  This is because any two backends that have overlapping
-        * shared holds on ProcArrayLock will certainly compute the same xmin
-        * (since no xact, in particular not the oldest, can exit the set of
-        * running transactions while we hold ProcArrayLock --- see further
-        * discussion just below).      So it doesn't matter whether another backend
-        * concurrently doing GetSnapshotData or GetOldestXmin sees our xmin as
-        * set or not; he'd compute the same xmin for himself either way.
-        * (We are assuming here that xmin can be set and read atomically,
-        * just like xid.)
-        *
-        * There is a corner case in which the above argument doesn't work: if
-        * there isn't any oldest xact, ie, all xids in the array are invalid.
-        * In that case we will compute xmin as the result of ReadNewTransactionId,
-        * and since GetNewTransactionId doesn't take the ProcArrayLock, it's not
-        * so obvious that two backends with overlapping shared locks will get
-        * the same answer.  But GetNewTransactionId is required to store the XID
-        * it assigned into the ProcArray before releasing XidGenLock.  Therefore
-        * the backend that did ReadNewTransactionId later will see that XID in
-        * the array, and will compute the same xmin as the earlier one that saw
-        * no XIDs in the array.
+        * going to set MyProc->xmin.
          */
         LWLockAcquire(ProcArrayLock, LW_SHARED);
  
-       /*--------------------
+       /*
          * Unfortunately, we have to call ReadNewTransactionId() after acquiring
-        * ProcArrayLock above.  It's not good because ReadNewTransactionId() does
-        * LWLockAcquire(XidGenLock), but *necessary*.  We need to be sure that
-        * no transactions exit the set of currently-running transactions
-        * between the time we fetch xmax and the time we finish building our
-        * snapshot.  Otherwise we could have a situation like this:
-        *
-        *              1. Tx Old is running (in Read Committed mode).
-        *              2. Tx S reads new transaction ID into xmax, then
-        *                 is swapped out before acquiring ProcArrayLock.
-        *              3. Tx New gets new transaction ID (>= S' xmax),
-        *                 makes changes and commits.
-        *              4. Tx Old changes some row R changed by Tx New and commits.
-        *              5. Tx S finishes getting its snapshot data.  It sees Tx Old as
-        *                 done, but sees Tx New as still running (since New >= xmax).
-        *
-        * Now S will see R changed by both Tx Old and Tx New, *but* does not
-        * see other changes made by Tx New.  If S is supposed to be in
-        * Serializable mode, this is wrong.
-        *
-        * By locking ProcArrayLock before we read xmax, we ensure that TX Old
-        * cannot exit the set of running transactions seen by Tx S.  Therefore
-        * both Old and New will be seen as still running => no inconsistency.
-        *--------------------
+        * ProcArrayLock.  It's not good because ReadNewTransactionId() does
+        * LWLockAcquire(XidGenLock), but *necessary*.  See discussion in
+        * src/backend/access/transam/README.
          */
-
         xmax = ReadNewTransactionId();
  
         /* initialize xmin calculation with xmax */
@@ -1147,9 +1107,10 @@ XidCacheRemoveRunningXids(TransactionId xid, int nxids, TransactionId *xids)
  
         /*
          * We must hold ProcArrayLock exclusively in order to remove transactions
-        * from the PGPROC array.  (See notes in GetSnapshotData.)      It's possible
-        * this could be relaxed since we know this routine is only used to abort
-        * subtransactions, but pending closer analysis we'd best be conservative.
+        * from the PGPROC array.  (See src/backend/access/transam/README.)  It's
+        * possible this could be relaxed since we know this routine is only used
+        * to abort subtransactions, but pending closer analysis we'd best be
+        * conservative.
          */
         LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
author	Tom Lane <tgl@sss.pgh.pa.us>
	Fri, 7 Sep 2007 20:59:26 +0000 (20:59 +0000)
committer	Tom Lane <tgl@sss.pgh.pa.us>
	Fri, 7 Sep 2007 20:59:26 +0000 (20:59 +0000)
src/backend/access/heap/heapam.c		patch \| blob \| history
src/backend/access/transam/README		patch \| blob \| history
src/backend/access/transam/xact.c		patch \| blob \| history
src/backend/commands/copy.c		patch \| blob \| history
src/backend/executor/execMain.c		patch \| blob \| history
src/backend/storage/ipc/procarray.c		patch \| blob \| history