Improve manual's discussion of locking and MVCC.

author Tom Lane <tgl@sss.pgh.pa.us>

Thu, 30 May 2002 20:45:18 +0000 (20:45 +0000)

committer Tom Lane <tgl@sss.pgh.pa.us>

Thu, 30 May 2002 20:45:18 +0000 (20:45 +0000)
author Tom Lane <tgl@sss.pgh.pa.us>
Thu, 30 May 2002 20:45:18 +0000 (20:45 +0000)
committer Tom Lane <tgl@sss.pgh.pa.us>
Thu, 30 May 2002 20:45:18 +0000 (20:45 +0000)
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml

index fddeb679c87a1732101e4fc41cf65c53cdaaaaac..7c65bb353052b7bb6376808773e742be79253abe 100644 (file)
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1,5 +1,5 @@
  <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/mvcc.sgml,v 2.24 2002/03/27 02:36:51 petere Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/mvcc.sgml,v 2.25 2002/05/30 20:45:18 tgl Exp $
  -->
  
   <chapter id="mvcc">
@@ -43,6 +43,13 @@ $Header: /cvsroot/pgsql/doc/src/sgml/mvcc.sgml,v 2.24 2002/03/27 02:36:51 petere
      with locks acquired for writing data, and so reading never blocks
      writing and writing never blocks reading.
     </para>
+
+   <para>
+    Table- and row-level locking facilities are also available in
+    <productname>PostgreSQL</productname> for applications that cannot
+    adapt easily to MVCC behavior.  However, proper use of MVCC will
+    generally provide better performance than locks.
+   </para>
    </sect1>
  
    <sect1 id="transaction-iso">
@@ -63,7 +70,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/mvcc.sgml,v 2.24 2002/03/27 02:36:51 petere
        </term>
       <listitem>
        <para>
-       A transaction reads data written by concurrent uncommitted transaction.
+       A transaction reads data written by a concurrent uncommitted transaction.
         </para>
        </listitem>
       </varlistentry>
@@ -209,41 +216,76 @@ $Header: /cvsroot/pgsql/doc/src/sgml/mvcc.sgml,v 2.24 2002/03/27 02:36:51 petere
      is the default isolation level in <productname>PostgreSQL</productname>. 
      When a transaction runs on this isolation level,
      a <command>SELECT</command> query sees only data committed before the
-    query began and never sees either uncommitted data or changes committed
+    query began; it never sees either uncommitted data or changes committed
      during query execution by concurrent transactions.  (However, the
      <command>SELECT</command> does see the effects of previous updates
-    executed within this same transaction, even though they are not yet
-    committed.)  Notice that two successive <command>SELECT</command>s can see different data,
-    even though they are within a single transaction, when other transactions
+    executed within its own transaction, even though they are not yet
+    committed.)  In effect, a <command>SELECT</command> query
+    sees a snapshot of the database as of the instant that that query
+    begins to run.  Notice that two successive <command>SELECT</command>s can
+    see different data, even though they are within a single transaction, if
+    other transactions 
      commit changes during execution of the first <command>SELECT</command>.
     </para>
  
     <para>
-    If a target row found by a query while executing an
-    <command>UPDATE</command> statement
-    (or <command>DELETE</command> or <command>SELECT FOR UPDATE</command>)
-    has already been updated by a
-    concurrent uncommitted transaction then the second transaction
-    that tries to update this row will wait for the other transaction to
-    commit or rollback. In the case of rollback, the waiting transaction
-    can proceed to change the row. In the case of commit (and if the
-    row still exists; i.e. was not deleted by the other transaction), the
-    query will be re-executed for this row to check that the new row
-    version still satisfies the query search condition. If the new row version
-    satisfies the query search condition then the row will be
-    updated (or deleted or marked for update).  Note that the starting point
-    for the update will be the new row version; moreover, after the update
-    the doubly-updated row is visible to subsequent <command>SELECT</command>s
-    in the current transaction.  Thus, the current transaction is able to see
-    the effects of the other transaction for this specific row.
+    <command>UPDATE</command>, <command>DELETE</command> and <command>SELECT
+    FOR UPDATE</command> commands behave the same as <command>SELECT</command>
+    in terms of searching for target rows: they will only find target rows
+    that were committed as of the query start time.  However, such a target
+    row may have already been updated (or deleted or marked for update) by
+    another concurrent transaction by the time it is found.  In this case, the
+    would-be updater will wait for the first updating transaction to commit or
+    roll back (if it is still in progress).  If the first updater rolls back,
+    then its effects are negated and the second updater can proceed with
+    updating the originally found row.  If the first updater commits, the
+    second updater will ignore the row if the first updater deleted it,
+    otherwise it will attempt to apply its operation to the updated version of
+    the row.  The query search condition (<literal>WHERE</> clause) is
+    re-evaluated to see if the updated version of the row still matches the
+    search condition.  If so, the second updater proceeds with its operation,
+    starting from the updated version of the row.
+   </para>
+
+   <para>
+    Because of the above rule, it is possible for updating queries to see
+    inconsistent snapshots --- they can see the effects of concurrent updating
+    queries that affected the same rows they are trying to update, but they
+    do not see effects of those queries on other rows in the database.
+    This behavior makes Read Committed mode unsuitable for queries that
+    involve complex search conditions.  However, it is just right for simpler
+    cases.  For example, consider updating bank balances with transactions
+    like
+
+<screen>
+BEGIN;
+UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345;
+UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 7534;
+COMMIT;
+</screen>
+
+    If two such transactions concurrently try to change the balance of account
+    12345, we clearly want the second transaction to start from the updated
+    version of the account's row.  Because each query is affecting only a
+    predetermined row, letting it see the updated version of the row does
+    not create any troublesome inconsistency.
     </para>
  
     <para>
-    The partial transaction isolation provided by Read Committed level is
-    adequate for many applications, and this level is fast and simple to use.
+    Since in Read Committed mode each new query starts with a new snapshot
+    that includes all transactions committed up to that instant, subsequent
+    queries in the same transaction will see the effects of the committed
+    concurrent transaction in any case.  The point at issue here is whether
+    or not within a <emphasis>single</> query we see an absolutely consistent
+    view of the database.
+   </para>
+
+   <para>
+    The partial transaction isolation provided by Read Committed mode is
+    adequate for many applications, and this mode is fast and simple to use.
      However, for applications that do complex queries and updates, it may
      be necessary to guarantee a more rigorously consistent view of the
-    database than the Read Committed level provides.
+    database than the Read Committed mode provides.
     </para>
    </sect1>
  
@@ -256,7 +298,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/mvcc.sgml,v 2.24 2002/03/27 02:36:51 petere
     </indexterm>
  
     <para>
-    <firstterm>Serializable</firstterm> provides the highest transaction
+    <firstterm>Serializable</firstterm> provides the strictest transaction
      isolation.  This level emulates serial transaction execution,
      as if transactions had been executed one after another, serially,
      rather than concurrently.  However, applications using this level must
@@ -266,28 +308,33 @@ $Header: /cvsroot/pgsql/doc/src/sgml/mvcc.sgml,v 2.24 2002/03/27 02:36:51 petere
     <para>
      When a transaction is on the serializable level,
      a <command>SELECT</command> query sees only data committed before the
-    transaction began and never sees either uncommitted data or changes
+    transaction began; it never sees either uncommitted data or changes
      committed
      during transaction execution by concurrent transactions.  (However, the
      <command>SELECT</command> does see the effects of previous updates
-    executed within this same transaction, even though they are not yet
+    executed within its own transaction, even though they are not yet
      committed.)  This is different from Read Committed in that the
      <command>SELECT</command>
      sees a snapshot as of the start of the transaction, not as of the start
-    of the current query within the transaction.
+    of the current query within the transaction.  Successive
+    <command>SELECT</command>s within a single transaction always see the same
+    data.
     </para>
  
     <para>
-    If a target row found by a query while executing an
-    <command>UPDATE</command> statement
-    (or <command>DELETE</command> or <command>SELECT FOR UPDATE</command>)
-    has already been updated by a
-    concurrent uncommitted transaction then the second transaction
-    that tries to update this row will wait for the other transaction to
-    commit or rollback. In the case of rollback, the waiting transaction
-    can proceed to change the row. In the case of a concurrent
-    transaction commit, a serializable transaction will be rolled back
-    with the message
+    <command>UPDATE</command>, <command>DELETE</command> and <command>SELECT
+    FOR UPDATE</command> commands behave the same as <command>SELECT</command>
+    in terms of searching for target rows: they will only find target rows
+    that were committed as of the transaction start time.  However, such a
+    target
+    row may have already been updated (or deleted or marked for update) by
+    another concurrent transaction by the time it is found.  In this case, the
+    serializable transaction will wait for the first updating transaction to commit or
+    roll back (if it is still in progress).  If the first updater rolls back,
+    then its effects are negated and the serializable transaction can proceed
+    with updating the originally found row.  But if the first updater commits
+    (and actually updated or deleted the row, not just selected it for update)
+    then the serializable transaction will be rolled back with the message
  
  <screen>
  ERROR:  Can't serialize access due to concurrent update
@@ -304,72 +351,29 @@ ERROR:  Can't serialize access due to concurrent update
      previously-committed change as part of its initial view of the database,
      so there is no logical conflict in using the new version of the row
      as the starting point for the new transaction's update.
-    Note that only updating transactions may need to be retried --- read-only
-    transactions never have serialization conflicts.
     </para>
  
     <para>
-    The Serializable transaction level provides a rigorous guarantee that each
-    transaction sees a wholly consistent view of the database.  However,
-    the application has to be prepared to retry transactions when concurrent
-    updates make it impossible to sustain the illusion of serial execution,
-    and the cost of redoing complex transactions may be significant.  So
-    this level is recommended only when update queries contain logic
-    sufficiently complex that they may give wrong answers in the Read Committed
-    level.
-   </para>
-  </sect1>
-
-  <sect1 id="applevel-consistency">
-   <title>Data consistency checks at the application level</title>
-
-   <para>
-    Because readers in <productname>PostgreSQL</productname>
-    don't lock data, regardless of
-    transaction isolation level, data read by one transaction can be
-    overwritten by another concurrent transaction. In other words,
-    if a row is returned by <command>SELECT</command> it doesn't mean that
-    the row still exists at the time it is returned (i.e., sometime after the
-    current transaction began); the row might have been modified or deleted
-    by an already-committed transaction that committed after this one started.
-    Even if the row is still valid <quote>now</quote>, it could be changed or deleted
-    before the current transaction does a commit or rollback.
-   </para>
-
-   <para>
-    Another way to think about it is that each
-    transaction sees a snapshot of the database contents, and concurrently
-    executing transactions may very well see different snapshots.  So the
-    whole concept of <quote>now</quote> is somewhat suspect anyway.  This is not normally
-    a big problem if the client applications are isolated from each other,
-    but if the clients can communicate via channels outside the database
-    then serious confusion may ensue.
+    Note that only updating transactions may need to be retried --- read-only
+    transactions will never have serialization conflicts.
     </para>
  
     <para>
-    To ensure the current existence of a row and protect it against
-    concurrent updates one must use <command>SELECT FOR UPDATE</command> or
-    an appropriate <command>LOCK TABLE</command> statement.
-    (<command>SELECT FOR UPDATE</command> locks just the returned rows against
-    concurrent updates, while <command>LOCK TABLE</command> protects the
-    whole table.)
-    This should be taken into account when porting applications to
-    <productname>PostgreSQL</productname> from other environments.
-
-    <note>
-     <para>
-      Before version 6.5 <productname>PostgreSQL</productname>
-      used read-locks and so the
-      above consideration is also the case
-      when upgrading to 6.5 (or higher) from previous
-      <productname>PostgreSQL</productname> versions.
-     </para>
-    </note>
+    The Serializable mode provides a rigorous guarantee that each
+    transaction sees a wholly consistent view of the database.  However,
+    the application has to be prepared to retry transactions when concurrent
+    updates make it impossible to sustain the illusion of serial execution.
+    Since the cost of redoing complex transactions may be significant,
+    this mode is recommended only when updating transactions contain logic
+    sufficiently complex that they may give wrong answers in Read
+    Committed mode.  Most commonly, Serializable mode is necessary when
+    a transaction performs several successive queries that must see
+    identical views of the database.
     </para>
    </sect1>
  
-  <sect1 id="locking-tables">
-   <title>Locking and Tables</title>
+  <sect1 id="explicit-locking">
+   <title>Explicit Locking</title>
  
     <indexterm>
      <primary>locking</primary>
@@ -377,20 +381,40 @@ ERROR:  Can't serialize access due to concurrent update
  
     <para>
      <productname>PostgreSQL</productname> provides various lock modes
-    to control concurrent access to data in tables.  Users normally
-    need not be concerned about the different lock modes because
+    to control concurrent access to data in tables.  These modes can be
+    used for application-controlled locking in situations where MVCC
+    does not give the desired behavior.  Also, most
      <productname>PostgreSQL</productname> commands automatically
-    acquire locks of appropriate modes to ensure data integrity while
-    permitting an appropriate level of concurrent access.
-    Nevertheless, a user can manually lock a table in any of the
-    available modes using the <command>LOCK TABLE</command> command.
+    acquire locks of appropriate modes to ensure that referenced tables
+    are not dropped or modified in incompatible ways while the command
+    executes.  (For example, <command>ALTER TABLE</> cannot be executed
+    concurrently with other operations on the same table.)
     </para>
  
+  <sect2 id="locking-tables">
+   <title>Table-Level Locks</title>
+
     <para>
      The list below shows the available lock modes and the contexts in
-    which they are used.  Remember that all of these lock modes are
-    table-level locks, even if the name contains the word
+    which they are used automatically by
+    <productname>PostgreSQL</productname>.
+    Remember that all of these lock modes are table-level locks,
+    even if the name contains the word
      <quote>row</quote>.  The names of the lock modes are historical.
+    To some extent the names reflect the typical usage of each lock
+    mode --- but the semantics are all the same.  The only real difference
+    between one lock mode and another is the set of lock modes with
+    which each conflicts.  Two transactions cannot hold locks of conflicting
+    modes on the same table at the same time.  (However, a transaction
+    never conflicts with itself --- for example, it may acquire
+    <literal>ACCESS EXCLUSIVE</literal> lock and later acquire
+    <literal>ACCESS SHARE</literal> lock on the same table.)  Nonconflicting
+    lock modes may be held concurrently by many transactions.  Notice in
+    particular that some lock modes are self-conflicting (for example,
+    <literal>ACCESS EXCLUSIVE</literal> cannot be held by more than one
+    transaction at a time) while others are not self-conflicting (for example,
+    <literal>ACCESS SHARE</literal> can be held by multiple transactions).
+    Once acquired, a lock mode is held till end of transaction.
     </para>
  
       <variablelist>
@@ -401,13 +425,15 @@ ERROR:  Can't serialize access due to concurrent update
         </term>
         <listitem>
         <para>
-        A read-lock mode acquired automatically on tables
-        being queried.
+        Conflicts with the <literal>ACCESS EXCLUSIVE</literal> lock
+        mode only.
         </para>
  
         <para>
-        Conflicts with the <literal>ACCESS EXCLUSIVE</literal> lock
-        mode only.
+        The <command>SELECT</command> command acquires a
+        lock of this mode on referenced tables.  In general, any query
+        that only reads a table and does not modify it will acquire
+        this lock mode.
         </para>
         </listitem>
        </varlistentry>
@@ -418,13 +444,15 @@ ERROR:  Can't serialize access due to concurrent update
         </term>
         <listitem>
         <para>
-        The <command>SELECT FOR UPDATE</command> command acquires a
-        lock of this mode.
+        Conflicts with the <literal>EXCLUSIVE</literal> and
+        <literal>ACCESS EXCLUSIVE</literal> lock modes.
         </para>
  
         <para>
-        Conflicts with the <literal>EXCLUSIVE</literal> and
-        <literal>ACCESS EXCLUSIVE</literal> lock modes.
+        The <command>SELECT FOR UPDATE</command> command acquires a
+        lock of this mode on the target table(s) (in addition to
+        <literal>ACCESS SHARE</literal> locks on any other tables
+        that are referenced but not selected <option>FOR UPDATE</option>).
         </para>
         </listitem>
        </varlistentry>
@@ -434,17 +462,20 @@ ERROR:  Can't serialize access due to concurrent update
         <literal>ROW EXCLUSIVE</literal>
         </term>
         <listitem>
-       <para>
-        The commands <command>UPDATE</command>,
-        <command>DELETE</command>, and <command>INSERT</command>
-        automatically acquire this lock mode.
-       </para>
-
         <para>
          Conflicts with the <literal>SHARE</literal>, <literal>SHARE ROW
          EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
          <literal>ACCESS EXCLUSIVE</literal> lock modes.
         </para>
+
+       <para>
+        The commands <command>UPDATE</command>,
+        <command>DELETE</command>, and <command>INSERT</command>
+        acquire this lock mode on the target table (in addition to
+        <literal>ACCESS SHARE</literal> locks on any other referenced
+        tables).  In general, this lock mode will be acquired by any
+        query that modifies the data in a table.
+       </para>
         </listitem>
        </varlistentry>
  
@@ -453,15 +484,17 @@ ERROR:  Can't serialize access due to concurrent update
         <literal>SHARE UPDATE EXCLUSIVE</literal>
         </term>
         <listitem>
-       <para>
-        Acquired by <command>VACUUM</command> (without <option>FULL</option>).
-       </para>
-
         <para>
          Conflicts with the <literal>SHARE UPDATE EXCLUSIVE</literal>,
          <literal>SHARE</literal>, <literal>SHARE ROW
          EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
          <literal>ACCESS EXCLUSIVE</literal> lock modes.
+        This mode protects a table against
+        concurrent schema changes and VACUUMs.
+       </para>
+
+       <para>
+        Acquired by <command>VACUUM</command> (without <option>FULL</option>).
         </para>
         </listitem>
        </varlistentry>
@@ -471,15 +504,16 @@ ERROR:  Can't serialize access due to concurrent update
         <literal>SHARE</literal>
         </term>
         <listitem>
-       <para>
-        Acquired by <command>CREATE INDEX</command>.
-       </para>
-
         <para>
          Conflicts with the <literal>ROW EXCLUSIVE</literal>,
          <literal>SHARE UPDATE EXCLUSIVE</literal>, <literal>SHARE ROW
          EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
          <literal>ACCESS EXCLUSIVE</literal> lock modes.
+        This mode protects a table against concurrent data changes.
+       </para>
+
+       <para>
+        Acquired by <command>CREATE INDEX</command>.
         </para>
         </listitem>
        </varlistentry>
@@ -489,10 +523,6 @@ ERROR:  Can't serialize access due to concurrent update
         <literal>SHARE ROW EXCLUSIVE</literal>
         </term>
         <listitem>
-       <para>
-         This lock mode is not automatically acquired by any command.
-       </para>
-
         <para>
          Conflicts with the <literal>ROW EXCLUSIVE</literal>,
          <literal>SHARE UPDATE EXCLUSIVE</literal>,
@@ -500,24 +530,33 @@ ERROR:  Can't serialize access due to concurrent update
          EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
          <literal>ACCESS EXCLUSIVE</literal> lock modes.
         </para>
+
+       <para>
+         This lock mode is not automatically acquired by any
+         <productname>PostgreSQL</productname> command.
+       </para>
         </listitem>
        </varlistentry>
  
        <varlistentry>
         <term>
-       <literal>EXCLUSIVE LOCK</literal>
+       <literal>EXCLUSIVE</literal>
         </term>
         <listitem>
-       <para>
-         This lock mode is not automatically acquired by any command.
-       </para>
-
         <para>
          Conflicts with the <literal>ROW SHARE</literal>, <literal>ROW
          EXCLUSIVE</literal>, <literal>SHARE UPDATE
          EXCLUSIVE</literal>, <literal>SHARE</literal>, <literal>SHARE
          ROW EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
          <literal>ACCESS EXCLUSIVE</literal> lock modes.
+        This mode allows only concurrent <literal>ACCESS SHARE</literal>,
+        i.e., only reads from the table can proceed in parallel with a
+        transaction holding this lock mode.
+       </para>
+
+       <para>
+         This lock mode is not automatically acquired by any
+         <productname>PostgreSQL</productname> command.
         </para>
         </listitem>
        </varlistentry>
@@ -528,19 +567,21 @@ ERROR:  Can't serialize access due to concurrent update
         </term>
         <listitem>
         <para>
-        Acquired by the <command>ALTER TABLE</command>, <command>DROP
-        TABLE</command>, and <command>VACUUM FULL</command> commands.
-        This is also the default lock mode for <command>LOCK TABLE</command>
-         statements that do not specify a mode explicitly.
-       </para>
-
-       <para>
-        Conflicts with locks of all modes ( <literal>ACCESS
+        Conflicts with locks of all modes (<literal>ACCESS
          SHARE</literal>, <literal>ROW SHARE</literal>, <literal>ROW
          EXCLUSIVE</literal>, <literal>SHARE UPDATE
          EXCLUSIVE</literal>, <literal>SHARE</literal>, <literal>SHARE
          ROW EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
          <literal>ACCESS EXCLUSIVE</literal>).
+        This mode guarantees that the
+        holder is the only transaction accessing the table in any way.
+       </para>
+
+       <para>
+        Acquired by the <command>ALTER TABLE</command>, <command>DROP
+        TABLE</command>, and <command>VACUUM FULL</command> commands.
+        This is also the default lock mode for <command>LOCK TABLE</command>
+         statements that do not specify a mode explicitly.
         </para>
         </listitem>
        </varlistentry>
@@ -554,17 +595,28 @@ ERROR:  Can't serialize access due to concurrent update
        </para>
       </note>
  
+   </sect2>
+
+   <sect2 id="locking-rows">
+    <title>Row-Level Locks</title>
+
      <para>
       In addition to table-level locks, there are row-level locks.
-     Row-level locks are acquired when rows are being updated (or
-     deleted or marked for update).  Row-level locks don't affect data
-     querying; they block writers to <emphasis>the same row</emphasis>
-     only.  Row-level locks cannot be acquired explicitly by the user.
+     A row-level lock on a specific row is automatically acquired when the
+     row is updated (or deleted or marked for update).  The lock is held
+     until the transaction commits or rolls back.
+     Row-level locks don't affect data
+     querying; they block <emphasis>writers to the same row</emphasis>
+     only.  To acquire a row-level lock on a row without actually
+     modifying the row, select the row with <command>SELECT FOR
+     UPDATE</command>.  Note that once a particular row-level lock is
+     acquired, the transaction may update the row multiple times without
+     fear of conflicts.
      </para>
  
      <para>
       <productname>PostgreSQL</productname> doesn't remember any
-     information about modified rows in memory, so is has no limit to
+     information about modified rows in memory, so it has no limit to
       the number of rows locked at one time.  However, locking a row
       may cause a disk write; thus, for example, <command>SELECT FOR
       UPDATE</command> will modify selected rows to mark them and so
@@ -572,11 +624,135 @@ ERROR:  Can't serialize access due to concurrent update
      </para>
  
      <para>
-    In addition to table and row locks, short-term share/exclusive locks are
-    used to control read/write access to table pages in the shared buffer
-    pool.  These locks are released immediately after a tuple is fetched or
-    updated.  Application writers normally need not be concerned with
-    page-level locks, but we mention them for completeness.
+     In addition to table and row locks, page-level share/exclusive locks are
+     used to control read/write access to table pages in the shared buffer
+     pool.  These locks are released immediately after a tuple is fetched or
+     updated.  Application writers normally need not be concerned with
+     page-level locks, but we mention them for completeness.
+    </para>
+
+   </sect2>
+
+   <sect2 id="locking-deadlocks">
+    <title>Deadlocks</title>
+
+    <para>
+     Use of explicit locking can cause <firstterm>deadlocks</>, wherein
+     two (or more) transactions each hold locks that the other wants.
+     For example, if transaction 1 acquires exclusive lock on table A
+     and then tries to acquire exclusive lock on table B, while transaction
+     2 has already exclusive-locked table B and now wants exclusive lock
+     on table A, then neither one can proceed.
+     <productname>PostgreSQL</productname> automatically detects deadlock
+     situations and resolves them by aborting one of the transactions
+     involved, allowing the other(s) to complete.  (Exactly which transaction
+     will be aborted is difficult to predict, and should not be relied on.)
+    </para>
+
+    <para>
+     The best defense against deadlocks is generally to avoid them by being
+     certain that all applications using a database acquire locks on multiple
+     objects in a consistent order.  One should also ensure that the first
+     lock acquired on an object in a transaction is the highest mode that
+     will be needed for that object.  If it is not feasible to verify this
+     in advance, then deadlocks may be handled on-the-fly by retrying
+     transactions that are aborted due to deadlock.
+    </para>
+
+    <para>
+     So long as no deadlock situation is detected, a transaction seeking
+     either a table-level or row-level lock will wait indefinitely for
+     conflicting locks to be released.  This means it is a bad idea for
+     applications to hold transactions open for long periods of time
+     (e.g., while waiting for user input).
+    </para>
+   </sect2>
+  </sect1>
+
+  <sect1 id="applevel-consistency">
+   <title>Data consistency checks at the application level</title>
+
+   <para>
+    Because readers in <productname>PostgreSQL</productname>
+    don't lock data, regardless of
+    transaction isolation level, data read by one transaction can be
+    overwritten by another concurrent transaction. In other words,
+    if a row is returned by <command>SELECT</command> it doesn't mean that
+    the row is still current at the instant it is returned (i.e., sometime
+    after the current query began).  The row might have been modified or
+    deleted by an already-committed transaction that committed after this one
+    started.
+    Even if the row is still valid <quote>now</quote>, it could be changed or
+    deleted
+    before the current transaction does a commit or rollback.
+   </para>
+
+   <para>
+    Another way to think about it is that each
+    transaction sees a snapshot of the database contents, and concurrently
+    executing transactions may very well see different snapshots.  So the
+    whole concept of <quote>now</quote> is somewhat suspect anyway.
+    This is not normally
+    a big problem if the client applications are isolated from each other,
+    but if the clients can communicate via channels outside the database
+    then serious confusion may ensue.
+   </para>
+
+   <para>
+    To ensure the current validity of a row and protect it against
+    concurrent updates one must use <command>SELECT FOR UPDATE</command> or
+    an appropriate <command>LOCK TABLE</command> statement.
+    (<command>SELECT FOR UPDATE</command> locks just the returned rows against
+    concurrent updates, while <command>LOCK TABLE</command> locks the
+    whole table.)
+    This should be taken into account when porting applications to
+    <productname>PostgreSQL</productname> from other environments.
+
+    <note>
+     <para>
+      Before version 6.5 <productname>PostgreSQL</productname>
+      used read-locks and so the
+      above consideration is also the case
+      when upgrading to 6.5 (or higher) from previous
+      <productname>PostgreSQL</productname> versions.
+     </para>
+    </note>
+   </para>
+
+   <para>
+    Global validity checks require extra thought under MVCC.  For
+    example, a banking application might wish to check that the sum of
+    all credits in one table equals the sum of debits in another table,
+    when both tables are being actively updated.  Comparing the results of two
+    successive SELECT SUM(...) commands will not work reliably under
+    Read Committed mode, since the second query will likely include the results
+    of transactions not counted by the first.  Doing the two sums in a
+    single serializable transaction will give an accurate picture of the
+    effects of transactions that committed before the serializable transaction
+    started --- but one might legitimately wonder whether the answer is still
+    relevant by the time it is delivered.  If the serializable transaction
+    itself applied some changes before trying to make the consistency check,
+    the usefulness of the check becomes even more debatable, since now it
+    includes some but not all post-transaction-start changes.  In such cases
+    a careful person might wish to lock all tables needed for the check,
+    in order to get an indisputable picture of current reality.  A
+    <literal>SHARE</> mode (or higher) lock guarantees that there are no
+    uncommitted changes in the locked table, other than those of the current
+    transaction.
+   </para>
+
+   <para>
+    Note also that if one is
+    relying on explicit locks to prevent concurrent changes, one should use
+    Read Committed mode, or in Serializable mode be careful to obtain the
+    lock(s) before performing queries.  An explicit lock obtained in a
+    serializable transaction guarantees that no other transactions modifying
+    the table are still running --- but if the snapshot seen by the
+    transaction predates obtaining the lock, it may predate some now-committed
+    changes in the table.  A serializable transaction's snapshot is actually
+    frozen at the start of its first query (SELECT/INSERT/UPDATE/DELETE), so
+    it's possible to obtain explicit locks before the snapshot is
+    frozen.
     </para>
    </sect1>
  
diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml

index a5191fb20d42169fec27942e2e91087086868c66..e28efe8fad64c6b2c1a201766241a122fac9d91d 100644 (file)
--- a/doc/src/sgml/ref/lock.sgml
+++ b/doc/src/sgml/ref/lock.sgml
@@ -1,5 +1,5 @@
  <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/ref/lock.sgml,v 1.32 2002/04/23 02:07:16 tgl Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/ref/lock.sgml,v 1.33 2002/05/30 20:45:18 tgl Exp $
  PostgreSQL documentation
  -->
  
@@ -53,12 +53,6 @@ where <replaceable class="PARAMETER">lockmode</replaceable> is one of:
       <varlistentry>
        <term>ACCESS SHARE MODE</term>
        <listitem>
-       <note>
-       <para>
-        This lock mode is acquired automatically over tables being queried.
-       </para>
-       </note>
-
         <para>
         This is the least restrictive lock mode.  It conflicts only with
         ACCESS EXCLUSIVE mode. It is used to protect a table from being
@@ -66,108 +60,114 @@ where <replaceable class="PARAMETER">lockmode</replaceable> is one of:
         <command>DROP TABLE</command> and <command>VACUUM FULL</command>
         commands.
         </para>
+
+       <note>
+       <para>
+        The <command>SELECT</command> command acquires a
+        lock of this mode on referenced tables.  In general, any query
+        that only reads a table and does not modify it will acquire
+        this lock mode.
+       </para>
+       </note>
        </listitem>
       </varlistentry>
  
       <varlistentry>
        <term>ROW SHARE MODE</term>
        <listitem>
+       <para>
+       Conflicts with EXCLUSIVE and ACCESS EXCLUSIVE lock modes.
+       </para>
+
         <note>
         <para>
-        Automatically acquired by <command>SELECT ... FOR UPDATE</command>.
+        The <command>SELECT FOR UPDATE</command> command acquires a
+        lock of this mode on the target table(s) (in addition to
+        <literal>ACCESS SHARE</literal> locks on any other tables
+        that are referenced but not selected <option>FOR UPDATE</option>).
         </para>
         </note>
-
-       <para>
-       Conflicts with EXCLUSIVE and ACCESS EXCLUSIVE lock modes.
-       </para>
        </listitem>
       </varlistentry>
  
       <varlistentry>
        <term>ROW EXCLUSIVE MODE</term>
        <listitem>
+       <para>
+        Conflicts with SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE and
+        ACCESS EXCLUSIVE modes.
+       </para>
+
         <note>
         <para>
-        Automatically acquired by <command>UPDATE</command>, 
+        The commands <command>UPDATE</command>,
          <command>DELETE</command>, and <command>INSERT</command>
-        statements.
+        acquire this lock mode on the target table (in addition to
+        <literal>ACCESS SHARE</literal> locks on any other referenced
+        tables).  In general, this lock mode will be acquired by any
+        query that modifies the data in a table.
          </para>
         </note>
-
-       <para>
-        Conflicts with SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE and
-        ACCESS EXCLUSIVE modes.
-       </para>
        </listitem>
       </varlistentry>
  
       <varlistentry>
        <term>SHARE UPDATE EXCLUSIVE MODE</term>
        <listitem>
-       <note>
-       <para>
-        Automatically acquired by <command>VACUUM</command> (without
-        <option>FULL</option>).
-       </para>
-       </note>
-
         <para>
         Conflicts with SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE,
         EXCLUSIVE and
         ACCESS EXCLUSIVE modes. This mode protects a table against
         concurrent schema changes and VACUUMs.
         </para>
-      </listitem>
-     </varlistentry>
  
-     <varlistentry>
-      <term>SHARE MODE</term>
-      <listitem>
         <note>
         <para>
-        Automatically acquired by <command>CREATE INDEX</command>.
-        Share-locks the entire table.
+        Acquired by <command>VACUUM</command> (without
+        <option>FULL</option>).
         </para>
         </note>
+      </listitem>
+     </varlistentry>
  
+     <varlistentry>
+      <term>SHARE MODE</term>
+      <listitem>
         <para>
         Conflicts with ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE,
         SHARE ROW EXCLUSIVE, EXCLUSIVE and
         ACCESS EXCLUSIVE modes. This mode protects a table against
-       concurrent data updates.
+       concurrent data changes.
+       </para>
+
+       <note>
+       <para>
+        Acquired by <command>CREATE INDEX</command>.
         </para>
+       </note>
        </listitem>
       </varlistentry>
  
       <varlistentry>
        <term>SHARE ROW EXCLUSIVE MODE</term>
        <listitem>
-       <note>
         <para>
-       This is like EXCLUSIVE MODE, but allows ROW SHARE locks
-        by others.
+        Conflicts with ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE,
+       SHARE ROW EXCLUSIVE, EXCLUSIVE and ACCESS EXCLUSIVE modes.
         </para>
-       </note>
  
+       <note>
         <para>
-        Conflicts with ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE,
-       SHARE ROW EXCLUSIVE, EXCLUSIVE and ACCESS EXCLUSIVE modes.
+         This lock mode is not automatically acquired by any
+         <productname>PostgreSQL</productname> command.
         </para>
+       </note>
        </listitem>
       </varlistentry>
  
       <varlistentry>
        <term>EXCLUSIVE MODE</term>
        <listitem>
-
-       <note>
-       <para>
-       This mode is yet more restrictive than SHARE ROW EXCLUSIVE.  
-       It blocks all concurrent ROW SHARE/SELECT...FOR UPDATE queries.
-       </para>
-       </note>
-
         <para>
         Conflicts with ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE,
         SHARE, SHARE ROW EXCLUSIVE,
@@ -176,33 +176,33 @@ where <replaceable class="PARAMETER">lockmode</replaceable> is one of:
         from the table can proceed in parallel with a transaction holding
         this lock mode.
         </para>
+
+       <note>
+       <para>
+        This lock mode is not automatically acquired by any
+        <productname>PostgreSQL</productname> command.
+       </para>
+       </note>
        </listitem>
       </varlistentry>
  
       <varlistentry>
        <term>ACCESS EXCLUSIVE MODE</term>
        <listitem>
-       <note>
-       <para>
-        Automatically acquired by <command>ALTER TABLE</command>,
-        <command>DROP TABLE</command>, <command>VACUUM FULL</command>
-        statements.
-        This is the most restrictive lock mode which
-        protects a locked table from any concurrent operations.
-       </para>
-       </note>
+       <para>
+       Conflicts with all lock modes.  This mode guarantees that the
+       holder is the only transaction accessing the table in any way.
+       </para>
  
         <note>
         <para>
-        This lock mode is also acquired by an unqualified
-        <command>LOCK TABLE</command> (i.e., the command without an explicit
-        lock mode option).
+        Acquired by <command>ALTER TABLE</command>,
+        <command>DROP TABLE</command>, and <command>VACUUM FULL</command>
+        statements.
+        This is also the default lock mode for <command>LOCK TABLE</command>
+         statements that do not specify a mode explicitly.
         </para>
         </note>
-
-       <para>
-       Conflicts with all lock modes.
-       </para>
        </listitem>
       </varlistentry>
      </variablelist>
@@ -255,98 +255,64 @@ ERROR <replaceable class="PARAMETER">name</replaceable>: Table does not exist.
    </title>
  
    <para>
-   <command>LOCK TABLE</command> controls concurrent access to a table
-   for the duration of a transaction.
-   <productname>PostgreSQL</productname> always uses the least restrictive
-   lock mode whenever possible. <command>LOCK TABLE</command> 
-   provides for cases when you might need more restrictive locking.
+   <command>LOCK TABLE</command> obtains a table-level lock, waiting if
+   necessary for any conflicting locks to be released.  Once obtained,
+   the lock is held for the remainder of the current transaction.
+   (There is no <command>UNLOCK TABLE</command> command; locks are always
+   released at transaction end.)
    </para>
  
    <para>
-   <acronym>RDBMS</acronym> locking uses the following terminology:
-
-   <variablelist>
-    <varlistentry>
-     <term>EXCLUSIVE</term>
-     <listitem>
-      <para>
-       An exclusive lock prevents other locks of the same type from being
-       granted.  (Note: ROW EXCLUSIVE mode does not follow this naming
-       convention perfectly, since it is shared at the level of the table;
-       it is exclusive only with respect to specific rows that are being
-       updated.)
-      </para>
-     </listitem>
-    </varlistentry>
-
-    <varlistentry>
-     <term>SHARE</term>
-     <listitem>
-      <para>
-      A shared lock allows others to also hold the same type of lock,
-      but prevents the corresponding EXCLUSIVE lock from being granted.
-      </para>
-     </listitem>
-    </varlistentry>
-
-    <varlistentry>
-     <term>ACCESS</term>
-     <listitem>
-      <para>
-       Locks table schema.
-      </para>
-     </listitem>
-    </varlistentry>
-
-    <varlistentry>
-     <term>ROW</term>
-     <listitem>
-      <para>
-       Locks individual rows.
-      </para>
-     </listitem>
-    </varlistentry>
-   </variablelist>
+   When acquiring locks automatically for commands that reference tables,
+   <productname>PostgreSQL</productname> always uses the least restrictive
+   lock mode possible. <command>LOCK TABLE</command> 
+   provides for cases when you might need more restrictive locking.
    </para>
  
    <para>
     For example, suppose an application runs a transaction at READ COMMITTED
-   isolation level and needs to ensure the existence of data in a table for
-   the duration of the
+   isolation level and needs to ensure that data in a table remains stable
+   for the duration of the
     transaction. To achieve this you could obtain SHARE lock mode over the
-   table before querying. This will prevent concurrent data changes 
-   and ensure further read operations over the table see data in their 
-   actual current state, because SHARE lock mode conflicts with any ROW
+   table before querying. This will prevent concurrent data changes
+   and ensure subsequent reads of the table see a stable
+   view of committed data, because SHARE lock mode conflicts with the ROW
     EXCLUSIVE lock acquired by writers, and your
     <command>LOCK TABLE <replaceable class="PARAMETER">name</replaceable> IN SHARE MODE</command>
-   statement will wait until any concurrent write operations commit or
-   rollback. Thus, once you obtain the lock, there are no uncommitted
-   writes outstanding.
+   statement will wait until any concurrent holders of ROW EXCLUSIVE mode
+   commit or roll back. Thus, once you obtain the lock, there are no
+   uncommitted writes outstanding; furthermore none can begin until you
+   release the lock.
  
     <note>
      <para>
-     To read data in their actual current state when running a transaction
-     at the SERIALIZABLE isolation level, you have to execute the LOCK TABLE
+     To achieve a similar effect when running a transaction
+     at the SERIALIZABLE isolation level, you have to execute the
+     <command>LOCK TABLE</>
       statement before executing any DML statement.  A serializable
       transaction's view of data will be frozen when its first DML statement
-     begins.
+     begins.  A later <command>LOCK</> will still prevent concurrent writes
+     --- but it
+     won't ensure that what the transaction reads corresponds to the latest
+     committed values.
      </para>
     </note>
    </para>
    
    <para>
-   In addition to the requirements above, if a transaction is going to
-   change data in a table, then SHARE ROW EXCLUSIVE lock mode should
-   be acquired to prevent deadlock conditions when two concurrent
-   transactions attempt to lock the table in SHARE mode and then
-   try to change data in this table, both (implicitly) acquiring 
-   ROW EXCLUSIVE lock mode that conflicts with a concurrent SHARE lock.
+   If a transaction of this sort is going to
+   change the data in the table, then it should use SHARE ROW EXCLUSIVE lock
+   mode instead of SHARE mode.  This ensures that only one transaction of
+   this type runs at a time.  Without this, a deadlock is possible: two
+   transactions might both acquire SHARE mode, and then be unable to also
+   acquire ROW EXCLUSIVE mode to actually perform their updates.  (Note that
+   a transaction's own locks never conflict, so a transaction can acquire
+   ROW EXCLUSIVE mode when it holds SHARE mode --- but not if anyone else
+   holds SHARE mode.)
    </para>
    
    <para>
-   To continue with the deadlock (when two transactions wait for one another)
-   issue raised above, you should follow two general rules to prevent 
-   deadlock conditions:
+   Two general rules may be followed to prevent deadlock conditions:
    </para>
  
    <itemizedlist>  
@@ -366,9 +332,7 @@ ERROR <replaceable class="PARAMETER">name</replaceable>: Table does not exist.
  
     <listitem>
      <para>
-     Transactions should acquire two conflicting lock modes only if
-     one of them is self-conflicting (i.e., may be held by only one
-     transaction at a time). If multiple lock modes are involved,
+     If multiple lock modes are involved for a single object,
       then transactions should always acquire the  most restrictive mode first.
      </para>
     
@@ -380,16 +344,18 @@ ERROR <replaceable class="PARAMETER">name</replaceable>: Table does not exist.
     </listitem>
    </itemizedlist>
  
-  <note>
-   <para>
-    <productname>PostgreSQL</productname> does detect deadlocks and will
-    rollback at least one waiting transaction to resolve the deadlock. 
-   </para>
-  </note>
+  <para>
+   <productname>PostgreSQL</productname> does detect deadlocks and will
+   rollback at least one waiting transaction to resolve the deadlock.
+   If it is not practical to code an application to follow the above rules
+   strictly, an alternative solution is to be prepared to retry transactions
+   when they are aborted by deadlocks.
+  </para>
  
    <para>
-   When locking multiple tables, the command LOCK a, b; is equivalent to LOCK
-   a; LOCK b;. The tables are locked one-by-one in the order specified in the
+   When locking multiple tables, the command <literal>LOCK a, b;</> is
+   equivalent to <literal>LOCK a; LOCK b;</>. The tables are locked one-by-one
+   in the order specified in the
     <command>LOCK</command> command.
    </para>
  
@@ -415,6 +381,62 @@ ERROR <replaceable class="PARAMETER">name</replaceable>: Table does not exist.
      lock will be dropped as soon as it is obtained.
     </para>
  
+  <para>
+   <acronym>RDBMS</acronym> locking uses the following standard terminology:
+
+   <variablelist>
+    <varlistentry>
+     <term>EXCLUSIVE</term>
+     <listitem>
+      <para>
+       An exclusive lock prevents other locks of the same type from being
+       granted.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term>SHARE</term>
+     <listitem>
+      <para>
+      A shared lock allows others to also hold the same type of lock,
+      but prevents the corresponding EXCLUSIVE lock from being granted.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term>ACCESS</term>
+     <listitem>
+      <para>
+       Locks table schema.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term>ROW</term>
+     <listitem>
+      <para>
+       Locks individual rows.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+   <productname>PostgreSQL</productname> does not follow this terminology
+   exactly.  <command>LOCK TABLE</> only deals with table-level locks, and
+   so the mode names involving ROW are all misnomers.  These mode names
+   should generally be read as indicating the intention of the user to
+   acquire row-level locks within the locked table.  Also,
+   ROW EXCLUSIVE mode does not follow this naming convention accurately,
+   since it is a sharable table lock.  Keep in mind that all the lock modes
+   have identical semantics so far as <command>LOCK TABLE</> is concerned,
+   differing only in the rules about which modes conflict with which.
+  </para>
+
    </refsect2>
   </refsect1>
    
@@ -424,7 +446,7 @@ ERROR <replaceable class="PARAMETER">name</replaceable>: Table does not exist.
    </title>
  
    <para>
-   Illustrate a SHARE lock on a primary key table when going to perform
+   Obtain a SHARE lock on a primary key table when going to perform
     inserts into a foreign key table:
  
    <programlisting>
author	Tom Lane <tgl@sss.pgh.pa.us>
	Thu, 30 May 2002 20:45:18 +0000 (20:45 +0000)
committer	Tom Lane <tgl@sss.pgh.pa.us>
	Thu, 30 May 2002 20:45:18 +0000 (20:45 +0000)
doc/src/sgml/mvcc.sgml		patch \| blob \| history
doc/src/sgml/ref/lock.sgml		patch \| blob \| history