--- /dev/null
+$PostgreSQL: pgsql/src/backend/access/transam/README,v 1.1 2004/08/01 20:57:59 tgl Exp $
+
+The Transaction System
+----------------------
+
+PostgreSQL's transaction system is a three-layer system. The bottom layer
+implements low-level transactions and subtransactions, on top of which rests
+the mainloop's control code, which in turn implements user-visible
+transactions and savepoints.
+
+The middle layer of code is called by postgres.c before and after the
+processing of each query:
+
+ StartTransactionCommand
+ CommitTransactionCommand
+ AbortCurrentTransaction
+
+Meanwhile, the user can alter the system's state by issuing the SQL commands
+BEGIN, COMMIT, ROLLBACK, SAVEPOINT, ROLLBACK TO or RELEASE. The traffic cop
+redirects these calls to the toplevel routines
+
+ BeginTransactionBlock
+ EndTransactionBlock
+ UserAbortTransactionBlock
+ DefineSavepoint
+ RollbackToSavepoint
+ ReleaseSavepoint
+
+respectively. Depending on the current state of the system, these functions
+call low level functions to activate the real transaction system:
+
+ StartTransaction
+ CommitTransaction
+ AbortTransaction
+ CleanupTransaction
+ StartSubTransaction
+ CommitSubTransaction
+ AbortSubTransaction
+ CleanupSubTransaction
+
+Additionally, within a transaction, CommandCounterIncrement is called to
+increment the command counter, which allows future commands to "see" the
+effects of previous commands within the same transaction. Note that this is
+done automatically by CommitTransactionCommand after each query inside a
+transaction block, but some utility functions also do it internally to allow
+some operations (usually in the system catalogs) to be seen by future
+operations in the same utility command (for example, in DefineRelation it is
+done after creating the heap so the pg_class row is visible, to be able to
+lock it).
+
+
+For example, consider the following sequence of user commands:
+
+1) BEGIN
+2) SELECT * FROM foo
+3) INSERT INTO foo VALUES (...)
+4) COMMIT
+
+In the main processing loop, this results in the following function call
+sequence:
+
+ / StartTransactionCommand;
+ / ProcessUtility; << BEGIN
+1) < BeginTransactionBlock;
+ \ CommitTransactionCommand;
+ \ StartTransaction;
+
+ / StartTransactionCommand;
+2) / ProcessQuery; << SELECT * FROM foo
+ \ CommitTransactionCommand;
+ \ CommandCounterIncrement;
+
+ / StartTransactionCommand;
+3) / ProcessQuery; << INSERT INTO foo VALUES (...)
+ \ CommitTransactionCommand;
+ \ CommandCounterIncrement;
+
+ / StartTransactionCommand;
+ / ProcessUtility; << COMMIT
+4) < EndTransactionBlock;
+ \ CommitTransaction;
+ \ CommitTransactionCommand;
+
+The point of this example is to demonstrate the need for
+StartTransactionCommand and CommitTransactionCommand to be state smart -- they
+should call CommandCounterIncrement between the calls to BeginTransactionBlock
+and EndTransactionBlock and outside these calls they need to do normal start,
+commit or abort processing.
+
+Furthermore, suppose the "SELECT * FROM foo" caused an abort condition. In
+this case AbortCurrentTransaction is called, and the transaction is put in
+aborted state. In this state, any user input is ignored except for
+transaction-termination statements, or ROLLBACK TO <savepoint> commands.
+
+Transaction aborts can occur in two ways:
+
+1) system dies from some internal cause (syntax error, etc)
+2) user types ROLLBACK
+
+The reason we have to distinguish them is illustrated by the following two
+situations:
+
+ case 1 case 2
+ ------ ------
+1) user types BEGIN 1) user types BEGIN
+2) user does something 2) user does something
+3) user does not like what 3) system aborts for some reason
+ she sees and types ABORT (syntax error, etc)
+
+In case 1, we want to abort the transaction and return to the default state.
+In case 2, there may be more commands coming our way which are part of the
+same transaction block; we have to ignore these commands until we see a COMMIT
+or ROLLBACK.
+
+Internal aborts are handled by AbortCurrentTransaction, while user aborts are
+handled by UserAbortTransactionBlock. Both of them rely on AbortTransaction
+to do all the real work. The only difference is what state we enter after
+AbortTransaction does its work:
+
+* AbortCurrentTransaction leaves us in TBLOCK_ABORT,
+* UserAbortTransactionBlock leaves us in TBLOCK_ENDABORT
+
+Low-level transaction abort handling is divided in two phases:
+* AbortTransaction executes as soon as we realize the transaction has
+ failed. It should release all shared resources (locks etc) so that we do
+ not delay other backends unnecessarily.
+* CleanupTransaction executes when we finally see a user COMMIT
+ or ROLLBACK command; it cleans things up and gets us out of the transaction
+ internally. In particular, we mustn't destroy TopTransactionContext until
+ this point.
+
+Also, note that when a transaction is committed, we don't close it right away.
+Rather it's put in TBLOCK_END state, which means that when
+CommitTransactionCommand is called after the query has finished processing,
+the transaction has to be closed. The distinction is subtle but important,
+because it means that control will leave the xact.c code with the transaction
+open, and the main loop will be able to keep processing inside the same
+transaction. So, in a sense, transaction commit is also handled in two
+phases, the first at EndTransactionBlock and the second at
+CommitTransactionCommand (which is where CommitTransaction is actually
+called).
+
+The rest of the code in xact.c are routines to support the creation and
+finishing of transactions and subtransactions. For example, AtStart_Memory
+takes care of initializing the memory subsystem at main transaction start.
+
+
+Subtransaction handling
+-----------------------
+
+Subtransactions are implemented using a stack of TransactionState structures,
+each of which has a pointer to its parent transaction's struct. When a new
+subtransaction is to be opened, PushTransaction is called, which creates a new
+TransactionState, with its parent link pointing to the current transaction.
+StartSubTransaction is in charge of initializing the new TransactionState to
+sane values, and properly initializing other subsystems (AtSubStart routines).
+
+When closing a subtransaction, either CommitSubTransaction has to be called
+(if the subtransaction is committing), or AbortSubTransaction and
+CleanupSubTransaction (if it's aborting). In either case, PopTransaction is
+called so the system returns to the parent transaction.
+
+One important point regarding subtransaction handling is that several may need
+to be closed in response to a single user command. That's because savepoints
+have names, and we allow to commit or rollback a savepoint by name, which is
+not necessarily the one that was last opened. In the case of subtransaction
+commit this is not a problem, and we close all the involved subtransactions
+right away by calling CommitTransactionToLevel, which in turn calls
+CommitSubTransaction and PopTransaction as many times as needed.
+
+In the case of subtransaction abort (when the user issues ROLLBACK TO
+<savepoint>), things are not so easy. We have to keep the subtransactions
+open and return control to the main loop. So what RollbackToSavepoint does is
+abort the innermost subtransaction and put it in TBLOCK_SUBENDABORT state, and
+put the rest in TBLOCK_SUBABORT_PENDING state. Then we return control to the
+main loop, which will in turn return control to us by calling
+CommitTransactionCommand. At this point we can close all subtransactions that
+are marked with the "abort pending" state. When that's done, the outermost
+subtransaction is created again, to conform to SQL's definition of ROLLBACK TO.
+
+Other subsystems are allowed to start "internal" subtransactions, which are
+handled by BeginInternalSubtransaction. This is to allow implementing
+exception handling, e.g. in PL/pgSQL. ReleaseCurrentSubTransaction and
+RollbackAndReleaseCurrentSubTransaction allows the subsystem to close said
+subtransactions. The main difference between this and the savepoint/release
+path is that BeginInternalSubtransaction is allowed when no explicit
+transaction block has been established, while DefineSavepoint is not.
+
+
+pg_clog and pg_subtrans
+-----------------------
+
+pg_clog and pg_subtrans are permanent (on-disk) storage of transaction related
+information. There is a limited number of pages of each kept in memory, so
+in many cases there is no need to actually read from disk. However, if
+there's a long running transaction or a backend sitting idle with an open
+transaction, it may be necessary to be able to read and write this information
+from disk. They also allow information to be permanent across server restarts.
+
+pg_clog records the commit status for each transaction. A transaction can be
+in progress, committed, aborted, or "sub-committed". This last state means
+that it's a subtransaction that's no longer running, but its parent has not
+updated its state yet (either it is still running, or the backend crashed
+without updating its status). A sub-committed transaction's status will be
+updated again to the final value as soon as the parent commits or aborts, or
+when the parent is detected to be aborted.
+
+Savepoints are implemented using subtransactions. A subtransaction is a
+transaction inside a transaction; it gets its own TransactionId, but its
+commit or abort status is not only dependent on whether it committed itself,
+but also whether its parent transaction committed. To implement multiple
+savepoints in a transaction we allow unlimited transaction nesting depth, so
+any particular subtransaction's commit state is dependent on the commit status
+of each and every ancestor transaction.
+
+The "subtransaction parent" (pg_subtrans) mechanism records, for each
+transaction, the TransactionId of its parent transaction. This information is
+stored as soon as the subtransaction is created. Top-level transactions do
+not have a parent, so they leave their pg_subtrans entries set to the default
+value of zero (InvalidTransactionId).
+
+pg_subtrans is used to check whether the transaction in question is still
+running --- the main Xid of a transaction is recorded in the PGPROC struct,
+but since we allow arbitrary nesting of subtransactions, we can't fit all Xids
+in shared memory, so we have to store them on disk. Note, however, that for
+each transaction we keep a "cache" of Xids that are known to be part of the
+transaction tree, so we can skip looking at pg_subtrans unless we know the
+cache has been overflowed. See storage/ipc/sinval.c for the gory details.
+
+slru.c is the supporting mechanism for both pg_clog and pg_subtrans. It
+implements the LRU policy for in-memory buffer pages. The high-level routines
+for pg_clog are implemented in transam.c, while the low-level functions are in
+clog.c. pg_subtrans is contained completely in subtrans.c.
* xact.c
* top level transaction system support routines
*
+ * See src/backend/access/transam/README for more information.
+ *
* Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
*
* IDENTIFICATION
- * $PostgreSQL: pgsql/src/backend/access/transam/xact.c,v 1.175 2004/08/01 17:32:13 tgl Exp $
- *
- * NOTES
- * Transaction aborts can now occur two ways:
- *
- * 1) system dies from some internal cause (syntax error, etc..)
- * 2) user types ABORT
- *
- * These two cases used to be treated identically, but now
- * we need to distinguish them. Why? consider the following
- * two situations:
- *
- * case 1 case 2
- * ------ ------
- * 1) user types BEGIN 1) user types BEGIN
- * 2) user does something 2) user does something
- * 3) user does not like what 3) system aborts for some reason
- * she sees and types ABORT
- *
- * In case 1, we want to abort the transaction and return to the
- * default state. In case 2, there may be more commands coming
- * our way which are part of the same transaction block and we have
- * to ignore these commands until we see a COMMIT transaction or
- * ROLLBACK.
- *
- * Internal aborts are now handled by AbortTransactionBlock(), just as
- * they always have been, and user aborts are now handled by
- * UserAbortTransactionBlock(). Both of them rely on AbortTransaction()
- * to do all the real work. The only difference is what state we
- * enter after AbortTransaction() does its work:
- *
- * * AbortTransactionBlock() leaves us in TBLOCK_ABORT and
- * * UserAbortTransactionBlock() leaves us in TBLOCK_ENDABORT
- *
- * Low-level transaction abort handling is divided into two phases:
- * * AbortTransaction() executes as soon as we realize the transaction
- * has failed. It should release all shared resources (locks etc)
- * so that we do not delay other backends unnecessarily.
- * * CleanupTransaction() executes when we finally see a user COMMIT
- * or ROLLBACK command; it cleans things up and gets us out of
- * the transaction internally. In particular, we mustn't destroy
- * TopTransactionContext until this point.
- *
- * NOTES
- * The essential aspects of the transaction system are:
- *
- * o transaction id generation
- * o transaction log updating
- * o memory cleanup
- * o cache invalidation
- * o lock cleanup
- *
- * Hence, the functional division of the transaction code is
- * based on which of the above things need to be done during
- * a start/commit/abort transaction. For instance, the
- * routine AtCommit_Memory() takes care of all the memory
- * cleanup stuff done at commit time.
- *
- * The code is layered as follows:
- *
- * StartTransaction
- * CommitTransaction
- * AbortTransaction
- * CleanupTransaction
- *
- * are provided to do the lower level work like recording
- * the transaction status in the log and doing memory cleanup.
- * above these routines are another set of functions:
- *
- * StartTransactionCommand
- * CommitTransactionCommand
- * AbortCurrentTransaction
- *
- * These are the routines used in the postgres main processing
- * loop. They are sensitive to the current transaction block state
- * and make calls to the lower level routines appropriately.
- *
- * Support for transaction blocks is provided via the functions:
- *
- * BeginTransactionBlock
- * CommitTransactionBlock
- * AbortTransactionBlock
- *
- * These are invoked only in response to a user "BEGIN WORK", "COMMIT",
- * or "ROLLBACK" command. The tricky part about these functions
- * is that they are called within the postgres main loop, in between
- * the StartTransactionCommand() and CommitTransactionCommand().
- *
- * For example, consider the following sequence of user commands:
- *
- * 1) begin
- * 2) select * from foo
- * 3) insert into foo (bar = baz)
- * 4) commit
- *
- * in the main processing loop, this results in the following
- * transaction sequence:
- *
- * / StartTransactionCommand();
- * 1) / ProcessUtility(); << begin
- * \ BeginTransactionBlock();
- * \ CommitTransactionCommand();
- *
- * / StartTransactionCommand();
- * 2) < ProcessQuery(); << select * from foo
- * \ CommitTransactionCommand();
- *
- * / StartTransactionCommand();
- * 3) < ProcessQuery(); << insert into foo (bar = baz)
- * \ CommitTransactionCommand();
- *
- * / StartTransactionCommand();
- * 4) / ProcessUtility(); << commit
- * \ CommitTransactionBlock();
- * \ CommitTransactionCommand();
- *
- * The point of this example is to demonstrate the need for
- * StartTransactionCommand() and CommitTransactionCommand() to
- * be state smart -- they should do nothing in between the calls
- * to BeginTransactionBlock() and EndTransactionBlock() and
- * outside these calls they need to do normal start/commit
- * processing.
- *
- * Furthermore, suppose the "select * from foo" caused an abort
- * condition. We would then want to abort the transaction and
- * ignore all subsequent commands up to the "commit".
- * -cim 3/23/90
+ * $PostgreSQL: pgsql/src/backend/access/transam/xact.c,v 1.176 2004/08/01 20:57:59 tgl Exp $
*
*-------------------------------------------------------------------------
*/