From: Heikki Linnakangas Date: Sat, 17 May 2014 10:48:52 +0000 (+0300) Subject: Update README, we don't do post-recovery cleanup actions anymore. X-Git-Tag: REL9_4_BETA2~170 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=a3655dd4a5cee3917a7d1766e07e36013e7e8835;p=postgresql Update README, we don't do post-recovery cleanup actions anymore. transam/README explained how B-tree incomplete splits were tracked and fixed after recovery, as an example of handling complex actions that need multiple WAL records, but that's not how it works anymore. Explain the new paradigm. --- diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README index 3a32471e95..f83526ccc3 100644 --- a/src/backend/access/transam/README +++ b/src/backend/access/transam/README @@ -575,16 +575,21 @@ while holding AccessExclusiveLock on the relation. Due to all these constraints, complex changes (such as a multilevel index insertion) normally need to be described by a series of atomic-action WAL -records. What do you do if the intermediate states are not self-consistent? -The answer is that the WAL replay logic has to be able to fix things up. -In btree indexes, for example, a page split requires insertion of a new key in -the parent btree level, but for locking reasons this has to be reflected by -two separate WAL records. The replay code has to remember "unfinished" split -operations, and match them up to subsequent insertions in the parent level. -If no matching insert has been found by the time the WAL replay ends, the -replay code has to do the insertion on its own to restore the index to -consistency. Such insertions occur after WAL is operational, so they can -and should write WAL records for the additional generated actions. +records. The intermediate states must be self-consistent, so that if the +replay is interrupted between any two actions, the system is fully +functional. In btree indexes, for example, a page split requires a new page +to be allocated, and an insertion of a new key in the parent btree level, +but for locking reasons this has to be reflected by two separate WAL +records. Replaying the first record, to allocate the new page and move +tuples to it, sets a flag on the page to indicate that the key has not been +inserted to the parent yet. Replaying the second record clears the flag. +This intermediate state is never seen by other backends during normal +operation, because the lock on the child page is held across the two +actions, but will be seen if the operation is interrupted before writing +the second WAL record. The search algorithm works with the intermediate +state as normal, but if an insertion encounters a page with the +incomplete-split flag set, it will finish the interrupted split by +inserting the key to the parent, before proceeding. Writing Hints -------------