From 7809cbf7e65b5f962ca9bd3ad20bcc0ac81855a4 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Tue, 15 May 2001 00:35:50 +0000 Subject: [PATCH] Some badly needed documentation about EvalPlanQual. --- src/backend/executor/README | 99 +++++++++++++++++++++++++++++++++++++ 1 file changed, 99 insertions(+) create mode 100644 src/backend/executor/README diff --git a/src/backend/executor/README b/src/backend/executor/README new file mode 100644 index 0000000000..0a56c3fa6a --- /dev/null +++ b/src/backend/executor/README @@ -0,0 +1,99 @@ +$Header: /cvsroot/pgsql/src/backend/executor/README,v 1.1 2001/05/15 00:35:50 tgl Exp $ + +The Postgres Executor +--------------------- + +The executor processes a tree of "plan nodes". The plan tree is essentially +a demand-pull pipeline of tuple processing operations. Each node, when +called, will produce the next tuple in its output sequence, or NULL if no +more tuples are available. If the node is not a primitive relation-scanning +node, it will have child node(s) that it calls in turn to obtain input +tuples. + +Refinements on this basic model include: + +* Choice of scan direction (forwards or backwards). Caution: this is not +currently well-supported. It works for primitive scan nodes, but not very +well for joins, aggregates, etc. + +* Rescan command to reset a node and make it generate its output sequence +over again. + +* Parameters that can alter a node's results. After adjusting a parameter, +the rescan command must be applied to that node and all nodes above it. +There is a moderately intelligent scheme to avoid rescanning nodes +unnecessarily (for example, Sort does not rescan its input if no parameters +of the input have changed, since it can just reread its stored sorted data). + +The plan tree concept implements SELECT directly: it is only necessary to +deliver the top-level result tuples to the client, or insert them into +another table in the case of INSERT ... SELECT. (INSERT ... VALUES is +handled similarly, but the plan tree is just a Result node with no source +tables.) For UPDATE, the plan tree selects the tuples that need to be +updated (WHERE condition) and delivers a new calculated tuple value for each +such tuple, plus a "junk" (hidden) tuple CTID identifying the target tuple. +The executor's top level then uses this information to update the correct +tuple. DELETE is similar to UPDATE except that only a CTID need be +delivered by the plan tree. + +XXX a great deal more documentation needs to be written here... + + +EvalPlanQual (READ COMMITTED update checking) +--------------------------------------------- + +For simple SELECTs, the executor need only pay attention to tuples that are +valid according to the snapshot seen by the current transaction (ie, they +were inserted by a previously committed transaction, and not deleted by any +previously committed transaction). However, for UPDATE and DELETE it is not +cool to modify or delete a tuple that's been modified by an open or +concurrently-committed transaction. If we are running in SERIALIZABLE +isolation level then we just raise an error when this condition is seen to +occur. In READ COMMITTED isolation level, we must work a lot harder. + +The basic idea in READ COMMITTED mode is to take the modified tuple +committed by the concurrent transaction (after waiting for it to commit, +if need be) and re-evaluate the query qualifications to see if it would +still meet the quals. If so, we regenerate the updated tuple (if we are +doing an UPDATE) from the modified tuple, and finally update/delete the +modified tuple. SELECT FOR UPDATE behaves similarly, except that its action +is just to mark the modified tuple for update by the current transaction. + +To implement this checking, we actually re-run the entire query from scratch +for each modified tuple, but with the scan node that sourced the original +tuple set to return only the modified tuple, not the original tuple or any +of the rest of the relation. If this query returns a tuple, then the +modified tuple passes the quals (and the query output is the suitably +modified update tuple, if we're doing UPDATE). If no tuple is returned, +then the modified tuple fails the quals, so we ignore it and continue the +original query. (This is reasonably efficient for simple queries, but may +be horribly slow for joins. A better design would be nice; one thought for +future investigation is to treat the tuple substitution like a parameter, +so that we can avoid rescanning unrelated nodes.) + +Note a fundamental bogosity of this approach: if the relation containing +the original tuple is being used in a self-join, the other instance(s) of +the relation will be treated as still containing the original tuple, whereas +logical consistency would demand that the modified tuple appear in them too. +But we'd have to actually substitute the modified tuple for the original, +while still returning all the rest of the relation, to ensure consistent +answers. Implementing this correctly is a task for future work. + +In UPDATE/DELETE, only the target relation needs to be handled this way, +so only one special recheck query needs to execute at a time. In SELECT FOR +UPDATE, there may be multiple relations flagged FOR UPDATE, so it's possible +that while we are executing a recheck query for one modified tuple, we will +hit another modified tuple in another relation. In this case we "stack up" +recheck queries: a sub-recheck query is spawned in which both the first and +second modified tuples will be returned as the only components of their +relations. (In event of success, all these modified tuples will be marked +for update.) Again, this isn't necessarily quite the right thing ... but in +simple cases it works. Potentially, recheck queries could get nested to the +depth of the number of FOR UPDATE relations in the query. + +It should be noted also that UPDATE/DELETE expect at most one tuple to +result from the modified query, whereas in the FOR UPDATE case it's possible +for multiple tuples to result (since we could be dealing with a join in +which multiple tuples join to the modified tuple). We want FOR UPDATE to +mark all relevant tuples, so we pass all tuples output by all the stacked +recheck queries back to the executor toplevel for marking. -- 2.40.0