From ba8808b2a668f8f3d1682593915023428f963230 Mon Sep 17 00:00:00 2001 From: "Thomas G. Lockhart" Date: Sat, 13 Feb 1999 03:37:54 +0000 Subject: [PATCH] Substitute great info from Stefan Simkovics' Master's Thesis. Still need to add some acknowledgements at the top of the doc; already have full info in the bibliography but since the original is being spread across the existing docs we should also mention things locally. --- doc/src/sgml/arch-dev.sgml | 4164 +++++++++++++++++++++++++++++++++++- 1 file changed, 4083 insertions(+), 81 deletions(-) diff --git a/doc/src/sgml/arch-dev.sgml b/doc/src/sgml/arch-dev.sgml index 076952eb60..c04e47cf0b 100644 --- a/doc/src/sgml/arch-dev.sgml +++ b/doc/src/sgml/arch-dev.sgml @@ -1,81 +1,4083 @@ - - Architecture - - -<ProductName>Postgres</ProductName> Architectural Concepts - - - Before we continue, you should understand the basic - Postgres system architecture. Understanding how the - parts of Postgres interact will make the next chapter - somewhat clearer. - In database jargon, Postgres uses a simple "process - per-user" client/server model. A Postgres session - consists of the following cooperating UNIX processes (programs): - - - - - A supervisory daemon process (postmaster), - - - - - the user's frontend application (e.g., the psql program), and - - - - - the one or more backend database servers (the postgres process itself). - - - - - - - A single postmaster manages a given collection of - databases on a single host. Such a collection of - databases is called an installation or site. Frontend - applications that wish to access a given database - within an installation make calls to the library. - The library sends user requests over the network to the - postmaster ((a)), which in turn starts a new - backend server process ((b)) - -
-How a connection is established - -
- - and connects the - frontend process to the new server ((c)). From - that point on, the frontend process and the backend - server communicate without intervention by the - postmaster. Hence, the postmaster is always running, waiting - for requests, whereas frontend and backend processes - come and go. The libpq library allows a single - frontend to make multiple connections to backend processes. - However, the frontend application is still a - single-threaded process. Multithreaded frontend/backend - connections are not currently supported in libpq. - One implication of this architecture is that the - postmaster and the backend always run on the same - machine (the database server), while the frontend - application may run anywhere. You should keep this - in mind, - because the files that can be accessed on a client - machine may not be accessible (or may only be accessed - using a different filename) on the database server - machine. - You should also be aware that the postmaster and - postgres servers run with the user-id of the Postgres - "superuser." Note that the Postgres superuser does not - have to be a special user (e.g., a user named - "postgres"). Furthermore, the Postgres superuser - should - definitely not be the UNIX superuser, "root"! In any - case, all files relating to a database should belong to - this Postgres superuser. -
-
-
+ + PostgreSQL from the Developer's Point of View + + + This chapter gives an overview of the internal structure of the + backend of Postgres. + After having read the following sections you + should have an idea of how a query is processed. Don't expect a + detailed description here (I think such a description dealing with + all data structures and functions used within Postgres + would exceed 1000 + pages!). This chapter is intended to help understanding the general + control and data flow within the backend from receiving a query to + sending the results. + + + + The Path of a Query + + + Here we give a short overview of the stages a query has to pass in + order to obtain a result. + + + + + + A connection from an application program to the Postgres + server has to be established. The application program transmits a + query to the server and receives the results sent back by the server. + + + + + + The parser stage checks the query + transmitted by the application + program (client) for correct syntax and creates + a query tree. + + + + + + The rewrite system takes + the query tree created by the parser stage and looks for + any rules (stored in the + system catalogs) to apply to + the querytree and performs the + transformations given in the rule bodies. + One application of the rewrite system is given in the realization of + views. + + + + Whenever a query against a view + (i.e. a virtual table) is made, + the rewrite system rewrites the user's query to + a query that accesses the base tables given in + the view definition instead. + + + + + + The planner/optimizer takes + the (rewritten) querytree and creates a + queryplan that will be the input to the + executor. + + + + It does so by first creating all possible paths + leading to the same result. For example if there is an index on a + relation to be scanned, there are two paths for the + scan. One possibility is a simple sequential scan and the other + possibility is to use the index. Next the cost for the execution of + each plan is estimated and the + cheapest plan is chosen and handed back. + + + + + + The executor recursively steps through + the plan tree and + retrieves tuples in the way represented by the plan. + The executor makes use of the + storage system while scanning + relations, performs sorts and joins, + evaluates qualifications and finally hands back the tuples derived. + + + + + + In the following sections we will cover every of the above listed items + in more detail to give a better understanding on Postgres's internal + control and data structures. + + + + + How Connections are Established + + + Postgres is implemented using a simple "process per-user" + client/server model. In this model there is one client process + connected to exactly one server process. + As we don't know per se + how many connections will be made, we have to use a master process + that spawns a new server process every time a connection is + requested. This master process is called postmaster and + listens at a specified TCP/IP port for incoming connections. Whenever + a request for a connection is detected the postmaster process + spawns a new server process called postgres. The server + tasks (postgres processes) communicate with each other using + semaphores and shared memory + to ensure data integrity + throughout concurrent data access. Figure + \ref{connection} illustrates the interaction of the master process + postmaster the server process postgres and a client + application. + + + + The client process can either be the psql frontend (for + interactive SQL queries) or any user application implemented using + the libpg library. Note that applications implemented using + ecpg + (the Postgres embedded SQL preprocessor for C) + also use this library. + + + + Once a connection is established the client process can send a query + to the backend (server). The query is transmitted using plain text, + i.e. there is no parsing done in the frontend (client). The + server parses the query, creates an execution plan, + executes the plan and returns the retrieved tuples to the client + by transmitting them over the established connection. + + + + + + + + The Parser Stage + + + The parser stage consists of two parts: + + + + + The parser defined in + gram.y and scan.l is + built using the UNIX tools yacc + and lex. + + + + + The transformation process does + modifications and augmentations to the data structures returned by the parser. + + + + + + + Parser + + + The parser has to check the query string (which arrives as + plain ASCII text) for valid syntax. If the syntax is correct a + parse tree is built up and handed back otherwise an error is + returned. For the implementation the well known UNIX + tools lex and yacc + are used. + + + + The lexer is defined in the file + scan.l and is responsible + for recognizing identifiers, + the SQL keywords etc. For + every keyword or identifier that is found, a token + is generated and handed to the parser. + + + + The parser is defined in the file gram.y and consists of a + set of grammar rules and actions + that are executed + whenever a rule is fired. The code of the actions (which + is actually C-code) is used to build up the parse tree. + + + + The file scan.l is transformed to + the C-source file scan.c + using the program lex + and gram.y is transformed to + gram.c using yacc. + After these transformations have taken + place a normal C-compiler can be used to create the + parser. Never make any changes to the generated C-files as they will + be overwritten the next time lex + or yacc is called. + + + + The mentioned transformations and compilations are normally done + automatically using the makefiles + shipped with the Postgres + source distribution. + + + + + + A detailed description of yacc or + the grammar rules given + in gram.y would be beyond the scope of this paper. There are + many books and documents dealing with lex + and yacc. You + should be familiar with yacc before you start to study the + grammar given in gram.y otherwise you won't understand what + happens there. + + + + For a better understanding of the data structures used in Postgres + for the processing of a query we use an example to illustrate the + changes made to these data structures in every stage. + + + + A Simple Select + + + This example contains the following simple query that will be used in + various descriptions and figures throughout the following + sections. The query assumes that the tables given in + The Supplier Database + + have already been defined. + + + select s.sname, se.pno + from supplier s, sells se + where s.sno > 2 and + s.sno = se.sno; + + + + + + Figure \ref{parsetree} shows the parse tree built by the + grammar rules and actions given in gram.y for the query + given in + (without the operator tree for + the where clause which is shown in figure \ref{where_clause} + because there was not enough space to show both data structures in one + figure). + + + + The top node of the tree is a SelectStmt node. For every entry + appearing in the from clause of the SQL query a RangeVar + node is created holding the name of the alias and a pointer to a + RelExpr node holding the name of the relation. All + RangeVar nodes are collected in a list which is attached to the field + fromClause of the SelectStmt node. + + + + For every entry appearing in the select list of the SQL query a + ResTarget node is created holding a pointer to an Attr + node. The Attr node holds the relation name of the entry and + a pointer to a Value node holding the name of the + attribute. + All ResTarget nodes are collected to a list which is + connected to the field targetList of the SelectStmt node. + + + + Figure \ref{where_clause} shows the operator tree built for the + where clause of the SQL query given in example + + which is attached to the field + qual of the SelectStmt node. The top node of the + operator tree is an A_Expr node representing an AND + operation. This node has two successors called lexpr and + rexpr pointing to two subtrees. The subtree attached to + lexpr represents the qualification s.sno > 2 and the one + attached to rexpr represents s.sno = se.sno. For every + attribute an Attr node is created holding the name of the + relation and a pointer to a Value node holding the name of the + attribute. For the constant term appearing in the query a + Const node is created holding the value. + + + + + + + + Transformation Process + + + The transformation process takes the tree handed back by + the parser as input and steps recursively through it. If + a SelectStmt node is found, it is transformed + to a Query + node which will be the top most node of the new data structure. Figure + \ref{transformed} shows the transformed data structure (the part + for the transformed where clause is given in figure + \ref{transformed_where} because there was not enough space to show all + parts in one figure). + + + + Now a check is made, if the relation names in the + FROM clause are known to the system. For every relation name + that is present in the system catalogs a RTE node is + created containing the relation name, the alias name and + the relation id. From now on the relation ids are used to + refer to the relations given in the query. All RTE nodes + are collected in the range table entry list which is connected + to the field rtable of the Query node. If a name of a + relation that is not known to the system is detected in the query an + error will be returned and the query processing will be aborted. + + + + Next it is checked if the attribute names used are + contained in the relations given in the query. For every + attribute} that is found a TLE node is created holding a pointer + to a Resdom node (which holds the name of the column) and a + pointer to a VAR node. There are two important numbers in the + VAR node. The field varno gives the position of the + relation containing the current attribute} in the range + table entry list created above. The field varattno gives the + position of the attribute within the relation. If the name + of an attribute cannot be found an error will be returned and + the query processing will be aborted. + + + + + + + + The <productname>Postgres</productname> Rule System + + + Postgres supports a powerful + rule system for the specification + of views and ambiguous view updates. + Originally the Postgres + rule system consisted of two implementations: + + + + + The first one worked using tuple level processing and was + implemented deep in the executor. The rule system was + called whenever an individual tuple had been accessed. This + implementation was removed in 1995 when the last official release + of the Postgres project was transformed into + Postgres95. + + + + + + The second implementation of the rule system is a technique + called query rewriting. + The rewrite system} is a module + that exists between the parser stage and the + planner/optimizer. This technique is still implemented. + + + + + + + For information on the syntax and creation of rules in the + Postgres system refer to + The PostgreSQL User's Guide. + + + + The Rewrite System + + + The query rewrite system is a module between + the parser stage and the planner/optimizer. It processes the tree handed + back by the parser stage (which represents a user query) and if + there is a rule present that has to be applied to the query it + rewrites the tree to an alternate form. + + + + Techniques To Implement Views + + + Now will sketch the algorithm of the query rewrite system. For + better illustration we show how to implement views using rules + as an example. + + + + Let the following rule be given: + + + create rule view_rule + as on select + to test_view + do instead + select s.sname, p.pname + from supplier s, sells se, part p + where s.sno = se.sno and + p.pno = se.pno; + + + + + The given rule will be fired whenever a select + against the relation test_view is detected. Instead of + selecting the tuples from test_view the select statement + given in the action part of the rule is executed. + + + + Let the following user-query against test_view be given: + + + select sname + from test_view + where sname <> 'Smith'; + + + + + Here is a list of the steps performed by the query rewrite + system whenever a user-query against test_view appears. (The + following listing is a very informal description of the algorithm just + intended for basic understanding. For a detailed description refer + to ). + + + + <literal>test_view</literal> Rewrite + + + Take the query given in the action part of the rule. + + + + + + Adapt the targetlist to meet the number and order of + attributes given in the user-query. + + + + + + Add the qualification given in the where clause of the + user-query to the qualification of the query given in the + action part of the rule. + + + + + + Given the rule definition above, the user-query will be + rewritten to the following form (Note that the rewriting is done on + the internal representation of the user-query handed back by the + parser stage but the derived new data structure will represent the following + query): + + + select s.sname + from supplier s, sells se, part p + where s.sno = se.sno and + p.pno = se.pno and + s.sname <> 'Smith'; + + + + + + + Planner/Optimizer + + + The task of the planner/optimizer is to create an optimal + execution plan. It first combines all possible ways of + scanning and joining + the relations that appear in a + query. All the created paths lead to the same result and it's the + task of the optimizer to estimate the cost of executing each path and + find out which one is the cheapest. + + + + Generating Possible Plans + + + The planner/optimizer decides which plans should be generated + based upon the types of indices defined on the relations appearing in + a query. There is always the possibility of performing a + sequential scan on a relation, so a plan using only + sequential scans is always created. Assume an index is defined on a + relation (for example a B-tree index) and a query contains the + restriction + relation.attribute OPR constant. If + relation.attribute happens to match the key of the B-tree + index and OPR is anything but '<>' another plan is created using + the B-tree index to scan the relation. If there are further indices + present and the restrictions in the query happen to match a key of an + index further plans will be considered. + + + + After all feasible plans have been found for scanning single + relations, plans for joining relations are created. The + planner/optimizer considers only joins between every two relations for + which there exists a corresponding join clause (i.e. for which a + restriction like where rel1.attr1=rel2.attr2 exists) in the + where qualification. All possible plans are generated for every + join pair considered by the planner/optimizer. The three possible join + strategies are: + + + + + nested iteration join: The right relation is scanned + once for every tuple found in the left relation. This strategy + is easy to implement but can be very time consuming. + + + + + + merge sort join: Each relation is sorted on the join + attributes before the join starts. Then the two relations are + merged together taking into account that both relations are + ordered on the join attributes. This kind of join is more + attractive because every relation has to be scanned only once. + + + + + + hash join: the right relation is first hashed on its + join attributes. Next the left relation is scanned and the + appropriate values of every tuple found are used as hash keys to + locate the tuples in the right relation. + + + + + + + + Data Structure of the Plan + + + Here we will give a little description of the nodes appearing in the + plan. Figure \ref{plan} shows the plan produced for the query in + example \ref{simple_select}. + + + + The top node of the plan is a MergeJoin node which has two + successors, one attached to the field lefttree and the second + attached to the field righttree. Each of the subnodes represents + one relation of the join. As mentioned above a merge sort + join requires each relation to be sorted. That's why we find + a Sort node in each subplan. The additional qualification given + in the query (s.sno > 2) is pushed down as far as possible and is + attached to the qpqual field of the leaf SeqScan node of + the corresponding subplan. + + + + The list attached to the field mergeclauses of the + MergeJoin node contains information about the join attributes. + The values 65000 and 65001 + for the varno fields in the + VAR nodes appearing in the mergeclauses list (and also in the + targetlist) mean that not the tuples of the current node should be + considered but the tuples of the next "deeper" nodes (i.e. the top + nodes of the subplans) should be used instead. + + + + Note that every Sort and SeqScan node appearing in figure + \ref{plan} has got a targetlist but because there was not enough space + only the one for the MergeJoin node could be drawn. + + + + Another task performed by the planner/optimizer is fixing the + operator ids in the Expr + and Oper nodes. As + mentioned earlier, Postgres supports a variety of different data + types and even user defined types can be used. To be able to maintain + the huge amount of functions and operators it is necessary to store + them in a system table. Each function and operator gets a unique + operator id. According to the types of the attributes used + within the qualifications etc., the appropriate operator ids + have to be used. + + + + + + Executor + + + The executor takes the plan handed back by the + planner/optimizer and starts processing the top node. In the case of + our example (the query given in example \ref{simple_select}) the top + node is a MergeJoin node. + + + + Before any merge can be done two tuples have to be fetched (one from + each subplan). So the executor recursively calls itself to + process the subplans (it starts with the subplan attached to + lefttree). The new top node (the top node of the left subplan) is a + SeqScan node and again a tuple has to be fetched before the node + itself can be processed. The executor calls itself recursively + another time for the subplan attached to lefttree of the + SeqScan node. + + + + Now the new top node is a Sort node. As a sort has to be done on + the whole relation, the executor starts fetching tuples + from the Sort node's subplan and sorts them into a temporary + relation (in memory or a file) when the Sort node is visited for + the first time. (Further examinations of the Sort node will + always return just one tuple from the sorted temporary + relation.) + + + + Every time the processing of the Sort node needs a new tuple the + executor is recursively called for the SeqScan node + attached as subplan. The relation (internally referenced by the + value given in the scanrelid field) is scanned for the next + tuple. If the tuple satisfies the qualification given by the tree + attached to qpqual it is handed back, otherwise the next tuple + is fetched until the qualification is satisfied. If the last tuple of + the relation has been processed a NULL pointer is + returned. + + + + After a tuple has been handed back by the lefttree of the + MergeJoin the righttree is processed in the same way. If both + tuples are present the executor processes the MergeJoin + node. Whenever a new tuple from one of the subplans is needed a + recursive call to the executor is performed to obtain it. If a + joined tuple could be created it is handed back and one complete + processing of the plan tree has finished. + + + + Now the described steps are performed once for every tuple, until a + NULL pointer is returned for the processing of the + MergeJoin node, indicating that we are finished. + + + + + + + + + -- 2.40.0