-<!-- $PostgreSQL: pgsql/doc/src/sgml/arch-dev.sgml,v 2.29 2007/01/31 20:56:16 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/arch-dev.sgml,v 2.30 2007/07/21 04:02:41 tgl Exp $ -->
<chapter id="overview">
<title>Overview of PostgreSQL Internals</title>
can be executed would take an excessive amount of time and memory
space. In particular, this occurs when executing queries
involving large numbers of join operations. In order to determine
- a reasonable (not optimal) query plan in a reasonable amount of
- time, <productname>PostgreSQL</productname> uses a <xref
- linkend="geqo" endterm="geqo-title">.
+ a reasonable (not necessarily optimal) query plan in a reasonable amount
+ of time, <productname>PostgreSQL</productname> uses a <xref
+ linkend="geqo" endterm="geqo-title"> when the number of joins
+ exceeds a threshold (see <xref linkend="guc-geqo-threshold">).
</para>
</note>
the index's <firstterm>operator class</>, another plan is created using
the B-tree index to scan the relation. If there are further indexes
present and the restrictions in the query happen to match a key of an
- index further plans will be considered.
+ index, further plans will be considered. Index scan plans are also
+ generated for indexes that have a sort ordering that can match the
+ query's <literal>ORDER BY</> clause (if any), or a sort ordering that
+ might be useful for merge joining (see below).
</para>
<para>
- After all feasible plans have been found for scanning single relations,
- plans for joining relations are created. The planner/optimizer
- preferentially considers joins between any two relations for which there
- exist a corresponding join clause in the <literal>WHERE</literal> qualification (i.e. for
- which a restriction like <literal>where rel1.attr1=rel2.attr2</literal>
- exists). Join pairs with no join clause are considered only when there
- is no other choice, that is, a particular relation has no available
- join clauses to any other relation. All possible plans are generated for
- every join pair considered
- by the planner/optimizer. The three possible join strategies are:
+ If the query requires joining two or more relations,
+ plans for joining relations are considered
+ after all feasible plans have been found for scanning single relations.
+ The three available join strategies are:
<itemizedlist>
<listitem>
cheapest one.
</para>
+ <para>
+ If the query uses fewer than <xref linkend="guc-geqo-threshold">
+ relations, a near-exhaustive search is conducted to find the best
+ join sequence. The planner preferentially considers joins between any
+ two relations for which there exist a corresponding join clause in the
+ <literal>WHERE</literal> qualification (i.e. for
+ which a restriction like <literal>where rel1.attr1=rel2.attr2</literal>
+ exists). Join pairs with no join clause are considered only when there
+ is no other choice, that is, a particular relation has no available
+ join clauses to any other relation. All possible plans are generated for
+ every join pair considered by the planner, and the one that is
+ (estimated to be) the cheapest is chosen.
+ </para>
+
+ <para>
+ When <varname>geqo_threshold</varname> is exceeded, the join
+ sequences considered are determined by heuristics, as described
+ in <xref linkend="geqo">. Otherwise the process is the same.
+ </para>
+
<para>
The finished plan tree consists of sequential or index scans of
the base relations, plus nested-loop, merge, or hash join nodes as
-<!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.39 2007/02/16 03:50:29 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.40 2007/07/21 04:02:41 tgl Exp $ -->
<chapter id="geqo">
<chapterinfo>
<productname>PostgreSQL</productname> optimizer.
</para>
- <para>
- Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
- algorithm.
- </para>
-
<para>
Specific characteristics of the <acronym>GEQO</acronym>
implementation in <productname>PostgreSQL</productname>
</itemizedlist>
</para>
+ <para>
+ Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's
+ Genitor algorithm.
+ </para>
+
<para>
The <acronym>GEQO</acronym> module allows
the <productname>PostgreSQL</productname> query optimizer to
non-exhaustive search.
</para>
+ <sect2>
+ <title>Generating Possible Plans with <acronym>GEQO</acronym></title>
+
+ <para>
+ The <acronym>GEQO</acronym> planning process uses the standard planner
+ code to generate plans for scans of individual relations. Then join
+ plans are developed using the genetic approach. As shown above, each
+ candidate join plan is represented by a sequence in which to join
+ the base relations. In the initial stage, the <acronym>GEQO</acronym>
+ code simply generates some possible join sequences at random. For each
+ join sequence considered, the standard planner code is invoked to
+ estimate the cost of performing the query using that join sequence.
+ (For each step of the join sequence, all three possible join strategies
+ are considered; and all the initially-determined relation scan plans
+ are available. The estimated cost is the cheapest of these
+ possibilities.) Join sequences with lower estimated cost are considered
+ <quote>more fit</> than those with higher cost. The genetic algorithm
+ discards the least fit candidates. Then new candidates are generated
+ by combining genes of more-fit candidates — that is, by using
+ randomly-chosen portions of known low-cost join sequences to create
+ new sequences for consideration. This process is repeated until a
+ preset number of join sequences have been considered; then the best
+ one found at any time during the search is used to generate the finished
+ plan.
+ </para>
+
+ <para>
+ This process is inherently nondeterministic, because of the randomized
+ choices made during both the initial population selection and subsequent
+ <quote>mutation</> of the best candidates. Hence different plans may
+ be selected from one run to the next, resulting in varying run time
+ and varying output row order.
+ </para>
+
+ </sect2>
+
<sect2 id="geqo-future">
<title>Future Implementation Tasks for
<productname>PostgreSQL</> <acronym>GEQO</acronym></title>
</itemizedlist>
</para>
+ <para>
+ In the current implementation, the fitness of each candidate join
+ sequence is estimated by running the standard planner's join selection
+ and cost estimation code from scratch. To the extent that different
+ candidates use similar sub-sequences of joins, a great deal of work
+ will be repeated. This could be made significantly faster by retaining
+ cost estimates for sub-joins. The problem is to avoid expending
+ unreasonable amounts of memory on retaining that state.
+ </para>
+
<para>
At a more basic level, it is not clear that solving query optimization
with a GA algorithm designed for TSP is appropriate. In the TSP case,