granicus.if.org Git - postgresql/blob - doc/src/sgml/geqo.sgml

   1 <!--
   2 $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.32 2005/04/12 03:16:50 tgl Exp $
   3 Genetic Optimizer
   4 -->
   5
   6  <chapter id="geqo">
   7   <chapterinfo>
   8    <author>
   9     <firstname>Martin</firstname>
  10     <surname>Utesch</surname>
  11     <affiliation>
  12      <orgname>
  13       University of Mining and Technology
  14      </orgname>
  15      <orgdiv>
  16       Institute of Automatic Control
  17      </orgdiv>
  18      <address>
  19       <city>
  20        Freiberg
  21       </city>
  22       <country>
  23        Germany
  24       </country>
  25      </address>
  26     </affiliation>
  27    </author>
  28    <date>1997-10-02</date>
  29   </chapterinfo>
  30
  31   <title id="geqo-title">Genetic Query Optimizer</title>
  32
  33   <para>
  34    <note>
  35     <title>Author</title>
  36     <para>
  37      Written by Martin Utesch (<email>utesch@aut.tu-freiberg.de</email>)
  38      for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
  39     </para>
  40    </note>
  41   </para>
  42
  43   <sect1 id="geqo-intro">
  44    <title>Query Handling as a Complex Optimization Problem</title>
  45
  46    <para>
  47     Among all relational operators the most difficult one to process
  48     and optimize is the <firstterm>join</firstterm>. The number of
  49     alternative plans to answer a query grows exponentially with the
  50     number of joins included in it. Further optimization effort is
  51     caused by the support of a variety of <firstterm>join
  52     methods</firstterm> (e.g., nested loop, hash join, merge join in
  53     <productname>PostgreSQL</productname>) to process individual joins
  54     and a diversity of <firstterm>indexes</firstterm> (e.g., R-tree,
  55     B-tree, hash in <productname>PostgreSQL</productname>) as access
  56     paths for relations.
  57    </para>
  58
  59    <para>
  60     The current <productname>PostgreSQL</productname> optimizer
  61     implementation performs a <firstterm>near-exhaustive
  62     search</firstterm> over the space of alternative strategies. This
  63     algorithm, first introduced in the <quote>System R</quote>
  64     database, produces a near-optimal join order, but can take an
  65     enormous amount of time and memory space when the number of joins
  66     in the query grows large. This makes the ordinary
  67     <productname>PostgreSQL</productname> query optimizer
  68     inappropriate for queries that join a large number of tables.
  69    </para>
  70
  71    <para>
  72     The Institute of Automatic Control at the University of Mining and
  73     Technology, in Freiberg, Germany, encountered the described problems as its
  74     folks wanted to take the <productname>PostgreSQL</productname> DBMS as the backend for a decision
  75     support knowledge based system for the maintenance of an electrical
  76     power grid. The DBMS needed to handle large join queries for the
  77     inference machine of the knowledge based system.
  78    </para>
  79
  80    <para>
  81     Performance difficulties in exploring the space of possible query
  82     plans created the demand for a new optimization technique to be developed.
  83    </para>
  84
  85    <para>
  86     In the following we describe the implementation of a
  87     <firstterm>Genetic Algorithm</firstterm> to solve the join
  88     ordering problem in a manner that is efficient for queries
  89     involving large numbers of joins.
  90    </para>
  91   </sect1>
  92
  93   <sect1 id="geqo-intro2">
  94    <title>Genetic Algorithms</title>
  95
  96    <para>
  97     The genetic algorithm (<acronym>GA</acronym>) is a heuristic optimization method which
  98     operates through
  99     nondeterministic, randomized search. The set of possible solutions for the
 100     optimization problem is considered as a
 101     <firstterm>population</firstterm> of <firstterm>individuals</firstterm>.
 102     The degree of adaptation of an individual to its environment is specified
 103     by its <firstterm>fitness</firstterm>.
 104    </para>
 105
 106    <para>
 107     The coordinates of an individual in the search space are represented
 108     by <firstterm>chromosomes</firstterm>, in essence a set of character
 109     strings. A <firstterm>gene</firstterm> is a
 110     subsection of a chromosome which encodes the value of a single parameter
 111     being optimized. Typical encodings for a gene could be <firstterm>binary</firstterm> or
 112     <firstterm>integer</firstterm>.
 113    </para>
 114
 115    <para>
 116     Through simulation of the evolutionary operations <firstterm>recombination</firstterm>,
 117     <firstterm>mutation</firstterm>, and
 118     <firstterm>selection</firstterm> new generations of search points are found
 119     that show a higher average fitness than their ancestors.
 120    </para>
 121
 122    <para>
 123     According to the <systemitem class="resource">comp.ai.genetic</> <acronym>FAQ</acronym> it cannot be stressed too
 124     strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a
 125     problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly
 126     non-random (better than random).
 127    </para>
 128
 129    <figure id="geqo-diagram">
 130     <title>Structured Diagram of a Genetic Algorithm</title>
 131
 132     <informaltable frame="none">
 133      <tgroup cols="2">
 134       <tbody>
 135        <row>
 136         <entry>P(t)</entry>
 137         <entry>generation of ancestors at a time t</entry>
 138        </row>
 139
 140        <row>
 141         <entry>P''(t)</entry>
 142         <entry>generation of descendants at a time t</entry>
 143        </row>
 144       </tbody>
 145      </tgroup>
 146     </informaltable>
 147
 148 <literallayout class="monospaced">
 149 +=========================================+
 150 |&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;  Algorithm GA  &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;|
 151 +=========================================+
 152 | INITIALIZE t := 0                       |
 153 +=========================================+
 154 | INITIALIZE P(t)                         |
 155 +=========================================+
 156 | evaluate FITNESS of P(t)                |
 157 +=========================================+
 158 | while not STOPPING CRITERION do         |
 159 |   +-------------------------------------+
 160 |   | P'(t)  := RECOMBINATION{P(t)}       |
 161 |   +-------------------------------------+
 162 |   | P''(t) := MUTATION{P'(t)}           |
 163 |   +-------------------------------------+
 164 |   | P(t+1) := SELECTION{P''(t) + P(t)}  |
 165 |   +-------------------------------------+
 166 |   | evaluate FITNESS of P''(t)          |
 167 |   +-------------------------------------+
 168 |   | t := t + 1                          |
 169 +===+=====================================+
 170 </literallayout>
 171    </figure>
 172   </sect1>
 173
 174   <sect1 id="geqo-pg-intro">
 175    <title>Genetic Query Optimization (<acronym>GEQO</acronym>) in PostgreSQL</title>
 176
 177    <para>
 178     The <acronym>GEQO</acronym> module approaches the query
 179     optimization problem as though it were the well-known traveling salesman
 180     problem (<acronym>TSP</acronym>).
 181     Possible query plans are encoded as integer strings. Each string
 182     represents the join order from one relation of the query to the next.
 183     For example, the join tree
 184 <literallayout class="monospaced">
 185    /\
 186   /\ 2
 187  /\ 3
 188 4  1
 189 </literallayout>
 190     is encoded by the integer string '4-1-3-2',
 191     which means, first join relation '4' and '1', then '3', and
 192     then '2', where 1, 2, 3, 4 are relation IDs within the
 193     <productname>PostgreSQL</productname> optimizer.
 194    </para>
 195
 196    <para>
 197     Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
 198     algorithm.
 199    </para>
 200
 201    <para>
 202     Specific characteristics of the <acronym>GEQO</acronym>
 203     implementation in <productname>PostgreSQL</productname>
 204     are:
 205
 206     <itemizedlist spacing="compact" mark="bullet">
 207      <listitem>
 208       <para>
 209        Usage of a <firstterm>steady state</firstterm> <acronym>GA</acronym> (replacement of the least fit
 210        individuals in a population, not whole-generational replacement)
 211        allows fast convergence towards improved query plans. This is
 212        essential for query handling with reasonable time;
 213       </para>
 214      </listitem>
 215
 216      <listitem>
 217       <para>
 218        Usage of <firstterm>edge recombination crossover</firstterm>
 219        which is especially suited to keep edge losses low for the
 220        solution of the <acronym>TSP</acronym> by means of a
 221        <acronym>GA</acronym>;
 222       </para>
 223      </listitem>
 224
 225      <listitem>
 226       <para>
 227        Mutation as genetic operator is deprecated so that no repair
 228        mechanisms are needed to generate legal <acronym>TSP</acronym> tours.
 229       </para>
 230      </listitem>
 231     </itemizedlist>
 232    </para>
 233
 234    <para>
 235     The <acronym>GEQO</acronym> module allows
 236     the <productname>PostgreSQL</productname> query optimizer to
 237     support large join queries effectively through
 238     non-exhaustive search.
 239    </para>
 240
 241   <sect2 id="geqo-future">
 242    <title>Future Implementation Tasks for
 243     <productname>PostgreSQL</> <acronym>GEQO</acronym></title>
 244
 245      <para>
 246       Work is still needed to improve the genetic algorithm parameter
 247       settings.
 248       In file <filename>src/backend/optimizer/geqo/geqo_main.c</filename>,
 249       routines
 250       <function>gimme_pool_size</function> and <function>gimme_number_generations</function>,
 251       we have to find a compromise for the parameter settings
 252       to satisfy two competing demands:
 253       <itemizedlist spacing="compact">
 254        <listitem>
 255         <para>
 256          Optimality of the query plan
 257         </para>
 258        </listitem>
 259        <listitem>
 260         <para>
 261          Computing time
 262         </para>
 263        </listitem>
 264       </itemizedlist>
 265      </para>
 266
 267      <para>
 268       At a more basic level, it is not clear that solving query optimization
 269       with a GA algorithm designed for TSP is appropriate.  In the TSP case,
 270       the cost associated with any substring (partial tour) is independent
 271       of the rest of the tour, but this is certainly not true for query
 272       optimization.  Thus it is questionable whether edge recombination
 273       crossover is the most effective mutation procedure.
 274      </para>
 275
 276    </sect2>
 277   </sect1>
 278
 279  <sect1 id="geqo-biblio">
 280   <title>Further Reading</title>
 281
 282   <para>
 283    The following resources contain additional information about
 284    genetic algorithms:
 285
 286    <itemizedlist>
 287     <listitem>
 288      <para>
 289       <ulink url="http://surf.de.uu.net/encore/www/">
 290       The Hitch-Hiker's Guide to Evolutionary Computation</ulink>, (FAQ for <ulink
 291       url="news://comp.ai.genetic"></ulink>)
 292      </para>
 293     </listitem>
 294
 295     <listitem>
 296      <para>
 297       <ulink url="http://www.red3d.com/cwr/evolve.html">
 298       Evolutionary Computation and its application to art and design</ulink>, by
 299       Craig Reynolds
 300      </para>
 301     </listitem>
 302
 303     <listitem>
 304      <para>
 305       <xref linkend="ELMA04">
 306      </para>
 307     </listitem>
 308
 309     <listitem>
 310      <para>
 311       <xref linkend="FONG">
 312      </para>
 313     </listitem>
 314    </itemizedlist>
 315   </para>
 316
 317  </sect1>
 318 </chapter>
 319
 320 <!-- Keep this comment at the end of the file
 321 Local variables:
 322 mode:sgml
 323 sgml-omittag:nil
 324 sgml-shorttag:t
 325 sgml-minimize-attributes:nil
 326 sgml-always-quote-attributes:t
 327 sgml-indent-step:1
 328 sgml-indent-data:t
 329 sgml-parent-document:nil
 330 sgml-default-dtd-file:"./reference.ced"
 331 sgml-exposed-tags:nil
 332 sgml-local-catalogs:("/usr/lib/sgml/catalog")
 333 sgml-local-ecat-files:nil
 334 End:
 335 -->