granicus.if.org Git - postgresql/blob - doc/src/sgml/geqo.sgml

   1 <!--
   2 $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.35 2006/01/22 03:56:58 neilc Exp $
   3 Genetic Optimizer
   4 -->
   5
   6  <chapter id="geqo">
   7   <chapterinfo>
   8    <author>
   9     <firstname>Martin</firstname>
  10     <surname>Utesch</surname>
  11     <affiliation>
  12      <orgname>
  13       University of Mining and Technology
  14      </orgname>
  15      <orgdiv>
  16       Institute of Automatic Control
  17      </orgdiv>
  18      <address>
  19       <city>
  20        Freiberg
  21       </city>
  22       <country>
  23        Germany
  24       </country>
  25      </address>
  26     </affiliation>
  27    </author>
  28    <date>1997-10-02</date>
  29   </chapterinfo>
  30
  31   <title id="geqo-title">Genetic Query Optimizer</title>
  32
  33   <para>
  34    <note>
  35     <title>Author</title>
  36     <para>
  37      Written by Martin Utesch (<email>utesch@aut.tu-freiberg.de</email>)
  38      for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
  39     </para>
  40    </note>
  41   </para>
  42
  43   <sect1 id="geqo-intro">
  44    <title>Query Handling as a Complex Optimization Problem</title>
  45
  46    <para>
  47     Among all relational operators the most difficult one to process
  48     and optimize is the <firstterm>join</firstterm>. The number of
  49     possible query plans grows exponentially with the
  50     number of joins in the query. Further optimization effort is
  51     caused by the support of a variety of <firstterm>join
  52     methods</firstterm> (e.g., nested loop, hash join, merge join in
  53     <productname>PostgreSQL</productname>) to process individual joins
  54     and a diversity of <firstterm>indexes</firstterm> (e.g.,
  55     B-tree, hash, GiST in <productname>PostgreSQL</productname>) as access
  56     paths for relations.
  57    </para>
  58
  59    <para>
  60     The normal <productname>PostgreSQL</productname> query optimizer
  61     performs a <firstterm>near-exhaustive search</firstterm> over the
  62     space of alternative strategies. This algorithm, first introduced
  63     in IBM's System R database, produces a near-optimal join order,
  64     but can take an enormous amount of time and memory space when the
  65     number of joins in the query grows large. This makes the ordinary
  66     <productname>PostgreSQL</productname> query optimizer
  67     inappropriate for queries that join a large number of tables.
  68    </para>
  69
  70    <para>
  71     The Institute of Automatic Control at the University of Mining and
  72     Technology, in Freiberg, Germany, encountered some problems when
  73     it wanted to use <productname>PostgreSQL</productname> as the
  74     backend for a decision support knowledge based system for the
  75     maintenance of an electrical power grid. The DBMS needed to handle
  76     large join queries for the inference machine of the knowledge
  77     based system. The number of joins in these queries made using the
  78     normal query optimizer infeasible.
  79    </para>
  80
  81    <para>
  82     In the following we describe the implementation of a
  83     <firstterm>genetic algorithm</firstterm> to solve the join
  84     ordering problem in a manner that is efficient for queries
  85     involving large numbers of joins.
  86    </para>
  87   </sect1>
  88
  89   <sect1 id="geqo-intro2">
  90    <title>Genetic Algorithms</title>
  91
  92    <para>
  93     The genetic algorithm (<acronym>GA</acronym>) is a heuristic optimization method which
  94     operates through
  95     nondeterministic, randomized search. The set of possible solutions for the
  96     optimization problem is considered as a
  97     <firstterm>population</firstterm> of <firstterm>individuals</firstterm>.
  98     The degree of adaptation of an individual to its environment is specified
  99     by its <firstterm>fitness</firstterm>.
 100    </para>
 101
 102    <para>
 103     The coordinates of an individual in the search space are represented
 104     by <firstterm>chromosomes</firstterm>, in essence a set of character
 105     strings. A <firstterm>gene</firstterm> is a
 106     subsection of a chromosome which encodes the value of a single parameter
 107     being optimized. Typical encodings for a gene could be <firstterm>binary</firstterm> or
 108     <firstterm>integer</firstterm>.
 109    </para>
 110
 111    <para>
 112     Through simulation of the evolutionary operations <firstterm>recombination</firstterm>,
 113     <firstterm>mutation</firstterm>, and
 114     <firstterm>selection</firstterm> new generations of search points are found
 115     that show a higher average fitness than their ancestors.
 116    </para>
 117
 118    <para>
 119     According to the <systemitem class="resource">comp.ai.genetic</> <acronym>FAQ</acronym> it cannot be stressed too
 120     strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a
 121     problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly
 122     non-random (better than random).
 123    </para>
 124
 125    <figure id="geqo-diagram">
 126     <title>Structured Diagram of a Genetic Algorithm</title>
 127
 128     <informaltable frame="none">
 129      <tgroup cols="2">
 130       <tbody>
 131        <row>
 132         <entry>P(t)</entry>
 133         <entry>generation of ancestors at a time t</entry>
 134        </row>
 135
 136        <row>
 137         <entry>P''(t)</entry>
 138         <entry>generation of descendants at a time t</entry>
 139        </row>
 140       </tbody>
 141      </tgroup>
 142     </informaltable>
 143
 144 <literallayout class="monospaced">
 145 +=========================================+
 146 |&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;  Algorithm GA  &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;|
 147 +=========================================+
 148 | INITIALIZE t := 0                       |
 149 +=========================================+
 150 | INITIALIZE P(t)                         |
 151 +=========================================+
 152 | evaluate FITNESS of P(t)                |
 153 +=========================================+
 154 | while not STOPPING CRITERION do         |
 155 |   +-------------------------------------+
 156 |   | P'(t)  := RECOMBINATION{P(t)}       |
 157 |   +-------------------------------------+
 158 |   | P''(t) := MUTATION{P'(t)}           |
 159 |   +-------------------------------------+
 160 |   | P(t+1) := SELECTION{P''(t) + P(t)}  |
 161 |   +-------------------------------------+
 162 |   | evaluate FITNESS of P''(t)          |
 163 |   +-------------------------------------+
 164 |   | t := t + 1                          |
 165 +===+=====================================+
 166 </literallayout>
 167    </figure>
 168   </sect1>
 169
 170   <sect1 id="geqo-pg-intro">
 171    <title>Genetic Query Optimization (<acronym>GEQO</acronym>) in PostgreSQL</title>
 172
 173    <para>
 174     The <acronym>GEQO</acronym> module approaches the query
 175     optimization problem as though it were the well-known traveling salesman
 176     problem (<acronym>TSP</acronym>).
 177     Possible query plans are encoded as integer strings. Each string
 178     represents the join order from one relation of the query to the next.
 179     For example, the join tree
 180 <literallayout class="monospaced">
 181    /\
 182   /\ 2
 183  /\ 3
 184 4  1
 185 </literallayout>
 186     is encoded by the integer string '4-1-3-2',
 187     which means, first join relation '4' and '1', then '3', and
 188     then '2', where 1, 2, 3, 4 are relation IDs within the
 189     <productname>PostgreSQL</productname> optimizer.
 190    </para>
 191
 192    <para>
 193     Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
 194     algorithm.
 195    </para>
 196
 197    <para>
 198     Specific characteristics of the <acronym>GEQO</acronym>
 199     implementation in <productname>PostgreSQL</productname>
 200     are:
 201
 202     <itemizedlist spacing="compact" mark="bullet">
 203      <listitem>
 204       <para>
 205        Usage of a <firstterm>steady state</firstterm> <acronym>GA</acronym> (replacement of the least fit
 206        individuals in a population, not whole-generational replacement)
 207        allows fast convergence towards improved query plans. This is
 208        essential for query handling with reasonable time;
 209       </para>
 210      </listitem>
 211
 212      <listitem>
 213       <para>
 214        Usage of <firstterm>edge recombination crossover</firstterm>
 215        which is especially suited to keep edge losses low for the
 216        solution of the <acronym>TSP</acronym> by means of a
 217        <acronym>GA</acronym>;
 218       </para>
 219      </listitem>
 220
 221      <listitem>
 222       <para>
 223        Mutation as genetic operator is deprecated so that no repair
 224        mechanisms are needed to generate legal <acronym>TSP</acronym> tours.
 225       </para>
 226      </listitem>
 227     </itemizedlist>
 228    </para>
 229
 230    <para>
 231     The <acronym>GEQO</acronym> module allows
 232     the <productname>PostgreSQL</productname> query optimizer to
 233     support large join queries effectively through
 234     non-exhaustive search.
 235    </para>
 236
 237   <sect2 id="geqo-future">
 238    <title>Future Implementation Tasks for
 239     <productname>PostgreSQL</> <acronym>GEQO</acronym></title>
 240
 241      <para>
 242       Work is still needed to improve the genetic algorithm parameter
 243       settings.
 244       In file <filename>src/backend/optimizer/geqo/geqo_main.c</filename>,
 245       routines
 246       <function>gimme_pool_size</function> and <function>gimme_number_generations</function>,
 247       we have to find a compromise for the parameter settings
 248       to satisfy two competing demands:
 249       <itemizedlist spacing="compact">
 250        <listitem>
 251         <para>
 252          Optimality of the query plan
 253         </para>
 254        </listitem>
 255        <listitem>
 256         <para>
 257          Computing time
 258         </para>
 259        </listitem>
 260       </itemizedlist>
 261      </para>
 262
 263      <para>
 264       At a more basic level, it is not clear that solving query optimization
 265       with a GA algorithm designed for TSP is appropriate.  In the TSP case,
 266       the cost associated with any substring (partial tour) is independent
 267       of the rest of the tour, but this is certainly not true for query
 268       optimization.  Thus it is questionable whether edge recombination
 269       crossover is the most effective mutation procedure.
 270      </para>
 271
 272    </sect2>
 273   </sect1>
 274
 275  <sect1 id="geqo-biblio">
 276   <title>Further Reading</title>
 277
 278   <para>
 279    The following resources contain additional information about
 280    genetic algorithms:
 281
 282    <itemizedlist>
 283     <listitem>
 284      <para>
 285       <ulink url="http://www.cs.bham.ac.uk/Mirrors/ftp.de.uu.net/EC/clife/www/location.htm">
 286       The Hitch-Hiker's Guide to Evolutionary Computation</ulink>, (FAQ for <ulink
 287       url="news://comp.ai.genetic"></ulink>)
 288      </para>
 289     </listitem>
 290
 291     <listitem>
 292      <para>
 293       <ulink url="http://www.red3d.com/cwr/evolve.html">
 294       Evolutionary Computation and its application to art and design</ulink>, by
 295       Craig Reynolds
 296      </para>
 297     </listitem>
 298
 299     <listitem>
 300      <para>
 301       <xref linkend="ELMA04">
 302      </para>
 303     </listitem>
 304
 305     <listitem>
 306      <para>
 307       <xref linkend="FONG">
 308      </para>
 309     </listitem>
 310    </itemizedlist>
 311   </para>
 312
 313  </sect1>
 314 </chapter>
 315
 316 <!-- Keep this comment at the end of the file
 317 Local variables:
 318 mode:sgml
 319 sgml-omittag:nil
 320 sgml-shorttag:t
 321 sgml-minimize-attributes:nil
 322 sgml-always-quote-attributes:t
 323 sgml-indent-step:1
 324 sgml-indent-data:t
 325 sgml-parent-document:nil
 326 sgml-default-dtd-file:"./reference.ced"
 327 sgml-exposed-tags:nil
 328 sgml-local-catalogs:("/usr/lib/sgml/catalog")
 329 sgml-local-ecat-files:nil
 330 End:
 331 -->