2 $Header: /cvsroot/pgsql/doc/src/sgml/geqo.sgml,v 1.22 2002/01/08 18:03:26 petere Exp $
9 <firstname>Martin</firstname>
10 <surname>Utesch</surname>
13 University of Mining and Technology
16 Institute of Automatic Control
28 <date>1997-10-02</date>
31 <title>Genetic Query Optimization</title>
37 Written by Martin Utesch (<email>utesch@aut.tu-freiberg.de</email>)
38 for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
43 <sect1 id="geqo-intro">
44 <title>Query Handling as a Complex Optimization Problem</title>
47 Among all relational operators the most difficult one to process and
48 optimize is the <firstterm>join</firstterm>. The number of alternative plans to answer a query
49 grows exponentially with the number of joins included in it. Further
50 optimization effort is caused by the support of a variety of
51 <firstterm>join methods</firstterm>
52 (e.g., nested loop, hash join, merge join in <productname>PostgreSQL</productname>) to
53 process individual joins and a diversity of
54 <firstterm>indexes</firstterm> (e.g., R-tree,
55 B-tree, hash in <productname>PostgreSQL</productname>) as access paths for relations.
59 The current <productname>PostgreSQL</productname> optimizer
60 implementation performs a <firstterm>near-exhaustive search</firstterm>
61 over the space of alternative strategies. This query
62 optimization technique is inadequate to support database application
63 domains that involve the need for extensive queries, such as artificial
68 The Institute of Automatic Control at the University of Mining and
69 Technology, in Freiberg, Germany, encountered the described problems as its
70 folks wanted to take the <productname>PostgreSQL</productname> DBMS as the backend for a decision
71 support knowledge based system for the maintenance of an electrical
72 power grid. The DBMS needed to handle large join queries for the
73 inference machine of the knowledge based system.
77 Performance difficulties in exploring the space of possible query
78 plans created the demand for a new optimization technique being developed.
82 In the following we propose the implementation of a <firstterm>Genetic Algorithm</firstterm>
83 as an option for the database query optimization problem.
87 <sect1 id="geqo-intro2">
88 <title>Genetic Algorithms</title>
91 The genetic algorithm (<acronym>GA</acronym>) is a heuristic optimization method which
93 determined, randomized search. The set of possible solutions for the
94 optimization problem is considered as a
95 <firstterm>population</firstterm> of <firstterm>individuals</firstterm>.
96 The degree of adaption of an individual to its environment is specified
97 by its <firstterm>fitness</firstterm>.
101 The coordinates of an individual in the search space are represented
102 by <firstterm>chromosomes</firstterm>, in essence a set of character
103 strings. A <firstterm>gene</firstterm> is a
104 subsection of a chromosome which encodes the value of a single parameter
105 being optimized. Typical encodings for a gene could be <firstterm>binary</firstterm> or
106 <firstterm>integer</firstterm>.
110 Through simulation of the evolutionary operations <firstterm>recombination</firstterm>,
111 <firstterm>mutation</firstterm>, and
112 <firstterm>selection</firstterm> new generations of search points are found
113 that show a higher average fitness than their ancestors.
117 According to the <systemitem class="resource">comp.ai.genetic</> <acronym>FAQ</acronym> it cannot be stressed too
118 strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a
119 problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly
120 non-random (better than random).
123 <figure id="geqo-diagram">
124 <title>Structured Diagram of a Genetic Algorithm</title>
126 <informaltable frame="none">
131 <entry>generation of ancestors at a time t</entry>
135 <entry>P''(t)</entry>
136 <entry>generation of descendants at a time t</entry>
142 <literallayout class="monospaced">
143 +=========================================+
144 |>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<|
145 +=========================================+
146 | INITIALIZE t := 0 |
147 +=========================================+
149 +=========================================+
150 | evalute FITNESS of P(t) |
151 +=========================================+
152 | while not STOPPING CRITERION do |
153 | +-------------------------------------+
154 | | P'(t) := RECOMBINATION{P(t)} |
155 | +-------------------------------------+
156 | | P''(t) := MUTATION{P'(t)} |
157 | +-------------------------------------+
158 | | P(t+1) := SELECTION{P''(t) + P(t)} |
159 | +-------------------------------------+
160 | | evalute FITNESS of P''(t) |
161 | +-------------------------------------+
163 +===+=====================================+
168 <sect1 id="geqo-pg-intro">
169 <title>Genetic Query Optimization (<acronym>GEQO</acronym>) in PostgreSQL</title>
172 The <acronym>GEQO</acronym> module is intended for the solution of the query
173 optimization problem similar to a traveling salesman problem (<acronym>TSP</acronym>).
174 Possible query plans are encoded as integer strings. Each string
175 represents the join order from one relation of the query to the next.
176 E. g., the query tree
177 <literallayout class="monospaced">
183 is encoded by the integer string '4-1-3-2',
184 which means, first join relation '4' and '1', then '3', and
185 then '2', where 1, 2, 3, 4 are relids within the
186 <productname>PostgreSQL</productname> optimizer.
190 Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
195 Specific characteristics of the <acronym>GEQO</acronym>
196 implementation in <productname>PostgreSQL</productname>
199 <itemizedlist spacing="compact" mark="bullet">
202 Usage of a <firstterm>steady state</firstterm> <acronym>GA</acronym> (replacement of the least fit
203 individuals in a population, not whole-generational replacement)
204 allows fast convergence towards improved query plans. This is
205 essential for query handling with reasonable time;
211 Usage of <firstterm>edge recombination crossover</firstterm> which is
213 to keep edge losses low for the solution of the
214 <acronym>TSP</acronym> by means of a <acronym>GA</acronym>;
220 Mutation as genetic operator is deprecated so that no repair
221 mechanisms are needed to generate legal <acronym>TSP</acronym> tours.
228 The <acronym>GEQO</acronym> module allows
229 the <productname>PostgreSQL</productname> query optimizer to
230 support large join queries effectively through
231 non-exhaustive search.
234 <sect2 id="geqo-future">
235 <title>Future Implementation Tasks for
236 <productname>PostgreSQL</> <acronym>GEQO</acronym></title>
239 Work is still needed to improve the genetic algorithm parameter
241 In file <filename>backend/optimizer/geqo/geqo_params.c</filename>, routines
242 <function>gimme_pool_size</function> and <function>gimme_number_generations</function>,
243 we have to find a compromise for the parameter settings
244 to satisfy two competing demands:
245 <itemizedlist spacing="compact">
248 Optimality of the query plan
262 <sect1 id="geqo-biblio">
263 <title>Further Readings</title>
266 The following resources contain additional information about
272 <ulink url="http://surf.de.uu.net/encore/www/">The Hitch-Hiker's
273 Guide to Evolutionary Computation</ulink> (FAQ for <ulink
274 url="news://comp.ai.genetic">comp.ai.genetic</ulink>)
280 <ulink url="http://www.red3d.com/cwr/evolve.html">Evolutionary
281 Computation and its application to art and design</ulink> by
288 <xref linkend="ELMA99">
294 <xref linkend="FONG">
303 <!-- Keep this comment at the end of the file
308 sgml-minimize-attributes:nil
309 sgml-always-quote-attributes:t
312 sgml-parent-document:nil
313 sgml-default-dtd-file:"./reference.ced"
314 sgml-exposed-tags:nil
315 sgml-local-catalogs:("/usr/lib/sgml/catalog")
316 sgml-local-ecat-files:nil