2 $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.35 2006/01/22 03:56:58 neilc Exp $
9 <firstname>Martin</firstname>
10 <surname>Utesch</surname>
13 University of Mining and Technology
16 Institute of Automatic Control
28 <date>1997-10-02</date>
31 <title id="geqo-title">Genetic Query Optimizer</title>
37 Written by Martin Utesch (<email>utesch@aut.tu-freiberg.de</email>)
38 for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
43 <sect1 id="geqo-intro">
44 <title>Query Handling as a Complex Optimization Problem</title>
47 Among all relational operators the most difficult one to process
48 and optimize is the <firstterm>join</firstterm>. The number of
49 possible query plans grows exponentially with the
50 number of joins in the query. Further optimization effort is
51 caused by the support of a variety of <firstterm>join
52 methods</firstterm> (e.g., nested loop, hash join, merge join in
53 <productname>PostgreSQL</productname>) to process individual joins
54 and a diversity of <firstterm>indexes</firstterm> (e.g.,
55 B-tree, hash, GiST in <productname>PostgreSQL</productname>) as access
60 The normal <productname>PostgreSQL</productname> query optimizer
61 performs a <firstterm>near-exhaustive search</firstterm> over the
62 space of alternative strategies. This algorithm, first introduced
63 in IBM's System R database, produces a near-optimal join order,
64 but can take an enormous amount of time and memory space when the
65 number of joins in the query grows large. This makes the ordinary
66 <productname>PostgreSQL</productname> query optimizer
67 inappropriate for queries that join a large number of tables.
71 The Institute of Automatic Control at the University of Mining and
72 Technology, in Freiberg, Germany, encountered some problems when
73 it wanted to use <productname>PostgreSQL</productname> as the
74 backend for a decision support knowledge based system for the
75 maintenance of an electrical power grid. The DBMS needed to handle
76 large join queries for the inference machine of the knowledge
77 based system. The number of joins in these queries made using the
78 normal query optimizer infeasible.
82 In the following we describe the implementation of a
83 <firstterm>genetic algorithm</firstterm> to solve the join
84 ordering problem in a manner that is efficient for queries
85 involving large numbers of joins.
89 <sect1 id="geqo-intro2">
90 <title>Genetic Algorithms</title>
93 The genetic algorithm (<acronym>GA</acronym>) is a heuristic optimization method which
95 nondeterministic, randomized search. The set of possible solutions for the
96 optimization problem is considered as a
97 <firstterm>population</firstterm> of <firstterm>individuals</firstterm>.
98 The degree of adaptation of an individual to its environment is specified
99 by its <firstterm>fitness</firstterm>.
103 The coordinates of an individual in the search space are represented
104 by <firstterm>chromosomes</firstterm>, in essence a set of character
105 strings. A <firstterm>gene</firstterm> is a
106 subsection of a chromosome which encodes the value of a single parameter
107 being optimized. Typical encodings for a gene could be <firstterm>binary</firstterm> or
108 <firstterm>integer</firstterm>.
112 Through simulation of the evolutionary operations <firstterm>recombination</firstterm>,
113 <firstterm>mutation</firstterm>, and
114 <firstterm>selection</firstterm> new generations of search points are found
115 that show a higher average fitness than their ancestors.
119 According to the <systemitem class="resource">comp.ai.genetic</> <acronym>FAQ</acronym> it cannot be stressed too
120 strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a
121 problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly
122 non-random (better than random).
125 <figure id="geqo-diagram">
126 <title>Structured Diagram of a Genetic Algorithm</title>
128 <informaltable frame="none">
133 <entry>generation of ancestors at a time t</entry>
137 <entry>P''(t)</entry>
138 <entry>generation of descendants at a time t</entry>
144 <literallayout class="monospaced">
145 +=========================================+
146 |>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<|
147 +=========================================+
148 | INITIALIZE t := 0 |
149 +=========================================+
151 +=========================================+
152 | evaluate FITNESS of P(t) |
153 +=========================================+
154 | while not STOPPING CRITERION do |
155 | +-------------------------------------+
156 | | P'(t) := RECOMBINATION{P(t)} |
157 | +-------------------------------------+
158 | | P''(t) := MUTATION{P'(t)} |
159 | +-------------------------------------+
160 | | P(t+1) := SELECTION{P''(t) + P(t)} |
161 | +-------------------------------------+
162 | | evaluate FITNESS of P''(t) |
163 | +-------------------------------------+
165 +===+=====================================+
170 <sect1 id="geqo-pg-intro">
171 <title>Genetic Query Optimization (<acronym>GEQO</acronym>) in PostgreSQL</title>
174 The <acronym>GEQO</acronym> module approaches the query
175 optimization problem as though it were the well-known traveling salesman
176 problem (<acronym>TSP</acronym>).
177 Possible query plans are encoded as integer strings. Each string
178 represents the join order from one relation of the query to the next.
179 For example, the join tree
180 <literallayout class="monospaced">
186 is encoded by the integer string '4-1-3-2',
187 which means, first join relation '4' and '1', then '3', and
188 then '2', where 1, 2, 3, 4 are relation IDs within the
189 <productname>PostgreSQL</productname> optimizer.
193 Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
198 Specific characteristics of the <acronym>GEQO</acronym>
199 implementation in <productname>PostgreSQL</productname>
202 <itemizedlist spacing="compact" mark="bullet">
205 Usage of a <firstterm>steady state</firstterm> <acronym>GA</acronym> (replacement of the least fit
206 individuals in a population, not whole-generational replacement)
207 allows fast convergence towards improved query plans. This is
208 essential for query handling with reasonable time;
214 Usage of <firstterm>edge recombination crossover</firstterm>
215 which is especially suited to keep edge losses low for the
216 solution of the <acronym>TSP</acronym> by means of a
217 <acronym>GA</acronym>;
223 Mutation as genetic operator is deprecated so that no repair
224 mechanisms are needed to generate legal <acronym>TSP</acronym> tours.
231 The <acronym>GEQO</acronym> module allows
232 the <productname>PostgreSQL</productname> query optimizer to
233 support large join queries effectively through
234 non-exhaustive search.
237 <sect2 id="geqo-future">
238 <title>Future Implementation Tasks for
239 <productname>PostgreSQL</> <acronym>GEQO</acronym></title>
242 Work is still needed to improve the genetic algorithm parameter
244 In file <filename>src/backend/optimizer/geqo/geqo_main.c</filename>,
246 <function>gimme_pool_size</function> and <function>gimme_number_generations</function>,
247 we have to find a compromise for the parameter settings
248 to satisfy two competing demands:
249 <itemizedlist spacing="compact">
252 Optimality of the query plan
264 At a more basic level, it is not clear that solving query optimization
265 with a GA algorithm designed for TSP is appropriate. In the TSP case,
266 the cost associated with any substring (partial tour) is independent
267 of the rest of the tour, but this is certainly not true for query
268 optimization. Thus it is questionable whether edge recombination
269 crossover is the most effective mutation procedure.
275 <sect1 id="geqo-biblio">
276 <title>Further Reading</title>
279 The following resources contain additional information about
285 <ulink url="http://www.cs.bham.ac.uk/Mirrors/ftp.de.uu.net/EC/clife/www/location.htm">
286 The Hitch-Hiker's Guide to Evolutionary Computation</ulink>, (FAQ for <ulink
287 url="news://comp.ai.genetic"></ulink>)
293 <ulink url="http://www.red3d.com/cwr/evolve.html">
294 Evolutionary Computation and its application to art and design</ulink>, by
301 <xref linkend="ELMA04">
307 <xref linkend="FONG">
316 <!-- Keep this comment at the end of the file
321 sgml-minimize-attributes:nil
322 sgml-always-quote-attributes:t
325 sgml-parent-document:nil
326 sgml-default-dtd-file:"./reference.ced"
327 sgml-exposed-tags:nil
328 sgml-local-catalogs:("/usr/lib/sgml/catalog")
329 sgml-local-ecat-files:nil