1 <!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.38 2006/09/16 00:30:14 momjian Exp $ -->
6 <firstname>Martin</firstname>
7 <surname>Utesch</surname>
10 University of Mining and Technology
13 Institute of Automatic Control
25 <date>1997-10-02</date>
28 <title id="geqo-title">Genetic Query Optimizer</title>
34 Written by Martin Utesch (<email>utesch@aut.tu-freiberg.de</email>)
35 for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
40 <sect1 id="geqo-intro">
41 <title>Query Handling as a Complex Optimization Problem</title>
44 Among all relational operators the most difficult one to process
45 and optimize is the <firstterm>join</firstterm>. The number of
46 possible query plans grows exponentially with the
47 number of joins in the query. Further optimization effort is
48 caused by the support of a variety of <firstterm>join
49 methods</firstterm> (e.g., nested loop, hash join, merge join in
50 <productname>PostgreSQL</productname>) to process individual joins
51 and a diversity of <firstterm>indexes</firstterm> (e.g.,
52 B-tree, hash, GiST and GIN in <productname>PostgreSQL</productname>) as
53 access paths for relations.
57 The normal <productname>PostgreSQL</productname> query optimizer
58 performs a <firstterm>near-exhaustive search</firstterm> over the
59 space of alternative strategies. This algorithm, first introduced
60 in IBM's System R database, produces a near-optimal join order,
61 but can take an enormous amount of time and memory space when the
62 number of joins in the query grows large. This makes the ordinary
63 <productname>PostgreSQL</productname> query optimizer
64 inappropriate for queries that join a large number of tables.
68 The Institute of Automatic Control at the University of Mining and
69 Technology, in Freiberg, Germany, encountered some problems when
70 it wanted to use <productname>PostgreSQL</productname> as the
71 backend for a decision support knowledge based system for the
72 maintenance of an electrical power grid. The DBMS needed to handle
73 large join queries for the inference machine of the knowledge
74 based system. The number of joins in these queries made using the
75 normal query optimizer infeasible.
79 In the following we describe the implementation of a
80 <firstterm>genetic algorithm</firstterm> to solve the join
81 ordering problem in a manner that is efficient for queries
82 involving large numbers of joins.
86 <sect1 id="geqo-intro2">
87 <title>Genetic Algorithms</title>
90 The genetic algorithm (<acronym>GA</acronym>) is a heuristic optimization method which
92 nondeterministic, randomized search. The set of possible solutions for the
93 optimization problem is considered as a
94 <firstterm>population</firstterm> of <firstterm>individuals</firstterm>.
95 The degree of adaptation of an individual to its environment is specified
96 by its <firstterm>fitness</firstterm>.
100 The coordinates of an individual in the search space are represented
101 by <firstterm>chromosomes</firstterm>, in essence a set of character
102 strings. A <firstterm>gene</firstterm> is a
103 subsection of a chromosome which encodes the value of a single parameter
104 being optimized. Typical encodings for a gene could be <firstterm>binary</firstterm> or
105 <firstterm>integer</firstterm>.
109 Through simulation of the evolutionary operations <firstterm>recombination</firstterm>,
110 <firstterm>mutation</firstterm>, and
111 <firstterm>selection</firstterm> new generations of search points are found
112 that show a higher average fitness than their ancestors.
116 According to the <systemitem class="resource">comp.ai.genetic</> <acronym>FAQ</acronym> it cannot be stressed too
117 strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a
118 problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly
119 non-random (better than random).
122 <figure id="geqo-diagram">
123 <title>Structured Diagram of a Genetic Algorithm</title>
125 <informaltable frame="none">
130 <entry>generation of ancestors at a time t</entry>
134 <entry>P''(t)</entry>
135 <entry>generation of descendants at a time t</entry>
141 <literallayout class="monospaced">
142 +=========================================+
143 |>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<|
144 +=========================================+
145 | INITIALIZE t := 0 |
146 +=========================================+
148 +=========================================+
149 | evaluate FITNESS of P(t) |
150 +=========================================+
151 | while not STOPPING CRITERION do |
152 | +-------------------------------------+
153 | | P'(t) := RECOMBINATION{P(t)} |
154 | +-------------------------------------+
155 | | P''(t) := MUTATION{P'(t)} |
156 | +-------------------------------------+
157 | | P(t+1) := SELECTION{P''(t) + P(t)} |
158 | +-------------------------------------+
159 | | evaluate FITNESS of P''(t) |
160 | +-------------------------------------+
162 +===+=====================================+
167 <sect1 id="geqo-pg-intro">
168 <title>Genetic Query Optimization (<acronym>GEQO</acronym>) in PostgreSQL</title>
171 The <acronym>GEQO</acronym> module approaches the query
172 optimization problem as though it were the well-known traveling salesman
173 problem (<acronym>TSP</acronym>).
174 Possible query plans are encoded as integer strings. Each string
175 represents the join order from one relation of the query to the next.
176 For example, the join tree
177 <literallayout class="monospaced">
183 is encoded by the integer string '4-1-3-2',
184 which means, first join relation '4' and '1', then '3', and
185 then '2', where 1, 2, 3, 4 are relation IDs within the
186 <productname>PostgreSQL</productname> optimizer.
190 Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
195 Specific characteristics of the <acronym>GEQO</acronym>
196 implementation in <productname>PostgreSQL</productname>
199 <itemizedlist spacing="compact" mark="bullet">
202 Usage of a <firstterm>steady state</firstterm> <acronym>GA</acronym> (replacement of the least fit
203 individuals in a population, not whole-generational replacement)
204 allows fast convergence towards improved query plans. This is
205 essential for query handling with reasonable time;
211 Usage of <firstterm>edge recombination crossover</firstterm>
212 which is especially suited to keep edge losses low for the
213 solution of the <acronym>TSP</acronym> by means of a
214 <acronym>GA</acronym>;
220 Mutation as genetic operator is deprecated so that no repair
221 mechanisms are needed to generate legal <acronym>TSP</acronym> tours.
228 The <acronym>GEQO</acronym> module allows
229 the <productname>PostgreSQL</productname> query optimizer to
230 support large join queries effectively through
231 non-exhaustive search.
234 <sect2 id="geqo-future">
235 <title>Future Implementation Tasks for
236 <productname>PostgreSQL</> <acronym>GEQO</acronym></title>
239 Work is still needed to improve the genetic algorithm parameter
241 In file <filename>src/backend/optimizer/geqo/geqo_main.c</filename>,
243 <function>gimme_pool_size</function> and <function>gimme_number_generations</function>,
244 we have to find a compromise for the parameter settings
245 to satisfy two competing demands:
246 <itemizedlist spacing="compact">
249 Optimality of the query plan
261 At a more basic level, it is not clear that solving query optimization
262 with a GA algorithm designed for TSP is appropriate. In the TSP case,
263 the cost associated with any substring (partial tour) is independent
264 of the rest of the tour, but this is certainly not true for query
265 optimization. Thus it is questionable whether edge recombination
266 crossover is the most effective mutation procedure.
272 <sect1 id="geqo-biblio">
273 <title>Further Reading</title>
276 The following resources contain additional information about
282 <ulink url="http://www.cs.bham.ac.uk/Mirrors/ftp.de.uu.net/EC/clife/www/location.htm">
283 The Hitch-Hiker's Guide to Evolutionary Computation</ulink>, (FAQ for <ulink
284 url="news://comp.ai.genetic"></ulink>)
290 <ulink url="http://www.red3d.com/cwr/evolve.html">
291 Evolutionary Computation and its application to art and design</ulink>, by
298 <xref linkend="ELMA04">
304 <xref linkend="FONG">