granicus.if.org Git - postgresql/blob - src/backend/optimizer/README

   1 Summary
   2 -------
   3
   4 These directories take the Query structure returned by the parser, and
   5 generate a plan used by the executor.  The /plan directory generates the
   6 actual output plan, the /path code generates all possible ways to join the
   7 tables, and /prep handles various preprocessing steps for special cases.
   8 /util is utility stuff.  /geqo is the separate "genetic optimization" planner
   9 --- it does a semi-random search through the join tree space, rather than
  10 exhaustively considering all possible join trees.  (But each join considered
  11 by /geqo is given to /path to create paths for, so we consider all possible
  12 implementation paths for each specific join pair even in GEQO mode.)
  13
  14
  15 Paths and Join Pairs
  16 --------------------
  17
  18 During the planning/optimizing process, we build "Path" trees representing
  19 the different ways of doing a query.  We select the cheapest Path that
  20 generates the desired relation and turn it into a Plan to pass to the
  21 executor.  (There is pretty much a one-to-one correspondence between the
  22 Path and Plan trees, but Path nodes omit info that won't be needed during
  23 planning, and include info needed for planning that won't be needed by the
  24 executor.)
  25
  26 The optimizer builds a RelOptInfo structure for each base relation used in
  27 the query.  Base rels are either primitive tables, or subquery subselects
  28 that are planned via a separate recursive invocation of the planner.  A
  29 RelOptInfo is also built for each join relation that is considered during
  30 planning.  A join rel is simply a combination of base rels.  There is only
  31 one join RelOptInfo for any given set of baserels --- for example, the join
  32 {A B C} is represented by the same RelOptInfo no matter whether we build it
  33 by joining A and B first and then adding C, or joining B and C first and
  34 then adding A, etc.  These different means of building the joinrel are
  35 represented as Paths.  For each RelOptInfo we build a list of Paths that
  36 represent plausible ways to implement the scan or join of that relation.
  37 Once we've considered all the plausible Paths for a rel, we select the one
  38 that is cheapest according to the planner's cost estimates.  The final plan
  39 is derived from the cheapest Path for the RelOptInfo that includes all the
  40 base rels of the query.
  41
  42 Possible Paths for a primitive table relation include plain old sequential
  43 scan, plus index scans for any indexes that exist on the table, plus bitmap
  44 index scans using one or more indexes.  A subquery base relation just has
  45 one Path, a "SubqueryScan" path (which links to the subplan that was built
  46 by a recursive invocation of the planner).  Likewise a function-RTE base
  47 relation has only one possible Path.
  48
  49 Joins always occur using two RelOptInfos.  One is outer, the other inner.
  50 Outers drive lookups of values in the inner.  In a nested loop, lookups of
  51 values in the inner occur by scanning the inner path once per outer tuple
  52 to find each matching inner row.  In a mergejoin, inner and outer rows are
  53 ordered, and are accessed in order, so only one scan is required to perform
  54 the entire join: both inner and outer paths are scanned in-sync.  (There's
  55 not a lot of difference between inner and outer in a mergejoin...)  In a
  56 hashjoin, the inner is scanned first and all its rows are entered in a
  57 hashtable, then the outer is scanned and for each row we lookup the join
  58 key in the hashtable.
  59
  60 A Path for a join relation is actually a tree structure, with the top
  61 Path node representing the join method.  It has left and right subpaths
  62 that represent the scan or join methods used for the two input relations.
  63
  64
  65 Join Tree Construction
  66 ----------------------
  67
  68 The optimizer generates optimal query plans by doing a more-or-less
  69 exhaustive search through the ways of executing the query.  The best Path
  70 tree is found by a recursive process:
  71
  72 1) Take each base relation in the query, and make a RelOptInfo structure
  73 for it.  Find each potentially useful way of accessing the relation,
  74 including sequential and index scans, and make a Path representing that
  75 way.  All the Paths made for a given relation are placed in its
  76 RelOptInfo.pathlist.  (Actually, we discard Paths that are obviously
  77 inferior alternatives before they ever get into the pathlist --- what
  78 ends up in the pathlist is the cheapest way of generating each potentially
  79 useful sort ordering of the relation.)  Also create a RelOptInfo.joininfo
  80 list including all the join clauses that involve this relation.  For
  81 example, the WHERE clause "tab1.col1 = tab2.col1" generates entries in
  82 both tab1 and tab2's joininfo lists.
  83
  84 If we have only a single base relation in the query, we are done.
  85 Otherwise we have to figure out how to join the base relations into a
  86 single join relation.
  87
  88 2) Normally, any explicit JOIN clauses are "flattened" so that we just
  89 have a list of relations to join.  However, FULL OUTER JOIN clauses are
  90 never flattened, and other kinds of JOIN might not be either, if the
  91 flattening process is stopped by join_collapse_limit or from_collapse_limit
  92 restrictions.  Therefore, we end up with a planning problem that contains
  93 both lists of relations to be joined in any order, and JOIN nodes that
  94 force a particular join order.  For each un-flattened JOIN node, we join
  95 exactly that pair of relations (after recursively planning their inputs,
  96 if the inputs aren't single base relations).  We generate a Path for each
  97 feasible join method, and select the cheapest Path.  Note that the JOIN
  98 clause structure determines the join Path structure, but it doesn't
  99 constrain the join implementation method at each join (nestloop, merge,
 100 hash), nor does it say which rel is considered outer or inner at each
 101 join.  We consider all these possibilities in building Paths.
 102
 103 3) At the top level of the FROM clause we will have a list of relations
 104 that are either base rels or joinrels constructed per un-flattened JOIN
 105 directives.  (This is also the situation, recursively, when we can flatten
 106 sub-joins underneath an un-flattenable JOIN into a list of relations to
 107 join.)  We can join these rels together in any order the planner sees fit.
 108 The standard (non-GEQO) planner does this as follows:
 109
 110 Consider joining each RelOptInfo to each other RelOptInfo specified in its
 111 RelOptInfo.joininfo, and generate a Path for each possible join method for
 112 each such pair.  (If we have a RelOptInfo with no join clauses, we have no
 113 choice but to generate a clauseless Cartesian-product join; so we consider
 114 joining that rel to each other available rel.  But in the presence of join
 115 clauses we will only consider joins that use available join clauses.)
 116
 117 If we only had two relations in the FROM list, we are done: we just pick
 118 the cheapest path for the join RelOptInfo.  If we had more than two, we now
 119 need to consider ways of joining join RelOptInfos to each other to make
 120 join RelOptInfos that represent more than two FROM items.
 121
 122 The join tree is constructed using a "dynamic programming" algorithm:
 123 in the first pass (already described) we consider ways to create join rels
 124 representing exactly two FROM items.  The second pass considers ways
 125 to make join rels that represent exactly three FROM items; the next pass,
 126 four items, etc.  The last pass considers how to make the final join
 127 relation that includes all FROM items --- obviously there can be only one
 128 join rel at this top level, whereas there can be more than one join rel
 129 at lower levels.  At each level we use joins that follow available join
 130 clauses, if possible, just as described for the first level.
 131
 132 For example:
 133
 134     SELECT  *
 135     FROM    tab1, tab2, tab3, tab4
 136     WHERE   tab1.col = tab2.col AND
 137         tab2.col = tab3.col AND
 138         tab3.col = tab4.col
 139
 140     Tables 1, 2, 3, and 4 are joined as:
 141     {1 2},{2 3},{3 4}
 142     {1 2 3},{2 3 4}
 143     {1 2 3 4}
 144     (other possibilities will be excluded for lack of join clauses)
 145
 146     SELECT  *
 147     FROM    tab1, tab2, tab3, tab4
 148     WHERE   tab1.col = tab2.col AND
 149         tab1.col = tab3.col AND
 150         tab1.col = tab4.col
 151
 152     Tables 1, 2, 3, and 4 are joined as:
 153     {1 2},{1 3},{1 4}
 154     {1 2 3},{1 3 4},{1 2 4}
 155     {1 2 3 4}
 156
 157 We consider left-handed plans (the outer rel of an upper join is a joinrel,
 158 but the inner is always a single FROM item); right-handed plans (outer rel
 159 is always a single item); and bushy plans (both inner and outer can be
 160 joins themselves).  For example, when building {1 2 3 4} we consider
 161 joining {1 2 3} to {4} (left-handed), {4} to {1 2 3} (right-handed), and
 162 {1 2} to {3 4} (bushy), among other choices.  Although the jointree
 163 scanning code produces these potential join combinations one at a time,
 164 all the ways to produce the same set of joined base rels will share the
 165 same RelOptInfo, so the paths produced from different join combinations
 166 that produce equivalent joinrels will compete in add_path().
 167
 168 Once we have built the final join rel, we use either the cheapest path
 169 for it or the cheapest path with the desired ordering (if that's cheaper
 170 than applying a sort to the cheapest other path).
 171
 172 If the query contains one-sided outer joins (LEFT or RIGHT joins), or
 173 "IN (sub-select)" WHERE clauses that were converted to joins, then some of
 174 the possible join orders may be illegal.  These are excluded by having
 175 make_join_rel consult side lists of outer joins and IN joins to see
 176 whether a proposed join is illegal.  (The same consultation allows it
 177 to see which join style should be applied for a valid join, ie,
 178 JOIN_INNER, JOIN_LEFT, etc.)
 179
 180
 181 Valid OUTER JOIN optimizations
 182 ------------------------------
 183
 184 The planner's treatment of outer join reordering is based on the following
 185 identities:
 186
 187 1.      (A leftjoin B on (Pab)) innerjoin C on (Pac)
 188         = (A innerjoin C on (Pac)) leftjoin B on (Pab)
 189
 190 where Pac is a predicate referencing A and C, etc (in this case, clearly
 191 Pac cannot reference B, or the transformation is nonsensical).
 192
 193 2.      (A leftjoin B on (Pab)) leftjoin C on (Pac)
 194         = (A leftjoin C on (Pac)) leftjoin B on (Pab)
 195
 196 3.      (A leftjoin B on (Pab)) leftjoin C on (Pbc)
 197         = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
 198
 199 Identity 3 only holds if predicate Pbc must fail for all-null B rows
 200 (that is, Pbc is strict for at least one column of B).  If Pbc is not
 201 strict, the first form might produce some rows with nonnull C columns
 202 where the second form would make those entries null.
 203
 204 RIGHT JOIN is equivalent to LEFT JOIN after switching the two input
 205 tables, so the same identities work for right joins.  Only FULL JOIN
 206 cannot be re-ordered at all.
 207
 208 An example of a case that does *not* work is moving an innerjoin into or
 209 out of the nullable side of an outer join:
 210
 211         A leftjoin (B join C on (Pbc)) on (Pab)
 212         != (A leftjoin B on (Pab)) join C on (Pbc)
 213
 214 FULL JOIN ordering is enforced by not collapsing FULL JOIN nodes when
 215 translating the jointree to "joinlist" representation.  LEFT and RIGHT
 216 JOIN nodes are normally collapsed so that they participate fully in the
 217 join order search.  To avoid generating illegal join orders, the planner
 218 creates an OuterJoinInfo node for each outer join, and make_join_rel
 219 checks this list to decide if a proposed join is legal.
 220
 221 What we store in OuterJoinInfo nodes are the minimum sets of Relids
 222 required on each side of the join to form the outer join.  Note that
 223 these are minimums; there's no explicit maximum, since joining other
 224 rels to the OJ's syntactic rels may be legal.  Per identities 1 and 2,
 225 non-FULL joins can be freely associated into the lefthand side of an
 226 OJ, but in general they can't be associated into the righthand side.
 227 So the restriction enforced by make_join_rel is that a proposed join
 228 can't join across a RHS boundary (ie, join anything inside the RHS
 229 to anything else) unless the join validly implements some outer join.
 230 (To support use of identity 3, we have to allow cases where an apparent
 231 violation of a lower OJ's RHS is committed while forming an upper OJ.
 232 If this wouldn't in fact be legal, the upper OJ's minimum LHS or RHS
 233 set must be expanded to include the whole of the lower OJ, thereby
 234 preventing it from being formed before the lower OJ is.)
 235
 236
 237 Pulling up subqueries
 238 ---------------------
 239
 240 As we described above, a subquery appearing in the range table is planned
 241 independently and treated as a "black box" during planning of the outer
 242 query.  This is necessary when the subquery uses features such as
 243 aggregates, GROUP, or DISTINCT.  But if the subquery is just a simple
 244 scan or join, treating the subquery as a black box may produce a poor plan
 245 compared to considering it as part of the entire plan search space.
 246 Therefore, at the start of the planning process the planner looks for
 247 simple subqueries and pulls them up into the main query's jointree.
 248
 249 Pulling up a subquery may result in FROM-list joins appearing below the top
 250 of the join tree.  Each FROM-list is planned using the dynamic-programming
 251 search method described above.
 252
 253 If pulling up a subquery produces a FROM-list as a direct child of another
 254 FROM-list, then we can merge the two FROM-lists together.  Once that's
 255 done, the subquery is an absolutely integral part of the outer query and
 256 will not constrain the join tree search space at all.  However, that could
 257 result in unpleasant growth of planning time, since the dynamic-programming
 258 search has runtime exponential in the number of FROM-items considered.
 259 Therefore, we don't merge FROM-lists if the result would have too many
 260 FROM-items in one list.
 261
 262
 263 Optimizer Functions
 264 -------------------
 265
 266 The primary entry point is planner().
 267
 268 planner()
 269  set up for recursive handling of subqueries
 270  do final cleanup after planning.
 271 -subquery_planner()
 272  pull up subqueries from rangetable, if possible
 273  canonicalize qual
 274      Attempt to simplify WHERE clause to the most useful form; this includes
 275      flattening nested AND/ORs and detecting clauses that are duplicated in
 276      different branches of an OR.
 277  simplify constant expressions
 278  process sublinks
 279  convert Vars of outer query levels into Params
 280 --grouping_planner()
 281   preprocess target list for non-SELECT queries
 282   handle UNION/INTERSECT/EXCEPT, GROUP BY, HAVING, aggregates,
 283         ORDER BY, DISTINCT, LIMIT
 284 --query_planner()
 285    pull out constant quals, which can be used to gate execution of the
 286         whole plan (if any are found, we make a top-level Result node
 287         to do the gating)
 288    make list of base relations used in query
 289    split up the qual into restrictions (a=1) and joins (b=c)
 290    find qual clauses that enable merge and hash joins
 291 ----make_one_rel()
 292      set_base_rel_pathlist()
 293       find scan and all index paths for each base relation
 294       find selectivity of columns used in joins
 295 -----make_one_rel_by_joins()
 296       jump to geqo if needed
 297       else call make_rels_by_joins() for each level of join tree needed
 298       make_rels_by_joins():
 299         For each joinrel of the prior level, do make_rels_by_clause_joins()
 300         if it has join clauses, or make_rels_by_clauseless_joins() if not.
 301         Also generate "bushy plan" joins between joinrels of lower levels.
 302       Back at make_one_rel_by_joins(), apply set_cheapest() to extract the
 303       cheapest path for each newly constructed joinrel.
 304       Loop back if this wasn't the top join level.
 305    Back at query_planner:
 306     put back any constant quals by adding a Result node
 307  Back at grouping_planner:
 308  do grouping(GROUP)
 309  do aggregates
 310  make unique(DISTINCT)
 311  make sort(ORDER BY)
 312  make limit(LIMIT/OFFSET)
 313
 314
 315 Optimizer Data Structures
 316 -------------------------
 317
 318 PlannerInfo     - global information for planning a particular Query
 319
 320 RelOptInfo      - a relation or joined relations
 321
 322  RestrictInfo   - WHERE clauses, like "x = 3" or "y = z"
 323                   (note the same structure is used for restriction and
 324                    join clauses)
 325
 326  Path           - every way to generate a RelOptInfo(sequential,index,joins)
 327   SeqScan       - a plain Path node with pathtype = T_SeqScan
 328   IndexPath     - index scans
 329   BitmapHeapPath - top of a bitmapped index scan
 330   TidPath       - scan by CTID
 331   AppendPath    - append multiple subpaths together
 332   ResultPath    - a Result plan node (used for FROM-less SELECT)
 333   MaterialPath  - a Material plan node
 334   UniquePath    - remove duplicate rows
 335   NestPath      - nested-loop joins
 336   MergePath     - merge joins
 337   HashPath      - hash joins
 338
 339  PathKeys       - a data structure representing the ordering of a path
 340
 341 The optimizer spends a good deal of its time worrying about the ordering
 342 of the tuples returned by a path.  The reason this is useful is that by
 343 knowing the sort ordering of a path, we may be able to use that path as
 344 the left or right input of a mergejoin and avoid an explicit sort step.
 345 Nestloops and hash joins don't really care what the order of their inputs
 346 is, but mergejoin needs suitably ordered inputs.  Therefore, all paths
 347 generated during the optimization process are marked with their sort order
 348 (to the extent that it is known) for possible use by a higher-level merge.
 349
 350 It is also possible to avoid an explicit sort step to implement a user's
 351 ORDER BY clause if the final path has the right ordering already, so the
 352 sort ordering is of interest even at the top level.  query_planner() will
 353 look for the cheapest path with a sort order matching the desired order,
 354 and grouping_planner() will compare its cost to the cost of using the
 355 cheapest-overall path and doing an explicit sort.
 356
 357 When we are generating paths for a particular RelOptInfo, we discard a path
 358 if it is more expensive than another known path that has the same or better
 359 sort order.  We will never discard a path that is the only known way to
 360 achieve a given sort order (without an explicit sort, that is).  In this
 361 way, the next level up will have the maximum freedom to build mergejoins
 362 without sorting, since it can pick from any of the paths retained for its
 363 inputs.
 364
 365
 366 PathKeys
 367 --------
 368
 369 The PathKeys data structure represents what is known about the sort order
 370 of a particular Path.
 371
 372 Path.pathkeys is a List of Lists of PathKeyItem nodes that represent
 373 the sort order of the result generated by the Path.  The n'th sublist
 374 represents the n'th sort key of the result.
 375
 376 In single/base relation RelOptInfo's, the Paths represent various ways
 377 of scanning the relation and the resulting ordering of the tuples.
 378 Sequential scan Paths have NIL pathkeys, indicating no known ordering.
 379 Index scans have Path.pathkeys that represent the chosen index's ordering,
 380 if any.  A single-key index would create a pathkey with a single sublist,
 381 e.g. ( (tab1.indexkey1/sortop1) ).  A multi-key index generates a sublist
 382 per key, e.g. ( (tab1.indexkey1/sortop1) (tab1.indexkey2/sortop2) ) which
 383 shows major sort by indexkey1 (ordering by sortop1) and minor sort by
 384 indexkey2 with sortop2.
 385
 386 Note that a multi-pass indexscan (OR clause scan) has NIL pathkeys since
 387 we can say nothing about the overall order of its result.  Also, an
 388 indexscan on an unordered type of index generates NIL pathkeys.  However,
 389 we can always create a pathkey by doing an explicit sort.  The pathkeys
 390 for a Sort plan's output just represent the sort key fields and the
 391 ordering operators used.
 392
 393 Things get more interesting when we consider joins.  Suppose we do a
 394 mergejoin between A and B using the mergeclause A.X = B.Y.  The output
 395 of the mergejoin is sorted by X --- but it is also sorted by Y.  We
 396 represent this fact by listing both keys in a single pathkey sublist:
 397 ( (A.X/xsortop B.Y/ysortop) ).  This pathkey asserts that the major
 398 sort order of the Path can be taken to be *either* A.X or B.Y.
 399 They are equal, so they are both primary sort keys.  By doing this,
 400 we allow future joins to use either var as a pre-sorted key, so upper
 401 Mergejoins may be able to avoid having to re-sort the Path.  This is
 402 why pathkeys is a List of Lists.
 403
 404 We keep a sortop associated with each PathKeyItem because cross-data-type
 405 mergejoins are possible; for example int4 = int8 is mergejoinable.
 406 In this case we need to remember that the left var is ordered by int4lt
 407 while the right var is ordered by int8lt.  So the different members of
 408 each sublist could have different sortops.
 409
 410 Note that while the order of the top list is meaningful (primary vs.
 411 secondary sort key), the order of each sublist is arbitrary.  Each sublist
 412 should be regarded as a set of equivalent keys, with no significance
 413 to the list order.
 414
 415 With a little further thought, it becomes apparent that pathkeys for
 416 joins need not only come from mergejoins.  For example, if we do a
 417 nestloop join between outer relation A and inner relation B, then any
 418 pathkeys relevant to A are still valid for the join result: we have
 419 not altered the order of the tuples from A.  Even more interesting,
 420 if there was a mergeclause (more formally, an "equijoin clause") A.X=B.Y,
 421 and A.X was a pathkey for the outer relation A, then we can assert that
 422 B.Y is a pathkey for the join result; X was ordered before and still is,
 423 and the joined values of Y are equal to the joined values of X, so Y
 424 must now be ordered too.  This is true even though we used neither an
 425 explicit sort nor a mergejoin on Y.
 426
 427 More generally, whenever we have an equijoin clause A.X = B.Y and a
 428 pathkey A.X, we can add B.Y to that pathkey if B is part of the joined
 429 relation the pathkey is for, *no matter how we formed the join*.  It works
 430 as long as the clause has been applied at some point while forming the
 431 join relation.  (In the current implementation, we always apply qual
 432 clauses as soon as possible, ie, as far down in the plan tree as possible.
 433 So we can treat the pathkeys as equivalent everywhere.  The exception is
 434 when the relations A and B are joined inside the nullable side of an
 435 OUTER JOIN and the equijoin clause comes from above the OUTER JOIN.  In this
 436 case we cannot apply the qual as soon as A and B are joined, so we do not
 437 consider the pathkeys to be equivalent.  This could be improved if we wanted
 438 to go to the trouble of making pathkey equivalence be context-dependent,
 439 but that seems much more complex than it's worth.)
 440
 441 In short, then: when producing the pathkeys for a merge or nestloop join,
 442 we can keep all of the keys of the outer path, since the ordering of the
 443 outer path will be preserved in the result.  Furthermore, we can add to
 444 each pathkey sublist any inner vars that are equijoined to any of the
 445 outer vars in the sublist; this works regardless of whether we are
 446 implementing the join using that equijoin clause as a mergeclause,
 447 or merely enforcing the clause after-the-fact as a qpqual filter.
 448
 449 Although Hashjoins also work only with equijoin operators, it is *not*
 450 safe to consider the output of a Hashjoin to be sorted in any particular
 451 order --- not even the outer path's order.  This is true because the
 452 executor might have to split the join into multiple batches.  Therefore
 453 a Hashjoin is always given NIL pathkeys.  (Also, we need to use only
 454 mergejoinable operators when deducing which inner vars are now sorted,
 455 because a mergejoin operator tells us which left- and right-datatype
 456 sortops can be considered equivalent, whereas a hashjoin operator
 457 doesn't imply anything about sort order.)
 458
 459 Pathkeys are also useful to represent an ordering that we wish to achieve,
 460 since they are easily compared to the pathkeys of a potential candidate
 461 path.  So, SortClause lists are turned into pathkeys lists for use inside
 462 the optimizer.
 463
 464 OK, now for how it *really* works:
 465
 466 We did implement pathkeys just as described above, and found that the
 467 planner spent a huge amount of time comparing pathkeys, because the
 468 representation of pathkeys as unordered lists made it expensive to decide
 469 whether two were equal or not.  So, we've modified the representation
 470 as described next.
 471
 472 If we scan the WHERE clause for equijoin clauses (mergejoinable clauses)
 473 during planner startup, we can construct lists of equivalent pathkey items
 474 for the query.  There could be more than two items per equivalence set;
 475 for example, WHERE A.X = B.Y AND B.Y = C.Z AND D.R = E.S creates the
 476 equivalence sets { A.X B.Y C.Z } and { D.R E.S } (plus associated sortops).
 477 Any pathkey item that belongs to an equivalence set implies that all the
 478 other items in its set apply to the relation too, or at least all the ones
 479 that are for fields present in the relation.  (Some of the items in the
 480 set might be for as-yet-unjoined relations.)  Furthermore, any multi-item
 481 pathkey sublist that appears at any stage of planning the query *must* be
 482 a subset of one or another of these equivalence sets; there's no way we'd
 483 have put two items in the same pathkey sublist unless they were equijoined
 484 in WHERE.
 485
 486 Now suppose that we allow a pathkey sublist to contain pathkey items for
 487 vars that are not yet part of the pathkey's relation.  This introduces
 488 no logical difficulty, because such items can easily be seen to be
 489 irrelevant; we just mandate that they be ignored.  But having allowed
 490 this, we can declare (by fiat) that any multiple-item pathkey sublist
 491 must be "equal()" to the appropriate equivalence set.  In effect,
 492 whenever we make a pathkey sublist that mentions any var appearing in an
 493 equivalence set, we instantly add all the other vars equivalenced to it,
 494 whether they appear yet in the pathkey's relation or not.  And we also
 495 mandate that the pathkey sublist appear in the same order as the
 496 equivalence set it comes from.
 497
 498 In fact, we can go even further, and say that the canonical representation
 499 of a pathkey sublist is a pointer directly to the relevant equivalence set,
 500 which is kept in a list of pathkey equivalence sets for the query.  Then
 501 pathkey sublist comparison reduces to pointer-equality checking!  To do this
 502 we also have to add single-element pathkey sublists to the query's list of
 503 equivalence sets, but that's a small price to pay.
 504
 505 By the way, it's OK and even useful for us to build equivalence sets
 506 that mention multiple vars from the same relation.  For example, if
 507 we have WHERE A.X = A.Y and we are scanning A using an index on X,
 508 we can legitimately conclude that the path is sorted by Y as well;
 509 and this could be handy if Y is the variable used in other join clauses
 510 or ORDER BY.  So, any WHERE clause with a mergejoinable operator can
 511 contribute to an equivalence set, even if it's not a join clause.
 512
 513 As sketched so far, equijoin operators allow us to conclude that
 514 A.X = B.Y and B.Y = C.Z together imply A.X = C.Z, even when different
 515 datatypes are involved.  What is not immediately obvious is that to use
 516 the "canonical pathkey" representation, we *must* make this deduction.
 517 An example (from a real bug in Postgres 7.0) is a mergejoin for a query
 518 like
 519         SELECT * FROM t1, t2 WHERE t1.f2 = t2.f3 AND t1.f1 = t2.f3;
 520 The canonical-pathkey mechanism is able to deduce that t1.f1 = t1.f2
 521 (ie, both appear in the same canonical pathkey set).  If we sort t1
 522 and then apply a mergejoin, we *must* filter the t1 tuples using the
 523 implied qualification f1 = f2, because otherwise the output of the sort
 524 will be ordered by f1 or f2 (whichever we sort on) but not both.  The
 525 merge will then fail since (depending on which qual clause it applies
 526 first) it's expecting either ORDER BY f1,f2 or ORDER BY f2,f1, but the
 527 actual output of the sort has neither of these orderings.  The best fix
 528 for this is to generate all the implied equality constraints for each
 529 equijoin set and add these clauses to the query's qualification list.
 530 In other words, we *explicitly* deduce f1 = f2 and add this to the WHERE
 531 clause.  The constraint will be applied as a qpqual to the output of the
 532 scan on t1, resulting in sort output that is indeed ordered by both vars.
 533 This approach provides more information to the selectivity estimation
 534 code than it would otherwise have, and reduces the number of tuples
 535 processed in join stages, so it's a win to make these deductions even
 536 if we weren't forced to.
 537
 538 When we generate implied equality constraints, we may find ourselves
 539 adding redundant clauses to specific relations.  For example, consider
 540         SELECT * FROM t1, t2, t3 WHERE t1.a = t2.b AND t2.b = t3.c;
 541 We will generate the implied clause t1.a = t3.c and add it to the tree.
 542 This is good since it allows us to consider joining t1 and t3 directly,
 543 which we otherwise wouldn't do.  But when we reach the stage of joining
 544 all three relations, we will have redundant join clauses --- eg, if we
 545 join t1 and t2 first, then the path that joins (t1 t2) to t3 will have
 546 both t2.b = t3.c and t1.a = t3.c as restriction clauses.  This is bad;
 547 not only is evaluation of the extra clause useless work at runtime,
 548 but the selectivity estimator routines will underestimate the number
 549 of tuples produced since they won't know that the two clauses are
 550 perfectly redundant.  We fix this by detecting and removing redundant
 551 clauses as the restriction clause list is built for each join.  (We
 552 can't do it sooner, since which clauses are redundant will vary depending
 553 on the join order.)
 554
 555 Yet another implication of all this is that mergejoinable operators
 556 must form closed equivalence sets.  For example, if "int2 = int4"
 557 and "int4 = int8" are both marked mergejoinable, then there had better
 558 be a mergejoinable "int2 = int8" operator as well.  Otherwise, when
 559 we're given WHERE int2var = int4var AND int4var = int8var, we'll fail
 560 while trying to create a representation of the implied clause
 561 int2var = int8var.
 562
 563 An additional refinement we can make is to insist that canonical pathkey
 564 lists (sort orderings) do not mention the same pathkey set more than once.
 565 For example, a pathkey list ((A) (B) (A)) is redundant --- the second
 566 occurrence of (A) does not change the ordering, since the data must already
 567 be sorted by A.  Although a user probably wouldn't write ORDER BY A,B,A
 568 directly, such redundancies are more probable once equijoin equivalences
 569 have been considered.  Also, the system is likely to generate redundant
 570 pathkey lists when computing the sort ordering needed for a mergejoin.  By
 571 eliminating the redundancy, we save time and improve planning, since the
 572 planner will more easily recognize equivalent orderings as being equivalent.
 573
 574 Though Bob Devine <bob.devine@worldnet.att.net> was not involved in the
 575 coding of our optimizer, he is available to field questions about
 576 optimizer topics.
 577
 578 -- bjm & tgl
 579