granicus.if.org Git - postgresql/blob - src/backend/optimizer/README

   1 src/backend/optimizer/README
   2
   3 Optimizer
   4 =========
   5
   6 These directories take the Query structure returned by the parser, and
   7 generate a plan used by the executor.  The /plan directory generates the
   8 actual output plan, the /path code generates all possible ways to join the
   9 tables, and /prep handles various preprocessing steps for special cases.
  10 /util is utility stuff.  /geqo is the separate "genetic optimization" planner
  11 --- it does a semi-random search through the join tree space, rather than
  12 exhaustively considering all possible join trees.  (But each join considered
  13 by /geqo is given to /path to create paths for, so we consider all possible
  14 implementation paths for each specific join pair even in GEQO mode.)
  15
  16
  17 Paths and Join Pairs
  18 --------------------
  19
  20 During the planning/optimizing process, we build "Path" trees representing
  21 the different ways of doing a query.  We select the cheapest Path that
  22 generates the desired relation and turn it into a Plan to pass to the
  23 executor.  (There is pretty nearly a one-to-one correspondence between the
  24 Path and Plan trees, but Path nodes omit info that won't be needed during
  25 planning, and include info needed for planning that won't be needed by the
  26 executor.)
  27
  28 The optimizer builds a RelOptInfo structure for each base relation used in
  29 the query.  Base rels are either primitive tables, or subquery subselects
  30 that are planned via a separate recursive invocation of the planner.  A
  31 RelOptInfo is also built for each join relation that is considered during
  32 planning.  A join rel is simply a combination of base rels.  There is only
  33 one join RelOptInfo for any given set of baserels --- for example, the join
  34 {A B C} is represented by the same RelOptInfo no matter whether we build it
  35 by joining A and B first and then adding C, or joining B and C first and
  36 then adding A, etc.  These different means of building the joinrel are
  37 represented as Paths.  For each RelOptInfo we build a list of Paths that
  38 represent plausible ways to implement the scan or join of that relation.
  39 Once we've considered all the plausible Paths for a rel, we select the one
  40 that is cheapest according to the planner's cost estimates.  The final plan
  41 is derived from the cheapest Path for the RelOptInfo that includes all the
  42 base rels of the query.
  43
  44 Possible Paths for a primitive table relation include plain old sequential
  45 scan, plus index scans for any indexes that exist on the table, plus bitmap
  46 index scans using one or more indexes.  Specialized RTE types, such as
  47 function RTEs, may have only one possible Path.
  48
  49 Joins always occur using two RelOptInfos.  One is outer, the other inner.
  50 Outers drive lookups of values in the inner.  In a nested loop, lookups of
  51 values in the inner occur by scanning the inner path once per outer tuple
  52 to find each matching inner row.  In a mergejoin, inner and outer rows are
  53 ordered, and are accessed in order, so only one scan is required to perform
  54 the entire join: both inner and outer paths are scanned in-sync.  (There's
  55 not a lot of difference between inner and outer in a mergejoin...)  In a
  56 hashjoin, the inner is scanned first and all its rows are entered in a
  57 hashtable, then the outer is scanned and for each row we lookup the join
  58 key in the hashtable.
  59
  60 A Path for a join relation is actually a tree structure, with the topmost
  61 Path node representing the last-applied join method.  It has left and right
  62 subpaths that represent the scan or join methods used for the two input
  63 relations.
  64
  65
  66 Join Tree Construction
  67 ----------------------
  68
  69 The optimizer generates optimal query plans by doing a more-or-less
  70 exhaustive search through the ways of executing the query.  The best Path
  71 tree is found by a recursive process:
  72
  73 1) Take each base relation in the query, and make a RelOptInfo structure
  74 for it.  Find each potentially useful way of accessing the relation,
  75 including sequential and index scans, and make Paths representing those
  76 ways.  All the Paths made for a given relation are placed in its
  77 RelOptInfo.pathlist.  (Actually, we discard Paths that are obviously
  78 inferior alternatives before they ever get into the pathlist --- what
  79 ends up in the pathlist is the cheapest way of generating each potentially
  80 useful sort ordering and parameterization of the relation.)  Also create a
  81 RelOptInfo.joininfo list including all the join clauses that involve this
  82 relation.  For example, the WHERE clause "tab1.col1 = tab2.col1" generates
  83 entries in both tab1 and tab2's joininfo lists.
  84
  85 If we have only a single base relation in the query, we are done.
  86 Otherwise we have to figure out how to join the base relations into a
  87 single join relation.
  88
  89 2) Normally, any explicit JOIN clauses are "flattened" so that we just
  90 have a list of relations to join.  However, FULL OUTER JOIN clauses are
  91 never flattened, and other kinds of JOIN might not be either, if the
  92 flattening process is stopped by join_collapse_limit or from_collapse_limit
  93 restrictions.  Therefore, we end up with a planning problem that contains
  94 lists of relations to be joined in any order, where any individual item
  95 might be a sub-list that has to be joined together before we can consider
  96 joining it to its siblings.  We process these sub-problems recursively,
  97 bottom up.  Note that the join list structure constrains the possible join
  98 orders, but it doesn't constrain the join implementation method at each
  99 join (nestloop, merge, hash), nor does it say which rel is considered outer
 100 or inner at each join.  We consider all these possibilities in building
 101 Paths. We generate a Path for each feasible join method, and select the
 102 cheapest Path.
 103
 104 For each planning problem, therefore, we will have a list of relations
 105 that are either base rels or joinrels constructed per sub-join-lists.
 106 We can join these rels together in any order the planner sees fit.
 107 The standard (non-GEQO) planner does this as follows:
 108
 109 Consider joining each RelOptInfo to each other RelOptInfo for which there
 110 is a usable joinclause, and generate a Path for each possible join method
 111 for each such pair.  (If we have a RelOptInfo with no join clauses, we have
 112 no choice but to generate a clauseless Cartesian-product join; so we
 113 consider joining that rel to each other available rel.  But in the presence
 114 of join clauses we will only consider joins that use available join
 115 clauses.  Note that join-order restrictions induced by outer joins and
 116 IN/EXISTS clauses are also checked, to ensure that we find a workable join
 117 order in cases where those restrictions force a clauseless join to be done.)
 118
 119 If we only had two relations in the list, we are done: we just pick
 120 the cheapest path for the join RelOptInfo.  If we had more than two, we now
 121 need to consider ways of joining join RelOptInfos to each other to make
 122 join RelOptInfos that represent more than two list items.
 123
 124 The join tree is constructed using a "dynamic programming" algorithm:
 125 in the first pass (already described) we consider ways to create join rels
 126 representing exactly two list items.  The second pass considers ways
 127 to make join rels that represent exactly three list items; the next pass,
 128 four items, etc.  The last pass considers how to make the final join
 129 relation that includes all list items --- obviously there can be only one
 130 join rel at this top level, whereas there can be more than one join rel
 131 at lower levels.  At each level we use joins that follow available join
 132 clauses, if possible, just as described for the first level.
 133
 134 For example:
 135
 136     SELECT  *
 137     FROM    tab1, tab2, tab3, tab4
 138     WHERE   tab1.col = tab2.col AND
 139         tab2.col = tab3.col AND
 140         tab3.col = tab4.col
 141
 142     Tables 1, 2, 3, and 4 are joined as:
 143     {1 2},{2 3},{3 4}
 144     {1 2 3},{2 3 4}
 145     {1 2 3 4}
 146     (other possibilities will be excluded for lack of join clauses)
 147
 148     SELECT  *
 149     FROM    tab1, tab2, tab3, tab4
 150     WHERE   tab1.col = tab2.col AND
 151         tab1.col = tab3.col AND
 152         tab1.col = tab4.col
 153
 154     Tables 1, 2, 3, and 4 are joined as:
 155     {1 2},{1 3},{1 4}
 156     {1 2 3},{1 3 4},{1 2 4}
 157     {1 2 3 4}
 158
 159 We consider left-handed plans (the outer rel of an upper join is a joinrel,
 160 but the inner is always a single list item); right-handed plans (outer rel
 161 is always a single item); and bushy plans (both inner and outer can be
 162 joins themselves).  For example, when building {1 2 3 4} we consider
 163 joining {1 2 3} to {4} (left-handed), {4} to {1 2 3} (right-handed), and
 164 {1 2} to {3 4} (bushy), among other choices.  Although the jointree
 165 scanning code produces these potential join combinations one at a time,
 166 all the ways to produce the same set of joined base rels will share the
 167 same RelOptInfo, so the paths produced from different join combinations
 168 that produce equivalent joinrels will compete in add_path().
 169
 170 The dynamic-programming approach has an important property that's not
 171 immediately obvious: we will finish constructing all paths for a given
 172 relation before we construct any paths for relations containing that rel.
 173 This means that we can reliably identify the "cheapest path" for each rel
 174 before higher-level relations need to know that.  Also, we can safely
 175 discard a path when we find that another path for the same rel is better,
 176 without worrying that maybe there is already a reference to that path in
 177 some higher-level join path.  Without this, memory management for paths
 178 would be much more complicated.
 179
 180 Once we have built the final join rel, we use either the cheapest path
 181 for it or the cheapest path with the desired ordering (if that's cheaper
 182 than applying a sort to the cheapest other path).
 183
 184 If the query contains one-sided outer joins (LEFT or RIGHT joins), or
 185 IN or EXISTS WHERE clauses that were converted to semijoins or antijoins,
 186 then some of the possible join orders may be illegal.  These are excluded
 187 by having join_is_legal consult a side list of such "special" joins to see
 188 whether a proposed join is illegal.  (The same consultation allows it to
 189 see which join style should be applied for a valid join, ie, JOIN_INNER,
 190 JOIN_LEFT, etc.)
 191
 192
 193 Valid OUTER JOIN Optimizations
 194 ------------------------------
 195
 196 The planner's treatment of outer join reordering is based on the following
 197 identities:
 198
 199 1.      (A leftjoin B on (Pab)) innerjoin C on (Pac)
 200         = (A innerjoin C on (Pac)) leftjoin B on (Pab)
 201
 202 where Pac is a predicate referencing A and C, etc (in this case, clearly
 203 Pac cannot reference B, or the transformation is nonsensical).
 204
 205 2.      (A leftjoin B on (Pab)) leftjoin C on (Pac)
 206         = (A leftjoin C on (Pac)) leftjoin B on (Pab)
 207
 208 3.      (A leftjoin B on (Pab)) leftjoin C on (Pbc)
 209         = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
 210
 211 Identity 3 only holds if predicate Pbc must fail for all-null B rows
 212 (that is, Pbc is strict for at least one column of B).  If Pbc is not
 213 strict, the first form might produce some rows with nonnull C columns
 214 where the second form would make those entries null.
 215
 216 RIGHT JOIN is equivalent to LEFT JOIN after switching the two input
 217 tables, so the same identities work for right joins.
 218
 219 An example of a case that does *not* work is moving an innerjoin into or
 220 out of the nullable side of an outer join:
 221
 222         A leftjoin (B join C on (Pbc)) on (Pab)
 223         != (A leftjoin B on (Pab)) join C on (Pbc)
 224
 225 SEMI joins work a little bit differently.  A semijoin can be reassociated
 226 into or out of the lefthand side of another semijoin, left join, or
 227 antijoin, but not into or out of the righthand side.  Likewise, an inner
 228 join, left join, or antijoin can be reassociated into or out of the
 229 lefthand side of a semijoin, but not into or out of the righthand side.
 230
 231 ANTI joins work approximately like LEFT joins, except that identity 3
 232 fails if the join to C is an antijoin (even if Pbc is strict, and in
 233 both the cases where the other join is a leftjoin and where it is an
 234 antijoin).  So we can't reorder antijoins into or out of the RHS of a
 235 leftjoin or antijoin, even if the relevant clause is strict.
 236
 237 The current code does not attempt to re-order FULL JOINs at all.
 238 FULL JOIN ordering is enforced by not collapsing FULL JOIN nodes when
 239 translating the jointree to "joinlist" representation.  Other types of
 240 JOIN nodes are normally collapsed so that they participate fully in the
 241 join order search.  To avoid generating illegal join orders, the planner
 242 creates a SpecialJoinInfo node for each non-inner join, and join_is_legal
 243 checks this list to decide if a proposed join is legal.
 244
 245 What we store in SpecialJoinInfo nodes are the minimum sets of Relids
 246 required on each side of the join to form the outer join.  Note that
 247 these are minimums; there's no explicit maximum, since joining other
 248 rels to the OJ's syntactic rels may be legal.  Per identities 1 and 2,
 249 non-FULL joins can be freely associated into the lefthand side of an
 250 OJ, but in some cases they can't be associated into the righthand side.
 251 So the restriction enforced by join_is_legal is that a proposed join
 252 can't join a rel within or partly within an RHS boundary to one outside
 253 the boundary, unless the proposed join is a LEFT join that can associate
 254 into the SpecialJoinInfo's RHS using identity 3.
 255
 256 The use of minimum Relid sets has some pitfalls; consider a query like
 257         A leftjoin (B leftjoin (C innerjoin D) on (Pbcd)) on Pa
 258 where Pa doesn't mention B/C/D at all.  In this case a naive computation
 259 would give the upper leftjoin's min LHS as {A} and min RHS as {C,D} (since
 260 we know that the innerjoin can't associate out of the leftjoin's RHS, and
 261 enforce that by including its relids in the leftjoin's min RHS).  And the
 262 lower leftjoin has min LHS of {B} and min RHS of {C,D}.  Given such
 263 information, join_is_legal would think it's okay to associate the upper
 264 join into the lower join's RHS, transforming the query to
 265         B leftjoin (A leftjoin (C innerjoin D) on Pa) on (Pbcd)
 266 which yields totally wrong answers.  We prevent that by forcing the min RHS
 267 for the upper join to include B.  This is perhaps overly restrictive, but
 268 such cases don't arise often so it's not clear that it's worth developing a
 269 more complicated system.
 270
 271
 272 Pulling Up Subqueries
 273 ---------------------
 274
 275 As we described above, a subquery appearing in the range table is planned
 276 independently and treated as a "black box" during planning of the outer
 277 query.  This is necessary when the subquery uses features such as
 278 aggregates, GROUP, or DISTINCT.  But if the subquery is just a simple
 279 scan or join, treating the subquery as a black box may produce a poor plan
 280 compared to considering it as part of the entire plan search space.
 281 Therefore, at the start of the planning process the planner looks for
 282 simple subqueries and pulls them up into the main query's jointree.
 283
 284 Pulling up a subquery may result in FROM-list joins appearing below the top
 285 of the join tree.  Each FROM-list is planned using the dynamic-programming
 286 search method described above.
 287
 288 If pulling up a subquery produces a FROM-list as a direct child of another
 289 FROM-list, then we can merge the two FROM-lists together.  Once that's
 290 done, the subquery is an absolutely integral part of the outer query and
 291 will not constrain the join tree search space at all.  However, that could
 292 result in unpleasant growth of planning time, since the dynamic-programming
 293 search has runtime exponential in the number of FROM-items considered.
 294 Therefore, we don't merge FROM-lists if the result would have too many
 295 FROM-items in one list.
 296
 297
 298 Optimizer Functions
 299 -------------------
 300
 301 The primary entry point is planner().
 302
 303 planner()
 304 set up for recursive handling of subqueries
 305 -subquery_planner()
 306  pull up sublinks and subqueries from rangetable, if possible
 307  canonicalize qual
 308      Attempt to simplify WHERE clause to the most useful form; this includes
 309      flattening nested AND/ORs and detecting clauses that are duplicated in
 310      different branches of an OR.
 311  simplify constant expressions
 312  process sublinks
 313  convert Vars of outer query levels into Params
 314 --grouping_planner()
 315   preprocess target list for non-SELECT queries
 316   handle UNION/INTERSECT/EXCEPT, GROUP BY, HAVING, aggregates,
 317         ORDER BY, DISTINCT, LIMIT
 318 --query_planner()
 319    make list of base relations used in query
 320    split up the qual into restrictions (a=1) and joins (b=c)
 321    find qual clauses that enable merge and hash joins
 322 ----make_one_rel()
 323      set_base_rel_pathlist()
 324       find seqscan and all index paths for each base relation
 325       find selectivity of columns used in joins
 326      make_rel_from_joinlist()
 327       hand off join subproblems to a plugin, GEQO, or standard_join_search()
 328 -----standard_join_search()
 329       call join_search_one_level() for each level of join tree needed
 330       join_search_one_level():
 331         For each joinrel of the prior level, do make_rels_by_clause_joins()
 332         if it has join clauses, or make_rels_by_clauseless_joins() if not.
 333         Also generate "bushy plan" joins between joinrels of lower levels.
 334       Back at standard_join_search(), generate gather paths if needed for
 335       each newly constructed joinrel, then apply set_cheapest() to extract
 336       the cheapest path for it.
 337       Loop back if this wasn't the top join level.
 338   Back at grouping_planner:
 339   do grouping (GROUP BY) and aggregation
 340   do window functions
 341   make unique (DISTINCT)
 342   do sorting (ORDER BY)
 343   do limit (LIMIT/OFFSET)
 344 Back at planner():
 345 convert finished Path tree into a Plan tree
 346 do final cleanup after planning
 347
 348
 349 Optimizer Data Structures
 350 -------------------------
 351
 352 PlannerGlobal   - global information for a single planner invocation
 353
 354 PlannerInfo     - information for planning a particular Query (we make
 355                   a separate PlannerInfo node for each sub-Query)
 356
 357 RelOptInfo      - a relation or joined relations
 358
 359  RestrictInfo   - WHERE clauses, like "x = 3" or "y = z"
 360                   (note the same structure is used for restriction and
 361                    join clauses)
 362
 363  Path           - every way to generate a RelOptInfo(sequential,index,joins)
 364   SeqScan       - represents a sequential scan plan
 365   IndexPath     - index scan
 366   BitmapHeapPath - top of a bitmapped index scan
 367   TidPath       - scan by CTID
 368   SubqueryScanPath - scan a subquery-in-FROM
 369   ForeignPath   - scan a foreign table, foreign join or foreign upper-relation
 370   CustomPath    - for custom scan providers
 371   AppendPath    - append multiple subpaths together
 372   MergeAppendPath - merge multiple subpaths, preserving their common sort order
 373   ResultPath    - a childless Result plan node (used for FROM-less SELECT)
 374   MaterialPath  - a Material plan node
 375   UniquePath    - remove duplicate rows (either by hashing or sorting)
 376   GatherPath    - collect the results of parallel workers
 377   ProjectionPath - a Result plan node with child (used for projection)
 378   SortPath      - a Sort plan node applied to some sub-path
 379   GroupPath     - a Group plan node applied to some sub-path
 380   UpperUniquePath - a Unique plan node applied to some sub-path
 381   AggPath       - an Agg plan node applied to some sub-path
 382   GroupingSetsPath - an Agg plan node used to implement GROUPING SETS
 383   MinMaxAggPath - a Result plan node with subplans performing MIN/MAX
 384   WindowAggPath - a WindowAgg plan node applied to some sub-path
 385   SetOpPath     - a SetOp plan node applied to some sub-path
 386   RecursiveUnionPath - a RecursiveUnion plan node applied to two sub-paths
 387   LockRowsPath  - a LockRows plan node applied to some sub-path
 388   ModifyTablePath - a ModifyTable plan node applied to some sub-path(s)
 389   LimitPath     - a Limit plan node applied to some sub-path
 390   NestPath      - nested-loop joins
 391   MergePath     - merge joins
 392   HashPath      - hash joins
 393
 394  EquivalenceClass - a data structure representing a set of values known equal
 395
 396  PathKey        - a data structure representing the sort ordering of a path
 397
 398 The optimizer spends a good deal of its time worrying about the ordering
 399 of the tuples returned by a path.  The reason this is useful is that by
 400 knowing the sort ordering of a path, we may be able to use that path as
 401 the left or right input of a mergejoin and avoid an explicit sort step.
 402 Nestloops and hash joins don't really care what the order of their inputs
 403 is, but mergejoin needs suitably ordered inputs.  Therefore, all paths
 404 generated during the optimization process are marked with their sort order
 405 (to the extent that it is known) for possible use by a higher-level merge.
 406
 407 It is also possible to avoid an explicit sort step to implement a user's
 408 ORDER BY clause if the final path has the right ordering already, so the
 409 sort ordering is of interest even at the top level.  grouping_planner() will
 410 look for the cheapest path with a sort order matching the desired order,
 411 then compare its cost to the cost of using the cheapest-overall path and
 412 doing an explicit sort on that.
 413
 414 When we are generating paths for a particular RelOptInfo, we discard a path
 415 if it is more expensive than another known path that has the same or better
 416 sort order.  We will never discard a path that is the only known way to
 417 achieve a given sort order (without an explicit sort, that is).  In this
 418 way, the next level up will have the maximum freedom to build mergejoins
 419 without sorting, since it can pick from any of the paths retained for its
 420 inputs.
 421
 422
 423 EquivalenceClasses
 424 ------------------
 425
 426 During the deconstruct_jointree() scan of the query's qual clauses, we look
 427 for mergejoinable equality clauses A = B whose applicability is not delayed
 428 by an outer join; these are called "equivalence clauses".  When we find
 429 one, we create an EquivalenceClass containing the expressions A and B to
 430 record this knowledge.  If we later find another equivalence clause B = C,
 431 we add C to the existing EquivalenceClass for {A B}; this may require
 432 merging two existing EquivalenceClasses.  At the end of the scan, we have
 433 sets of values that are known all transitively equal to each other.  We can
 434 therefore use a comparison of any pair of the values as a restriction or
 435 join clause (when these values are available at the scan or join, of
 436 course); furthermore, we need test only one such comparison, not all of
 437 them.  Therefore, equivalence clauses are removed from the standard qual
 438 distribution process.  Instead, when preparing a restriction or join clause
 439 list, we examine each EquivalenceClass to see if it can contribute a
 440 clause, and if so we select an appropriate pair of values to compare.  For
 441 example, if we are trying to join A's relation to C's, we can generate the
 442 clause A = C, even though this appeared nowhere explicitly in the original
 443 query.  This may allow us to explore join paths that otherwise would have
 444 been rejected as requiring Cartesian-product joins.
 445
 446 Sometimes an EquivalenceClass may contain a pseudo-constant expression
 447 (i.e., one not containing Vars or Aggs of the current query level, nor
 448 volatile functions).  In this case we do not follow the policy of
 449 dynamically generating join clauses: instead, we dynamically generate
 450 restriction clauses "var = const" wherever one of the variable members of
 451 the class can first be computed.  For example, if we have A = B and B = 42,
 452 we effectively generate the restriction clauses A = 42 and B = 42, and then
 453 we need not bother with explicitly testing the join clause A = B when the
 454 relations are joined.  In effect, all the class members can be tested at
 455 relation-scan level and there's never a need for join tests.
 456
 457 The precise technical interpretation of an EquivalenceClass is that it
 458 asserts that at any plan node where more than one of its member values
 459 can be computed, output rows in which the values are not all equal may
 460 be discarded without affecting the query result.  (We require all levels
 461 of the plan to enforce EquivalenceClasses, hence a join need not recheck
 462 equality of values that were computable by one of its children.)  For an
 463 ordinary EquivalenceClass that is "valid everywhere", we can further infer
 464 that the values are all non-null, because all mergejoinable operators are
 465 strict.  However, we also allow equivalence clauses that appear below the
 466 nullable side of an outer join to form EquivalenceClasses; for these
 467 classes, the interpretation is that either all the values are equal, or
 468 all (except pseudo-constants) have gone to null.  (This requires a
 469 limitation that non-constant members be strict, else they might not go
 470 to null when the other members do.)  Consider for example
 471
 472         SELECT *
 473           FROM a LEFT JOIN
 474                (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
 475                ON a.x = ss.y
 476           WHERE a.x = 42;
 477
 478 We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby
 479 apply c.z = 10 while scanning c.  (The reason we disallow outerjoin-delayed
 480 clauses from forming EquivalenceClasses is exactly that we want to be able
 481 to push any derived clauses as far down as possible.)  But once above the
 482 outer join it's no longer necessarily the case that b.y = 10, and thus we
 483 cannot use such EquivalenceClasses to conclude that sorting is unnecessary
 484 (see discussion of PathKeys below).
 485
 486 In this example, notice also that a.x = ss.y (really a.x = b.y) is not an
 487 equivalence clause because its applicability to b is delayed by the outer
 488 join; thus we do not try to insert b.y into the equivalence class {a.x 42}.
 489 But since we see that a.x has been equated to 42 above the outer join, we
 490 are able to form a below-outer-join class {b.y 42}; this restriction can be
 491 added because no b/c row not having b.y = 42 can contribute to the result
 492 of the outer join, and so we need not compute such rows.  Now this class
 493 will get merged with {b.y c.z 10}, leading to the contradiction 10 = 42,
 494 which lets the planner deduce that the b/c join need not be computed at all
 495 because none of its rows can contribute to the outer join.  (This gets
 496 implemented as a gating Result filter, since more usually the potential
 497 contradiction involves Param values rather than just Consts, and thus has
 498 to be checked at runtime.)
 499
 500 To aid in determining the sort ordering(s) that can work with a mergejoin,
 501 we mark each mergejoinable clause with the EquivalenceClasses of its left
 502 and right inputs.  For an equivalence clause, these are of course the same
 503 EquivalenceClass.  For a non-equivalence mergejoinable clause (such as an
 504 outer-join qualification), we generate two separate EquivalenceClasses for
 505 the left and right inputs.  This may result in creating single-item
 506 equivalence "classes", though of course these are still subject to merging
 507 if other equivalence clauses are later found to bear on the same
 508 expressions.
 509
 510 Another way that we may form a single-item EquivalenceClass is in creation
 511 of a PathKey to represent a desired sort order (see below).  This is a bit
 512 different from the above cases because such an EquivalenceClass might
 513 contain an aggregate function or volatile expression.  (A clause containing
 514 a volatile function will never be considered mergejoinable, even if its top
 515 operator is mergejoinable, so there is no way for a volatile expression to
 516 get into EquivalenceClasses otherwise.  Aggregates are disallowed in WHERE
 517 altogether, so will never be found in a mergejoinable clause.)  This is just
 518 a convenience to maintain a uniform PathKey representation: such an
 519 EquivalenceClass will never be merged with any other.  Note in particular
 520 that a single-item EquivalenceClass {a.x} is *not* meant to imply an
 521 assertion that a.x = a.x; the practical effect of this is that a.x could
 522 be NULL.
 523
 524 An EquivalenceClass also contains a list of btree opfamily OIDs, which
 525 determines what the equalities it represents actually "mean".  All the
 526 equivalence clauses that contribute to an EquivalenceClass must have
 527 equality operators that belong to the same set of opfamilies.  (Note: most
 528 of the time, a particular equality operator belongs to only one family, but
 529 it's possible that it belongs to more than one.  We keep track of all the
 530 families to ensure that we can make use of an index belonging to any one of
 531 the families for mergejoin purposes.)
 532
 533 An EquivalenceClass can contain "em_is_child" members, which are copies
 534 of members that contain appendrel parent relation Vars, transposed to
 535 contain the equivalent child-relation variables or expressions.  These
 536 members are *not* full-fledged members of the EquivalenceClass and do not
 537 affect the class's overall properties at all.  They are kept only to
 538 simplify matching of child-relation expressions to EquivalenceClasses.
 539 Most operations on EquivalenceClasses should ignore child members.
 540
 541
 542 PathKeys
 543 --------
 544
 545 The PathKeys data structure represents what is known about the sort order
 546 of the tuples generated by a particular Path.  A path's pathkeys field is a
 547 list of PathKey nodes, where the n'th item represents the n'th sort key of
 548 the result.  Each PathKey contains these fields:
 549
 550         * a reference to an EquivalenceClass
 551         * a btree opfamily OID (must match one of those in the EC)
 552         * a sort direction (ascending or descending)
 553         * a nulls-first-or-last flag
 554
 555 The EquivalenceClass represents the value being sorted on.  Since the
 556 various members of an EquivalenceClass are known equal according to the
 557 opfamily, we can consider a path sorted by any one of them to be sorted by
 558 any other too; this is what justifies referencing the whole
 559 EquivalenceClass rather than just one member of it.
 560
 561 In single/base relation RelOptInfo's, the Paths represent various ways
 562 of scanning the relation and the resulting ordering of the tuples.
 563 Sequential scan Paths have NIL pathkeys, indicating no known ordering.
 564 Index scans have Path.pathkeys that represent the chosen index's ordering,
 565 if any.  A single-key index would create a single-PathKey list, while a
 566 multi-column index generates a list with one element per index column.
 567 (Actually, since an index can be scanned either forward or backward, there
 568 are two possible sort orders and two possible PathKey lists it can
 569 generate.)
 570
 571 Note that a bitmap scan has NIL pathkeys since we can say nothing about
 572 the overall order of its result.  Also, an indexscan on an unordered type
 573 of index generates NIL pathkeys.  However, we can always create a pathkey
 574 by doing an explicit sort.  The pathkeys for a Sort plan's output just
 575 represent the sort key fields and the ordering operators used.
 576
 577 Things get more interesting when we consider joins.  Suppose we do a
 578 mergejoin between A and B using the mergeclause A.X = B.Y.  The output
 579 of the mergejoin is sorted by X --- but it is also sorted by Y.  Again,
 580 this can be represented by a PathKey referencing an EquivalenceClass
 581 containing both X and Y.
 582
 583 With a little further thought, it becomes apparent that nestloop joins
 584 can also produce sorted output.  For example, if we do a nestloop join
 585 between outer relation A and inner relation B, then any pathkeys relevant
 586 to A are still valid for the join result: we have not altered the order of
 587 the tuples from A.  Even more interesting, if there was an equivalence clause
 588 A.X=B.Y, and A.X was a pathkey for the outer relation A, then we can assert
 589 that B.Y is a pathkey for the join result; X was ordered before and still
 590 is, and the joined values of Y are equal to the joined values of X, so Y
 591 must now be ordered too.  This is true even though we used neither an
 592 explicit sort nor a mergejoin on Y.  (Note: hash joins cannot be counted
 593 on to preserve the order of their outer relation, because the executor
 594 might decide to "batch" the join, so we always set pathkeys to NIL for
 595 a hashjoin path.)  Exception: a RIGHT or FULL join doesn't preserve the
 596 ordering of its outer relation, because it might insert nulls at random
 597 points in the ordering.
 598
 599 In general, we can justify using EquivalenceClasses as the basis for
 600 pathkeys because, whenever we scan a relation containing multiple
 601 EquivalenceClass members or join two relations each containing
 602 EquivalenceClass members, we apply restriction or join clauses derived from
 603 the EquivalenceClass.  This guarantees that any two values listed in the
 604 EquivalenceClass are in fact equal in all tuples emitted by the scan or
 605 join, and therefore that if the tuples are sorted by one of the values,
 606 they can be considered sorted by any other as well.  It does not matter
 607 whether the test clause is used as a mergeclause, or merely enforced
 608 after-the-fact as a qpqual filter.
 609
 610 Note that there is no particular difficulty in labeling a path's sort
 611 order with a PathKey referencing an EquivalenceClass that contains
 612 variables not yet joined into the path's output.  We can simply ignore
 613 such entries as not being relevant (yet).  This makes it possible to
 614 use the same EquivalenceClasses throughout the join planning process.
 615 In fact, by being careful not to generate multiple identical PathKey
 616 objects, we can reduce comparison of EquivalenceClasses and PathKeys
 617 to simple pointer comparison, which is a huge savings because add_path
 618 has to make a large number of PathKey comparisons in deciding whether
 619 competing Paths are equivalently sorted.
 620
 621 Pathkeys are also useful to represent an ordering that we wish to achieve,
 622 since they are easily compared to the pathkeys of a potential candidate
 623 path.  So, SortGroupClause lists are turned into pathkeys lists for use
 624 inside the optimizer.
 625
 626 An additional refinement we can make is to insist that canonical pathkey
 627 lists (sort orderings) do not mention the same EquivalenceClass more than
 628 once.  For example, in all these cases the second sort column is redundant,
 629 because it cannot distinguish values that are the same according to the
 630 first sort column:
 631         SELECT ... ORDER BY x, x
 632         SELECT ... ORDER BY x, x DESC
 633         SELECT ... WHERE x = y ORDER BY x, y
 634 Although a user probably wouldn't write "ORDER BY x,x" directly, such
 635 redundancies are more probable once equivalence classes have been
 636 considered.  Also, the system may generate redundant pathkey lists when
 637 computing the sort ordering needed for a mergejoin.  By eliminating the
 638 redundancy, we save time and improve planning, since the planner will more
 639 easily recognize equivalent orderings as being equivalent.
 640
 641 Another interesting property is that if the underlying EquivalenceClass
 642 contains a constant and is not below an outer join, then the pathkey is
 643 completely redundant and need not be sorted by at all!  Every row must
 644 contain the same constant value, so there's no need to sort.  (If the EC is
 645 below an outer join, we still have to sort, since some of the rows might
 646 have gone to null and others not.  In this case we must be careful to pick
 647 a non-const member to sort by.  The assumption that all the non-const
 648 members go to null at the same plan level is critical here, else they might
 649 not produce the same sort order.)  This might seem pointless because users
 650 are unlikely to write "... WHERE x = 42 ORDER BY x", but it allows us to
 651 recognize when particular index columns are irrelevant to the sort order:
 652 if we have "... WHERE x = 42 ORDER BY y", scanning an index on (x,y)
 653 produces correctly ordered data without a sort step.  We used to have very
 654 ugly ad-hoc code to recognize that in limited contexts, but discarding
 655 constant ECs from pathkeys makes it happen cleanly and automatically.
 656
 657 You might object that a below-outer-join EquivalenceClass doesn't always
 658 represent the same values at every level of the join tree, and so using
 659 it to uniquely identify a sort order is dubious.  This is true, but we
 660 can avoid dealing with the fact explicitly because we always consider that
 661 an outer join destroys any ordering of its nullable inputs.  Thus, even
 662 if a path was sorted by {a.x} below an outer join, we'll re-sort if that
 663 sort ordering was important; and so using the same PathKey for both sort
 664 orderings doesn't create any real problem.
 665
 666
 667 Order of processing for EquivalenceClasses and PathKeys
 668 -------------------------------------------------------
 669
 670 As alluded to above, there is a specific sequence of phases in the
 671 processing of EquivalenceClasses and PathKeys during planning.  During the
 672 initial scanning of the query's quals (deconstruct_jointree followed by
 673 reconsider_outer_join_clauses), we construct EquivalenceClasses based on
 674 mergejoinable clauses found in the quals.  At the end of this process,
 675 we know all we can know about equivalence of different variables, so
 676 subsequently there will be no further merging of EquivalenceClasses.
 677 At that point it is possible to consider the EquivalenceClasses as
 678 "canonical" and build canonical PathKeys that reference them.  At this
 679 time we construct PathKeys for the query's ORDER BY and related clauses.
 680 (Any ordering expressions that do not appear elsewhere will result in
 681 the creation of new EquivalenceClasses, but this cannot result in merging
 682 existing classes, so canonical-ness is not lost.)
 683
 684 Because all the EquivalenceClasses are known before we begin path
 685 generation, we can use them as a guide to which indexes are of interest:
 686 if an index's column is not mentioned in any EquivalenceClass then that
 687 index's sort order cannot possibly be helpful for the query.  This allows
 688 short-circuiting of much of the processing of create_index_paths() for
 689 irrelevant indexes.
 690
 691 There are some cases where planner.c constructs additional
 692 EquivalenceClasses and PathKeys after query_planner has completed.
 693 In these cases, the extra ECs/PKs are needed to represent sort orders
 694 that were not considered during query_planner.  Such situations should be
 695 minimized since it is impossible for query_planner to return a plan
 696 producing such a sort order, meaning an explicit sort will always be needed.
 697 Currently this happens only for queries involving multiple window functions
 698 with different orderings, for which extra sorts are needed anyway.
 699
 700
 701 Parameterized Paths
 702 -------------------
 703
 704 The naive way to join two relations using a clause like WHERE A.X = B.Y
 705 is to generate a nestloop plan like this:
 706
 707         NestLoop
 708                 Filter: A.X = B.Y
 709                 -> Seq Scan on A
 710                 -> Seq Scan on B
 711
 712 We can make this better by using a merge or hash join, but it still
 713 requires scanning all of both input relations.  If A is very small and B is
 714 very large, but there is an index on B.Y, it can be enormously better to do
 715 something like this:
 716
 717         NestLoop
 718                 -> Seq Scan on A
 719                 -> Index Scan using B_Y_IDX on B
 720                         Index Condition: B.Y = A.X
 721
 722 Here, we are expecting that for each row scanned from A, the nestloop
 723 plan node will pass down the current value of A.X into the scan of B.
 724 That allows the indexscan to treat A.X as a constant for any one
 725 invocation, and thereby use it as an index key.  This is the only plan type
 726 that can avoid fetching all of B, and for small numbers of rows coming from
 727 A, that will dominate every other consideration.  (As A gets larger, this
 728 gets less attractive, and eventually a merge or hash join will win instead.
 729 So we have to cost out all the alternatives to decide what to do.)
 730
 731 It can be useful for the parameter value to be passed down through
 732 intermediate layers of joins, for example:
 733
 734         NestLoop
 735                 -> Seq Scan on A
 736                 Hash Join
 737                         Join Condition: B.Y = C.W
 738                         -> Seq Scan on B
 739                         -> Index Scan using C_Z_IDX on C
 740                                 Index Condition: C.Z = A.X
 741
 742 If all joins are plain inner joins then this is usually unnecessary,
 743 because it's possible to reorder the joins so that a parameter is used
 744 immediately below the nestloop node that provides it.  But in the
 745 presence of outer joins, such join reordering may not be possible.
 746
 747 Also, the bottom-level scan might require parameters from more than one
 748 other relation.  In principle we could join the other relations first
 749 so that all the parameters are supplied from a single nestloop level.
 750 But if those other relations have no join clause in common (which is
 751 common in star-schema queries for instance), the planner won't consider
 752 joining them directly to each other.  In such a case we need to be able
 753 to create a plan like
 754
 755     NestLoop
 756         -> Seq Scan on SmallTable1 A
 757         NestLoop
 758             -> Seq Scan on SmallTable2 B
 759             NestLoop
 760                 -> Index Scan using XYIndex on LargeTable C
 761                       Index Condition: C.X = A.AID and C.Y = B.BID
 762
 763 so we should be willing to pass down A.AID through a join even though
 764 there is no join order constraint forcing the plan to look like this.
 765
 766 Before version 9.2, Postgres used ad-hoc methods for planning and
 767 executing nestloop queries of this kind, and those methods could not
 768 handle passing parameters down through multiple join levels.
 769
 770 To plan such queries, we now use a notion of a "parameterized path",
 771 which is a path that makes use of a join clause to a relation that's not
 772 scanned by the path.  In the example two above, we would construct a
 773 path representing the possibility of doing this:
 774
 775         -> Index Scan using C_Z_IDX on C
 776                 Index Condition: C.Z = A.X
 777
 778 This path will be marked as being parameterized by relation A.  (Note that
 779 this is only one of the possible access paths for C; we'd still have a
 780 plain unparameterized seqscan, and perhaps other possibilities.)  The
 781 parameterization marker does not prevent joining the path to B, so one of
 782 the paths generated for the joinrel {B C} will represent
 783
 784         Hash Join
 785                 Join Condition: B.Y = C.W
 786                 -> Seq Scan on B
 787                 -> Index Scan using C_Z_IDX on C
 788                         Index Condition: C.Z = A.X
 789
 790 This path is still marked as being parameterized by A.  When we attempt to
 791 join {B C} to A to form the complete join tree, such a path can only be
 792 used as the inner side of a nestloop join: it will be ignored for other
 793 possible join types.  So we will form a join path representing the query
 794 plan shown above, and it will compete in the usual way with paths built
 795 from non-parameterized scans.
 796
 797 While all ordinary paths for a particular relation generate the same set
 798 of rows (since they must all apply the same set of restriction clauses),
 799 parameterized paths typically generate fewer rows than less-parameterized
 800 paths, since they have additional clauses to work with.  This means we
 801 must consider the number of rows generated as an additional figure of
 802 merit.  A path that costs more than another, but generates fewer rows,
 803 must be kept since the smaller number of rows might save work at some
 804 intermediate join level.  (It would not save anything if joined
 805 immediately to the source of the parameters.)
 806
 807 To keep cost estimation rules relatively simple, we make an implementation
 808 restriction that all paths for a given relation of the same parameterization
 809 (i.e., the same set of outer relations supplying parameters) must have the
 810 same rowcount estimate.  This is justified by insisting that each such path
 811 apply *all* join clauses that are available with the named outer relations.
 812 Different paths might, for instance, choose different join clauses to use
 813 as index clauses; but they must then apply any other join clauses available
 814 from the same outer relations as filter conditions, so that the set of rows
 815 returned is held constant.  This restriction doesn't degrade the quality of
 816 the finished plan: it amounts to saying that we should always push down
 817 movable join clauses to the lowest possible evaluation level, which is a
 818 good thing anyway.  The restriction is useful in particular to support
 819 pre-filtering of join paths in add_path_precheck.  Without this rule we
 820 could never reject a parameterized path in advance of computing its rowcount
 821 estimate, which would greatly reduce the value of the pre-filter mechanism.
 822
 823 To limit planning time, we have to avoid generating an unreasonably large
 824 number of parameterized paths.  We do this by only generating parameterized
 825 relation scan paths for index scans, and then only for indexes for which
 826 suitable join clauses are available.  There are also heuristics in join
 827 planning that try to limit the number of parameterized paths considered.
 828
 829 In particular, there's been a deliberate policy decision to favor hash
 830 joins over merge joins for parameterized join steps (those occurring below
 831 a nestloop that provides parameters to the lower join's inputs).  While we
 832 do not ignore merge joins entirely, joinpath.c does not fully explore the
 833 space of potential merge joins with parameterized inputs.  Also, add_path
 834 treats parameterized paths as having no pathkeys, so that they compete
 835 only on cost and rowcount; they don't get preference for producing a
 836 special sort order.  This creates additional bias against merge joins,
 837 since we might discard a path that could have been useful for performing
 838 a merge without an explicit sort step.  Since a parameterized path must
 839 ultimately be used on the inside of a nestloop, where its sort order is
 840 uninteresting, these choices do not affect any requirement for the final
 841 output order of a query --- they only make it harder to use a merge join
 842 at a lower level.  The savings in planning work justifies that.
 843
 844 Similarly, parameterized paths do not normally get preference in add_path
 845 for having cheap startup cost; that's seldom of much value when on the
 846 inside of a nestloop, so it seems not worth keeping extra paths solely for
 847 that.  An exception occurs for parameterized paths for the RHS relation of
 848 a SEMI or ANTI join: in those cases, we can stop the inner scan after the
 849 first match, so it's primarily startup not total cost that we care about.
 850
 851
 852 LATERAL subqueries
 853 ------------------
 854
 855 As of 9.3 we support SQL-standard LATERAL references from subqueries in
 856 FROM (and also functions in FROM).  The planner implements these by
 857 generating parameterized paths for any RTE that contains lateral
 858 references.  In such cases, *all* paths for that relation will be
 859 parameterized by at least the set of relations used in its lateral
 860 references.  (And in turn, join relations including such a subquery might
 861 not have any unparameterized paths.)  All the other comments made above for
 862 parameterized paths still apply, though; in particular, each such path is
 863 still expected to enforce any join clauses that can be pushed down to it,
 864 so that all paths of the same parameterization have the same rowcount.
 865
 866 We also allow LATERAL subqueries to be flattened (pulled up into the parent
 867 query) by the optimizer, but only when this does not introduce lateral
 868 references into JOIN/ON quals that would refer to relations outside the
 869 lowest outer join at/above that qual.  The semantics of such a qual would
 870 be unclear.  Note that even with this restriction, pullup of a LATERAL
 871 subquery can result in creating PlaceHolderVars that contain lateral
 872 references to relations outside their syntactic scope.  We still evaluate
 873 such PHVs at their syntactic location or lower, but the presence of such a
 874 PHV in the quals or targetlist of a plan node requires that node to appear
 875 on the inside of a nestloop join relative to the rel(s) supplying the
 876 lateral reference.  (Perhaps now that that stuff works, we could relax the
 877 pullup restriction?)
 878
 879
 880 Post scan/join planning
 881 -----------------------
 882
 883 So far we have discussed only scan/join planning, that is, implementation
 884 of the FROM and WHERE clauses of a SQL query.  But the planner must also
 885 determine how to deal with GROUP BY, aggregation, and other higher-level
 886 features of queries; and in many cases there are multiple ways to do these
 887 steps and thus opportunities for optimization choices.  These steps, like
 888 scan/join planning, are handled by constructing Paths representing the
 889 different ways to do a step, then choosing the cheapest Path.
 890
 891 Since all Paths require a RelOptInfo as "parent", we create RelOptInfos
 892 representing the outputs of these upper-level processing steps.  These
 893 RelOptInfos are mostly dummy, but their pathlist lists hold all the Paths
 894 considered useful for each step.  Currently, we may create these types of
 895 additional RelOptInfos during upper-level planning:
 896
 897 UPPERREL_SETOP          result of UNION/INTERSECT/EXCEPT, if any
 898 UPPERREL_GROUP_AGG      result of grouping/aggregation, if any
 899 UPPERREL_WINDOW         result of window functions, if any
 900 UPPERREL_DISTINCT       result of "SELECT DISTINCT", if any
 901 UPPERREL_ORDERED        result of ORDER BY, if any
 902 UPPERREL_FINAL          result of any remaining top-level actions
 903
 904 UPPERREL_FINAL is used to represent any final processing steps, currently
 905 LockRows (SELECT FOR UPDATE), LIMIT/OFFSET, and ModifyTable.  There is no
 906 flexibility about the order in which these steps are done, and thus no need
 907 to subdivide this stage more finely.
 908
 909 These "upper relations" are identified by the UPPERREL enum values shown
 910 above, plus a relids set, which allows there to be more than one upperrel
 911 of the same kind.  We use NULL for the relids if there's no need for more
 912 than one upperrel of the same kind.  Currently, in fact, the relids set
 913 is vestigial because it's always NULL, but that's expected to change in
 914 the future.  For example, in planning set operations, we might need the
 915 relids to denote which subset of the leaf SELECTs has been combined in a
 916 particular group of Paths that are competing with each other.
 917
 918 The result of subquery_planner() is always returned as a set of Paths
 919 stored in the UPPERREL_FINAL rel with NULL relids.  The other types of
 920 upperrels are created only if needed for the particular query.
 921
 922 The upper-relation infrastructure is designed so that things will work
 923 properly if a particular upper relation is created and Paths are added
 924 to it sooner than would normally happen.  This allows, for example,
 925 for an FDW's GetForeignPaths function to insert a Path representing
 926 remote aggregation into the UPPERREL_GROUP_AGG upperrel, if it notices
 927 that the query represents an aggregation that could be done entirely on
 928 the foreign server.  That Path will then compete with Paths representing
 929 local aggregation on a regular scan of the foreign table, once the core
 930 planner reaches the point of considering aggregation.  (In practice,
 931 it will usually be more convenient for FDWs to detect such cases in a
 932 GetForeignUpperPaths callback; but that still represents injecting a
 933 Path before the core code has touched the corresponding upperrel.)
 934
 935
 936 Parallel Query and Partial Paths
 937 --------------------------------
 938
 939 Parallel query involves dividing up the work that needs to be performed
 940 either by an entire query or some portion of the query in such a way that
 941 some of that work can be done by one or more worker processes, which are
 942 called parallel workers.  Parallel workers are a subtype of dynamic
 943 background workers; see src/backend/access/transam/README.parallel for a
 944 fuller description.  Academic literature on parallel query suggests that
 945 that parallel execution strategies can be divided into essentially two
 946 categories: pipelined parallelism, where the execution of the query is
 947 divided into multiple stages and each stage is handled by a separate
 948 process; and partitioning parallelism, where the data is split between
 949 multiple processes and each process handles a subset of it.  The
 950 literature, however, suggests that gains from pipeline parallelism are
 951 often very limited due to the difficulty of avoiding pipeline stalls.
 952 Consequently, we do not currently attempt to generate query plans that
 953 use this technique.
 954
 955 Instead, we focus on partitioning parallelism, which does not require
 956 that the underlying table be partitioned.  It only requires that (1)
 957 there is some method of dividing the data from at least one of the base
 958 tables involved in the relation across multiple processes, (2) allowing
 959 each process to handle its own portion of the data, and then (3)
 960 collecting the results.  Requirements (2) and (3) is satisfied by the
 961 executor node Gather, which launches any number of worker processes and
 962 executes its single child plan in all of them (and perhaps in the leader
 963 also, if the children aren't generating enough data to keep the leader
 964 busy).  Requirement (1) is handled by the SeqScan node: when invoked
 965 with parallel_aware = true, this node will, in effect, partition the
 966 table on a block by block basis, returning a subset of the tuples from
 967 the relation in each worker where that SeqScan is executed.  A similar
 968 scheme could be (and probably should be) implemented for bitmap heap
 969 scans.
 970
 971 Just as we do for non-parallel access methods, we build Paths to
 972 represent access strategies that can be used in a parallel plan.  These
 973 are, in essence, the same strategies that are available in the
 974 non-parallel plan, but there is an important difference: a path that
 975 will run beneath a Gather node returns only a subset of the query
 976 results in each worker, not all of them.  To form a path that can
 977 actually be executed, the (rather large) cost of the Gather node must be
 978 accounted for.  For this reason among others, paths intended to run
 979 beneath a Gather node - which we call "partial" paths since they return
 980 only a subset of the results in each worker - must be kept separate from
 981 ordinary paths (see RelOptInfo's partial_pathlist and the function
 982 add_partial_path).
 983
 984 One of the keys to making parallel query effective is to run as much of
 985 the query in parallel as possible.  Therefore, we expect it to generally
 986 be desirable to postpone the Gather stage until as near to the top of the
 987 plan as possible.  Expanding the range of cases in which more work can be
 988 pushed below the Gather (and costing them accurately) is likely to keep us
 989 busy for a long time to come.