--- /dev/null
+<!--
+$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.1 2000/12/16 02:29:36 tgl Exp $
+-->
+
+ <chapter id="performance-tips">
+ <title>Performance Tips</title>
+
+ <para>
+ Query performance can be affected by many things. Some of these can
+ be manipulated by the user, while others are fundamental to the underlying
+ design of the system. This chapter provides some hints about understanding
+ and tuning <productname>Postgres</productname> performance.
+ </para>
+
+ <sect1 id="using-explain">
+ <title>Using <command>EXPLAIN</command></title>
+
+ <note>
+ <title>Author</title>
+ <para>
+ Written by Tom Lane, from e-mail dated 2000-03-27.
+ </para>
+ </note>
+
+ <para>
+ <productname>Postgres</productname> devises a <firstterm>query
+ plan</firstterm> for each query it is given. Choosing the right
+ plan to match the query structure and the properties of the data
+ is absolutely critical for good performance. You can use the
+ <command>EXPLAIN</command> command to see what query plan the system
+ creates for any query. Unfortunately,
+ plan-reading is an art that deserves a tutorial, and I haven't
+ had time to write one. Here is some quick & dirty explanation.
+ </para>
+
+ <para>
+ The numbers that are currently quoted by EXPLAIN are:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ Estimated start-up cost (time expended before output scan can start,
+ eg, time to do the sorting in a SORT node).
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Estimated total cost (if all tuples are retrieved, which they may not
+ be --- a query with a LIMIT will stop short of paying the total cost,
+ for example).
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Estimated number of rows output by this plan node (again, without
+ regard for any LIMIT).
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Estimated average width (in bytes) of rows output by this plan
+ node.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>
+ The costs are measured in units of disk page fetches. (CPU effort
+ estimates are converted into disk-page units using some
+ fairly arbitrary fudge-factors. If you want to experiment with these
+ factors, see the list of run-time configuration parameters in the
+ <citetitle>Administrator's Guide</citetitle>.)
+ </para>
+
+ <para>
+ It's important to note that the cost of an upper-level node includes
+ the cost of all its child nodes. It's also important to realize that
+ the cost only reflects things that the planner/optimizer cares about.
+ In particular, the cost does not consider the time spent transmitting
+ result tuples to the frontend --- which could be a pretty dominant
+ factor in the true elapsed time, but the planner ignores it because
+ it cannot change it by altering the plan. (Every correct plan will
+ output the same tuple set, we trust.)
+ </para>
+
+ <para>
+ Rows output is a little tricky because it is <emphasis>not</emphasis> the
+ number of rows
+ processed/scanned by the query --- it is usually less, reflecting the
+ estimated selectivity of any WHERE-clause constraints that are being
+ applied at this node. Ideally the top-level rows estimate will
+ approximate the number of rows actually returned, updated, or deleted
+ by the query (again, without considering the effects of LIMIT).
+ </para>
+
+ <para>
+ Average width is pretty bogus because the thing really doesn't have
+ any idea of the average length of variable-length columns. I'm thinking
+ about improving that in the future, but it may not be worth the trouble,
+ because the width isn't used for very much.
+ </para>
+
+ <para>
+ Here are some examples (using the regress test database after a
+ vacuum analyze, and almost-7.0 sources):
+
+ <programlisting>
+regression=# explain select * from tenk1;
+NOTICE: QUERY PLAN:
+
+Seq Scan on tenk1 (cost=0.00..333.00 rows=10000 width=148)
+ </programlisting>
+ </para>
+
+ <para>
+ This is about as straightforward as it gets. If you do
+
+ <programlisting>
+select * from pg_class where relname = 'tenk1';
+ </programlisting>
+
+ you'll find out that tenk1 has 233 disk
+ pages and 10000 tuples. So the cost is estimated at 233 block
+ reads, defined as 1.0 apiece, plus 10000 * cpu_tuple_cost which is
+ currently 0.01 (try <command>show cpu_tuple_cost</command>).
+ </para>
+
+ <para>
+ Now let's modify the query to add a qualification clause:
+
+ <programlisting>
+regression=# explain select * from tenk1 where unique1 < 1000;
+NOTICE: QUERY PLAN:
+
+Seq Scan on tenk1 (cost=0.00..358.00 rows=1000 width=148)
+ </programlisting>
+
+ The estimate of output rows has gone down because of the WHERE clause.
+ (This estimate is uncannily accurate because tenk1 is a particularly
+ simple case --- the unique1 column has 10000 distinct values ranging
+ from 0 to 9999, so the estimator's linear interpolation between min and
+ max column values is dead-on.) However, the scan will still have to
+ visit all 10000 rows, so the cost hasn't decreased; in fact it has gone
+ up a bit to reflect the extra CPU time spent checking the WHERE
+ condition.
+ </para>
+
+ <para>
+ Modify the query to restrict the qualification even more:
+
+ <programlisting>
+regression=# explain select * from tenk1 where unique1 < 100;
+NOTICE: QUERY PLAN:
+
+Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.35 rows=100 width=148)
+ </programlisting>
+
+ and you will see that if we make the WHERE condition selective
+ enough, the planner will
+ eventually decide that an indexscan is cheaper than a sequential scan.
+ This plan will only have to visit 100 tuples because of the index,
+ so it wins despite the fact that each individual fetch is expensive.
+ </para>
+
+ <para>
+ Add another condition to the qualification:
+
+ <programlisting>
+regression=# explain select * from tenk1 where unique1 < 100 and
+regression-# stringu1 = 'xxx';
+NOTICE: QUERY PLAN:
+
+Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.60 rows=1 width=148)
+ </programlisting>
+
+ The added clause "stringu1 = 'xxx'" reduces the output-rows estimate,
+ but not the cost because we still have to visit the same set of tuples.
+ </para>
+
+ <para>
+ Let's try joining two tables, using the fields we have been discussing:
+
+ <programlisting>
+regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 < 100
+regression-# and t1.unique2 = t2.unique2;
+NOTICE: QUERY PLAN:
+
+Nested Loop (cost=0.00..144.07 rows=100 width=296)
+ -> Index Scan using tenk1_unique1 on tenk1 t1
+ (cost=0.00..89.35 rows=100 width=148)
+ -> Index Scan using tenk2_unique2 on tenk2 t2
+ (cost=0.00..0.53 rows=1 width=148)
+ </programlisting>
+ </para>
+
+ <para>
+ In this nested-loop join, the outer scan is the same indexscan we had
+ in the example before last, and so its cost and row count are the same
+ because we are applying the "unique1 < 100" WHERE clause at that node.
+ The "t1.unique2 = t2.unique2" clause isn't relevant yet, so it doesn't
+ affect the outer scan's row count. For the inner scan, the
+ current
+ outer-scan tuple's unique2 value is plugged into the inner indexscan
+ to produce an indexqual like
+ "t2.unique2 = <replaceable>constant</replaceable>". So we get the
+ same inner-scan plan and costs that we'd get from, say, "explain select
+ * from tenk2 where unique2 = 42". The loop node's costs are then set
+ on the basis of the outer scan's cost, plus one repetition of the
+ inner scan for each outer tuple (100 * 0.53, here), plus a little CPU
+ time for join processing.
+ </para>
+
+ <para>
+ In this example the loop's output row count is the same as the product
+ of the two scans' row counts, but that's not true in general, because
+ in general you can have WHERE clauses that mention both relations and
+ so can only be applied at the join point, not to either input scan.
+ For example, if we added "WHERE ... AND t1.hundred < t2.hundred",
+ that'd decrease the output row count of the join node, but not change
+ either input scan.
+ </para>
+
+ <para>
+ One way to look at variant plans is to force the planner to disregard
+ whatever strategy it thought was the winner, using the enable/disable
+ flags for each plan type. (This is a crude tool, but useful. See
+ also <xref linkend="explicit-joins">.)
+
+ <programlisting>
+regression=# set enable_nestloop = off;
+SET VARIABLE
+regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 < 100
+regression-# and t1.unique2 = t2.unique2;
+NOTICE: QUERY PLAN:
+
+Hash Join (cost=89.60..574.10 rows=100 width=296)
+ -> Seq Scan on tenk2 t2
+ (cost=0.00..333.00 rows=10000 width=148)
+ -> Hash (cost=89.35..89.35 rows=100 width=148)
+ -> Index Scan using tenk1_unique1 on tenk1 t1
+ (cost=0.00..89.35 rows=100 width=148)
+ </programlisting>
+
+ This plan proposes to extract the 100 interesting rows of tenk1
+ using ye same olde indexscan, stash them into an in-memory hash table,
+ and then do a sequential scan of tenk2, probing into the hash table
+ for possible matches of "t1.unique2 = t2.unique2" at each tenk2 tuple.
+ The cost to read tenk1 and set up the hash table is entirely start-up
+ cost for the hash join, since we won't get any tuples out until we can
+ start reading tenk2. The total time estimate for the join also
+ includes a pretty hefty charge for CPU time to probe the hash table
+ 10000 times. Note, however, that we are NOT charging 10000 times 89.35;
+ the hash table setup is only done once in this plan type.
+ </para>
+ </sect1>
+
+ <sect1 id="explicit-joins">
+ <title>Controlling the Planner with Explicit JOINs</title>
+
+ <para>
+ Beginning with <productname>Postgres</productname> 7.1 it is possible
+ to control the query planner to some extent by using explicit JOIN
+ syntax. To see why this matters, we first need some background.
+ </para>
+
+ <para>
+ In a simple join query, such as
+ <programlisting>
+SELECT * FROM a,b,c WHERE a.id = b.id AND b.ref = c.id;
+ </programlisting>
+ the planner is free to join the given tables in any order. For example,
+ it could generate a query plan that joins A to B, using the WHERE clause
+ a.id = b.id, and then joins C to this joined table, using the other
+ WHERE clause. Or it could join B to C and then join A to that result.
+ Or it could join A to C and then join them with B --- but that would
+ be inefficient, since the full Cartesian product of A and C would have
+ to be formed, there being no applicable WHERE clause to allow optimization
+ of the join.
+ (All joins in the <productname>Postgres</productname> executor happen
+ between two input tables, so it's necessary to build up the result in one
+ or another of these fashions.) The important point is that these different
+ join possibilities give semantically equivalent results but may have hugely
+ different execution costs. Therefore, the planner will explore all of them
+ to try to find the most efficient query plan.
+ </para>
+
+ <para>
+ When a query only involves two or three tables, there aren't many join
+ orders to worry about. But the number of possible join orders grows
+ exponentially as the number of tables expands. Beyond ten or so input
+ tables it's no longer practical to do an exhaustive search of all the
+ possibilities, and even for six or seven tables planning may take an
+ annoyingly long time. When there are too many input tables, the
+ <productname>Postgres</productname> planner will switch from exhaustive
+ search to a <firstterm>genetic</firstterm> probabilistic search
+ through a limited number of possibilities. (The switchover threshold is
+ set by the GEQO_THRESHOLD run-time
+ parameter described in the <citetitle>Administrator's Guide</citetitle>.)
+ The genetic search takes less time, but it won't
+ necessarily find the best possible plan.
+ </para>
+
+ <para>
+ When the query involves outer joins, the planner has much less freedom
+ than it does for plain (inner) joins. For example, consider
+ <programlisting>
+SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
+ </programlisting>
+ Although this query's restrictions are superficially similar to the
+ previous example, the semantics are different because a row must be
+ emitted for each row of A that has no matching row in the join of B and C.
+ Therefore the planner has no choice of join order here: it must join
+ B to C and then join A to that result. Accordingly, this query takes
+ less time to plan than the previous query.
+ </para>
+
+ <para>
+ In <productname>Postgres</productname> 7.1, the planner treats all
+ explicit JOIN syntaxes as constraining the join order, even though
+ it is not logically necessary to make such a constraint for inner
+ joins. Therefore, although all of these queries give the same result:
+ <programlisting>
+SELECT * FROM a,b,c WHERE a.id = b.id AND b.ref = c.id;
+SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;
+SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
+ </programlisting>
+ the second and third take less time to plan than the first. This effect
+ is not worth worrying about for only three tables, but it can be a
+ lifesaver with many tables.
+ </para>
+
+ <para>
+ You do not need to constrain the join order completely in order to
+ cut search time, because it's OK to use JOIN operators in a plain
+ FROM list. For example,
+ <programlisting>
+SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;
+ </programlisting>
+ forces the planner to join A to B before joining them to other tables,
+ but doesn't constrain its choices otherwise. In this example, the
+ number of possible join orders is reduced by a factor of 5.
+ </para>
+
+ <para>
+ If you have a mix of outer and inner joins in a complex query, you
+ might not want to constrain the planner's search for a good ordering
+ of inner joins inside an outer join. You can't do that directly in the
+ JOIN syntax, but you can get around the syntactic limitation by using
+ views. For example,
+ <programlisting>
+CREATE VIEW v1 AS SELECT ... FROM a, b, c WHERE ...;
+SELECT * FROM d LEFT JOIN v1 ON (...);
+ </programlisting>
+ Here, joining D must be the last step in the query plan, but the
+ planner is free to consider various join orders for A,B,C.
+ </para>
+
+ <para>
+ Constraining the planner's search in this way is a useful technique
+ both for reducing planning time and for directing the planner to a
+ good query plan. If the planner chooses a bad join order by default,
+ you can force it to choose a better order via JOIN syntax --- assuming
+ that you know of a better order, that is. Experimentation is recommended.
+ </para>
+ </sect1>
+
+ <sect1 id="populate">
+ <title>Populating a Database</title>
+
+ <para>
+ One may need to do a large number of table insertions when first
+ populating a database. Here are some tips and techniques for making that as
+ efficient as possible.
+ </para>
+
+ <sect2 id="disable-autocommit">
+ <title>Disable Auto-commit</title>
+
+ <para>
+ Turn off auto-commit and just do one commit at
+ the end. Otherwise <productname>Postgres</productname> is doing a
+ lot of work for each record
+ added. In general when you are doing bulk inserts, you want
+ to turn off some of the database features to gain speed.
+ </para>
+ </sect2>
+
+ <sect2 id="populate-copy-from">
+ <title>Use COPY FROM</title>
+
+ <para>
+ Use <command>COPY FROM STDIN</command> to load all the records in one
+ command, instead
+ of a series of INSERT commands. This reduces parsing, planning, etc
+ overhead a great deal. If you do this then it's not necessary to fool
+ around with autocommit.
+ </para>
+ </sect2>
+
+ <sect2 id="populate-rm-indices">
+ <title>Remove Indices</title>
+
+ <para>
+ If you are loading a freshly created table, the fastest way is to
+ create the table, bulk-load with COPY, then create any indexes needed
+ for the table. Creating an index on pre-existing data is quicker than
+ updating it incrementally as each record is loaded.
+ </para>
+
+ <para>
+ If you are augmenting an existing table, you can <command>DROP
+ INDEX</command>, load the table, then recreate the index. Of
+ course, the database performance for other users may be adversely
+ affected during the time that the index is missing.
+ </para>
+ </sect2>
+ </sect1>
+
+ </chapter>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode:sgml
+sgml-omittag:nil
+sgml-shorttag:t
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:nil
+sgml-default-dtd-file:"./reference.ced"
+sgml-exposed-tags:nil
+sgml-local-catalogs:("/usr/lib/sgml/catalog")
+sgml-local-ecat-files:nil
+End:
+-->
+++ /dev/null
-<!--
-$Header: /cvsroot/pgsql/doc/src/sgml/Attic/plan.sgml,v 2.5 2000/09/29 20:21:34 petere Exp $
--->
-
- <chapter id="understand-performance">
- <title>Understanding Performance</title>
-
- <para>
- Query performance can be affected by many things. Some of these can
- be manipulated by the user, while others are fundamental to the underlying
- design of the system.
- </para>
-
- <para>
- Some performance issues, such as index creation and bulk data
- loading, are covered elsewhere. This chapter will discuss the
- <command>EXPLAIN</command> command, and will show how the details
- of a query can affect the query plan, and hence overall
- performance.
- </para>
-
- <sect1 id="using-explain">
- <title>Using <command>EXPLAIN</command></title>
-
- <note>
- <title>Author</title>
- <para>
- Written by Tom Lane, from e-mail dated 2000-03-27.
- </para>
- </note>
-
- <para>
- Plan-reading is an art that deserves a tutorial, and I haven't
- had time to write one. Here is some quick & dirty explanation.
- </para>
-
- <para>
- The numbers that are currently quoted by EXPLAIN are:
-
- <itemizedlist>
- <listitem>
- <para>
- Estimated start-up cost (time expended before output scan can start,
- eg, time to do the sorting in a SORT node).
- </para>
- </listitem>
-
- <listitem>
- <para>
- Estimated total cost (if all tuples are retrieved, which they may not
- be --- LIMIT will stop short of paying the total cost, for
- example).
- </para>
- </listitem>
-
- <listitem>
- <para>
- Estimated number of rows output by this plan node.
- </para>
- </listitem>
-
- <listitem>
- <para>
- Estimated average width (in bytes) of rows output by this plan
- node.
- </para>
- </listitem>
- </itemizedlist>
- </para>
-
- <para>
- The costs are measured in units of disk page fetches. (CPU effort
- estimates are converted into disk-page units using some
- fairly arbitrary fudge-factors. See the <command>SET</command>
- reference page if you want to experiment with these.)
- It's important to note that the cost of an upper-level node includes
- the cost of all its child nodes. It's also important to realize that
- the cost only reflects things that the planner/optimizer cares about.
- In particular, the cost does not consider the time spent transmitting
- result tuples to the frontend --- which could be a pretty dominant
- factor in the true elapsed time, but the planner ignores it because
- it cannot change it by altering the plan. (Every correct plan will
- output the same tuple set, we trust.)
- </para>
-
- <para>
- Rows output is a little tricky because it is <emphasis>not</emphasis> the number of rows
- processed/scanned by the query --- it is usually less, reflecting the
- estimated selectivity of any WHERE-clause constraints that are being
- applied at this node.
- </para>
-
- <para>
- Average width is pretty bogus because the thing really doesn't have
- any idea of the average length of variable-length columns. I'm thinking
- about improving that in the future, but it may not be worth the trouble,
- because the width isn't used for very much.
- </para>
-
- <para>
- Here are some examples (using the regress test database after a
- vacuum analyze, and almost-7.0 sources):
-
- <programlisting>
-regression=# explain select * from tenk1;
-NOTICE: QUERY PLAN:
-
-Seq Scan on tenk1 (cost=0.00..333.00 rows=10000 width=148)
- </programlisting>
- </para>
-
- <para>
- This is about as straightforward as it gets. If you do
-
- <programlisting>
-select * from pg_class where relname = 'tenk1';
- </programlisting>
-
- you'll find out that tenk1 has 233 disk
- pages and 10000 tuples. So the cost is estimated at 233 block
- reads, defined as 1.0 apiece, plus 10000 * cpu_tuple_cost which is
- currently 0.01 (try <command>show cpu_tuple_cost</command>).
- </para>
-
- <para>
- Now let's modify the query to add a qualification clause:
-
- <programlisting>
-regression=# explain select * from tenk1 where unique1 < 1000;
-NOTICE: QUERY PLAN:
-
-Seq Scan on tenk1 (cost=0.00..358.00 rows=1000 width=148)
- </programlisting>
-
- The estimate of output rows has gone down because of the WHERE clause.
- (The uncannily accurate estimate is just because tenk1 is a particularly
- simple case --- the unique1 column has 10000 distinct values ranging
- from 0 to 9999, so the estimator's linear interpolation between min and
- max column values is dead-on.) However, the scan will still have to
- visit all 10000 rows, so the cost hasn't decreased; in fact it has gone
- up a bit to reflect the extra CPU time spent checking the WHERE
- condition.
- </para>
-
- <para>
- Modify the query to restrict the qualification even more:
-
- <programlisting>
-regression=# explain select * from tenk1 where unique1 < 100;
-NOTICE: QUERY PLAN:
-
-Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.35 rows=100 width=148)
- </programlisting>
-
- and you will see that if we make the WHERE condition selective
- enough, the planner will
- eventually decide that an indexscan is cheaper than a sequential scan.
- This plan will only have to visit 100 tuples because of the index,
- so it wins despite the fact that each individual fetch is expensive.
- </para>
-
- <para>
- Add another condition to the qualification:
-
- <programlisting>
-regression=# explain select * from tenk1 where unique1 < 100 and
-regression-# stringu1 = 'xxx';
-NOTICE: QUERY PLAN:
-
-Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.60 rows=1 width=148)
- </programlisting>
-
- The added clause "stringu1 = 'xxx'" reduces the output-rows estimate,
- but not the cost because we still have to visit the same set of tuples.
- </para>
-
- <para>
- Let's try joining two tables, using the fields we have been discussing:
-
- <programlisting>
-regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 < 100
-regression-# and t1.unique2 = t2.unique2;
-NOTICE: QUERY PLAN:
-
-Nested Loop (cost=0.00..144.07 rows=100 width=296)
- -> Index Scan using tenk1_unique1 on tenk1 t1
- (cost=0.00..89.35 rows=100 width=148)
- -> Index Scan using tenk2_unique2 on tenk2 t2
- (cost=0.00..0.53 rows=1 width=148)
- </programlisting>
- </para>
-
- <para>
- In this nested-loop join, the outer scan is the same indexscan we had
- in the example before last, and so its cost and row count are the same
- because we are applying the "unique1 < 100" WHERE clause at that node.
- The "t1.unique2 = t2.unique2" clause isn't relevant yet, so it doesn't
- affect the outer scan's row count. For the inner scan, the
- current
- outer-scan tuple's unique2 value is plugged into the inner indexscan
- to produce an indexqual like
- "t2.unique2 = <replaceable>constant</replaceable>". So we get the
- same inner-scan plan and costs that we'd get from, say, "explain select
- * from tenk2 where unique2 = 42". The loop node's costs are then set
- on the basis of the outer scan's cost, plus one repetition of the
- inner scan for each outer tuple (100 * 0.53, here), plus a little CPU
- time for join processing.
- </para>
-
- <para>
- In this example the loop's output row count is the same as the product
- of the two scans' row counts, but that's not true in general, because
- in general you can have WHERE clauses that mention both relations and
- so can only be applied at the join point, not to either input scan.
- For example, if we added "WHERE ... AND t1.hundred < t2.hundred",
- that'd decrease the output row count of the join node, but not change
- either input scan.
- </para>
-
- <para>
- We can look at variant plans by forcing the planner to disregard
- whatever strategy it thought was the winner (a pretty crude tool,
- but it's what we've got at the moment):
-
- <programlisting>
-regression=# set enable_nestloop = off;
-SET VARIABLE
-regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 < 100
-regression-# and t1.unique2 = t2.unique2;
-NOTICE: QUERY PLAN:
-
-Hash Join (cost=89.60..574.10 rows=100 width=296)
- -> Seq Scan on tenk2 t2
- (cost=0.00..333.00 rows=10000 width=148)
- -> Hash (cost=89.35..89.35 rows=100 width=148)
- -> Index Scan using tenk1_unique1 on tenk1 t1
- (cost=0.00..89.35 rows=100 width=148)
- </programlisting>
-
- This plan proposes to extract the 100 interesting rows of tenk1
- using ye same olde indexscan, stash them into an in-memory hash table,
- and then do a sequential scan of tenk2, probing into the hash table
- for possible matches of "t1.unique2 = t2.unique2" at each tenk2 tuple.
- The cost to read tenk1 and set up the hash table is entirely start-up
- cost for the hash join, since we won't get any tuples out until we can
- start reading tenk2. The total time estimate for the join also
- includes a pretty hefty charge for CPU time to probe the hash table
- 10000 times. Note, however, that we are NOT charging 10000 times 89.35;
- the hash table setup is only done once in this plan type.
- </para>
- </sect1>
- </chapter>
-
-<!-- Keep this comment at the end of the file
-Local variables:
-mode:sgml
-sgml-omittag:nil
-sgml-shorttag:t
-sgml-minimize-attributes:nil
-sgml-always-quote-attributes:t
-sgml-indent-step:1
-sgml-indent-data:t
-sgml-parent-document:nil
-sgml-default-dtd-file:"./reference.ced"
-sgml-exposed-tags:nil
-sgml-local-catalogs:("/usr/lib/sgml/catalog")
-sgml-local-ecat-files:nil
-End:
--->