-<!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.45 2003/09/30 03:22:33 tgl Exp $ -->
+<!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.46 2003/11/06 22:21:47 tgl Exp $ -->
<chapter id="indexes">
<title id="indexes-title">Indexes</title>
than a sequential table scan. But you may have to run the
<command>ANALYZE</command> command regularly to update
statistics to allow the query planner to make educated decisions.
- Also read <xref linkend="performance-tips"> for information about
+ See <xref linkend="performance-tips"> for information about
how to find out whether an index is used and when and why the
planner may choose <emphasis>not</emphasis> to use an index.
</para>
<para>
<productname>PostgreSQL</productname> provides several index types:
- B-tree, R-tree, GiST, and Hash. Each index type is more appropriate for
- a particular query type because of the algorithm it uses.
+ B-tree, R-tree, GiST, and Hash. Each index type uses a different
+ algorithm that is best suited to different types of queries.
<indexterm>
<primary>index</primary>
<secondary>B-tree</secondary>
<primary>B-tree</primary>
<see>index</see>
</indexterm>
- By
- default, the <command>CREATE INDEX</command> command will create a
- B-tree index, which fits the most common situations. In
+ By default, the <command>CREATE INDEX</command> command will create a
+ B-tree index, which fits the most common situations. B-trees can
+ handle equality and range queries on data that can be sorted into
+ some ordering. In
particular, the <productname>PostgreSQL</productname> query planner
will consider using a B-tree index whenever an indexed column is
involved in a comparison using one of these operators:
<primary>R-tree</primary>
<see>index</see>
</indexterm>
- R-tree indexes are especially suited for spatial data. To create
+ R-tree indexes are suited for queries on spatial data. To create
an R-tree index, use a command of the form
<synopsis>
CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING RTREE (<replaceable>column</replaceable>);
<primary>hash</primary>
<see>index</see>
</indexterm>
+ Hash indexes can only handle simple equality comparisons.
The query planner will consider using a hash index whenever an
indexed column is involved in a comparison using the
<literal>=</literal> operator. The following command is used to
<note>
<para>
Testing has shown <productname>PostgreSQL</productname>'s hash
- indexes to be similar or slower than B-tree indexes, and the
- index size and build time for hash indexes is much worse. Hash
- indexes also suffer poor performance under high concurrency. For
+ indexes to perform no better than B-tree indexes, and the
+ index size and build time for hash indexes is much worse. For
these reasons, hash index use is presently discouraged.
</para>
</note>
</para>
<para>
- The B-tree index is an implementation of Lehman-Yao
+ The B-tree index method is an implementation of Lehman-Yao
high-concurrency B-trees. The R-tree index method implements
standard R-trees using Guttman's quadratic split algorithm. The
- hash index is an implementation of Litwin's linear hashing. We
+ hash index method is an implementation of Litwin's linear hashing. We
mention the algorithms used solely to indicate that all of these
index methods are fully dynamic and do not have to be optimized
periodically (as is the case with, for example, static hash methods).
name varchar
);
</programlisting>
- (Say, you keep your <filename class="directory">/dev</filename>
+ (say, you keep your <filename class="directory">/dev</filename>
directory in a database...) and you frequently make queries like
<programlisting>
SELECT name FROM test2 WHERE major = <replaceable>constant</replaceable> AND minor = <replaceable>constant</replaceable>;
<literal>a</literal> and <literal>b</literal>, or in queries
involving only <literal>a</literal>, but not in other combinations.
(In a query involving <literal>a</literal> and <literal>c</literal>
- the planner might choose to use the index for
- <literal>a</literal> only and treat <literal>c</literal> like an
+ the planner could choose to use the index for
+ <literal>a</literal>, while treating <literal>c</literal> like an
ordinary unindexed column.) Of course, each column must be used with
operators appropriate to the index type; clauses that involve other
operators will not be considered.
<para>
When an index is declared unique, multiple table rows with equal
indexed values will not be allowed. Null values are not considered
- equal.
+ equal. A multicolumn unique index will only reject cases where all
+ of the indexed columns are equal in two rows.
</para>
<para>
- <productname>PostgreSQL</productname> automatically creates unique
- indexes when a table is declared with a unique constraint or a
- primary key, on the columns that make up the primary key or unique
- columns (a multicolumn index, if appropriate), to enforce that
- constraint. A unique index can be added to a table at any later
- time, to add a unique constraint.
+ <productname>PostgreSQL</productname> automatically creates a unique
+ index when a unique constraint or a primary key is defined for a table.
+ The index covers the columns that make up the primary key or unique
+ columns (a multicolumn index, if appropriate), and is the mechanism
+ that enforces the constraint.
</para>
<note>
<literal>ALTER TABLE ... ADD CONSTRAINT</literal>. The use of
indexes to enforce unique constraints could be considered an
implementation detail that should not be accessed directly.
+ One should, however, be aware that there's no need to manually
+ create indexes on unique columns; doing so would just duplicate
+ the automatically-created index.
</para>
</note>
</sect1>
</programlisting>
</para>
+ <para>
+ If we were to declare this index <literal>UNIQUE</>, it would prevent
+ creation of rows whose <literal>col1</> values differ only in case,
+ as well as rows whose <literal>col1</> values are actually identical.
+ Thus, indexes on expressions can be used to enforce constraints that
+ are not definable as simple unique constraints.
+ </para>
+
<para>
As another example, if one often does queries like this:
<programlisting>
In practice the default operator class for the column's data type is
usually sufficient. The main point of having operator classes is
that for some data types, there could be more than one meaningful
- ordering. For example, we might want to sort a complex-number data
+ index behavior. For example, we might want to sort a complex-number data
type either by absolute value or by real part. We could do this by
defining two operator classes for the data type and then selecting
the proper class when making an index.
There are also some built-in operator classes besides the default ones:
<itemizedlist>
- <listitem>
- <para>
- The operator classes <literal>box_ops</literal> and
- <literal>bigbox_ops</literal> both support R-tree indexes on the
- <type>box</type> data type. The difference between them is
- that <literal>bigbox_ops</literal> scales box coordinates down,
- to avoid floating-point exceptions from doing multiplication,
- addition, and subtraction on very large floating-point
- coordinates. If the field on which your rectangles lie is about
- 20 000 square units or larger, you should use
- <literal>bigbox_ops</literal>.
- </para>
- </listitem>
-
<listitem>
<para>
The operator classes <literal>text_pattern_ops</literal>,
create, it would probably be too slow to be of any real use.)
The system can recognize simple inequality implications, for example
<quote>x < 1</quote> implies <quote>x < 2</quote>; otherwise
- the predicate condition must exactly match the query's <literal>WHERE</> condition
+ the predicate condition must exactly match part of the query's
+ <literal>WHERE</> condition
or the index will not be recognized to be usable.
</para>
maintenance and tuning, it is still important to check
which indexes are actually used by the real-life query workload.
Examining index usage for an individual query is done with the
- <command>EXPLAIN</> command; its application for this purpose is
+ <xref linkend="sql-explain" endterm="sql-explain-title">
+ command; its application for this purpose is
illustrated in <xref linkend="using-explain">.
It is also possible to gather overall statistics about index usage
in a running server, as described in <xref linkend="monitoring-stats">.
<itemizedlist>
<listitem>
<para>
- Always run <command>ANALYZE</command> first. This command
+ Always run <xref linkend="sql-analyze" endterm="sql-analyze-title">
+ first. This command
collects statistics about the distribution of the values in the
table. This information is required to guess the number of rows
returned by a query, which is needed by the planner to assign
run-time parameters (described in <xref linkend="runtime-config">).
An inaccurate selectivity estimate is due to
insufficient statistics. It may be possible to help this by
- tuning the statistics-gathering parameters (see <command>ALTER
- TABLE</command> reference).
+ tuning the statistics-gathering parameters (see
+ <xref linkend="sql-altertable" endterm="sql-altertable-title">).
</para>
<para>