Create a "sort support" interface API for faster sorting.

[postgresql] / doc / src / sgml / xindex.sgml
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml

index 5b747fe60339d09a55617b762e32644877bf0a49..f3a9783e5c905b1a6ba6c9ddea0a716551837f26 100644 (file)
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -1,500 +1,1081 @@
-<!--
-$Header: /cvsroot/pgsql/doc/src/sgml/xindex.sgml,v 1.15 2001/05/17 21:50:17 petere Exp $
-Postgres documentation
--->
+<!-- doc/src/sgml/xindex.sgml -->
  
- <chapter id="xindex">
-  <title>Interfacing Extensions To Indexes</title>
+<sect1 id="xindex">
+ <title>Interfacing Extensions To Indexes</title>
+
+ <indexterm zone="xindex">
+  <primary>index</primary>
+  <secondary>for user-defined data type</secondary>
+ </indexterm>
  
    <para>
-   The procedures described thus far let you define a new type, new
-   functions and new operators.  However, we cannot yet define a secondary
-   index (such as a <acronym>B-tree</acronym>, <acronym>R-tree</acronym> or
-   hash access method) over a new type or its operators.
+   The procedures described thus far let you define new types, new
+   functions, and new operators. However, we cannot yet define an
+   index on a column of a new data type.  To do this, we must define an
+   <firstterm>operator class</> for the new data type.  Later in this
+   section, we will illustrate this concept in an example: a new
+   operator class for the B-tree index method that stores and sorts
+   complex numbers in ascending absolute value order.
    </para>
  
    <para>
-   Look back at
-   <xref linkend="EXTEND-CATALOGS">.
-   The right half shows the  catalogs  that we must modify in order to tell
-   <productname>Postgres</productname> how to use a user-defined type and/or
-   user-defined  operators with an index (i.e., <filename>pg_am, pg_amop,
-    pg_amproc, pg_operator</filename> and <filename>pg_opclass</filename>).
-   Unfortunately, there is no simple command to do this.  We will demonstrate
-   how to modify these catalogs through a running example:  a  new  operator
-   class for the <acronym>B-tree</acronym> access method that stores and
-   sorts complex numbers in ascending absolute value order.
+   Operator classes can be grouped into <firstterm>operator families</>
+   to show the relationships between semantically compatible classes.
+   When only a single data type is involved, an operator class is sufficient,
+   so we'll focus on that case first and then return to operator families.
    </para>
  
+ <sect2 id="xindex-opclass">
+  <title>Index Methods and Operator Classes</title>
+
    <para>
-   The <filename>pg_am</filename> table contains one row for every user
-   defined access method.  Support for the heap access method is built into
-   <productname>Postgres</productname>, but every other access method is
-   described here.  The schema is
+   The <classname>pg_am</classname> table contains one row for every
+   index method (internally known as access method).  Support for
+   regular access to tables is built into
+   <productname>PostgreSQL</productname>, but all index methods are
+   described in <classname>pg_am</classname>.  It is possible to add a
+   new index method by defining the required interface routines and
+   then creating a row in <classname>pg_am</classname> &mdash; but that is
+   beyond the scope of this chapter (see <xref linkend="indexam">).
+  </para>
  
-   <table tocentry="1">
-    <title>Index Schema</title>
+  <para>
+   The routines for an index method do not directly know anything
+   about the data types that the index method will operate on.
+   Instead, an <firstterm>operator
+   class</><indexterm><primary>operator class</></indexterm>
+   identifies the set of operations that the index method needs to use
+   to work with a particular data type.  Operator classes are so
+   called because one thing they specify is the set of
+   <literal>WHERE</>-clause operators that can be used with an index
+   (i.e., can be converted into an index-scan qualification).  An
+   operator class can also specify some <firstterm>support
+   procedures</> that are needed by the internal operations of the
+   index method, but do not directly correspond to any
+   <literal>WHERE</>-clause operator that can be used with the index.
+  </para>
  
+  <para>
+   It is possible to define multiple operator classes for the same
+   data type and index method.  By doing this, multiple
+   sets of indexing semantics can be defined for a single data type.
+   For example, a B-tree index requires a sort ordering to be defined
+   for each data type it works on.
+   It might be useful for a complex-number data type
+   to have one B-tree operator class that sorts the data by complex
+   absolute value, another that sorts by real part, and so on.
+   Typically, one of the operator classes will be deemed most commonly
+   useful and will be marked as the default operator class for that
+   data type and index method.
+  </para>
+
+  <para>
+   The same operator class name
+   can be used for several different index methods (for example, both B-tree
+   and hash index methods have operator classes named
+   <literal>int4_ops</literal>), but each such class is an independent
+   entity and must be defined separately.
+  </para>
+ </sect2>
+
+ <sect2 id="xindex-strategies">
+  <title>Index Method Strategies</title>
+
+  <para>
+   The operators associated with an operator class are identified by
+   <quote>strategy numbers</>, which serve to identify the semantics of
+   each operator within the context of its operator class.
+   For example, B-trees impose a strict ordering on keys, lesser to greater,
+   and so operators like <quote>less than</> and <quote>greater than or equal
+   to</> are interesting with respect to a B-tree.
+   Because
+   <productname>PostgreSQL</productname> allows the user to define operators,
+   <productname>PostgreSQL</productname> cannot look at the name of an operator
+   (e.g., <literal>&lt;</> or <literal>&gt;=</>) and tell what kind of
+   comparison it is.  Instead, the index method defines a set of
+   <quote>strategies</>, which can be thought of as generalized operators.
+   Each operator class specifies which actual operator corresponds to each
+   strategy for a particular data type and interpretation of the index
+   semantics.
+  </para>
+
+  <para>
+   The B-tree index method defines five strategies, shown in <xref
+   linkend="xindex-btree-strat-table">.
+  </para>
+
+   <table tocentry="1" id="xindex-btree-strat-table">
+    <title>B-tree Strategies</title>
      <tgroup cols="2">
       <thead>
        <row>
-       <entry>Column</entry>
-       <entry>Description</entry>
+       <entry>Operation</entry>
+       <entry>Strategy Number</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>less than</entry>
+       <entry>1</entry>
+      </row>
+      <row>
+       <entry>less than or equal</entry>
+       <entry>2</entry>
+      </row>
+      <row>
+       <entry>equal</entry>
+       <entry>3</entry>
+      </row>
+      <row>
+       <entry>greater than or equal</entry>
+       <entry>4</entry>
+      </row>
+      <row>
+       <entry>greater than</entry>
+       <entry>5</entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  <para>
+   Hash indexes support only equality comparisons, and so they use only one
+   strategy, shown in <xref linkend="xindex-hash-strat-table">.
+  </para>
+
+   <table tocentry="1" id="xindex-hash-strat-table">
+    <title>Hash Strategies</title>
+    <tgroup cols="2">
+     <thead>
+      <row>
+       <entry>Operation</entry>
+       <entry>Strategy Number</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>equal</entry>
+       <entry>1</entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  <para>
+   GiST indexes are more flexible: they do not have a fixed set of
+   strategies at all.  Instead, the <quote>consistency</> support routine
+   of each particular GiST operator class interprets the strategy numbers
+   however it likes.  As an example, several of the built-in GiST index
+   operator classes index two-dimensional geometric objects, providing
+   the <quote>R-tree</> strategies shown in
+   <xref linkend="xindex-rtree-strat-table">.  Four of these are true
+   two-dimensional tests (overlaps, same, contains, contained by);
+   four of them consider only the X direction; and the other four
+   provide the same tests in the Y direction.
+  </para>
+
+   <table tocentry="1" id="xindex-rtree-strat-table">
+    <title>GiST Two-Dimensional <quote>R-tree</> Strategies</title>
+    <tgroup cols="2">
+     <thead>
+      <row>
+       <entry>Operation</entry>
+       <entry>Strategy Number</entry>
        </row>
       </thead>
       <tbody>
        <row>
-       <entry>amname</entry>
-       <entry>name of the access method</entry>
+       <entry>strictly left of</entry>
+       <entry>1</entry>
        </row>
        <row>
-       <entry>amowner</entry>
-       <entry>user id of the owner</entry>
+       <entry>does not extend to right of</entry>
+       <entry>2</entry>
+      </row>
+      <row>
+       <entry>overlaps</entry>
+       <entry>3</entry>
        </row>
        <row>
-       <entry>amstrategies</entry>
-       <entry>number of strategies for this access method (see below)</entry>
+       <entry>does not extend to left of</entry>
+       <entry>4</entry>
+      </row>
+      <row>
+       <entry>strictly right of</entry>
+       <entry>5</entry>
        </row>
        <row>
-       <entry>amsupport</entry>
-       <entry>number of support routines for this access method (see below)</entry>
+       <entry>same</entry>
+       <entry>6</entry>
        </row>
        <row>
-       <entry>amorderstrategy</entry>
-       <entry>zero if the index offers no sort order, otherwise the strategy
-        number of the strategy operator that describes the sort order</entry>
+       <entry>contains</entry>
+       <entry>7</entry>
        </row>
        <row>
-       <entry>amgettuple</entry>
+       <entry>contained by</entry>
+       <entry>8</entry>
        </row>
        <row>
-       <entry>aminsert</entry>
+       <entry>does not extend above</entry>
+       <entry>9</entry>
        </row>
        <row>
-       <entry>...</entry>
-       <entry>procedure  identifiers  for  interface routines to the access
-       method.  For example, regproc ids for opening,  closing,  and
-       getting rows from the access method appear here.</entry>
+       <entry>strictly below</entry>
+       <entry>10</entry>
+      </row>
+      <row>
+       <entry>strictly above</entry>
+       <entry>11</entry>
+      </row>
+      <row>
+       <entry>does not extend below</entry>
+       <entry>12</entry>
        </row>
       </tbody>
      </tgroup>
     </table>
+
+  <para>
+   GIN indexes are similar to GiST indexes in flexibility: they don't have a
+   fixed set of strategies. Instead the support routines of each operator
+   class interpret the strategy numbers according to the operator class's
+   definition. As an example, the strategy numbers used by the built-in
+   operator classes for arrays are
+   shown in <xref linkend="xindex-gin-array-strat-table">.
    </para>
  
+   <table tocentry="1" id="xindex-gin-array-strat-table">
+    <title>GIN Array Strategies</title>
+    <tgroup cols="2">
+     <thead>
+      <row>
+       <entry>Operation</entry>
+       <entry>Strategy Number</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>overlap</entry>
+       <entry>1</entry>
+      </row>
+      <row>
+       <entry>contains</entry>
+       <entry>2</entry>
+      </row>
+      <row>
+       <entry>is contained by</entry>
+       <entry>3</entry>
+      </row>
+      <row>
+       <entry>equal</entry>
+       <entry>4</entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
    <para>
-   The <acronym>object ID</acronym> of the row in
-   <filename>pg_am</filename> is used as a foreign key in a lot of other
-   tables.  You  do not  need to  add a new rows to this table; all that
-   you are interested in is the <acronym>object ID</acronym> of the access
-   method row you want to extend:
+   Notice that all the operators listed above return Boolean values.  In
+   practice, all operators defined as index method search operators must
+   return type <type>boolean</type>, since they must appear at the top
+   level of a <literal>WHERE</> clause to be used with an index.
+   (Some index access methods also support <firstterm>ordering operators</>,
+   which typically don't return Boolean values; that feature is discussed
+   in <xref linkend="xindex-ordering-ops">.)
+  </para>
+ </sect2>
  
-   <programlisting>
-SELECT oid FROM pg_am WHERE amname = 'btree';
+ <sect2 id="xindex-support">
+  <title>Index Method Support Routines</title>
  
- oid
------
- 403
-(1 row)
-   </programlisting>
+  <para>
+   Strategies aren't usually enough information for the system to figure
+   out how to use an index.  In practice, the index methods require
+   additional support routines in order to work. For example, the B-tree
+   index method must be able to compare two keys and determine whether one
+   is greater than, equal to, or less than the other.  Similarly, the
+   hash index method must be able to compute hash codes for key values.
+   These operations do not correspond to operators used in qualifications in
+   SQL commands;  they are administrative routines used by
+   the index methods, internally.
+  </para>
  
-   We will use that <command>SELECT</command> in a <command>WHERE</command>
-   clause later.
+  <para>
+   Just as with strategies, the operator class identifies which specific
+   functions should play each of these roles for a given data type and
+   semantic interpretation.  The index method defines the set
+   of functions it needs, and the operator class identifies the correct
+   functions to use by assigning them to the <quote>support function numbers</>
+   specified by the index method.
    </para>
  
    <para>
-   The <filename>amstrategies</filename> column exists to standardize
-   comparisons across data types.  For example, <acronym>B-tree</acronym>s
-   impose a strict ordering on keys, lesser to greater.  Since
-   <productname>Postgres</productname> allows the user to define operators,
-   <productname>Postgres</productname> cannot look at the name of an operator
-   (e.g., "&gt;" or "&lt;") and tell what kind of comparison it is.  In fact,
-   some  access methods don't impose any ordering at all.  For example,
-   <acronym>R-tree</acronym>s express a rectangle-containment relationship,
-   whereas a hashed data structure expresses only bitwise similarity based
-   on the value of a hash function.  <productname>Postgres</productname>
-   needs some consistent way of taking a qualification in your query,
-   looking at the operator and then deciding if a usable index exists.  This
-   implies that <productname>Postgres</productname> needs to know, for
-   example, that the  "&lt;="  and  "&gt;" operators partition a
-   <acronym>B-tree</acronym>.  <productname>Postgres</productname>
-   uses strategies to express these relationships  between
-   operators and the way they can be used to scan indexes.
+   B-trees require a single support function, and allow a second one to be
+   supplied at the operator class author's option, as shown in <xref
+   linkend="xindex-btree-support-table">.
    </para>
  
+   <table tocentry="1" id="xindex-btree-support-table">
+    <title>B-tree Support Functions</title>
+    <tgroup cols="2">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Support Number</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        Compare two keys and return an integer less than zero, zero, or
+        greater than zero, indicating whether the first key is less than,
+        equal to, or greater than the second
+       </entry>
+       <entry>1</entry>
+      </row>
+      <row>
+       <entry>
+        Return the addresses of C-callable sort support function(s),
+        as documented in <filename>utils/sortsupport.h</> (optional)
+       </entry>
+       <entry>2</entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
    <para>
-   Defining a new set of strategies is beyond the scope of this discussion,
-   but we'll explain how <acronym>B-tree</acronym> strategies work because
-   you'll need to know that to add a new operator class. In the
-   <filename>pg_am</filename> table, the amstrategies column is the
-   number of strategies defined for this access method. For
-   <acronym>B-tree</acronym>s, this number is 5.  These strategies
-   correspond to
+   Hash indexes require one support function, shown in <xref
+   linkend="xindex-hash-support-table">.
+  </para>
  
-   <table tocentry="1">
-    <title>B-tree Strategies</title>
-    <titleabbrev>B-tree</titleabbrev>
+   <table tocentry="1" id="xindex-hash-support-table">
+    <title>Hash Support Functions</title>
      <tgroup cols="2">
       <thead>
        <row>
-       <entry>Operation</entry>
-       <entry>Index</entry>
+       <entry>Function</entry>
+       <entry>Support Number</entry>
        </row>
       </thead>
       <tbody>
        <row>
-       <entry>less than</entry>
+       <entry>Compute the hash value for a key</entry>
         <entry>1</entry>
        </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  <para>
+   GiST indexes require seven support functions, with an optional eighth, as
+   shown in <xref linkend="xindex-gist-support-table">.
+   (For more information see <xref linkend="GiST">.)
+  </para>
+
+   <table tocentry="1" id="xindex-gist-support-table">
+    <title>GiST Support Functions</title>
+    <tgroup cols="3">
+     <thead>
        <row>
-       <entry>less than or equal</entry>
+       <entry>Function</entry>
+       <entry>Description</entry>
+       <entry>Support Number</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry><function>consistent</></entry>
+       <entry>determine whether key satisfies the
+        query qualifier</entry>
+       <entry>1</entry>
+      </row>
+      <row>
+       <entry><function>union</></entry>
+       <entry>compute union of a set of keys</entry>
         <entry>2</entry>
        </row>
        <row>
-       <entry>equal</entry>
+       <entry><function>compress</></entry>
+       <entry>compute a compressed representation of a key or value
+        to be indexed</entry>
         <entry>3</entry>
        </row>
        <row>
-       <entry>greater than or equal</entry>
+       <entry><function>decompress</></entry>
+       <entry>compute a decompressed representation of a
+        compressed key</entry>
         <entry>4</entry>
        </row>
        <row>
-       <entry>greater than</entry>
+       <entry><function>penalty</></entry>
+       <entry>compute penalty for inserting new key into subtree
+       with given subtree's key</entry>
         <entry>5</entry>
        </row>
+      <row>
+       <entry><function>picksplit</></entry>
+       <entry>determine which entries of a page are to be moved
+       to the new page and compute the union keys for resulting pages</entry>
+       <entry>6</entry>
+      </row>
+      <row>
+       <entry><function>equal</></entry>
+       <entry>compare two keys and return true if they are equal</entry>
+       <entry>7</entry>
+      </row>
+      <row>
+       <entry><function>distance</></entry>
+       <entry>determine distance from key to query value (optional)</entry>
+       <entry>8</entry>
+      </row>
       </tbody>
      </tgroup>
     </table>
+
+  <para>
+   GIN indexes require four support functions, with an optional fifth, as
+   shown in <xref linkend="xindex-gin-support-table">.
+   (For more information see <xref linkend="GIN">.)
    </para>
  
+   <table tocentry="1" id="xindex-gin-support-table">
+    <title>GIN Support Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Description</entry>
+       <entry>Support Number</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry><function>compare</></entry>
+       <entry>
+        compare two keys and return an integer less than zero, zero,
+        or greater than zero, indicating whether the first key is less than,
+        equal to, or greater than the second
+       </entry>
+       <entry>1</entry>
+      </row>
+      <row>
+       <entry><function>extractValue</></entry>
+       <entry>extract keys from a value to be indexed</entry>
+       <entry>2</entry>
+      </row>
+      <row>
+       <entry><function>extractQuery</></entry>
+       <entry>extract keys from a query condition</entry>
+       <entry>3</entry>
+      </row>
+      <row>
+       <entry><function>consistent</></entry>
+       <entry>determine whether value matches query condition</entry>
+       <entry>4</entry>
+      </row>
+      <row>
+       <entry><function>comparePartial</></entry>
+       <entry>
+        compare partial key from
+        query and key from index, and return an integer less than zero, zero,
+        or greater than zero, indicating whether GIN should ignore this index
+        entry, treat the entry as a match, or stop the index scan (optional)
+       </entry>
+       <entry>5</entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
    <para>
-   The idea is that you'll need to add procedures corresponding to the
-   comparisons above to the <filename>pg_amop</filename> relation (see below).
-   The access method code can use these strategy numbers, regardless of data
-   type, to figure out how to partition the <acronym>B-tree</acronym>,
-   compute selectivity, and so on.  Don't worry about the details of adding
-   procedures yet; just understand that there must be a set of these
-   procedures for <filename>int2, int4, oid,</filename> and every other
-   data type on which a <acronym>B-tree</acronym> can operate.
+   Unlike search operators, support functions return whichever data
+   type the particular index method expects; for example in the case
+   of the comparison function for B-trees, a signed integer.  The number
+   and types of the arguments to each support function are likewise
+   dependent on the index method.  For B-tree and hash the comparison and
+   hashing support functions
+   take the same input data types as do the operators included in the operator
+   class, but this is not the case for most GIN and GiST support functions.
    </para>
+ </sect2>
+
+ <sect2 id="xindex-example">
+  <title>An Example</title>
  
    <para>
-   Sometimes, strategies aren't enough information for the system to figure
-   out how to use an index.  Some access methods require other support
-   routines in order to work. For example, the <acronym>B-tree</acronym>
-   access method must be able to compare two keys and determine whether one
-   is greater than, equal to, or less than the other.  Similarly, the
-   <acronym>R-tree</acronym> access method must be able to compute
-   intersections,  unions, and sizes of rectangles.  These
-   operations do not correspond to user qualifications in
-   SQL queries;  they are administrative routines used by
-   the access methods, internally.
+   Now that we have seen the ideas, here is the promised example of
+   creating a new operator class.
+   (You can find a working copy of this example in
+   <filename>src/tutorial/complex.c</filename> and
+   <filename>src/tutorial/complex.sql</filename> in the source
+   distribution.)
+   The operator class encapsulates
+   operators that sort complex numbers in absolute value order, so we
+   choose the name <literal>complex_abs_ops</literal>.  First, we need
+   a set of operators.  The procedure for defining operators was
+   discussed in <xref linkend="xoper">.  For an operator class on
+   B-trees, the operators we require are:
+
+   <itemizedlist spacing="compact">
+    <listitem><simpara>absolute-value less-than (strategy 1)</></>
+    <listitem><simpara>absolute-value less-than-or-equal (strategy 2)</></>
+    <listitem><simpara>absolute-value equal (strategy 3)</></>
+    <listitem><simpara>absolute-value greater-than-or-equal (strategy 4)</></>
+    <listitem><simpara>absolute-value greater-than (strategy 5)</></>
+   </itemizedlist>
    </para>
  
    <para>
-   In order to manage diverse support routines consistently across all
-   <productname>Postgres</productname> access methods,
-   <filename>pg_am</filename> includes a column called
-   <filename>amsupport</filename>.  This column records the number of
-   support routines used by an access method.  For <acronym>B-tree</acronym>s,
-   this number is one -- the routine to take two keys and return -1, 0, or
-   +1, depending on whether the first key is less than, equal
-   to, or greater than the second.
+   The least error-prone way to define a related set of comparison operators
+   is to write the B-tree comparison support function first, and then write the
+   other functions as one-line wrappers around the support function.  This
+   reduces the odds of getting inconsistent results for corner cases.
+   Following this approach, we first write:
  
-   <note>
+<programlisting><![CDATA[
+#define Mag(c)  ((c)->x*(c)->x + (c)->y*(c)->y)
+
+static int
+complex_abs_cmp_internal(Complex *a, Complex *b)
+{
+    double      amag = Mag(a),
+                bmag = Mag(b);
+
+    if (amag < bmag)
+        return -1;
+    if (amag > bmag)
+        return 1;
+    return 0;
+}
+]]>
+</programlisting>
+
+   Now the less-than function looks like:
+
+<programlisting><![CDATA[
+PG_FUNCTION_INFO_V1(complex_abs_lt);
+
+Datum
+complex_abs_lt(PG_FUNCTION_ARGS)
+{
+    Complex    *a = (Complex *) PG_GETARG_POINTER(0);
+    Complex    *b = (Complex *) PG_GETARG_POINTER(1);
+
+    PG_RETURN_BOOL(complex_abs_cmp_internal(a, b) < 0);
+}
+]]>
+</programlisting>
+
+   The other four functions differ only in how they compare the internal
+   function's result to zero.
+  </para>
+
+  <para>
+   Next we declare the functions and the operators based on the functions
+   to SQL:
+
+<programlisting>
+CREATE FUNCTION complex_abs_lt(complex, complex) RETURNS bool
+    AS '<replaceable>filename</replaceable>', 'complex_abs_lt'
+    LANGUAGE C IMMUTABLE STRICT;
+
+CREATE OPERATOR &lt; (
+   leftarg = complex, rightarg = complex, procedure = complex_abs_lt,
+   commutator = &gt; , negator = &gt;= ,
+   restrict = scalarltsel, join = scalarltjoinsel
+);
+</programlisting>
+   It is important to specify the correct commutator and negator operators,
+   as well as suitable restriction and join selectivity
+   functions, otherwise the optimizer will be unable to make effective
+   use of the index.  Note that the less-than, equal, and
+   greater-than cases should use different selectivity functions.
+  </para>
+
+  <para>
+   Other things worth noting are happening here:
+
+  <itemizedlist>
+   <listitem>
      <para>
-     Strictly  speaking, this routine can return a negative
-     number (&lt; 0), 0, or a non-zero positive number (&gt; 0).
+     There can only be one operator named, say, <literal>=</literal>
+     and taking type <type>complex</type> for both operands.  In this
+     case we don't have any other operator <literal>=</literal> for
+     <type>complex</type>, but if we were building a practical data
+     type we'd probably want <literal>=</literal> to be the ordinary
+     equality operation for complex numbers (and not the equality of
+     the absolute values).  In that case, we'd need to use some other
+     operator name for <function>complex_abs_eq</>.
      </para>
-   </note>
+   </listitem>
+
+   <listitem>
+    <para>
+     Although <productname>PostgreSQL</productname> can cope with
+     functions having the same SQL name as long as they have different
+     argument data types, C can only cope with one global function
+     having a given name.  So we shouldn't name the C function
+     something simple like <filename>abs_eq</filename>.  Usually it's
+     a good practice to include the data type name in the C function
+     name, so as not to conflict with functions for other data types.
+    </para>
+   </listitem>
+
+   <listitem>
+    <para>
+     We could have made the SQL name
+     of the function <filename>abs_eq</filename>, relying on
+     <productname>PostgreSQL</productname> to distinguish it by
+     argument data types from any other SQL function of the same name.
+     To keep the example simple, we make the function have the same
+     names at the C level and SQL level.
+    </para>
+   </listitem>
+  </itemizedlist>
    </para>
  
    <para>
-   The <filename>amstrategies</filename> entry in <filename>pg_am</filename>
-   is just the number
-   of strategies defined for the access method in question.  The procedures
-   for less than, less equal, and so on don't appear in
-   <filename>pg_am</filename>.  Similarly, <filename>amsupport</filename>
-   is just the number of support routines required by  the  access
-   method.  The actual routines are listed elsewhere.
+   The next step is the registration of the support routine required
+   by B-trees.  The example C code that implements this is in the same
+   file that contains the operator functions.  This is how we declare
+   the function:
+
+<programlisting>
+CREATE FUNCTION complex_abs_cmp(complex, complex)
+    RETURNS integer
+    AS '<replaceable>filename</replaceable>'
+    LANGUAGE C IMMUTABLE STRICT;
+</programlisting>
    </para>
  
    <para>
-   By the way, the <filename>amorderstrategy</filename> entry tells whether
-   the access method supports ordered scan.  Zero means it doesn't; if it
-   does, <filename>amorderstrategy</filename> is the number of the strategy
-   routine that corresponds to the ordering operator.  For example, btree
-   has <filename>amorderstrategy</filename> = 1 which is its
-   "less than" strategy number.
+   Now that we have the required operators and support routine,
+   we can finally create the operator class:
+
+<programlisting><![CDATA[
+CREATE OPERATOR CLASS complex_abs_ops
+    DEFAULT FOR TYPE complex USING btree AS
+        OPERATOR        1       < ,
+        OPERATOR        2       <= ,
+        OPERATOR        3       = ,
+        OPERATOR        4       >= ,
+        OPERATOR        5       > ,
+        FUNCTION        1       complex_abs_cmp(complex, complex);
+]]>
+</programlisting>
    </para>
  
    <para>
-   The next table of interest is <filename>pg_opclass</filename>.  This table
-   exists only to associate an operator class name and perhaps a default type
-   with an operator class oid. Some existing opclasses are <filename>int2_ops,
-   int4_ops,</filename> and <filename>oid_ops</filename>.  You need to add a
-   row with your opclass name (for example,
-   <filename>complex_abs_ops</filename>) to
-   <filename>pg_opclass</filename>.  The <filename>oid</filename> of
-   this row will be a foreign key in other tables, notably
-   <filename>pg_amop</filename>.
+   And we're done!  It should now be possible to create
+   and use B-tree indexes on <type>complex</type> columns.
+  </para>
  
-   <programlisting>
-INSERT INTO pg_opclass (opcname, opcdeftype)
-    SELECT 'complex_abs_ops', oid FROM pg_type WHERE typname = 'complex';
+  <para>
+   We could have written the operator entries more verbosely, as in:
+<programlisting>
+        OPERATOR        1       &lt; (complex, complex) ,
+</programlisting>
+   but there is no need to do so when the operators take the same data type
+   we are defining the operator class for.
+  </para>
  
-SELECT oid, opcname, opcdeftype
-    FROM pg_opclass
-    WHERE opcname = 'complex_abs_ops';
+  <para>
+   The above example assumes that you want to make this new operator class the
+   default B-tree operator class for the <type>complex</type> data type.
+   If you don't, just leave out the word <literal>DEFAULT</>.
+  </para>
+ </sect2>
  
-  oid   |     opcname     | opcdeftype
---------+-----------------+------------
- 277975 | complex_abs_ops |     277946
-(1 row)
-   </programlisting>
+ <sect2 id="xindex-opfamily">
+  <title>Operator Classes and Operator Families</title>
  
-   Note that the oid for your <filename>pg_opclass</filename> row will
-   be different!  Don't worry about this though.  We'll get this number
-   from the system later just like we got the oid of the type here.
+  <para>
+   So far we have implicitly assumed that an operator class deals with
+   only one data type.  While there certainly can be only one data type in
+   a particular index column, it is often useful to index operations that
+   compare an indexed column to a value of a different data type.  Also,
+   if there is use for a cross-data-type operator in connection with an
+   operator class, it is often the case that the other data type has a
+   related operator class of its own.  It is helpful to make the connections
+   between related classes explicit, because this can aid the planner in
+   optimizing SQL queries (particularly for B-tree operator classes, since
+   the planner contains a great deal of knowledge about how to work with them).
    </para>
  
    <para>
-   The above example assumes that you want to make this new opclass the
-   default index opclass for the <filename>complex</filename> datatype.
-   If you don't, just insert zero into <filename>opcdeftype</filename>,
-   rather than inserting the datatype's oid:
-
-   <programlisting>
-INSERT INTO pg_opclass (opcname, opcdeftype) VALUES ('complex_abs_ops', 0);
-   </programlisting>
+   To handle these needs, <productname>PostgreSQL</productname>
+   uses the concept of an <firstterm>operator
+   family</><indexterm><primary>operator family</></indexterm>.
+   An operator family contains one or more operator classes, and can also
+   contain indexable operators and corresponding support functions that
+   belong to the family as a whole but not to any single class within the
+   family.  We say that such operators and functions are <quote>loose</>
+   within the family, as opposed to being bound into a specific class.
+   Typically each operator class contains single-data-type operators
+   while cross-data-type operators are loose in the family.
+  </para>
  
+  <para>
+   All the operators and functions in an operator family must have compatible
+   semantics, where the compatibility requirements are set by the index
+   method.  You might therefore wonder why bother to single out particular
+   subsets of the family as operator classes; and indeed for many purposes
+   the class divisions are irrelevant and the family is the only interesting
+   grouping.  The reason for defining operator classes is that they specify
+   how much of the family is needed to support any particular index.
+   If there is an index using an operator class, then that operator class
+   cannot be dropped without dropping the index &mdash; but other parts of
+   the operator family, namely other operator classes and loose operators,
+   could be dropped.  Thus, an operator class should be specified to contain
+   the minimum set of operators and functions that are reasonably needed
+   to work with an index on a specific data type, and then related but
+   non-essential operators can be added as loose members of the operator
+   family.
    </para>
  
    <para>
-   So now we have an access method and an operator  class.
-   We  still  need  a  set of operators.  The procedure for
-   defining operators was discussed earlier in  this  manual.
-   For  the  <filename>complex_abs_ops</filename>  operator  class on Btrees,
-   the operators we require are:
+   As an example, <productname>PostgreSQL</productname> has a built-in
+   B-tree operator family <literal>integer_ops</>, which includes operator
+   classes <literal>int8_ops</>, <literal>int4_ops</>, and
+   <literal>int2_ops</> for indexes on <type>bigint</> (<type>int8</>),
+   <type>integer</> (<type>int4</>), and <type>smallint</> (<type>int2</>)
+   columns respectively.  The family also contains cross-data-type comparison
+   operators allowing any two of these types to be compared, so that an index
+   on one of these types can be searched using a comparison value of another
+   type.  The family could be duplicated by these definitions:
+
+<programlisting><![CDATA[
+CREATE OPERATOR FAMILY integer_ops USING btree;
+
+CREATE OPERATOR CLASS int8_ops
+DEFAULT FOR TYPE int8 USING btree FAMILY integer_ops AS
+  -- standard int8 comparisons
+  OPERATOR 1 < ,
+  OPERATOR 2 <= ,
+  OPERATOR 3 = ,
+  OPERATOR 4 >= ,
+  OPERATOR 5 > ,
+  FUNCTION 1 btint8cmp(int8, int8) ,
+  FUNCTION 2 btint8sortsupport(internal) ;
+
+CREATE OPERATOR CLASS int4_ops
+DEFAULT FOR TYPE int4 USING btree FAMILY integer_ops AS
+  -- standard int4 comparisons
+  OPERATOR 1 < ,
+  OPERATOR 2 <= ,
+  OPERATOR 3 = ,
+  OPERATOR 4 >= ,
+  OPERATOR 5 > ,
+  FUNCTION 1 btint4cmp(int4, int4) ,
+  FUNCTION 2 btint4sortsupport(internal) ;
+
+CREATE OPERATOR CLASS int2_ops
+DEFAULT FOR TYPE int2 USING btree FAMILY integer_ops AS
+  -- standard int2 comparisons
+  OPERATOR 1 < ,
+  OPERATOR 2 <= ,
+  OPERATOR 3 = ,
+  OPERATOR 4 >= ,
+  OPERATOR 5 > ,
+  FUNCTION 1 btint2cmp(int2, int2) ,
+  FUNCTION 2 btint2sortsupport(internal) ;
+
+ALTER OPERATOR FAMILY integer_ops USING btree ADD
+  -- cross-type comparisons int8 vs int2
+  OPERATOR 1 < (int8, int2) ,
+  OPERATOR 2 <= (int8, int2) ,
+  OPERATOR 3 = (int8, int2) ,
+  OPERATOR 4 >= (int8, int2) ,
+  OPERATOR 5 > (int8, int2) ,
+  FUNCTION 1 btint82cmp(int8, int2) ,
+
+  -- cross-type comparisons int8 vs int4
+  OPERATOR 1 < (int8, int4) ,
+  OPERATOR 2 <= (int8, int4) ,
+  OPERATOR 3 = (int8, int4) ,
+  OPERATOR 4 >= (int8, int4) ,
+  OPERATOR 5 > (int8, int4) ,
+  FUNCTION 1 btint84cmp(int8, int4) ,
+
+  -- cross-type comparisons int4 vs int2
+  OPERATOR 1 < (int4, int2) ,
+  OPERATOR 2 <= (int4, int2) ,
+  OPERATOR 3 = (int4, int2) ,
+  OPERATOR 4 >= (int4, int2) ,
+  OPERATOR 5 > (int4, int2) ,
+  FUNCTION 1 btint42cmp(int4, int2) ,
  
-   <programlisting>
-        absolute value less-than
-        absolute value less-than-or-equal
-        absolute value equal
-        absolute value greater-than-or-equal
-        absolute value greater-than
-   </programlisting>
+  -- cross-type comparisons int4 vs int8
+  OPERATOR 1 < (int4, int8) ,
+  OPERATOR 2 <= (int4, int8) ,
+  OPERATOR 3 = (int4, int8) ,
+  OPERATOR 4 >= (int4, int8) ,
+  OPERATOR 5 > (int4, int8) ,
+  FUNCTION 1 btint48cmp(int4, int8) ,
+
+  -- cross-type comparisons int2 vs int8
+  OPERATOR 1 < (int2, int8) ,
+  OPERATOR 2 <= (int2, int8) ,
+  OPERATOR 3 = (int2, int8) ,
+  OPERATOR 4 >= (int2, int8) ,
+  OPERATOR 5 > (int2, int8) ,
+  FUNCTION 1 btint28cmp(int2, int8) ,
+
+  -- cross-type comparisons int2 vs int4
+  OPERATOR 1 < (int2, int4) ,
+  OPERATOR 2 <= (int2, int4) ,
+  OPERATOR 3 = (int2, int4) ,
+  OPERATOR 4 >= (int2, int4) ,
+  OPERATOR 5 > (int2, int4) ,
+  FUNCTION 1 btint24cmp(int2, int4) ;
+]]>
+</programlisting>
+
+   Notice that this definition <quote>overloads</> the operator strategy and
+   support function numbers: each number occurs multiple times within the
+   family.  This is allowed so long as each instance of a
+   particular number has distinct input data types.  The instances that have
+   both input types equal to an operator class's input type are the
+   primary operators and support functions for that operator class,
+   and in most cases should be declared as part of the operator class rather
+   than as loose members of the family.
+  </para>
+
+  <para>
+   In a B-tree operator family, all the operators in the family must sort
+   compatibly, meaning that the transitive laws hold across all the data types
+   supported by the family: <quote>if A = B and B = C, then A = C</>,
+   and <quote>if A &lt; B and B &lt; C, then A &lt; C</>.  Moreover, implicit
+   or binary coercion casts between types represented in the operator family
+   must not change the associated sort ordering.  For each
+   operator in the family there must be a support function having the same
+   two input data types as the operator.  It is recommended that a family be
+   complete, i.e., for each combination of data types, all operators are
+   included.  Each operator class should include just the non-cross-type
+   operators and support function for its data type.
    </para>
  
    <para>
-   Suppose the code that implements the functions  defined
-   is stored in the file
-   <filename>PGROOT/src/tutorial/complex.c</filename>
+   To build a multiple-data-type hash operator family, compatible hash
+   support functions must be created for each data type supported by the
+   family.  Here compatibility means that the functions are guaranteed to
+   return the same hash code for any two values that are considered equal
+   by the family's equality operators, even when the values are of different
+   types.  This is usually difficult to accomplish when the types have
+   different physical representations, but it can be done in some cases.
+   Furthermore, casting a value from one data type represented in the operator
+   family to another data type also represented in the operator family via
+   an implicit or binary coercion cast must not change the computed hash value.
+   Notice that there is only one support function per data type, not one
+   per equality operator.  It is recommended that a family be complete, i.e.,
+   provide an equality operator for each combination of data types.
+   Each operator class should include just the non-cross-type equality
+   operator and the support function for its data type.
    </para>
  
    <para>
-   Part of the C code looks like this: (note that we will only show the
-   equality operator for the rest of the examples.  The other four
-   operators are very similar.  Refer to <filename>complex.c</filename>
-   or <filename>complex.source</filename> for the details.)
+   GIN and GiST indexes do not have any explicit notion of cross-data-type
+   operations.  The set of operators supported is just whatever the primary
+   support functions for a given operator class can handle.
+  </para>
+
+  <note>
+   <para>
+    Prior to <productname>PostgreSQL</productname> 8.3, there was no concept
+    of operator families, and so any cross-data-type operators intended to be
+    used with an index had to be bound directly into the index's operator
+    class.  While this approach still works, it is deprecated because it
+    makes an index's dependencies too broad, and because the planner can
+    handle cross-data-type comparisons more effectively when both data types
+    have operators in the same operator family.
+   </para>
+  </note>
+ </sect2>
  
-   <programlisting>
-#define Mag(c) ((c)-&gt;x*(c)-&gt;x + (c)-&gt;y*(c)-&gt;y)
+ <sect2 id="xindex-opclass-dependencies">
+  <title>System Dependencies on Operator Classes</title>
  
-         bool
-         complex_abs_eq(Complex *a, Complex *b)
-         {
-             double amag = Mag(a), bmag = Mag(b);
-             return (amag==bmag);
-         }
-   </programlisting>
+   <indexterm>
+    <primary>ordering operator</primary>
+   </indexterm>
+
+  <para>
+   <productname>PostgreSQL</productname> uses operator classes to infer the
+   properties of operators in more ways than just whether they can be used
+   with indexes.  Therefore, you might want to create operator classes
+   even if you have no intention of indexing any columns of your data type.
    </para>
  
    <para>
-   We make the function known to Postgres like this:
-   <programlisting>
-CREATE FUNCTION complex_abs_eq(complex, complex)
-              RETURNS bool
-              AS 'PGROOT/tutorial/obj/complex.so'
-              LANGUAGE 'c';
-   </programlisting>
+   In particular, there are SQL features such as <literal>ORDER BY</> and
+   <literal>DISTINCT</> that require comparison and sorting of values.
+   To implement these features on a user-defined data type,
+   <productname>PostgreSQL</productname> looks for the default B-tree operator
+   class for the data type.  The <quote>equals</> member of this operator
+   class defines the system's notion of equality of values for
+   <literal>GROUP BY</> and <literal>DISTINCT</>, and the sort ordering
+   imposed by the operator class defines the default <literal>ORDER BY</>
+   ordering.
    </para>
  
    <para>
-   There are some important things that are happening here.
+   Comparison of arrays of user-defined types also relies on the semantics
+   defined by the default B-tree operator class.
    </para>
  
    <para>
-   First, note that operators for less-than, less-than-or-equal, equal,
-   greater-than-or-equal, and greater-than for <filename>complex</filename>
-   are being defined.  We can only have one operator named, say, = and
-   taking type <filename>complex</filename> for both operands.  In this case
-   we don't have any other operator = for <filename>complex</filename>,
-   but if we were building a practical datatype we'd probably want = to
-   be the ordinary equality operation for complex numbers.  In that case,
-   we'd need to use some other operator name for complex_abs_eq.
+   If there is no default B-tree operator class for a data type, the system
+   will look for a default hash operator class.  But since that kind of
+   operator class only provides equality, in practice it is only enough
+   to support array equality.
    </para>
  
    <para>
-   Second, although Postgres can cope with operators having
-   the same name as long as they have different input datatypes, C can only
-   cope with one global routine having a given name, period.  So we shouldn't
-   name the C function something simple like <filename>abs_eq</filename>.
-   Usually it's a good practice to include the datatype name in the C
-   function name, so as not to conflict with functions for other datatypes.
+   When there is no default operator class for a data type, you will get
+   errors like <quote>could not identify an ordering operator</> if you
+   try to use these SQL features with the data type.
    </para>
  
+   <note>
+    <para>
+     In <productname>PostgreSQL</productname> versions before 7.4,
+     sorting and grouping operations would implicitly use operators named
+     <literal>=</>, <literal>&lt;</>, and <literal>&gt;</>.  The new
+     behavior of relying on default operator classes avoids having to make
+     any assumption about the behavior of operators with particular names.
+    </para>
+   </note>
+
    <para>
-   Third, we could have made the Postgres name of the function
-   <filename>abs_eq</filename>, relying on Postgres to distinguish it
-   by input datatypes from any other Postgres function of the same name.
-   To keep the example simple, we make the function have the same names
-   at the C level and Postgres level.
+   Another important point is that an operator that
+   appears in a hash operator family is a candidate for hash joins,
+   hash aggregation, and related optimizations.  The hash operator family
+   is essential here since it identifies the hash function(s) to use.
    </para>
+ </sect2>
+
+ <sect2 id="xindex-ordering-ops">
+  <title>Ordering Operators</title>
  
    <para>
-   Finally, note that these operator functions return Boolean values.
-   The access methods rely on this fact.  (On the other
-   hand, the support function returns whatever the particular access method
-   expects -- in this case, a signed integer.) The final routine in the
-   file is the "support routine" mentioned when we discussed the amsupport
-   column of the <filename>pg_am</filename> table.  We will use this
-   later on.  For now, ignore it.
+   Some index access methods (currently, only GiST) support the concept of
+   <firstterm>ordering operators</>.  What we have been discussing so far
+   are <firstterm>search operators</>.  A search operator is one for which
+   the index can be searched to find all rows satisfying
+   <literal>WHERE</>
+   <replaceable>indexed_column</>
+   <replaceable>operator</>
+   <replaceable>constant</>.
+   Note that nothing is promised about the order in which the matching rows
+   will be returned.  In contrast, an ordering operator does not restrict the
+   set of rows that can be returned, but instead determines their order.
+   An ordering operator is one for which the index can be scanned to return
+   rows in the order represented by
+   <literal>ORDER BY</>
+   <replaceable>indexed_column</>
+   <replaceable>operator</>
+   <replaceable>constant</>.
+   The reason for defining ordering operators that way is that it supports
+   nearest-neighbor searches, if the operator is one that measures distance.
+   For example, a query like
+<programlisting><![CDATA[
+SELECT * FROM places ORDER BY location <-> point '(101,456)' LIMIT 10;
+]]>
+</programlisting>
+   finds the ten places closest to a given target point.  A GiST index
+   on the location column can do this efficiently because
+   <literal>&lt;-&gt;</> is an ordering operator.
    </para>
  
    <para>
-   Now we are ready to define the operators:
+   While search operators have to return Boolean results, ordering operators
+   usually return some other type, such as float or numeric for distances.
+   This type is normally not the same as the data type being indexed.
+   To avoid hard-wiring assumptions about the behavior of different data
+   types, the definition of an ordering operator is required to name
+   a B-tree operator family that specifies the sort ordering of the result
+   data type.  As was stated in the previous section, B-tree operator families
+   define <productname>PostgreSQL</productname>'s notion of ordering, so
+   this is a natural representation.  Since the point <literal>&lt;-&gt;</>
+   operator returns <type>float8</>, it could be specified in an operator
+   class creation command like this:
+<programlisting><![CDATA[
+OPERATOR 15    <-> (point, point) FOR ORDER BY float_ops
+]]>
+</programlisting>
+   where <literal>float_ops</> is the built-in operator family that includes
+   operations on <type>float8</>.  This declaration states that the index
+   is able to return rows in order of increasing values of the
+   <literal>&lt;-&gt;</> operator.
+  </para>
+ </sect2>
+
+ <sect2 id="xindex-opclass-features">
+  <title>Special Features of Operator Classes</title>
  
-   <programlisting>
-CREATE OPERATOR = (
-     leftarg = complex, rightarg = complex,
-     procedure = complex_abs_eq,
-     restrict = eqsel, join = eqjoinsel
-         )
-   </programlisting>
+  <para>
+   There are two special features of operator classes that we have
+   not discussed yet, mainly because they are not useful
+   with the most commonly used index methods.
+  </para>
  
-   The important
-   things here are the procedure names (which are the <acronym>C</acronym>
-   functions defined above) and the restriction and join selectivity
-   functions.  You should just use the selectivity functions used in
-   the example (see <filename>complex.source</filename>).
-   Note that there
-   are different such functions for the less-than, equal, and greater-than
-   cases.  These must be supplied, or the optimizer will be unable to
-   make effective use of the index.
+  <para>
+   Normally, declaring an operator as a member of an operator class
+   (or family) means that the index method can retrieve exactly the set of rows
+   that satisfy a <literal>WHERE</> condition using the operator.  For example:
+<programlisting>
+SELECT * FROM table WHERE integer_column &lt; 4;
+</programlisting>
+   can be satisfied exactly by a B-tree index on the integer column.
+   But there are cases where an index is useful as an inexact guide to
+   the matching rows.  For example, if a GiST index stores only bounding boxes
+   for geometric objects, then it cannot exactly satisfy a <literal>WHERE</>
+   condition that tests overlap between nonrectangular objects such as
+   polygons.  Yet we could use the index to find objects whose bounding
+   box overlaps the bounding box of the target object, and then do the
+   exact overlap test only on the objects found by the index.  If this
+   scenario applies, the index is said to be <quote>lossy</> for the
+   operator.  Lossy index searches are implemented by having the index
+   method return a <firstterm>recheck</> flag when a row might or might
+   not really satisfy the query condition.  The core system will then
+   test the original query condition on the retrieved row to see whether
+   it should be returned as a valid match.  This approach works if
+   the index is guaranteed to return all the required rows, plus perhaps
+   some additional rows, which can be eliminated by performing the original
+   operator invocation.  The index methods that support lossy searches
+   (currently, GiST and GIN) allow the support functions of individual
+   operator classes to set the recheck flag, and so this is essentially an
+   operator-class feature.
    </para>
  
    <para>
-   The next step is to add entries for these operators to
-   the <filename>pg_amop</filename> relation.  To do this,
-   we'll need the <filename>oid</filename>s of the operators we just
-   defined.  We'll look up the names of all the operators that take
-   two <filename>complex</filename>es, and pick ours out:
-   
-   <programlisting>
-    SELECT o.oid AS opoid, o.oprname
-     INTO TABLE complex_ops_tmp
-     FROM pg_operator o, pg_type t
-     WHERE o.oprleft = t.oid and o.oprright = t.oid
-      and t.typname = 'complex';
-
- opoid  | oprname
---------+---------
- 277963 | +
- 277970 | &lt;
- 277971 | &lt;=
- 277972 | =
- 277973 | &gt;=
- 277974 | &gt;
-(6 rows)
-   </programlisting>
+   Consider again the situation where we are storing in the index only
+   the bounding box of a complex object such as a polygon.  In this
+   case there's not much value in storing the whole polygon in the index
+   entry &mdash; we might as well store just a simpler object of type
+   <type>box</>.  This situation is expressed by the <literal>STORAGE</>
+   option in <command>CREATE OPERATOR CLASS</>: we'd write something like:
+
+<programlisting>
+CREATE OPERATOR CLASS polygon_ops
+    DEFAULT FOR TYPE polygon USING gist AS
+        ...
+        STORAGE box;
+</programlisting>
+
+   At present, only the GiST and GIN index methods support a
+   <literal>STORAGE</> type that's different from the column data type.
+   The GiST <function>compress</> and <function>decompress</> support
+   routines must deal with data-type conversion when <literal>STORAGE</>
+   is used.  In GIN, the <literal>STORAGE</> type identifies the type of
+   the <quote>key</> values, which normally is different from the type
+   of the indexed column &mdash; for example, an operator class for
+   integer-array columns might have keys that are just integers.  The
+   GIN <function>extractValue</> and <function>extractQuery</> support
+   routines are responsible for extracting keys from indexed values.
+  </para>
+ </sect2>
  
-   (Again, some of your <filename>oid</filename> numbers will almost
-   certainly be different.)  The operators we are interested in are those
-   with <filename>oid</filename>s 277970 through 277974.  The values you
-   get will probably be different, and you should substitute them for the
-   values below.  We will do this with a select statement.
-  </para>
-
-  <para>
-   Now we are ready to update <filename>pg_amop</filename> with our new
-   operator class.  The most important thing in this entire discussion
-   is that the operators are ordered, from less than through greater
-   than, in <filename>pg_amop</filename>.  We add the rows we need:
-
-   <programlisting>
-    INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy)
-        SELECT am.oid, opcl.oid, c.opoid, 1
-        FROM pg_am am, pg_opclass opcl, complex_ops_tmp c
-        WHERE amname = 'btree' AND
-            opcname = 'complex_abs_ops' AND
-            c.oprname = '&lt;';
-   </programlisting>
-
-   Now do this for the other operators substituting for the "1" in the
-   third line above and the "&lt;" in the last line.  Note the order:
-   "less than" is 1, "less than or equal" is 2, "equal" is 3, "greater
-   than or equal" is 4, and "greater than" is 5.
-  </para>
-
-  <para>
-   The next step is registration of the "support routine" previously
-   described in our discussion of <filename>pg_am</filename>.  The
-   <filename>oid</filename> of this support routine is stored in the
-   <filename>pg_amproc</filename> table, keyed by the access method
-   <filename>oid</filename> and the operator class <filename>oid</filename>.
-   First, we need to register the function in
-   <productname>Postgres</productname> (recall that we put the
-   <acronym>C</acronym> code that implements this routine in the bottom of
-   the file in which we implemented the operator routines):
-
-   <programlisting>
-    CREATE FUNCTION complex_abs_cmp(complex, complex)
-     RETURNS int4
-     AS 'PGROOT/tutorial/obj/complex.so'
-     LANGUAGE 'c';
-
-    SELECT oid, proname FROM pg_proc
-     WHERE proname = 'complex_abs_cmp';
-
-  oid   |     proname
---------+-----------------
- 277997 | complex_abs_cmp
-(1 row)
-   </programlisting>
-
-   (Again, your <filename>oid</filename> number will probably be different.)
-   We can add the new row as follows:
-
-   <programlisting>
-    INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum)
-        SELECT a.oid, b.oid, c.oid, 1
-            FROM pg_am a, pg_opclass b, pg_proc c
-            WHERE a.amname = 'btree' AND
-                b.opcname = 'complex_abs_ops' AND
-                c.proname = 'complex_abs_cmp';
-   </programlisting>
-  </para>
-
-  <para>
-   And we're done!  (Whew.)  It should now be possible to create
-   and use btree indexes on <filename>complex</filename> columns.
-  </para>
-
- </chapter>
-
-<!-- Keep this comment at the end of the file
-Local variables:
-mode:sgml
-sgml-omittag:nil
-sgml-shorttag:t
-sgml-minimize-attributes:nil
-sgml-always-quote-attributes:t
-sgml-indent-step:1
-sgml-indent-data:t
-sgml-parent-document:nil
-sgml-default-dtd-file:"./reference.ced"
-sgml-exposed-tags:nil
-sgml-local-catalogs:("/usr/lib/sgml/catalog")
-sgml-local-ecat-files:nil
-End:
--->
+</sect1>