Docs updates for cross-type hashing.

author Tom Lane <tgl@sss.pgh.pa.us>

Tue, 6 Feb 2007 04:38:31 +0000 (04:38 +0000)

committer Tom Lane <tgl@sss.pgh.pa.us>

Tue, 6 Feb 2007 04:38:31 +0000 (04:38 +0000)
author Tom Lane <tgl@sss.pgh.pa.us>
Tue, 6 Feb 2007 04:38:31 +0000 (04:38 +0000)
committer Tom Lane <tgl@sss.pgh.pa.us>
Tue, 6 Feb 2007 04:38:31 +0000 (04:38 +0000)
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml

index 7c68bfc10095ab520d8cf1a7f992be22ed333abf..84421de8bb76adad33386f6aba150dcf28011f3a 100644 (file)
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.58 2007/02/01 00:28:18 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.59 2007/02/06 04:38:31 tgl Exp $ -->
  
  <sect1 id="xindex">
   <title>Interfacing Extensions To Indexes</title>
@@ -139,7 +139,7 @@
     </table>
  
    <para>
-   Hash indexes express only bitwise equality, and so they use only one
+   Hash indexes support only equality comparisons, and so they use only one
     strategy, shown in <xref linkend="xindex-hash-strat-table">.
    </para>
  
@@ -162,7 +162,7 @@
     </table>
  
    <para>
-   GiST indexes are even more flexible: they do not have a fixed set of
+   GiST indexes are more flexible: they do not have a fixed set of
     strategies at all.  Instead, the <quote>consistency</> support routine
     of each particular GiST operator class interprets the strategy numbers
     however it likes.  As an example, several of the built-in GiST index
@@ -802,14 +802,23 @@ ALTER OPERATOR FAMILY integer_ops USING btree ADD
     operator in the family there must be a support function having the same
     two input data types as the operator.  It is recommended that a family be
     complete, i.e., for each combination of data types, all operators are
-   included.  An operator class should include just the non-cross-type
+   included.  Each operator class should include just the non-cross-type
     operators and support function for its data type.
    </para>
  
    <para>
-   At this writing, hash indexes do not support cross-type operations,
-   and so there is little use for a hash operator family larger than one
-   operator class.  This is expected to be relaxed in the future.
+   To build a multiple-data-type hash operator family, compatible hash
+   support functions must be created for each data type supported by the
+   family.  Here compatibility means that the functions are guaranteed to
+   return the same hash code for any two values that are considered equal
+   by the family's equality operators, even when the values are of different
+   types.  This is usually difficult to accomplish when the types have
+   different physical representations, but it can be done in some cases.
+   Notice that there is only one support function per data type, not one
+   per equality operator.  It is recommended that a family be complete, i.e.,
+   provide an equality operator for each combination of data types.
+   Each operator class should include just the non-cross-type equality
+   operator and the support function for its data type.
    </para>
  
    <para>
diff --git a/doc/src/sgml/xoper.sgml b/doc/src/sgml/xoper.sgml

index fdfa48a212205d9a1e1450a58cf86784e0534f1f..0a13ce7c4357908f0535c2492823ccde1c67d79f 100644 (file)
--- a/doc/src/sgml/xoper.sgml
+++ b/doc/src/sgml/xoper.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/xoper.sgml,v 1.41 2007/02/01 19:10:24 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/xoper.sgml,v 1.42 2007/02/06 04:38:31 tgl Exp $ -->
  
   <sect1 id="xoper">
    <title>User-Defined Operators</title>
@@ -85,7 +85,7 @@ SELECT (a + b) AS c FROM test_complex;
      appropriate, because they can make for considerable speedups in execution
      of queries that use the operator.  But if you provide them, you must be
      sure that they are right!  Incorrect use of an optimization clause can
-    result in server process crashes, subtly wrong output, or other Bad Things.
+    result in slow queries, subtly wrong output, or other Bad Things.
      You can always leave out an optimization clause if you are not sure
      about it; the only consequence is that queries might run slower than
      they need to.
@@ -326,8 +326,8 @@ table1.column1 OP table2.column2
       The <literal>HASHES</literal> clause, if present, tells the system that
       it is permissible to use the hash join method for a join based on this
       operator.  <literal>HASHES</> only makes sense for a binary operator that
-     returns <literal>boolean</>, and in practice the operator had better be
-     equality for some data type.
+     returns <literal>boolean</>, and in practice the operator must represent
+     equality for some data type or pair of data types.
      </para>
  
      <para>
@@ -337,7 +337,13 @@ table1.column1 OP table2.column2
       join will never compare them at all, implicitly assuming that the
       result of the join operator must be false.  So it never makes sense
       to specify <literal>HASHES</literal> for operators that do not represent
-     some form of equality.
+     some form of equality.  In most cases it is only practical to support
+     hashing for operators that take the same data type on both sides.
+     However, sometimes it is possible to design compatible hash functions
+     for two or more datatypes; that is, functions that will generate the
+     same hash codes for <quote>equal</> values, even though the values
+     have different representations.  For example, it's fairly simple
+     to arrange this property when hashing integers of different widths.
      </para>
  
      <para>
@@ -346,9 +352,9 @@ table1.column1 OP table2.column2
       the operator, since of course the referencing operator family couldn't
       exist yet.  But attempts to use the operator in hash joins will fail
       at run time if no such operator family exists.  The system needs the
-     operator family to find the data-type-specific hash function for the
-     operator's input data type.  Of course, you must also create a suitable
-     hash function before you can create the operator family.
+     operator family to find the data-type-specific hash function(s) for the
+     operator's input data type(s).  Of course, you must also create suitable
+     hash functions before you can create the operator family.
      </para>
  
      <para>
@@ -366,6 +372,17 @@ table1.column1 OP table2.column2
       to ensure it generates the same hash value as positive zero.
      </para>
  
+    <para>
+     A hash-joinable operator must have a commutator (itself if the two
+     operand data types are the same, or a related equality operator
+     if they are different) that appears in the same operator family.
+     If this is not the case, planner errors might occur when the operator
+     is used.  Also, it is a good idea (but not strictly required) for
+     a hash operator family that supports multiple datatypes to provide
+     equality operators for every combination of the datatypes; this
+     allows better optimization.
+    </para>
+
      <note>
      <para>
       The function underlying a hash-joinable operator must be marked
author	Tom Lane <tgl@sss.pgh.pa.us>
	Tue, 6 Feb 2007 04:38:31 +0000 (04:38 +0000)
committer	Tom Lane <tgl@sss.pgh.pa.us>
	Tue, 6 Feb 2007 04:38:31 +0000 (04:38 +0000)
doc/src/sgml/xindex.sgml		patch \| blob \| history
doc/src/sgml/xoper.sgml		patch \| blob \| history