2 $Header: /cvsroot/pgsql/doc/src/sgml/xindex.sgml,v 1.29 2002/09/21 18:32:54 petere Exp $
3 PostgreSQL documentation
7 <title>Interfacing Extensions To Indexes</title>
9 <sect1 id="xindex-intro">
10 <title>Introduction</title>
13 The procedures described thus far let you define new types, new
14 functions, and new operators. However, we cannot yet define a
15 secondary index (such as a B-tree, R-tree, or hash access method)
16 over a new type, nor associate operators of a new type with secondary
18 To do these things, we must define an <firstterm>operator class</>
19 for the new data type. We will describe operator classes in the
20 context of a running example: a new operator
21 class for the B-tree access method that stores and
22 sorts complex numbers in ascending absolute value order.
27 Prior to <productname>PostgreSQL</productname> release 7.3, it was
28 necessary to make manual additions to
29 <classname>pg_amop</>, <classname>pg_amproc</>, and
30 <classname>pg_opclass</> in order to create a user-defined
31 operator class. That approach is now deprecated in favor of
32 using <command>CREATE OPERATOR CLASS</>, which is a much simpler
33 and less error-prone way of creating the necessary catalog entries.
38 <sect1 id="xindex-am">
39 <title>Access Methods and Operator Classes</title>
42 The <classname>pg_am</classname> table contains one row for every
43 index access method. Support for access to regular tables is
44 built into <productname>PostgreSQL</productname>, but all index access
45 methods are described in <classname>pg_am</classname>. It is possible
46 to add a new index access method by defining the required interface
47 routines and then creating a row in <classname>pg_am</classname> ---
48 but that is far beyond the scope of this chapter.
52 The routines for an index access method do not directly know anything
53 about the data types the access method will operate on. Instead, an
54 <firstterm>operator class</> identifies the set of operations that the
55 access method needs to be able to use to work with a particular data type.
56 Operator classes are so called because one thing they specify is the set
57 of WHERE-clause operators that can be used with an index (ie, can be
58 converted into an index scan qualification). An operator class may also
59 specify some <firstterm>support procedures</> that are needed by the
60 internal operations of the index access method, but do not directly
61 correspond to any WHERE-clause operator that can be used with the index.
65 It is possible to define multiple operator classes for the same
66 input data type and index access method. By doing this, multiple
67 sets of indexing semantics can be defined for a single data type.
68 For example, a B-tree index requires a sort ordering to be defined
69 for each data type it works on.
70 It might be useful for a complex-number data type
71 to have one B-tree operator class that sorts the data by complex
72 absolute value, another that sorts by real part, and so on.
73 Typically one of the operator classes will be deemed most commonly
74 useful and will be marked as the default operator class for that
75 data type and index access method.
79 The same operator class name
80 can be used for several different access methods (for example, both B-tree
81 and hash access methods have operator classes named
82 <literal>oid_ops</literal>), but each such class is an independent
83 entity and must be defined separately.
87 <sect1 id="xindex-strategies">
88 <title>Access Method Strategies</title>
91 The operators associated with an operator class are identified by
92 <quote>strategy numbers</>, which serve to identify the semantics of
93 each operator within the context of its operator class.
94 For example, B-trees impose a strict ordering on keys, lesser to greater,
95 and so operators like <quote>less than</> and <quote>greater than or equal
96 to</> are interesting with respect to a B-tree.
98 <productname>PostgreSQL</productname> allows the user to define operators,
99 <productname>PostgreSQL</productname> cannot look at the name of an operator
100 (e.g., <literal><</> or <literal>>=</>) and tell what kind of
101 comparison it is. Instead, the index access method defines a set of
102 <quote>strategies</>, which can be thought of as generalized operators.
103 Each operator class shows which actual operator corresponds to each
104 strategy for a particular data type and interpretation of the index
109 B-tree indexes define 5 strategies, as shown in <xref
110 linkend="xindex-btree-strat-table">.
113 <table tocentry="1" id="xindex-btree-strat-table">
114 <title>B-tree Strategies</title>
115 <titleabbrev>B-tree</titleabbrev>
119 <entry>Operation</entry>
120 <entry>Strategy Number</entry>
125 <entry>less than</entry>
129 <entry>less than or equal</entry>
137 <entry>greater than or equal</entry>
141 <entry>greater than</entry>
149 Hash indexes express only bitwise similarity, and so they define only 1
150 strategy, as shown in <xref linkend="xindex-hash-strat-table">.
153 <table tocentry="1" id="xindex-hash-strat-table">
154 <title>Hash Strategies</title>
155 <titleabbrev>Hash</titleabbrev>
159 <entry>Operation</entry>
160 <entry>Strategy Number</entry>
173 R-tree indexes express rectangle-containment relationships.
174 They define 8 strategies, as shown in <xref linkend="xindex-rtree-strat-table">.
177 <table tocentry="1" id="xindex-rtree-strat-table">
178 <title>R-tree Strategies</title>
179 <titleabbrev>R-tree</titleabbrev>
183 <entry>Operation</entry>
184 <entry>Strategy Number</entry>
189 <entry>left of</entry>
193 <entry>left of or overlapping</entry>
197 <entry>overlapping</entry>
201 <entry>right of or overlapping</entry>
205 <entry>right of</entry>
213 <entry>contains</entry>
217 <entry>contained by</entry>
225 GiST indexes are even more flexible: they do not have a fixed set of
226 strategies at all. Instead, the <quote>consistency</> support routine
227 of a particular GiST operator class interprets the strategy numbers
232 By the way, the <structfield>amorderstrategy</structfield> column
233 in <classname>pg_am</> tells whether
234 the access method supports ordered scan. Zero means it doesn't; if it
235 does, <structfield>amorderstrategy</structfield> is the strategy
236 number that corresponds to the ordering operator. For example, B-tree
237 has <structfield>amorderstrategy</structfield> = 1, which is its
238 <quote>less than</quote> strategy number.
242 In short, an operator class must specify a set of operators that express
243 each of these semantic ideas for the operator class's data type.
247 <sect1 id="xindex-support">
248 <title>Access Method Support Routines</title>
251 Strategies aren't usually enough information for the system to figure
252 out how to use an index. In practice, the access methods require
253 additional support routines in order to work. For example, the B-tree
254 access method must be able to compare two keys and determine whether one
255 is greater than, equal to, or less than the other. Similarly, the
256 R-tree access method must be able to compute
257 intersections, unions, and sizes of rectangles. These
258 operations do not correspond to operators used in qualifications in
259 SQL queries; they are administrative routines used by
260 the access methods, internally.
264 Just as with operators, the operator class identifies which specific
265 functions should play each of these roles for a given data type and
266 semantic interpretation. The index access method specifies the set
267 of functions it needs, and the operator class identifies the correct
268 functions to use by assigning <quote>support function numbers</> to them.
272 B-trees require a single support function, as shown in <xref
273 linkend="xindex-btree-support-table">.
276 <table tocentry="1" id="xindex-btree-support-table">
277 <title>B-tree Support Functions</title>
278 <titleabbrev>B-tree</titleabbrev>
282 <entry>Function</entry>
283 <entry>Support Number</entry>
289 Compare two keys and return an integer less than zero, zero, or
290 greater than zero, indicating whether the first key is less than, equal to,
291 or greater than the second.
300 Hash indexes likewise require one support function, as shown in <xref
301 linkend="xindex-hash-support-table">.
304 <table tocentry="1" id="xindex-hash-support-table">
305 <title>Hash Support Functions</title>
306 <titleabbrev>Hash</titleabbrev>
310 <entry>Function</entry>
311 <entry>Support Number</entry>
316 <entry>Compute the hash value for a key</entry>
324 R-tree indexes require three support functions,
325 as shown in <xref linkend="xindex-rtree-support-table">.
328 <table tocentry="1" id="xindex-rtree-support-table">
329 <title>R-tree Support Functions</title>
330 <titleabbrev>R-tree</titleabbrev>
334 <entry>Function</entry>
335 <entry>Support Number</entry>
344 <entry>intersection</entry>
356 GiST indexes require seven support functions,
357 as shown in <xref linkend="xindex-gist-support-table">.
360 <table tocentry="1" id="xindex-gist-support-table">
361 <title>GiST Support Functions</title>
362 <titleabbrev>GiST</titleabbrev>
366 <entry>Function</entry>
367 <entry>Support Number</entry>
372 <entry>consistent</entry>
380 <entry>compress</entry>
384 <entry>decompress</entry>
388 <entry>penalty</entry>
392 <entry>picksplit</entry>
405 <sect1 id="xindex-operators">
406 <title>Creating the Operators and Support Routines</title>
409 Now that we have seen the ideas, here is the promised example
410 of creating a new operator class. First, we need a set of operators.
412 defining operators was discussed in <xref linkend="xoper">.
413 For the <literal>complex_abs_ops</literal> operator class on B-trees,
414 the operators we require are:
416 <itemizedlist spacing="compact">
417 <listitem><simpara>absolute-value less-than (strategy 1)</></>
418 <listitem><simpara>absolute-value less-than-or-equal (strategy 2)</></>
419 <listitem><simpara>absolute-value equal (strategy 3)</></>
420 <listitem><simpara>absolute-value greater-than-or-equal (strategy 4)</></>
421 <listitem><simpara>absolute-value greater-than (strategy 5)</></>
426 Suppose the code that implements these functions
427 is stored in the file
428 <filename><replaceable>PGROOT</replaceable>/src/tutorial/complex.c</filename>,
429 which we have compiled into
430 <filename><replaceable>PGROOT</replaceable>/src/tutorial/complex.so</filename>.
431 Part of the C code looks like this:
434 #define Mag(c) ((c)->x*(c)->x + (c)->y*(c)->y)
437 complex_abs_eq(Complex *a, Complex *b)
439 double amag = Mag(a), bmag = Mag(b);
443 (Note that we will only show the equality operator in this text.
444 The other four operators are very similar. Refer to
445 <filename>complex.c</filename> or
446 <filename>complex.source</filename> for the details.)
450 We make the function known to <productname>PostgreSQL</productname> like this:
452 CREATE FUNCTION complex_abs_eq(complex, complex) RETURNS boolean
453 AS '<replaceable>PGROOT</replaceable>/src/tutorial/complex'
459 There are some important things that are happening here:
464 First, note that operators for less-than, less-than-or-equal, equal,
465 greater-than-or-equal, and greater-than for <filename>complex</filename>
466 are being defined. We can only have one operator named, say, = and
467 taking type <filename>complex</filename> for both operands. In this case
468 we don't have any other operator = for <filename>complex</filename>,
469 but if we were building a practical data type we'd probably want = to
470 be the ordinary equality operation for complex numbers. In that case,
471 we'd need to use some other operator name for <function>complex_abs_eq</>.
477 Second, although <productname>PostgreSQL</productname> can cope with operators having
478 the same name as long as they have different input data types, C can only
479 cope with one global routine having a given name, period. So we shouldn't
480 name the C function something simple like <filename>abs_eq</filename>.
481 Usually it's a good practice to include the data type name in the C
482 function name, so as not to conflict with functions for other data types.
488 Third, we could have made the <productname>PostgreSQL</productname> name of the function
489 <filename>abs_eq</filename>, relying on <productname>PostgreSQL</productname> to distinguish it
490 by input data types from any other <productname>PostgreSQL</productname> function of the same name.
491 To keep the example simple, we make the function have the same names
492 at the C level and <productname>PostgreSQL</productname> level.
498 Finally, note that these operator functions return Boolean values.
499 In practice, all operators defined as index access method
500 strategies must return type <type>boolean</type>, since they must
501 appear at the top level of a <literal>WHERE</> clause to be used with an index.
502 (On the other hand, support functions return whatever the
503 particular access method expects -- in the case of the comparison
504 function for B-trees, a signed integer.)
511 Now we are ready to define the operators:
515 leftarg = complex, rightarg = complex,
516 procedure = complex_abs_eq,
517 restrict = eqsel, join = eqjoinsel
522 things here are the procedure names (which are the C
523 functions defined above) and the restriction and join selectivity
524 functions. You should just use the selectivity functions used in
525 the example (see <filename>complex.source</filename>).
527 are different such functions for the less-than, equal, and greater-than
528 cases. These must be supplied or the optimizer will be unable to
529 make effective use of the index.
533 The next step is the registration of the comparison <quote>support
534 routine</quote> required by B-trees. The C code that implements this
535 is in the same file that contains the operator procedures:
538 CREATE FUNCTION complex_abs_cmp(complex, complex)
540 AS '<replaceable>PGROOT</replaceable>/src/tutorial/complex'
546 <sect1 id="xindex-opclass">
547 <title>Creating the Operator Class</title>
550 Now that we have the required operators and support routine,
551 we can finally create the operator class:
554 CREATE OPERATOR CLASS complex_abs_ops
555 DEFAULT FOR TYPE complex USING btree AS
561 FUNCTION 1 complex_abs_cmp(complex, complex);
566 And we're done! (Whew.) It should now be possible to create
567 and use B-tree indexes on <type>complex</type> columns.
571 We could have written the operator entries more verbosely, as in
573 OPERATOR 1 < (complex, complex) ,
575 but there is no need to do so when the operators take the same data type
576 we are defining the operator class for.
580 The above example assumes that you want to make this new operator class the
581 default B-tree operator class for the <type>complex</type> data type.
582 If you don't, just leave out the word <literal>DEFAULT</>.
586 <sect1 id="xindex-opclass-features">
587 <title>Special Features of Operator Classes</title>
590 There are two special features of operator classes that we have
591 not discussed yet, mainly because they are not very useful
592 with the default B-tree index access method.
596 Normally, declaring an operator as a member of an operator class means
597 that the index access method can retrieve exactly the set of rows
598 that satisfy a WHERE condition using the operator. For example,
600 SELECT * FROM table WHERE integer_column < 4;
602 can be satisfied exactly by a B-tree index on the integer column.
603 But there are cases where an index is useful as an inexact guide to
604 the matching rows. For example, if an R-tree index stores only
605 bounding boxes for objects, then it cannot exactly satisfy a WHERE
606 condition that tests overlap between nonrectangular objects such as
607 polygons. Yet we could use the index to find objects whose bounding
608 box overlaps the bounding box of the target object, and then do the
609 exact overlap test only on the objects found by the index. If this
610 scenario applies, the index is said to be <quote>lossy</> for the
611 operator, and we add <literal>RECHECK</> to the <literal>OPERATOR</> clause
612 in the <command>CREATE OPERATOR CLASS</> command.
613 <literal>RECHECK</> is valid if the index is guaranteed to return
614 all the required tuples, plus perhaps some additional tuples, which
615 can be eliminated by performing the original operator comparison.
619 Consider again the situation where we are storing in the index only
620 the bounding box of a complex object such as a polygon. In this
621 case there's not much value in storing the whole polygon in the index
622 entry --- we may as well store just a simpler object of type
623 <literal>box</>. This situation is expressed by the <literal>STORAGE</>
624 option in <command>CREATE OPERATOR CLASS</>: we'd write something like
627 CREATE OPERATOR CLASS polygon_ops
628 DEFAULT FOR TYPE polygon USING gist AS
633 At present, only the GiST access method supports a
634 <literal>STORAGE</> type that's different from the column data type.
635 The GiST <literal>compress</> and <literal>decompress</> support
636 routines must deal with data-type conversion when <literal>STORAGE</>
643 <!-- Keep this comment at the end of the file
648 sgml-minimize-attributes:nil
649 sgml-always-quote-attributes:t
652 sgml-parent-document:nil
653 sgml-default-dtd-file:"./reference.ced"
654 sgml-exposed-tags:nil
655 sgml-local-catalogs:("/usr/lib/sgml/catalog")
656 sgml-local-ecat-files:nil