granicus.if.org Git - postgresql/blob - doc/src/sgml/xindex.sgml

   1 <!--
   2 $Header: /cvsroot/pgsql/doc/src/sgml/xindex.sgml,v 1.21 2001/11/21 06:09:45 thomas Exp $
   3 PostgreSQL documentation
   4 -->
   5
   6  <chapter id="xindex">
   7   <title>Interfacing Extensions To Indexes</title>
   8
   9   <para>
  10    The procedures described thus far let you define a new type, new
  11    functions and new operators.  However, we cannot yet define a secondary
  12    index (such as a <acronym>B-tree</acronym>, <acronym>R-tree</acronym> or
  13    hash access method) over a new type or its operators.
  14   </para>
  15
  16   <para>
  17    Look back at
  18    <xref linkend="EXTEND-CATALOGS">.
  19    The right half shows the  catalogs  that we must modify in order to tell
  20    <productname>PostgreSQL</productname> how to use a user-defined type and/or
  21    user-defined  operators with an index (i.e., <filename>pg_am, pg_amop,
  22     pg_amproc, pg_operator</filename> and <filename>pg_opclass</filename>).
  23    Unfortunately, there is no simple command to do this.  We will demonstrate
  24    how to modify these catalogs through a running example:  a  new  operator
  25    class for the <acronym>B-tree</acronym> access method that stores and
  26    sorts complex numbers in ascending absolute value order.
  27   </para>
  28
  29   <para>
  30    The <filename>pg_am</filename> table contains one row for every index
  31    access method.  Support for the heap access method is built into
  32    <productname>PostgreSQL</productname>, but every other access method is
  33    described in <filename>pg_am</filename>.  The schema is
  34
  35    <table tocentry="1">
  36     <title>Index Access Method Schema</title>
  37
  38     <tgroup cols="2">
  39      <thead>
  40       <row>
  41        <entry>Column</entry>
  42        <entry>Description</entry>
  43       </row>
  44      </thead>
  45      <tbody>
  46       <row>
  47        <entry>amname</entry>
  48        <entry>name of the access method</entry>
  49       </row>
  50       <row>
  51        <entry>amowner</entry>
  52        <entry>user id of the owner</entry>
  53       </row>
  54       <row>
  55        <entry>amstrategies</entry>
  56        <entry>number of strategies for this access method (see below)</entry>
  57       </row>
  58       <row>
  59        <entry>amsupport</entry>
  60        <entry>number of support routines for this access method (see below)</entry>
  61       </row>
  62       <row>
  63        <entry>amorderstrategy</entry>
  64        <entry>zero if the index offers no sort order, otherwise the strategy
  65         number of the strategy operator that describes the sort order</entry>
  66       </row>
  67       <row>
  68        <entry>amcanunique</entry>
  69        <entry>does AM support UNIQUE indexes?</entry>
  70       </row>
  71       <row>
  72        <entry>amcanmulticol</entry>
  73        <entry>does AM support multicolumn indexes?</entry>
  74       </row>
  75       <row>
  76        <entry>amindexnulls</entry>
  77        <entry>does AM support NULL index entries?</entry>
  78       </row>
  79       <row>
  80        <entry>amconcurrent</entry>
  81        <entry>does AM support concurrent updates?</entry>
  82       </row>
  83       <row>
  84        <entry>amgettuple</entry>
  85       </row>
  86       <row>
  87        <entry>aminsert</entry>
  88       </row>
  89       <row>
  90        <entry>...</entry>
  91        <entry>procedure  identifiers  for  interface routines to the access
  92         method.  For example, regproc ids for opening,  closing,  and
  93         getting rows from the access method appear here.</entry>
  94       </row>
  95      </tbody>
  96     </tgroup>
  97    </table>
  98   </para>
  99
 100   <para>
 101    The <acronym>object ID</acronym> of the row in
 102    <filename>pg_am</filename> is used as a foreign key in a lot of other
 103    tables.  You  do not  need to  add a new row to this table; all that
 104    you are interested in is the <acronym>object ID</acronym> of the access
 105    method you want to extend:
 106
 107    <programlisting>
 108 SELECT oid FROM pg_am WHERE amname = 'btree';
 109
 110  oid
 111 -----
 112  403
 113 (1 row)
 114    </programlisting>
 115
 116    We will use that <command>SELECT</command> in a <command>WHERE</command>
 117    clause later.
 118   </para>
 119
 120   <para>
 121    The <filename>amstrategies</filename> column exists to standardize
 122    comparisons across data types.  For example, <acronym>B-tree</acronym>s
 123    impose a strict ordering on keys, lesser to greater.  Since
 124    <productname>PostgreSQL</productname> allows the user to define operators,
 125    <productname>PostgreSQL</productname> cannot look at the name of an operator
 126    (e.g., <literal>&gt;</> or <literal>&lt;</>) and tell what kind of comparison it is.  In fact,
 127    some  access methods don't impose any ordering at all.  For example,
 128    <acronym>R-tree</acronym>s express a rectangle-containment relationship,
 129    whereas a hashed data structure expresses only bitwise similarity based
 130    on the value of a hash function.  <productname>PostgreSQL</productname>
 131    needs some consistent way of taking a qualification in your query,
 132    looking at the operator and then deciding if a usable index exists.  This
 133    implies that <productname>PostgreSQL</productname> needs to know, for
 134    example, that the  <literal>&lt;=</>  and  <literal>&gt;</> operators partition a
 135    <acronym>B-tree</acronym>.  <productname>PostgreSQL</productname>
 136    uses strategies to express these relationships  between
 137    operators and the way they can be used to scan indexes.
 138   </para>
 139
 140   <para>
 141    Defining a new set of strategies is beyond the scope of this discussion,
 142    but we'll explain how <acronym>B-tree</acronym> strategies work because
 143    you'll need to know that to add a new B-tree operator class. In the
 144    <filename>pg_am</filename> table, the amstrategies column is the
 145    number of strategies defined for this access method. For
 146    <acronym>B-tree</acronym>s, this number is 5.  These strategies
 147    correspond to
 148
 149    <table tocentry="1">
 150     <title>B-tree Strategies</title>
 151     <titleabbrev>B-tree</titleabbrev>
 152     <tgroup cols="2">
 153      <thead>
 154       <row>
 155        <entry>Operation</entry>
 156        <entry>Index</entry>
 157       </row>
 158      </thead>
 159      <tbody>
 160       <row>
 161        <entry>less than</entry>
 162        <entry>1</entry>
 163       </row>
 164       <row>
 165        <entry>less than or equal</entry>
 166        <entry>2</entry>
 167       </row>
 168       <row>
 169        <entry>equal</entry>
 170        <entry>3</entry>
 171       </row>
 172       <row>
 173        <entry>greater than or equal</entry>
 174        <entry>4</entry>
 175       </row>
 176       <row>
 177        <entry>greater than</entry>
 178        <entry>5</entry>
 179       </row>
 180      </tbody>
 181     </tgroup>
 182    </table>
 183   </para>
 184
 185   <para>
 186    The idea is that you'll need to add operators corresponding to the
 187    comparisons above to the <filename>pg_amop</filename> relation (see below).
 188    The access method code can use these strategy numbers, regardless of data
 189    type, to figure out how to partition the <acronym>B-tree</acronym>,
 190    compute selectivity, and so on.  Don't worry about the details of adding
 191    operators yet; just understand that there must be a set of these
 192    operators for <filename>int2, int4, oid,</filename> and every other
 193    data type on which a <acronym>B-tree</acronym> can operate.
 194   </para>
 195
 196   <para>
 197    Sometimes, strategies aren't enough information for the system to figure
 198    out how to use an index.  Some access methods require additional support
 199    routines in order to work. For example, the <acronym>B-tree</acronym>
 200    access method must be able to compare two keys and determine whether one
 201    is greater than, equal to, or less than the other.  Similarly, the
 202    <acronym>R-tree</acronym> access method must be able to compute
 203    intersections,  unions, and sizes of rectangles.  These
 204    operations do not correspond to operators used in qualifications in
 205    SQL queries;  they are administrative routines used by
 206    the access methods, internally.
 207   </para>
 208
 209   <para>
 210    In order to manage diverse support routines consistently across all
 211    <productname>PostgreSQL</productname> access methods,
 212    <filename>pg_am</filename> includes a column called
 213    <filename>amsupport</filename>.  This column records the number of
 214    support routines used by an access method.  For <acronym>B-tree</acronym>s,
 215    this number is one -- the routine to take two keys and return -1, 0, or
 216    +1, depending on whether the first key is less than, equal
 217    to, or greater than the second.
 218
 219    <note>
 220     <para>
 221      Strictly  speaking, this routine can return a negative
 222      number (&lt; 0), zero, or a non-zero positive number (&gt; 0).
 223     </para>
 224    </note>
 225   </para>
 226
 227   <para>
 228    The <filename>amstrategies</filename> entry in <filename>pg_am</filename>
 229    is just the number
 230    of strategies defined for the access method in question.  The operators
 231    for less than, less equal, and so on don't appear in
 232    <filename>pg_am</filename>.  Similarly, <filename>amsupport</filename>
 233    is just the number of support routines required by  the  access
 234    method.  The actual routines are listed elsewhere.
 235   </para>
 236
 237   <para>
 238    By the way, the <filename>amorderstrategy</filename> entry tells whether
 239    the access method supports ordered scan.  Zero means it doesn't; if it
 240    does, <filename>amorderstrategy</filename> is the number of the strategy
 241    routine that corresponds to the ordering operator.  For example, B-tree
 242    has <filename>amorderstrategy</filename> = 1 which is its
 243    <quote>less than</quote> strategy number.
 244   </para>
 245
 246   <para>
 247    The next table of interest is <filename>pg_opclass</filename>.  This table
 248    defines operator class names and input data types for each of the operator
 249    classes supported by a given index access method.  The same class name
 250    can be used for several different access methods (for example, both B-tree
 251    and hash access methods have operator classes named
 252    <filename>oid_ops</filename>), but a separate
 253    <filename>pg_opclass</filename> row must appear for each access method.
 254    The <filename>oid</filename> of the <filename>pg_opclass</filename> row is
 255    used as a foreign
 256    key in other tables to associate specific operators and support routines
 257    with the operator class.
 258   </para>
 259
 260   <para>
 261    You need to add a row with your opclass name (for example,
 262    <filename>complex_abs_ops</filename>) to
 263    <filename>pg_opclass</filename>:
 264
 265    <programlisting>
 266 INSERT INTO pg_opclass (opcamid, opcname, opcintype, opcdefault, opckeytype)
 267     VALUES (
 268         (SELECT oid FROM pg_am WHERE amname = 'btree'),
 269         'complex_abs_ops',
 270         (SELECT oid FROM pg_type WHERE typname = 'complex'),
 271         true,
 272         0);
 273
 274 SELECT oid, *
 275     FROM pg_opclass
 276     WHERE opcname = 'complex_abs_ops';
 277
 278   oid   | opcamid |     opcname     | opcintype | opcdefault | opckeytype
 279 --------+---------+-----------------+-----------+------------+------------
 280  277975 |     403 | complex_abs_ops |    277946 | t          |          0
 281 (1 row)
 282    </programlisting>
 283
 284    Note that the oid for your <filename>pg_opclass</filename> row will
 285    be different!  Don't worry about this though.  We'll get this number
 286    from the system later just like we got the oid of the type here.
 287   </para>
 288
 289   <para>
 290    The above example assumes that you want to make this new opclass the
 291    default B-tree opclass for the <filename>complex</filename> data type.
 292    If you don't, just set <filename>opcdefault</filename> to false instead.
 293    <filename>opckeytype</filename> is not described here; it should always
 294    be zero for B-tree opclasses.
 295   </para>
 296
 297   <para>
 298    So now we have an access method and an operator  class.
 299    We  still  need  a  set of operators.  The procedure for
 300    defining operators was discussed earlier in  this  manual.
 301    For  the  <filename>complex_abs_ops</filename>  operator  class on B-trees,
 302    the operators we require are:
 303
 304    <programlisting>
 305         absolute value less-than
 306         absolute value less-than-or-equal
 307         absolute value equal
 308         absolute value greater-than-or-equal
 309         absolute value greater-than
 310    </programlisting>
 311   </para>
 312
 313   <para>
 314    Suppose the code that implements these functions
 315    is stored in the file
 316    <replaceable>PGROOT</replaceable><filename>/tutorial/complex.c</filename>,
 317    which we have compiled into
 318    <replaceable>PGROOT</replaceable><filename>/tutorial/complex.so</filename>.
 319   </para>
 320
 321   <para>
 322    Part of the C code looks like this: (note that we will only show the
 323    equality operator for the rest of the examples.  The other four
 324    operators are very similar.  Refer to <filename>complex.c</filename>
 325    or <filename>complex.source</filename> for the details.)
 326
 327    <programlisting>
 328 #define Mag(c) ((c)-&gt;x*(c)-&gt;x + (c)-&gt;y*(c)-&gt;y)
 329
 330          bool
 331          complex_abs_eq(Complex *a, Complex *b)
 332          {
 333              double amag = Mag(a), bmag = Mag(b);
 334              return (amag==bmag);
 335          }
 336    </programlisting>
 337   </para>
 338
 339   <para>
 340    We make the function known to <productname>PostgreSQL</productname> like this:
 341    <programlisting>
 342 CREATE FUNCTION complex_abs_eq(complex, complex)
 343               RETURNS bool
 344               AS '<replaceable>PGROOT</replaceable>/tutorial/complex'
 345               LANGUAGE C;
 346    </programlisting>
 347   </para>
 348
 349   <para>
 350    There are some important things that are happening here.
 351   </para>
 352
 353   <para>
 354    First, note that operators for less-than, less-than-or-equal, equal,
 355    greater-than-or-equal, and greater-than for <filename>complex</filename>
 356    are being defined.  We can only have one operator named, say, = and
 357    taking type <filename>complex</filename> for both operands.  In this case
 358    we don't have any other operator = for <filename>complex</filename>,
 359    but if we were building a practical data type we'd probably want = to
 360    be the ordinary equality operation for complex numbers.  In that case,
 361    we'd need to use some other operator name for complex_abs_eq.
 362   </para>
 363
 364   <para>
 365    Second, although <productname>PostgreSQL</productname> can cope with operators having
 366    the same name as long as they have different input data types, C can only
 367    cope with one global routine having a given name, period.  So we shouldn't
 368    name the C function something simple like <filename>abs_eq</filename>.
 369    Usually it's a good practice to include the data type name in the C
 370    function name, so as not to conflict with functions for other data types.
 371   </para>
 372
 373   <para>
 374    Third, we could have made the <productname>PostgreSQL</productname> name of the function
 375    <filename>abs_eq</filename>, relying on <productname>PostgreSQL</productname> to distinguish it
 376    by input data types from any other <productname>PostgreSQL</productname> function of the same name.
 377    To keep the example simple, we make the function have the same names
 378    at the C level and <productname>PostgreSQL</productname> level.
 379   </para>
 380
 381   <para>
 382    Finally, note that these operator functions return Boolean values.
 383    In practice, all operators defined as index access method strategies
 384    must return Boolean, since they must appear at the top level of a WHERE
 385    clause to be used with an index.
 386    (On the other
 387    hand, the support function returns whatever the particular access method
 388    expects -- in this case, a signed integer.)
 389   </para>
 390
 391   <para>
 392    The final routine in the
 393    file is the <quote>support routine</quote> mentioned when we discussed the amsupport
 394    column of the <filename>pg_am</filename> table.  We will use this
 395    later on.  For now, ignore it.
 396   </para>
 397
 398   <para>
 399    Now we are ready to define the operators:
 400
 401    <programlisting>
 402 CREATE OPERATOR = (
 403      leftarg = complex, rightarg = complex,
 404      procedure = complex_abs_eq,
 405      restrict = eqsel, join = eqjoinsel
 406          );
 407    </programlisting>
 408
 409    The important
 410    things here are the procedure names (which are the <acronym>C</acronym>
 411    functions defined above) and the restriction and join selectivity
 412    functions.  You should just use the selectivity functions used in
 413    the example (see <filename>complex.source</filename>).
 414    Note that there
 415    are different such functions for the less-than, equal, and greater-than
 416    cases.  These must be supplied, or the optimizer will be unable to
 417    make effective use of the index.
 418   </para>
 419
 420   <para>
 421    The next step is to add entries for these operators to
 422    the <filename>pg_amop</filename> relation.  To do this,
 423    we'll need the <filename>oid</filename>s of the operators we just
 424    defined.  We'll look up the names of all the operators that take
 425    two <filename>complex</filename>es, and pick ours out:
 426
 427    <programlisting>
 428     SELECT o.oid AS opoid, o.oprname
 429      INTO TEMP TABLE complex_ops_tmp
 430      FROM pg_operator o, pg_type t
 431      WHERE o.oprleft = t.oid and o.oprright = t.oid
 432       and t.typname = 'complex';
 433
 434  opoid  | oprname
 435 --------+---------
 436  277963 | +
 437  277970 | &lt;
 438  277971 | &lt;=
 439  277972 | =
 440  277973 | &gt;=
 441  277974 | &gt;
 442 (6 rows)
 443    </programlisting>
 444
 445    (Again, some of your <filename>oid</filename> numbers will almost
 446    certainly be different.)  The operators we are interested in are those
 447    with <filename>oid</filename>s 277970 through 277974.  The values you
 448    get will probably be different, and you should substitute them for the
 449    values below.  We will do this with a select statement.
 450   </para>
 451
 452   <para>
 453    Now we are ready to insert entries into <filename>pg_amop</filename> for
 454    our new operator class.  These entries must associate the correct
 455    B-tree strategy numbers with each of the operators we need.
 456    The command to insert the less-than operator looks like:
 457
 458    <programlisting>
 459     INSERT INTO pg_amop (amopclaid, amopstrategy, amopreqcheck, amopopr)
 460         SELECT opcl.oid, 1, false, c.opoid
 461         FROM pg_opclass opcl, complex_ops_tmp c
 462         WHERE
 463             opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
 464             opcname = 'complex_abs_ops' AND
 465             c.oprname = '&lt;';
 466    </programlisting>
 467
 468    Now do this for the other operators substituting for the <literal>1</> in the
 469    second line above and the <literal>&lt;</> in the last line.  Note the order:
 470    <quote>less than</> is 1, <quote>less than or equal</> is 2,
 471    <quote>equal</> is 3, <quote>greater than or equal</quote> is 4, and
 472    <quote>greater than</quote> is 5.
 473   </para>
 474
 475   <para>
 476    The field <filename>amopreqcheck</filename> is not discussed here; it
 477    should always be false for B-tree operators.
 478   </para>
 479
 480   <para>
 481    The final step is registration of the <quote>support routine</quote> previously
 482    described in our discussion of <filename>pg_am</filename>.  The
 483    <filename>oid</filename> of this support routine is stored in the
 484    <filename>pg_amproc</filename> table, keyed by the operator class
 485    <filename>oid</filename> and the support routine number.
 486    First, we need to register the function in
 487    <productname>PostgreSQL</productname> (recall that we put the
 488    <acronym>C</acronym> code that implements this routine in the bottom of
 489    the file in which we implemented the operator routines):
 490
 491    <programlisting>
 492     CREATE FUNCTION complex_abs_cmp(complex, complex)
 493      RETURNS int4
 494      AS '<replaceable>PGROOT</replaceable>/tutorial/complex'
 495      LANGUAGE C;
 496
 497     SELECT oid, proname FROM pg_proc
 498      WHERE proname = 'complex_abs_cmp';
 499
 500   oid   |     proname
 501 --------+-----------------
 502  277997 | complex_abs_cmp
 503 (1 row)
 504    </programlisting>
 505
 506    (Again, your <filename>oid</filename> number will probably be different.)
 507    We can add the new row as follows:
 508
 509    <programlisting>
 510     INSERT INTO pg_amproc (amopclaid, amprocnum, amproc)
 511         SELECT opcl.oid, 1, p.oid
 512         FROM pg_opclass opcl, pg_proc p
 513         WHERE
 514             opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
 515             opcname = 'complex_abs_ops' AND
 516             p.proname = 'complex_abs_cmp';
 517    </programlisting>
 518   </para>
 519
 520   <para>
 521    And we're done!  (Whew.)  It should now be possible to create
 522    and use B-tree indexes on <filename>complex</filename> columns.
 523   </para>
 524
 525  </chapter>
 526
 527 <!-- Keep this comment at the end of the file
 528 Local variables:
 529 mode:sgml
 530 sgml-omittag:nil
 531 sgml-shorttag:t
 532 sgml-minimize-attributes:nil
 533 sgml-always-quote-attributes:t
 534 sgml-indent-step:1
 535 sgml-indent-data:t
 536 sgml-parent-document:nil
 537 sgml-default-dtd-file:"./reference.ced"
 538 sgml-exposed-tags:nil
 539 sgml-local-catalogs:("/usr/lib/sgml/catalog")
 540 sgml-local-ecat-files:nil
 541 End:
 542 -->