granicus.if.org Git - postgresql/blob - doc/src/sgml/xindex.sgml

   1 <!--
   2 $Header: /cvsroot/pgsql/doc/src/sgml/xindex.sgml,v 1.18 2001/09/10 21:58:47 petere Exp $
   3 Postgres documentation
   4 -->
   5
   6  <chapter id="xindex">
   7   <title>Interfacing Extensions To Indexes</title>
   8
   9   <para>
  10    The procedures described thus far let you define a new type, new
  11    functions and new operators.  However, we cannot yet define a secondary
  12    index (such as a <acronym>B-tree</acronym>, <acronym>R-tree</acronym> or
  13    hash access method) over a new type or its operators.
  14   </para>
  15
  16   <para>
  17    Look back at
  18    <xref linkend="EXTEND-CATALOGS">.
  19    The right half shows the  catalogs  that we must modify in order to tell
  20    <productname>Postgres</productname> how to use a user-defined type and/or
  21    user-defined  operators with an index (i.e., <filename>pg_am, pg_amop,
  22     pg_amproc, pg_operator</filename> and <filename>pg_opclass</filename>).
  23    Unfortunately, there is no simple command to do this.  We will demonstrate
  24    how to modify these catalogs through a running example:  a  new  operator
  25    class for the <acronym>B-tree</acronym> access method that stores and
  26    sorts complex numbers in ascending absolute value order.
  27   </para>
  28
  29   <para>
  30    The <filename>pg_am</filename> table contains one row for every index
  31    access method.  Support for the heap access method is built into
  32    <productname>Postgres</productname>, but every other access method is
  33    described in <filename>pg_am</filename>.  The schema is
  34
  35    <table tocentry="1">
  36     <title>Index Access Method Schema</title>
  37
  38     <tgroup cols="2">
  39      <thead>
  40       <row>
  41        <entry>Column</entry>
  42        <entry>Description</entry>
  43       </row>
  44      </thead>
  45      <tbody>
  46       <row>
  47        <entry>amname</entry>
  48        <entry>name of the access method</entry>
  49       </row>
  50       <row>
  51        <entry>amowner</entry>
  52        <entry>user id of the owner</entry>
  53       </row>
  54       <row>
  55        <entry>amstrategies</entry>
  56        <entry>number of strategies for this access method (see below)</entry>
  57       </row>
  58       <row>
  59        <entry>amsupport</entry>
  60        <entry>number of support routines for this access method (see below)</entry>
  61       </row>
  62       <row>
  63        <entry>amorderstrategy</entry>
  64        <entry>zero if the index offers no sort order, otherwise the strategy
  65         number of the strategy operator that describes the sort order</entry>
  66       </row>
  67       <row>
  68        <entry>amcanunique</entry>
  69        <entry>does AM support UNIQUE indexes?</entry>
  70       </row>
  71       <row>
  72        <entry>amcanmulticol</entry>
  73        <entry>does AM support multicolumn indexes?</entry>
  74       </row>
  75       <row>
  76        <entry>amindexnulls</entry>
  77        <entry>does AM support NULL index entries?</entry>
  78       </row>
  79       <row>
  80        <entry>amconcurrent</entry>
  81        <entry>does AM support concurrent updates?</entry>
  82       </row>
  83       <row>
  84        <entry>amgettuple</entry>
  85       </row>
  86       <row>
  87        <entry>aminsert</entry>
  88       </row>
  89       <row>
  90        <entry>...</entry>
  91        <entry>procedure  identifiers  for  interface routines to the access
  92         method.  For example, regproc ids for opening,  closing,  and
  93         getting rows from the access method appear here.</entry>
  94       </row>
  95      </tbody>
  96     </tgroup>
  97    </table>
  98   </para>
  99
 100   <para>
 101    The <acronym>object ID</acronym> of the row in
 102    <filename>pg_am</filename> is used as a foreign key in a lot of other
 103    tables.  You  do not  need to  add a new row to this table; all that
 104    you are interested in is the <acronym>object ID</acronym> of the access
 105    method you want to extend:
 106
 107    <programlisting>
 108 SELECT oid FROM pg_am WHERE amname = 'btree';
 109
 110  oid
 111 -----
 112  403
 113 (1 row)
 114    </programlisting>
 115
 116    We will use that <command>SELECT</command> in a <command>WHERE</command>
 117    clause later.
 118   </para>
 119
 120   <para>
 121    The <filename>amstrategies</filename> column exists to standardize
 122    comparisons across data types.  For example, <acronym>B-tree</acronym>s
 123    impose a strict ordering on keys, lesser to greater.  Since
 124    <productname>Postgres</productname> allows the user to define operators,
 125    <productname>Postgres</productname> cannot look at the name of an operator
 126    (e.g., "&gt;" or "&lt;") and tell what kind of comparison it is.  In fact,
 127    some  access methods don't impose any ordering at all.  For example,
 128    <acronym>R-tree</acronym>s express a rectangle-containment relationship,
 129    whereas a hashed data structure expresses only bitwise similarity based
 130    on the value of a hash function.  <productname>Postgres</productname>
 131    needs some consistent way of taking a qualification in your query,
 132    looking at the operator and then deciding if a usable index exists.  This
 133    implies that <productname>Postgres</productname> needs to know, for
 134    example, that the  "&lt;="  and  "&gt;" operators partition a
 135    <acronym>B-tree</acronym>.  <productname>Postgres</productname>
 136    uses strategies to express these relationships  between
 137    operators and the way they can be used to scan indexes.
 138   </para>
 139
 140   <para>
 141    Defining a new set of strategies is beyond the scope of this discussion,
 142    but we'll explain how <acronym>B-tree</acronym> strategies work because
 143    you'll need to know that to add a new B-tree operator class. In the
 144    <filename>pg_am</filename> table, the amstrategies column is the
 145    number of strategies defined for this access method. For
 146    <acronym>B-tree</acronym>s, this number is 5.  These strategies
 147    correspond to
 148
 149    <table tocentry="1">
 150     <title>B-tree Strategies</title>
 151     <titleabbrev>B-tree</titleabbrev>
 152     <tgroup cols="2">
 153      <thead>
 154       <row>
 155        <entry>Operation</entry>
 156        <entry>Index</entry>
 157       </row>
 158      </thead>
 159      <tbody>
 160       <row>
 161        <entry>less than</entry>
 162        <entry>1</entry>
 163       </row>
 164       <row>
 165        <entry>less than or equal</entry>
 166        <entry>2</entry>
 167       </row>
 168       <row>
 169        <entry>equal</entry>
 170        <entry>3</entry>
 171       </row>
 172       <row>
 173        <entry>greater than or equal</entry>
 174        <entry>4</entry>
 175       </row>
 176       <row>
 177        <entry>greater than</entry>
 178        <entry>5</entry>
 179       </row>
 180      </tbody>
 181     </tgroup>
 182    </table>
 183   </para>
 184
 185   <para>
 186    The idea is that you'll need to add operators corresponding to the
 187    comparisons above to the <filename>pg_amop</filename> relation (see below).
 188    The access method code can use these strategy numbers, regardless of data
 189    type, to figure out how to partition the <acronym>B-tree</acronym>,
 190    compute selectivity, and so on.  Don't worry about the details of adding
 191    operators yet; just understand that there must be a set of these
 192    operators for <filename>int2, int4, oid,</filename> and every other
 193    data type on which a <acronym>B-tree</acronym> can operate.
 194   </para>
 195
 196   <para>
 197    Sometimes, strategies aren't enough information for the system to figure
 198    out how to use an index.  Some access methods require additional support
 199    routines in order to work. For example, the <acronym>B-tree</acronym>
 200    access method must be able to compare two keys and determine whether one
 201    is greater than, equal to, or less than the other.  Similarly, the
 202    <acronym>R-tree</acronym> access method must be able to compute
 203    intersections,  unions, and sizes of rectangles.  These
 204    operations do not correspond to operators used in qualifications in
 205    SQL queries;  they are administrative routines used by
 206    the access methods, internally.
 207   </para>
 208
 209   <para>
 210    In order to manage diverse support routines consistently across all
 211    <productname>Postgres</productname> access methods,
 212    <filename>pg_am</filename> includes a column called
 213    <filename>amsupport</filename>.  This column records the number of
 214    support routines used by an access method.  For <acronym>B-tree</acronym>s,
 215    this number is one -- the routine to take two keys and return -1, 0, or
 216    +1, depending on whether the first key is less than, equal
 217    to, or greater than the second.
 218
 219    <note>
 220     <para>
 221      Strictly  speaking, this routine can return a negative
 222      number (&lt; 0), zero, or a non-zero positive number (&gt; 0).
 223     </para>
 224    </note>
 225   </para>
 226
 227   <para>
 228    The <filename>amstrategies</filename> entry in <filename>pg_am</filename>
 229    is just the number
 230    of strategies defined for the access method in question.  The operators
 231    for less than, less equal, and so on don't appear in
 232    <filename>pg_am</filename>.  Similarly, <filename>amsupport</filename>
 233    is just the number of support routines required by  the  access
 234    method.  The actual routines are listed elsewhere.
 235   </para>
 236
 237   <para>
 238    By the way, the <filename>amorderstrategy</filename> entry tells whether
 239    the access method supports ordered scan.  Zero means it doesn't; if it
 240    does, <filename>amorderstrategy</filename> is the number of the strategy
 241    routine that corresponds to the ordering operator.  For example, B-tree
 242    has <filename>amorderstrategy</filename> = 1 which is its
 243    "less than" strategy number.
 244   </para>
 245
 246   <para>
 247    The next table of interest is <filename>pg_opclass</filename>.  This table
 248    defines operator class names and input data types for each of the operator
 249    classes supported by a given index access method.  The same class name
 250    can be used for several different access methods (for example, both B-tree
 251    and hash access methods have operator classes named
 252    <filename>oid_ops</filename>), but a separate
 253    <filename>pg_opclass</filename> row must appear for each access method.
 254    The <filename>oid</filename> of the <filename>pg_opclass</filename> row is
 255    used as a foreign
 256    key in other tables to associate specific operators and support routines
 257    with the operator class.
 258   </para>
 259
 260   <para>
 261    You need to add a row with your opclass name (for example,
 262    <filename>complex_abs_ops</filename>) to
 263    <filename>pg_opclass</filename>:
 264
 265    <programlisting>
 266 INSERT INTO pg_opclass (opcamid, opcname, opcintype, opcdefault, opckeytype)
 267     VALUES (
 268         (SELECT oid FROM pg_am WHERE amname = 'btree'),
 269         'complex_abs_ops',
 270         (SELECT oid FROM pg_type WHERE typname = 'complex'),
 271         true,
 272         0);
 273
 274 SELECT oid, *
 275     FROM pg_opclass
 276     WHERE opcname = 'complex_abs_ops';
 277
 278   oid   | opcamid |     opcname     | opcintype | opcdefault | opckeytype
 279 --------+---------+-----------------+-----------+------------+------------
 280  277975 |     403 | complex_abs_ops |    277946 | t          |          0
 281 (1 row)
 282    </programlisting>
 283
 284    Note that the oid for your <filename>pg_opclass</filename> row will
 285    be different!  Don't worry about this though.  We'll get this number
 286    from the system later just like we got the oid of the type here.
 287   </para>
 288
 289   <para>
 290    The above example assumes that you want to make this new opclass the
 291    default B-tree opclass for the <filename>complex</filename> data type.
 292    If you don't, just set <filename>opcdefault</filename> to false instead.
 293    <filename>opckeytype</filename> is not described here; it should always
 294    be zero for B-tree opclasses.
 295   </para>
 296
 297   <para>
 298    So now we have an access method and an operator  class.
 299    We  still  need  a  set of operators.  The procedure for
 300    defining operators was discussed earlier in  this  manual.
 301    For  the  <filename>complex_abs_ops</filename>  operator  class on B-trees,
 302    the operators we require are:
 303
 304    <programlisting>
 305         absolute value less-than
 306         absolute value less-than-or-equal
 307         absolute value equal
 308         absolute value greater-than-or-equal
 309         absolute value greater-than
 310    </programlisting>
 311   </para>
 312
 313   <para>
 314    Suppose the code that implements these functions
 315    is stored in the file
 316    <filename>PGROOT/src/tutorial/complex.c</filename>.
 317   </para>
 318
 319   <para>
 320    Part of the C code looks like this: (note that we will only show the
 321    equality operator for the rest of the examples.  The other four
 322    operators are very similar.  Refer to <filename>complex.c</filename>
 323    or <filename>complex.source</filename> for the details.)
 324
 325    <programlisting>
 326 #define Mag(c) ((c)-&gt;x*(c)-&gt;x + (c)-&gt;y*(c)-&gt;y)
 327
 328          bool
 329          complex_abs_eq(Complex *a, Complex *b)
 330          {
 331              double amag = Mag(a), bmag = Mag(b);
 332              return (amag==bmag);
 333          }
 334    </programlisting>
 335   </para>
 336
 337   <para>
 338    We make the function known to Postgres like this:
 339    <programlisting>
 340 CREATE FUNCTION complex_abs_eq(complex, complex)
 341               RETURNS bool
 342               AS 'PGROOT/tutorial/obj/complex.so'
 343               LANGUAGE 'c';
 344    </programlisting>
 345   </para>
 346
 347   <para>
 348    There are some important things that are happening here.
 349   </para>
 350
 351   <para>
 352    First, note that operators for less-than, less-than-or-equal, equal,
 353    greater-than-or-equal, and greater-than for <filename>complex</filename>
 354    are being defined.  We can only have one operator named, say, = and
 355    taking type <filename>complex</filename> for both operands.  In this case
 356    we don't have any other operator = for <filename>complex</filename>,
 357    but if we were building a practical data type we'd probably want = to
 358    be the ordinary equality operation for complex numbers.  In that case,
 359    we'd need to use some other operator name for complex_abs_eq.
 360   </para>
 361
 362   <para>
 363    Second, although Postgres can cope with operators having
 364    the same name as long as they have different input data types, C can only
 365    cope with one global routine having a given name, period.  So we shouldn't
 366    name the C function something simple like <filename>abs_eq</filename>.
 367    Usually it's a good practice to include the data type name in the C
 368    function name, so as not to conflict with functions for other data types.
 369   </para>
 370
 371   <para>
 372    Third, we could have made the Postgres name of the function
 373    <filename>abs_eq</filename>, relying on Postgres to distinguish it
 374    by input data types from any other Postgres function of the same name.
 375    To keep the example simple, we make the function have the same names
 376    at the C level and Postgres level.
 377   </para>
 378
 379   <para>
 380    Finally, note that these operator functions return Boolean values.
 381    In practice, all operators defined as index access method strategies
 382    must return Boolean, since they must appear at the top level of a WHERE
 383    clause to be used with an index.
 384    (On the other
 385    hand, the support function returns whatever the particular access method
 386    expects -- in this case, a signed integer.)
 387   </para>
 388
 389   <para>
 390    The final routine in the
 391    file is the "support routine" mentioned when we discussed the amsupport
 392    column of the <filename>pg_am</filename> table.  We will use this
 393    later on.  For now, ignore it.
 394   </para>
 395
 396   <para>
 397    Now we are ready to define the operators:
 398
 399    <programlisting>
 400 CREATE OPERATOR = (
 401      leftarg = complex, rightarg = complex,
 402      procedure = complex_abs_eq,
 403      restrict = eqsel, join = eqjoinsel
 404          );
 405    </programlisting>
 406
 407    The important
 408    things here are the procedure names (which are the <acronym>C</acronym>
 409    functions defined above) and the restriction and join selectivity
 410    functions.  You should just use the selectivity functions used in
 411    the example (see <filename>complex.source</filename>).
 412    Note that there
 413    are different such functions for the less-than, equal, and greater-than
 414    cases.  These must be supplied, or the optimizer will be unable to
 415    make effective use of the index.
 416   </para>
 417
 418   <para>
 419    The next step is to add entries for these operators to
 420    the <filename>pg_amop</filename> relation.  To do this,
 421    we'll need the <filename>oid</filename>s of the operators we just
 422    defined.  We'll look up the names of all the operators that take
 423    two <filename>complex</filename>es, and pick ours out:
 424
 425    <programlisting>
 426     SELECT o.oid AS opoid, o.oprname
 427      INTO TEMP TABLE complex_ops_tmp
 428      FROM pg_operator o, pg_type t
 429      WHERE o.oprleft = t.oid and o.oprright = t.oid
 430       and t.typname = 'complex';
 431
 432  opoid  | oprname
 433 --------+---------
 434  277963 | +
 435  277970 | &lt;
 436  277971 | &lt;=
 437  277972 | =
 438  277973 | &gt;=
 439  277974 | &gt;
 440 (6 rows)
 441    </programlisting>
 442
 443    (Again, some of your <filename>oid</filename> numbers will almost
 444    certainly be different.)  The operators we are interested in are those
 445    with <filename>oid</filename>s 277970 through 277974.  The values you
 446    get will probably be different, and you should substitute them for the
 447    values below.  We will do this with a select statement.
 448   </para>
 449
 450   <para>
 451    Now we are ready to insert entries into <filename>pg_amop</filename> for
 452    our new operator class.  These entries must associate the correct
 453    B-tree strategy numbers with each of the operators we need.
 454    The command to insert the less-than operator looks like:
 455
 456    <programlisting>
 457     INSERT INTO pg_amop (amopclaid, amopstrategy, amopreqcheck, amopopr)
 458         SELECT opcl.oid, 1, false, c.opoid
 459         FROM pg_opclass opcl, complex_ops_tmp c
 460         WHERE
 461             opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
 462             opcname = 'complex_abs_ops' AND
 463             c.oprname = '&lt;';
 464    </programlisting>
 465
 466    Now do this for the other operators substituting for the "1" in the
 467    second line above and the "&lt;" in the last line.  Note the order:
 468    "less than" is 1, "less than or equal" is 2, "equal" is 3, "greater
 469    than or equal" is 4, and "greater than" is 5.
 470   </para>
 471
 472   <para>
 473    The field <filename>amopreqcheck</filename> is not discussed here; it
 474    should always be false for B-tree operators.
 475   </para>
 476
 477   <para>
 478    The final step is registration of the "support routine" previously
 479    described in our discussion of <filename>pg_am</filename>.  The
 480    <filename>oid</filename> of this support routine is stored in the
 481    <filename>pg_amproc</filename> table, keyed by the operator class
 482    <filename>oid</filename> and the support routine number.
 483    First, we need to register the function in
 484    <productname>Postgres</productname> (recall that we put the
 485    <acronym>C</acronym> code that implements this routine in the bottom of
 486    the file in which we implemented the operator routines):
 487
 488    <programlisting>
 489     CREATE FUNCTION complex_abs_cmp(complex, complex)
 490      RETURNS int4
 491      AS 'PGROOT/tutorial/obj/complex.so'
 492      LANGUAGE 'c';
 493
 494     SELECT oid, proname FROM pg_proc
 495      WHERE proname = 'complex_abs_cmp';
 496
 497   oid   |     proname
 498 --------+-----------------
 499  277997 | complex_abs_cmp
 500 (1 row)
 501    </programlisting>
 502
 503    (Again, your <filename>oid</filename> number will probably be different.)
 504    We can add the new row as follows:
 505
 506    <programlisting>
 507     INSERT INTO pg_amproc (amopclaid, amprocnum, amproc)
 508         SELECT opcl.oid, 1, p.oid
 509         FROM pg_opclass opcl, pg_proc p
 510         WHERE
 511             opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
 512             opcname = 'complex_abs_ops' AND
 513             p.proname = 'complex_abs_cmp';
 514    </programlisting>
 515   </para>
 516
 517   <para>
 518    And we're done!  (Whew.)  It should now be possible to create
 519    and use B-tree indexes on <filename>complex</filename> columns.
 520   </para>
 521
 522  </chapter>
 523
 524 <!-- Keep this comment at the end of the file
 525 Local variables:
 526 mode:sgml
 527 sgml-omittag:nil
 528 sgml-shorttag:t
 529 sgml-minimize-attributes:nil
 530 sgml-always-quote-attributes:t
 531 sgml-indent-step:1
 532 sgml-indent-data:t
 533 sgml-parent-document:nil
 534 sgml-default-dtd-file:"./reference.ced"
 535 sgml-exposed-tags:nil
 536 sgml-local-catalogs:("/usr/lib/sgml/catalog")
 537 sgml-local-ecat-files:nil
 538 End:
 539 -->