granicus.if.org Git - postgresql/blob - doc/src/sgml/xindex.sgml

   1 <!--
   2 $Header: /cvsroot/pgsql/doc/src/sgml/xindex.sgml,v 1.9 2000/03/28 02:53:02 tgl Exp $
   3 Postgres documentation
   4 -->
   5
   6  <chapter id="xindex">
   7   <title>Interfacing Extensions To Indices</title>
   8
   9   <para>
  10    The procedures described thus far let you define a new type, new
  11    functions and new operators.  However, we cannot yet define a secondary
  12    index (such as a <acronym>B-tree</acronym>, <acronym>R-tree</acronym> or
  13    hash access method) over a new type or its operators.
  14   </para>
  15
  16   <para>
  17    Look back at
  18    <xref endterm="EXTEND-CATALOGS" linkend="EXTEND-CATALOGS">.
  19    The right half shows the  catalogs  that we must modify in order to tell
  20    <productname>Postgres</productname> how to use a user-defined type and/or
  21    user-defined  operators with an index (i.e., <filename>pg_am, pg_amop,
  22     pg_amproc, pg_operator</filename> and <filename>pg_opclass</filename>).
  23    Unfortunately, there is no simple command to do this.  We will demonstrate
  24    how to modify these catalogs through a running example:  a  new  operator
  25    class for the <acronym>B-tree</acronym> access method that stores and
  26    sorts complex numbers in ascending absolute value order.
  27   </para>
  28
  29   <para>
  30    The <filename>pg_am</filename> class contains one instance for every user
  31    defined access method.  Support for the heap access method is built into
  32    <productname>Postgres</productname>, but every other access method is
  33    described here.  The schema is
  34
  35    <table tocentry="1">
  36     <title>Index Schema</title>
  37     <titleabbrev>Indices</titleabbrev>
  38     <tgroup cols="2">
  39      <thead>
  40       <row>
  41        <entry>Attribute</entry>
  42        <entry>Description</entry>
  43       </row>
  44      </thead>
  45      <tbody>
  46       <row>
  47        <entry>amname</entry>
  48        <entry>name of the access method</entry>
  49       </row>
  50       <row>
  51        <entry>amowner</entry>
  52        <entry>object id of the owner's instance in pg_user</entry>
  53       </row>
  54       <row>
  55        <entry>amstrategies</entry>
  56        <entry>number of strategies for this access method (see below)</entry>
  57       </row>
  58       <row>
  59        <entry>amsupport</entry>
  60        <entry>number of support routines for this access method (see below)</entry>
  61       </row>
  62       <row>
  63        <entry>amorderstrategy</entry>
  64        <entry>zero if the index offers no sort order, otherwise the strategy
  65         number of the strategy operator that describes the sort order</entry>
  66       </row>
  67       <row>
  68        <entry>amgettuple</entry>
  69       </row>
  70       <row>
  71        <entry>aminsert</entry>
  72       </row>
  73       <row>
  74        <entry>...</entry>
  75        <entry>procedure  identifiers  for  interface routines to the access
  76         method.  For example, regproc ids for opening,  closing,  and
  77         getting instances from the access method appear here.</entry>
  78       </row>
  79      </tbody>
  80     </tgroup>
  81    </table>
  82   </para>
  83
  84   <para>
  85    The <acronym>object ID</acronym> of the instance in
  86    <filename>pg_am</filename> is used as a foreign key in lots of other
  87    classes.  You  don't  need to  add a new instance to this class; all
  88    you're interested in is the <acronym>object ID</acronym> of the access
  89    method instance you want to extend:
  90
  91    <programlisting>
  92 SELECT oid FROM pg_am WHERE amname = 'btree';
  93
  94  oid
  95 -----
  96  403
  97 (1 row)
  98    </programlisting>
  99
 100    We will use that <command>SELECT</command> in a <command>WHERE</command>
 101    clause later.
 102   </para>
 103
 104   <para>
 105    The <filename>amstrategies</filename> attribute exists to standardize
 106    comparisons across data types.  For example, <acronym>B-tree</acronym>s
 107    impose a strict ordering on keys, lesser to greater.  Since
 108    <productname>Postgres</productname> allows the user to define operators,
 109    <productname>Postgres</productname> cannot look at the name of an operator
 110    (eg, ">" or "<") and tell what kind of comparison it is.  In fact,
 111    some  access methods don't impose any ordering at all.  For example,
 112    <acronym>R-tree</acronym>s express a rectangle-containment relationship,
 113    whereas a hashed data structure expresses only bitwise similarity based
 114    on the value of a hash function.  <productname>Postgres</productname>
 115    needs some consistent way of taking a qualification in your query,
 116    looking at the operator and then deciding if a usable index exists.  This
 117    implies that <productname>Postgres</productname> needs to know, for
 118    example, that the  "<="  and  ">" operators partition a
 119    <acronym>B-tree</acronym>.  <productname>Postgres</productname>
 120    uses strategies to express these relationships  between
 121    operators and the way they can be used to scan indices.
 122   </para>
 123
 124   <para>
 125    Defining a new set of strategies is beyond the scope of this discussion,
 126    but we'll explain how <acronym>B-tree</acronym> strategies work because
 127    you'll need to know that to add a new operator class. In the
 128    <filename>pg_am</filename> class, the amstrategies attribute is the
 129    number of strategies defined for this access method. For
 130    <acronym>B-tree</acronym>s, this number is 5.  These strategies
 131    correspond to
 132
 133    <table tocentry="1">
 134     <title>B-tree Strategies</title>
 135     <titleabbrev>B-tree</titleabbrev>
 136     <tgroup cols="2">
 137      <thead>
 138       <row>
 139        <entry>Operation</entry>
 140        <entry>Index</entry>
 141       </row>
 142      </thead>
 143      <tbody>
 144       <row>
 145        <entry>less than</entry>
 146        <entry>1</entry>
 147       </row>
 148       <row>
 149        <entry>less than or equal</entry>
 150        <entry>2</entry>
 151       </row>
 152       <row>
 153        <entry>equal</entry>
 154        <entry>3</entry>
 155       </row>
 156       <row>
 157        <entry>greater than or equal</entry>
 158        <entry>4</entry>
 159       </row>
 160       <row>
 161        <entry>greater than</entry>
 162        <entry>5</entry>
 163       </row>
 164      </tbody>
 165     </tgroup>
 166    </table>
 167   </para>
 168
 169   <para>
 170    The idea is that you'll need to add procedures corresponding to the
 171    comparisons above to the <filename>pg_amop</filename> relation (see below).
 172    The access method code can use these strategy numbers, regardless of data
 173    type, to figure out how to partition the <acronym>B-tree</acronym>,
 174    compute selectivity, and so on.  Don't worry about the details of adding
 175    procedures yet; just understand that there must be a set of these
 176    procedures for <filename>int2, int4, oid,</filename> and every other
 177    data type on which a <acronym>B-tree</acronym> can operate.
 178   </para>
 179
 180   <para>
 181    Sometimes, strategies aren't enough information for the system to figure
 182    out how to use an index.  Some access methods require other support
 183    routines in order to work. For example, the <acronym>B-tree</acronym>
 184    access method must be able to compare two keys and determine whether one
 185    is greater than, equal to, or less than the other.  Similarly, the
 186    <acronym>R-tree</acronym> access method must be able to compute
 187    intersections,  unions, and sizes of rectangles.  These
 188    operations do not correspond to user qualifications in
 189    SQL queries;  they are administrative routines used by
 190    the access methods, internally.
 191   </para>
 192
 193   <para>
 194    In order to manage diverse support routines consistently across all
 195    <productname>Postgres</productname> access methods,
 196    <filename>pg_am</filename> includes an attribute called
 197    <filename>amsupport</filename>.  This attribute records the number of
 198    support routines used by an access method.  For <acronym>B-tree</acronym>s,
 199    this number is one -- the routine to take two keys and return -1, 0, or
 200    +1, depending on whether the first key is less than, equal
 201    to, or greater than the second.
 202
 203    <note>
 204     <para>
 205      Strictly  speaking, this routine can return a negative
 206      number (< 0), 0, or a non-zero positive number (> 0).
 207     </para>
 208    </note>
 209   </para>
 210
 211   <para>
 212    The <filename>amstrategies</filename> entry in <filename>pg_am</filename>
 213    is just the number
 214    of strategies defined for the access method in question.  The procedures
 215    for less than, less equal, and so on don't appear in
 216    <filename>pg_am</filename>.  Similarly, <filename>amsupport</filename>
 217    is just the number of support routines required by  the  access
 218    method.  The actual routines are listed elsewhere.
 219   </para>
 220
 221   <para>
 222    By the way, the <filename>amorderstrategy</filename> entry tells whether
 223    the access method supports ordered scan.  Zero means it doesn't; if it
 224    does, <filename>amorderstrategy</filename> is the number of the strategy
 225    routine that corresponds to the ordering operator.  For example, btree
 226    has <filename>amorderstrategy</filename> = 1 which is its
 227    "less than" strategy number.
 228   </para>
 229
 230   <para>
 231    The next class of interest is <filename>pg_opclass</filename>.  This class
 232    exists only to associate an operator class name and perhaps a default type
 233    with an operator class oid. Some existing opclasses are <filename>int2_ops,
 234    int4_ops,</filename> and <filename>oid_ops</filename>.  You need to add an
 235    instance with your opclass name (for example,
 236    <filename>complex_abs_ops</filename>) to
 237    <filename>pg_opclass</filename>.  The <filename>oid</filename> of
 238    this instance will be a foreign key in other classes, notably
 239    <filename>pg_amop</filename>.
 240
 241    <programlisting>
 242 INSERT INTO pg_opclass (opcname, opcdeftype)
 243     SELECT 'complex_abs_ops', oid FROM pg_type WHERE typname = 'complex';
 244
 245 SELECT oid, opcname, opcdeftype
 246     FROM pg_opclass
 247     WHERE opcname = 'complex_abs_ops';
 248
 249   oid   |     opcname     | opcdeftype
 250 --------+-----------------+------------
 251  277975 | complex_abs_ops |     277946
 252 (1 row)
 253    </programlisting>
 254
 255    Note that the oid for your <filename>pg_opclass</filename> instance will
 256    be different!  Don't worry about this though.  We'll get this number
 257    from the system later just like we got the oid of the type here.
 258   </para>
 259
 260   <para>
 261    The above example assumes that you want to make this new opclass the
 262    default index opclass for the <filename>complex</filename> datatype.
 263    If you don't, just insert zero into <filename>opcdeftype</filename>,
 264    rather than inserting the datatype's oid:
 265
 266    <programlisting>
 267 INSERT INTO pg_opclass (opcname, opcdeftype) VALUES ('complex_abs_ops', 0);
 268    </programlisting>
 269
 270   </para>
 271
 272   <para>
 273    So now we have an access method and an operator  class.
 274    We  still  need  a  set of operators.  The procedure for
 275    defining operators was discussed earlier in  this  manual.
 276    For  the  <filename>complex_abs_ops</filename>  operator  class on Btrees,
 277    the operators we require are:
 278
 279    <programlisting>
 280         absolute value less-than
 281         absolute value less-than-or-equal
 282         absolute value equal
 283         absolute value greater-than-or-equal
 284         absolute value greater-than
 285    </programlisting>
 286   </para>
 287
 288   <para>
 289    Suppose the code that implements the functions  defined
 290    is stored in the file
 291    <filename>PGROOT/src/tutorial/complex.c</filename>
 292   </para>
 293
 294   <para>
 295    Part of the C code looks like this: (note that we will only show the
 296    equality operator for the rest of the examples.  The other four
 297    operators are very similar.  Refer to <filename>complex.c</filename>
 298    or <filename>complex.source</filename> for the details.)
 299
 300    <programlisting>
 301 #define Mag(c) ((c)-&gt;x*(c)-&gt;x + (c)-&gt;y*(c)-&gt;y)
 302
 303          bool
 304          complex_abs_eq(Complex *a, Complex *b)
 305          {
 306              double amag = Mag(a), bmag = Mag(b);
 307              return (amag==bmag);
 308          }
 309    </programlisting>
 310   </para>
 311
 312   <para>
 313    We make the function known to Postgres like this:
 314    <programlisting>
 315 CREATE FUNCTION complex_abs_eq(complex, complex)
 316               RETURNS bool
 317               AS 'PGROOT/tutorial/obj/complex.so'
 318               LANGUAGE 'c';
 319    </programlisting>
 320   </para>
 321
 322   <para>
 323    There are some important things that are happening here.
 324   </para>
 325
 326   <para>
 327    First, note that operators for less-than, less-than-or-equal, equal,
 328    greater-than-or-equal, and greater-than for <filename>complex</filename>
 329    are being defined.  We can only have one operator named, say, = and
 330    taking type <filename>complex</filename> for both operands.  In this case
 331    we don't have any other operator = for <filename>complex</filename>,
 332    but if we were building a practical datatype we'd probably want = to
 333    be the ordinary equality operation for complex numbers.  In that case,
 334    we'd need to use some other operator name for complex_abs_eq.
 335   </para>
 336
 337   <para>
 338    Second, although Postgres can cope with operators having
 339    the same name as long as they have different input datatypes, C can only
 340    cope with one global routine having a given name, period.  So we shouldn't
 341    name the C function something simple like <filename>abs_eq</filename>.
 342    Usually it's a good practice to include the datatype name in the C
 343    function name, so as not to conflict with functions for other datatypes.
 344   </para>
 345
 346   <para>
 347    Third, we could have made the Postgres name of the function
 348    <filename>abs_eq</filename>, relying on Postgres to distinguish it
 349    by input datatypes from any other Postgres function of the same name.
 350    To keep the example simple, we make the function have the same names
 351    at the C level and Postgres level.
 352   </para>
 353
 354   <para>
 355    Finally, note that these operator functions return Boolean values.
 356    The access methods rely on this fact.  (On the other
 357    hand, the support function returns whatever the particular access method
 358    expects -- in this case, a signed integer.) The final routine in the
 359    file is the "support routine" mentioned when we discussed the amsupport
 360    attribute of the <filename>pg_am</filename> class.  We will use this
 361    later on.  For now, ignore it.
 362   </para>
 363
 364   <para>
 365    Now we are ready to define the operators:
 366
 367    <programlisting>
 368 CREATE OPERATOR = (
 369      leftarg = complex, rightarg = complex,
 370      procedure = complex_abs_eq,
 371      restrict = eqsel, join = eqjoinsel
 372          )
 373    </programlisting>
 374
 375    The important
 376    things here are the procedure names (which are the <acronym>C</acronym>
 377    functions defined above) and the restriction and join selectivity
 378    functions.  You should just use the selectivity functions used in
 379    the example (see <filename>complex.source</filename>).
 380    Note that there
 381    are different such functions for the less-than, equal, and greater-than
 382    cases.  These must be supplied, or the optimizer will be unable to
 383    make effective use of the index.
 384   </para>
 385
 386   <para>
 387    The next step is to add entries for these operators to
 388    the <filename>pg_amop</filename> relation.  To do this,
 389    we'll need the <filename>oid</filename>s of the operators we just
 390    defined.  We'll look up the names of all the operators that take
 391    two <filename>complex</filename>es, and pick ours out:
 392
 393    <programlisting>
 394     SELECT o.oid AS opoid, o.oprname
 395      INTO TABLE complex_ops_tmp
 396      FROM pg_operator o, pg_type t
 397      WHERE o.oprleft = t.oid and o.oprright = t.oid
 398       and t.typname = 'complex';
 399
 400  opoid  | oprname
 401 --------+---------
 402  277963 | +
 403  277970 | &lt;
 404  277971 | &lt;=
 405  277972 | =
 406  277973 | &gt;=
 407  277974 | &gt;
 408 (6 rows)
 409    </programlisting>
 410
 411    (Again, some of your <filename>oid</filename> numbers will almost
 412    certainly be different.)  The operators we are interested in are those
 413    with <filename>oid</filename>s 277970 through 277974.  The values you
 414    get will probably be different, and you should substitute them for the
 415    values below.  We will do this with a select statement.
 416   </para>
 417
 418   <para>
 419    Now we're ready to update <filename>pg_amop</filename> with our new
 420    operator class.  The most important thing in this entire discussion
 421    is that the operators are ordered, from less than through greater
 422    than, in <filename>pg_amop</filename>.  We add the instances we need:
 423
 424    <programlisting>
 425     INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy)
 426         SELECT am.oid, opcl.oid, c.opoid, 1
 427         FROM pg_am am, pg_opclass opcl, complex_ops_tmp c
 428         WHERE amname = 'btree' AND
 429             opcname = 'complex_abs_ops' AND
 430             c.oprname = '<';
 431    </programlisting>
 432
 433    Now do this for the other operators substituting for the "1" in the
 434    third line above and the "<" in the last line.  Note the order:
 435    "less than" is 1, "less than or equal" is 2, "equal" is 3, "greater
 436    than or equal" is 4, and "greater than" is 5.
 437   </para>
 438
 439   <para>
 440    The next step is registration of the "support routine" previously
 441    described in our discussion of <filename>pg_am</filename>.  The
 442    <filename>oid</filename> of this support routine is stored in the
 443    <filename>pg_amproc</filename> class, keyed by the access method
 444    <filename>oid</filename> and the operator class <filename>oid</filename>.
 445    First, we need to register the function in
 446    <productname>Postgres</productname> (recall that we put the
 447    <acronym>C</acronym> code that implements this routine in the bottom of
 448    the file in which we implemented the operator routines):
 449
 450    <programlisting>
 451     CREATE FUNCTION complex_abs_cmp(complex, complex)
 452      RETURNS int4
 453      AS 'PGROOT/tutorial/obj/complex.so'
 454      LANGUAGE 'c';
 455
 456     SELECT oid, proname FROM pg_proc
 457      WHERE proname = 'complex_abs_cmp';
 458
 459   oid   |     proname
 460 --------+-----------------
 461  277997 | complex_abs_cmp
 462 (1 row)
 463    </programlisting>
 464
 465    (Again, your <filename>oid</filename> number will probably be different.)
 466    We can add the new instance as follows:
 467
 468    <programlisting>
 469     INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum)
 470         SELECT a.oid, b.oid, c.oid, 1
 471             FROM pg_am a, pg_opclass b, pg_proc c
 472             WHERE a.amname = 'btree' AND
 473                 b.opcname = 'complex_abs_ops' AND
 474                 c.proname = 'complex_abs_cmp';
 475    </programlisting>
 476   </para>
 477
 478   <para>
 479    And we're done!  (Whew.)  It should now be possible to create
 480    and use btree indexes on <filename>complex</filename> columns.
 481   </para>
 482
 483  </chapter>
 484
 485 <!-- Keep this comment at the end of the file
 486 Local variables:
 487 mode: sgml
 488 sgml-omittag:nil
 489 sgml-shorttag:t
 490 sgml-minimize-attributes:nil
 491 sgml-always-quote-attributes:t
 492 sgml-indent-step:1
 493 sgml-indent-data:t
 494 sgml-parent-document:nil
 495 sgml-default-dtd-file:"./reference.ced"
 496 sgml-exposed-tags:nil
 497 sgml-local-catalogs:"/usr/lib/sgml/catalog"
 498 sgml-local-ecat-files:nil
 499 End:
 500 -->