granicus.if.org Git - postgresql/blob - doc/src/sgml/spgist.sgml

   1 <!-- doc/src/sgml/spgist.sgml -->
   2
   3 <chapter id="SPGiST">
   4 <title>SP-GiST Indexes</title>
   5
   6    <indexterm>
   7     <primary>index</primary>
   8     <secondary>SP-GiST</secondary>
   9    </indexterm>
  10
  11 <sect1 id="spgist-intro">
  12  <title>Introduction</title>
  13
  14  <para>
  15   <acronym>SP-GiST</acronym> is an abbreviation for space-partitioned
  16   <acronym>GiST</acronym>.  <acronym>SP-GiST</acronym> supports partitioned
  17   search trees, which facilitate development of a wide range of different
  18   non-balanced data structures, such as quad-trees, k-d trees, and radix
  19   trees (tries).  The common feature of these structures is that they
  20   repeatedly divide the search space into partitions that need not be
  21   of equal size.  Searches that are well matched to the partitioning rule
  22   can be very fast.
  23  </para>
  24
  25  <para>
  26   These popular data structures were originally developed for in-memory
  27   usage.  In main memory, they are usually designed as a set of dynamically
  28   allocated nodes linked by pointers.  This is not suitable for direct
  29   storing on disk, since these chains of pointers can be rather long which
  30   would require too many disk accesses.  In contrast, disk-based data
  31   structures should have a high fanout to minimize I/O.  The challenge
  32   addressed by <acronym>SP-GiST</acronym> is to map search tree nodes to
  33   disk pages in such a way that a search need access only a few disk pages,
  34   even if it traverses many nodes.
  35  </para>
  36
  37  <para>
  38   Like <acronym>GiST</acronym>, <acronym>SP-GiST</acronym> is meant to allow
  39   the development of custom data types with the appropriate access methods,
  40   by an expert in the domain of the data type, rather than a database expert.
  41  </para>
  42
  43  <para>
  44   Some of the information here is derived from Purdue University's
  45   SP-GiST Indexing Project
  46   <ulink url="http://www.cs.purdue.edu/spgist/">web site</ulink>.
  47   The <acronym>SP-GiST</acronym> implementation in
  48   <productname>PostgreSQL</productname> is primarily maintained by Teodor
  49   Sigaev and Oleg Bartunov, and there is more information on their
  50   <!-- URL will be changed -->
  51   <ulink url="http://www.sai.msu.su/~megera/wiki/spgist_dev">web site</ulink>.
  52  </para>
  53
  54 </sect1>
  55
  56 <sect1 id="spgist-builtin-opclasses">
  57  <title>Built-in Operator Classes</title>
  58
  59  <para>
  60   The core <productname>PostgreSQL</> distribution
  61   includes the <acronym>SP-GiST</acronym> operator classes shown in
  62   <xref linkend="spgist-builtin-opclasses-table">.
  63  </para>
  64
  65   <table id="spgist-builtin-opclasses-table">
  66    <title>Built-in <acronym>SP-GiST</acronym> Operator Classes</title>
  67    <tgroup cols="3">
  68     <thead>
  69      <row>
  70       <entry>Name</entry>
  71       <entry>Indexed Data Type</entry>
  72       <entry>Indexable Operators</entry>
  73      </row>
  74     </thead>
  75     <tbody>
  76      <row>
  77       <entry><literal>kd_point_ops</></entry>
  78       <entry><type>point</></entry>
  79       <entry>
  80        <literal>&lt;&lt;</>
  81        <literal>&lt;@</>
  82        <literal>&lt;^</>
  83        <literal>&gt;&gt;</>
  84        <literal>&gt;^</>
  85        <literal>~=</>
  86       </entry>
  87      </row>
  88      <row>
  89       <entry><literal>quad_point_ops</></entry>
  90       <entry><type>point</></entry>
  91       <entry>
  92        <literal>&lt;&lt;</>
  93        <literal>&lt;@</>
  94        <literal>&lt;^</>
  95        <literal>&gt;&gt;</>
  96        <literal>&gt;^</>
  97        <literal>~=</>
  98       </entry>
  99      </row>
 100      <row>
 101       <entry><literal>range_ops</></entry>
 102       <entry>any range type</entry>
 103       <entry>
 104        <literal>&amp;&amp;</>
 105        <literal>&amp;&lt;</>
 106        <literal>&amp;&gt;</>
 107        <literal>-|-</>
 108        <literal>&lt;&lt;</>
 109        <literal>&lt;@</>
 110        <literal>=</>
 111        <literal>&gt;&gt;</>
 112        <literal>@&gt;</>
 113       </entry>
 114      </row>
 115      <row>
 116       <entry><literal>box_ops</></entry>
 117       <entry><type>box</></entry>
 118       <entry>
 119        <literal>&lt;&lt;</>
 120        <literal>&amp;&lt;</>
 121        <literal>&amp;&amp;</>
 122        <literal>&amp;&gt;</>
 123        <literal>&gt;&gt;</>
 124        <literal>~=</>
 125        <literal>@&gt;</>
 126        <literal>&lt;@</>
 127        <literal>&amp;&lt;|</>
 128        <literal>&lt;&lt;|</>
 129        <literal>|&gt;&gt;</literal>
 130        <literal>|&amp;&gt;</>
 131       </entry>
 132      </row>
 133      <row>
 134       <entry><literal>text_ops</></entry>
 135       <entry><type>text</></entry>
 136       <entry>
 137        <literal>&lt;</>
 138        <literal>&lt;=</>
 139        <literal>=</>
 140        <literal>&gt;</>
 141        <literal>&gt;=</>
 142        <literal>~&lt;=~</>
 143        <literal>~&lt;~</>
 144        <literal>~&gt;=~</>
 145        <literal>~&gt;~</>
 146       </entry>
 147      </row>
 148      <row>
 149       <entry><literal>inet_ops</></entry>
 150       <entry><type>inet</>, <type>cidr</></entry>
 151       <entry>
 152        <literal>&amp;&amp;</>
 153        <literal>&gt;&gt;</>
 154        <literal>&gt;&gt;=</>
 155        <literal>&gt;</>
 156        <literal>&gt;=</>
 157        <literal>&lt;&gt;</>
 158        <literal>&lt;&lt;</>
 159        <literal>&lt;&lt;=</>
 160        <literal>&lt;</>
 161        <literal>&lt;=</>
 162        <literal>=</>
 163       </entry>
 164      </row>
 165     </tbody>
 166    </tgroup>
 167   </table>
 168
 169  <para>
 170   Of the two operator classes for type <type>point</>,
 171   <literal>quad_point_ops</> is the default.  <literal>kd_point_ops</>
 172   supports the same operators but uses a different index data structure which
 173   may offer better performance in some applications.
 174  </para>
 175
 176 </sect1>
 177
 178 <sect1 id="spgist-extensibility">
 179  <title>Extensibility</title>
 180
 181  <para>
 182   <acronym>SP-GiST</acronym> offers an interface with a high level of
 183   abstraction, requiring the access method developer to implement only
 184   methods specific to a given data type. The <acronym>SP-GiST</acronym> core
 185   is responsible for efficient disk mapping and searching the tree structure.
 186   It also takes care of concurrency and logging considerations.
 187  </para>
 188
 189  <para>
 190   Leaf tuples of an <acronym>SP-GiST</acronym> tree contain values of the
 191   same data type as the indexed column.  Leaf tuples at the root level will
 192   always contain the original indexed data value, but leaf tuples at lower
 193   levels might contain only a compressed representation, such as a suffix.
 194   In that case the operator class support functions must be able to
 195   reconstruct the original value using information accumulated from the
 196   inner tuples that are passed through to reach the leaf level.
 197  </para>
 198
 199  <para>
 200   Inner tuples are more complex, since they are branching points in the
 201   search tree.  Each inner tuple contains a set of one or more
 202   <firstterm>nodes</>, which represent groups of similar leaf values.
 203   A node contains a downlink that leads either to another, lower-level inner
 204   tuple, or to a short list of leaf tuples that all lie on the same index page.
 205   Each node normally has a <firstterm>label</> that describes it; for example,
 206   in a radix tree the node label could be the next character of the string
 207   value.  (Alternatively, an operator class can omit the node labels, if it
 208   works with a fixed set of nodes for all inner tuples;
 209   see <xref linkend="spgist-null-labels">.)
 210   Optionally, an inner tuple can have a <firstterm>prefix</> value
 211   that describes all its members.  In a radix tree this could be the common
 212   prefix of the represented strings.  The prefix value is not necessarily
 213   really a prefix, but can be any data needed by the operator class;
 214   for example, in a quad-tree it can store the central point that the four
 215   quadrants are measured with respect to.  A quad-tree inner tuple would
 216   then also contain four nodes corresponding to the quadrants around this
 217   central point.
 218  </para>
 219
 220  <para>
 221   Some tree algorithms require knowledge of level (or depth) of the current
 222   tuple, so the <acronym>SP-GiST</acronym> core provides the possibility for
 223   operator classes to manage level counting while descending the tree.
 224   There is also support for incrementally reconstructing the represented
 225   value when that is needed, and for passing down additional data (called
 226   <firstterm>traverse values</>) during a tree descent.
 227  </para>
 228
 229  <note>
 230   <para>
 231    The <acronym>SP-GiST</acronym> core code takes care of null entries.
 232    Although <acronym>SP-GiST</acronym> indexes do store entries for nulls
 233    in indexed columns, this is hidden from the index operator class code:
 234    no null index entries or search conditions will ever be passed to the
 235    operator class methods.  (It is assumed that <acronym>SP-GiST</acronym>
 236    operators are strict and so cannot succeed for null values.)  Null values
 237    are therefore not discussed further here.
 238   </para>
 239  </note>
 240
 241  <para>
 242   There are five user-defined methods that an index operator class for
 243   <acronym>SP-GiST</acronym> must provide.  All five follow the convention
 244   of accepting two <type>internal</> arguments, the first of which is a
 245   pointer to a C struct containing input values for the support method,
 246   while the second argument is a pointer to a C struct where output values
 247   must be placed.  Four of the methods just return <type>void</>, since
 248   all their results appear in the output struct; but
 249   <function>leaf_consistent</> additionally returns a <type>boolean</> result.
 250   The methods must not modify any fields of their input structs.  In all
 251   cases, the output struct is initialized to zeroes before calling the
 252   user-defined method.
 253  </para>
 254
 255  <para>
 256   The five user-defined methods are:
 257  </para>
 258
 259  <variablelist>
 260     <varlistentry>
 261      <term><function>config</></term>
 262      <listitem>
 263       <para>
 264        Returns static information about the index implementation, including
 265        the data type OIDs of the prefix and node label data types.
 266       </para>
 267      <para>
 268       The <acronym>SQL</> declaration of the function must look like this:
 269 <programlisting>
 270 CREATE FUNCTION my_config(internal, internal) RETURNS void ...
 271 </programlisting>
 272       The first argument is a pointer to a <structname>spgConfigIn</>
 273       C struct, containing input data for the function.
 274       The second argument is a pointer to a <structname>spgConfigOut</>
 275       C struct, which the function must fill with result data.
 276 <programlisting>
 277 typedef struct spgConfigIn
 278 {
 279     Oid         attType;        /* Data type to be indexed */
 280 } spgConfigIn;
 281
 282 typedef struct spgConfigOut
 283 {
 284     Oid         prefixType;     /* Data type of inner-tuple prefixes */
 285     Oid         labelType;      /* Data type of inner-tuple node labels */
 286     bool        canReturnData;  /* Opclass can reconstruct original data */
 287     bool        longValuesOK;   /* Opclass can cope with values &gt; 1 page */
 288 } spgConfigOut;
 289 </programlisting>
 290
 291       <structfield>attType</> is passed in order to support polymorphic
 292       index operator classes; for ordinary fixed-data-type operator classes, it
 293       will always have the same value and so can be ignored.
 294      </para>
 295
 296      <para>
 297       For operator classes that do not use prefixes,
 298       <structfield>prefixType</> can be set to <literal>VOIDOID</>.
 299       Likewise, for operator classes that do not use node labels,
 300       <structfield>labelType</> can be set to <literal>VOIDOID</>.
 301       <structfield>canReturnData</> should be set true if the operator class
 302       is capable of reconstructing the originally-supplied index value.
 303       <structfield>longValuesOK</> should be set true only when the
 304       <structfield>attType</> is of variable length and the operator
 305       class is capable of segmenting long values by repeated suffixing
 306       (see <xref linkend="spgist-limits">).
 307      </para>
 308      </listitem>
 309     </varlistentry>
 310
 311     <varlistentry>
 312      <term><function>choose</></term>
 313      <listitem>
 314       <para>
 315         Chooses a method for inserting a new value into an inner tuple.
 316       </para>
 317
 318      <para>
 319       The <acronym>SQL</> declaration of the function must look like this:
 320 <programlisting>
 321 CREATE FUNCTION my_choose(internal, internal) RETURNS void ...
 322 </programlisting>
 323       The first argument is a pointer to a <structname>spgChooseIn</>
 324       C struct, containing input data for the function.
 325       The second argument is a pointer to a <structname>spgChooseOut</>
 326       C struct, which the function must fill with result data.
 327 <programlisting>
 328 typedef struct spgChooseIn
 329 {
 330     Datum       datum;          /* original datum to be indexed */
 331     Datum       leafDatum;      /* current datum to be stored at leaf */
 332     int         level;          /* current level (counting from zero) */
 333
 334     /* Data from current inner tuple */
 335     bool        allTheSame;     /* tuple is marked all-the-same? */
 336     bool        hasPrefix;      /* tuple has a prefix? */
 337     Datum       prefixDatum;    /* if so, the prefix value */
 338     int         nNodes;         /* number of nodes in the inner tuple */
 339     Datum      *nodeLabels;     /* node label values (NULL if none) */
 340 } spgChooseIn;
 341
 342 typedef enum spgChooseResultType
 343 {
 344     spgMatchNode = 1,           /* descend into existing node */
 345     spgAddNode,                 /* add a node to the inner tuple */
 346     spgSplitTuple               /* split inner tuple (change its prefix) */
 347 } spgChooseResultType;
 348
 349 typedef struct spgChooseOut
 350 {
 351     spgChooseResultType resultType;     /* action code, see above */
 352     union
 353     {
 354         struct                  /* results for spgMatchNode */
 355         {
 356             int         nodeN;      /* descend to this node (index from 0) */
 357             int         levelAdd;   /* increment level by this much */
 358             Datum       restDatum;  /* new leaf datum */
 359         }           matchNode;
 360         struct                  /* results for spgAddNode */
 361         {
 362             Datum       nodeLabel;  /* new node's label */
 363             int         nodeN;      /* where to insert it (index from 0) */
 364         }           addNode;
 365         struct                  /* results for spgSplitTuple */
 366         {
 367             /* Info to form new upper-level inner tuple with one child tuple */
 368             bool        prefixHasPrefix;    /* tuple should have a prefix? */
 369             Datum       prefixPrefixDatum;  /* if so, its value */
 370             int         prefixNNodes;       /* number of nodes */
 371             Datum      *prefixNodeLabels;   /* their labels (or NULL for
 372                                              * no labels) */
 373             int         childNodeN;         /* which node gets child tuple */
 374
 375             /* Info to form new lower-level inner tuple with all old nodes */
 376             bool        postfixHasPrefix;   /* tuple should have a prefix? */
 377             Datum       postfixPrefixDatum; /* if so, its value */
 378         }           splitTuple;
 379     }           result;
 380 } spgChooseOut;
 381 </programlisting>
 382
 383        <structfield>datum</> is the original datum that was to be inserted
 384        into the index.
 385        <structfield>leafDatum</> is initially the same as
 386        <structfield>datum</>, but can change at lower levels of the tree
 387        if the <function>choose</function> or <function>picksplit</function>
 388        methods change it.  When the insertion search reaches a leaf page,
 389        the current value of <structfield>leafDatum</> is what will be stored
 390        in the newly created leaf tuple.
 391        <structfield>level</> is the current inner tuple's level, starting at
 392        zero for the root level.
 393        <structfield>allTheSame</> is true if the current inner tuple is
 394        marked as containing multiple equivalent nodes
 395        (see <xref linkend="spgist-all-the-same">).
 396        <structfield>hasPrefix</> is true if the current inner tuple contains
 397        a prefix; if so,
 398        <structfield>prefixDatum</> is its value.
 399        <structfield>nNodes</> is the number of child nodes contained in the
 400        inner tuple, and
 401        <structfield>nodeLabels</> is an array of their label values, or
 402        NULL if there are no labels.
 403       </para>
 404
 405       <para>
 406        The <function>choose</function> function can determine either that
 407        the new value matches one of the existing child nodes, or that a new
 408        child node must be added, or that the new value is inconsistent with
 409        the tuple prefix and so the inner tuple must be split to create a
 410        less restrictive prefix.
 411       </para>
 412
 413       <para>
 414        If the new value matches one of the existing child nodes,
 415        set <structfield>resultType</> to <literal>spgMatchNode</>.
 416        Set <structfield>nodeN</> to the index (from zero) of that node in
 417        the node array.
 418        Set <structfield>levelAdd</> to the increment in
 419        <structfield>level</> caused by descending through that node,
 420        or leave it as zero if the operator class does not use levels.
 421        Set <structfield>restDatum</> to equal <structfield>datum</>
 422        if the operator class does not modify datums from one level to the
 423        next, or otherwise set it to the modified value to be used as
 424        <structfield>leafDatum</> at the next level.
 425       </para>
 426
 427       <para>
 428        If a new child node must be added,
 429        set <structfield>resultType</> to <literal>spgAddNode</>.
 430        Set <structfield>nodeLabel</> to the label to be used for the new
 431        node, and set <structfield>nodeN</> to the index (from zero) at which
 432        to insert the node in the node array.
 433        After the node has been added, the <function>choose</function>
 434        function will be called again with the modified inner tuple;
 435        that call should result in an <literal>spgMatchNode</> result.
 436       </para>
 437
 438       <para>
 439        If the new value is inconsistent with the tuple prefix,
 440        set <structfield>resultType</> to <literal>spgSplitTuple</>.
 441        This action moves all the existing nodes into a new lower-level
 442        inner tuple, and replaces the existing inner tuple with a tuple
 443        having a single downlink pointing to the new lower-level inner tuple.
 444        Set <structfield>prefixHasPrefix</> to indicate whether the new
 445        upper tuple should have a prefix, and if so set
 446        <structfield>prefixPrefixDatum</> to the prefix value.  This new
 447        prefix value must be sufficiently less restrictive than the original
 448        to accept the new value to be indexed.
 449        Set <structfield>prefixNNodes</> to the number of nodes needed in the
 450        new tuple, and set <structfield>prefixNodeLabels</> to a palloc'd array
 451        holding their labels, or to NULL if node labels are not required.
 452        Note that the total size of the new upper tuple must be no more
 453        than the total size of the tuple it is replacing; this constrains
 454        the lengths of the new prefix and new labels.
 455        Set <structfield>childNodeN</> to the index (from zero) of the node
 456        that will downlink to the new lower-level inner tuple.
 457        Set <structfield>postfixHasPrefix</> to indicate whether the new
 458        lower-level inner tuple should have a prefix, and if so set
 459        <structfield>postfixPrefixDatum</> to the prefix value.  The
 460        combination of these two prefixes and the downlink node's label
 461        (if any) must have the same meaning as the original prefix, because
 462        there is no opportunity to alter the node labels that are moved to
 463        the new lower-level tuple, nor to change any child index entries.
 464        After the node has been split, the <function>choose</function>
 465        function will be called again with the replacement inner tuple.
 466        That call may return an <literal>spgAddNode</> result, if no suitable
 467        node was created by the <literal>spgSplitTuple</> action.  Eventually
 468        <function>choose</function> must return <literal>spgMatchNode</> to
 469        allow the insertion to descend to the next level.
 470       </para>
 471      </listitem>
 472     </varlistentry>
 473
 474     <varlistentry>
 475      <term><function>picksplit</></term>
 476      <listitem>
 477       <para>
 478        Decides how to create a new inner tuple over a set of leaf tuples.
 479       </para>
 480
 481       <para>
 482         The <acronym>SQL</> declaration of the function must look like this:
 483 <programlisting>
 484 CREATE FUNCTION my_picksplit(internal, internal) RETURNS void ...
 485 </programlisting>
 486       The first argument is a pointer to a <structname>spgPickSplitIn</>
 487       C struct, containing input data for the function.
 488       The second argument is a pointer to a <structname>spgPickSplitOut</>
 489       C struct, which the function must fill with result data.
 490 <programlisting>
 491 typedef struct spgPickSplitIn
 492 {
 493     int         nTuples;        /* number of leaf tuples */
 494     Datum      *datums;         /* their datums (array of length nTuples) */
 495     int         level;          /* current level (counting from zero) */
 496 } spgPickSplitIn;
 497
 498 typedef struct spgPickSplitOut
 499 {
 500     bool        hasPrefix;      /* new inner tuple should have a prefix? */
 501     Datum       prefixDatum;    /* if so, its value */
 502
 503     int         nNodes;         /* number of nodes for new inner tuple */
 504     Datum      *nodeLabels;     /* their labels (or NULL for no labels) */
 505
 506     int        *mapTuplesToNodes;   /* node index for each leaf tuple */
 507     Datum      *leafTupleDatums;    /* datum to store in each new leaf tuple */
 508 } spgPickSplitOut;
 509 </programlisting>
 510
 511        <structfield>nTuples</> is the number of leaf tuples provided.
 512        <structfield>datums</> is an array of their datum values.
 513        <structfield>level</> is the current level that all the leaf tuples
 514        share, which will become the level of the new inner tuple.
 515       </para>
 516
 517       <para>
 518        Set <structfield>hasPrefix</> to indicate whether the new inner
 519        tuple should have a prefix, and if so set
 520        <structfield>prefixDatum</> to the prefix value.
 521        Set <structfield>nNodes</> to indicate the number of nodes that
 522        the new inner tuple will contain, and
 523        set <structfield>nodeLabels</> to an array of their label values,
 524        or to NULL if node labels are not required.
 525        Set <structfield>mapTuplesToNodes</> to an array that gives the index
 526        (from zero) of the node that each leaf tuple should be assigned to.
 527        Set <structfield>leafTupleDatums</> to an array of the values to
 528        be stored in the new leaf tuples (these will be the same as the
 529        input <structfield>datums</> if the operator class does not modify
 530        datums from one level to the next).
 531        Note that the <function>picksplit</> function is
 532        responsible for palloc'ing the
 533        <structfield>nodeLabels</>, <structfield>mapTuplesToNodes</> and
 534        <structfield>leafTupleDatums</> arrays.
 535       </para>
 536
 537       <para>
 538        If more than one leaf tuple is supplied, it is expected that the
 539        <function>picksplit</> function will classify them into more than
 540        one node; otherwise it is not possible to split the leaf tuples
 541        across multiple pages, which is the ultimate purpose of this
 542        operation.  Therefore, if the <function>picksplit</> function
 543        ends up placing all the leaf tuples in the same node, the core
 544        SP-GiST code will override that decision and generate an inner
 545        tuple in which the leaf tuples are assigned at random to several
 546        identically-labeled nodes.  Such a tuple is marked
 547        <literal>allTheSame</> to signify that this has happened.  The
 548        <function>choose</> and <function>inner_consistent</> functions
 549        must take suitable care with such inner tuples.
 550        See <xref linkend="spgist-all-the-same"> for more information.
 551       </para>
 552
 553       <para>
 554        <function>picksplit</> can be applied to a single leaf tuple only
 555        in the case that the <function>config</> function set
 556        <structfield>longValuesOK</> to true and a larger-than-a-page input
 557        value has been supplied.  In this case the point of the operation is
 558        to strip off a prefix and produce a new, shorter leaf datum value.
 559        The call will be repeated until a leaf datum short enough to fit on
 560        a page has been produced.  See <xref linkend="spgist-limits"> for
 561        more information.
 562       </para>
 563      </listitem>
 564     </varlistentry>
 565
 566     <varlistentry>
 567      <term><function>inner_consistent</></term>
 568      <listitem>
 569       <para>
 570        Returns set of nodes (branches) to follow during tree search.
 571       </para>
 572
 573       <para>
 574        The <acronym>SQL</> declaration of the function must look like this:
 575 <programlisting>
 576 CREATE FUNCTION my_inner_consistent(internal, internal) RETURNS void ...
 577 </programlisting>
 578       The first argument is a pointer to a <structname>spgInnerConsistentIn</>
 579       C struct, containing input data for the function.
 580       The second argument is a pointer to a <structname>spgInnerConsistentOut</>
 581       C struct, which the function must fill with result data.
 582
 583 <programlisting>
 584 typedef struct spgInnerConsistentIn
 585 {
 586     ScanKey     scankeys;       /* array of operators and comparison values */
 587     int         nkeys;          /* length of array */
 588
 589     Datum       reconstructedValue;     /* value reconstructed at parent */
 590     void       *traversalValue; /* opclass-specific traverse value */
 591     MemoryContext traversalMemoryContext;   /* put new traverse values here */
 592     int         level;          /* current level (counting from zero) */
 593     bool        returnData;     /* original data must be returned? */
 594
 595     /* Data from current inner tuple */
 596     bool        allTheSame;     /* tuple is marked all-the-same? */
 597     bool        hasPrefix;      /* tuple has a prefix? */
 598     Datum       prefixDatum;    /* if so, the prefix value */
 599     int         nNodes;         /* number of nodes in the inner tuple */
 600     Datum      *nodeLabels;     /* node label values (NULL if none) */
 601 } spgInnerConsistentIn;
 602
 603 typedef struct spgInnerConsistentOut
 604 {
 605     int         nNodes;         /* number of child nodes to be visited */
 606     int        *nodeNumbers;    /* their indexes in the node array */
 607     int        *levelAdds;      /* increment level by this much for each */
 608     Datum      *reconstructedValues;    /* associated reconstructed values */
 609     void      **traversalValues;        /* opclass-specific traverse values */
 610 } spgInnerConsistentOut;
 611 </programlisting>
 612
 613        The array <structfield>scankeys</>, of length <structfield>nkeys</>,
 614        describes the index search condition(s).  These conditions are
 615        combined with AND &mdash; only index entries that satisfy all of
 616        them are interesting.  (Note that <structfield>nkeys</> = 0 implies
 617        that all index entries satisfy the query.)  Usually the consistent
 618        function only cares about the <structfield>sk_strategy</> and
 619        <structfield>sk_argument</> fields of each array entry, which
 620        respectively give the indexable operator and comparison value.
 621        In particular it is not necessary to check <structfield>sk_flags</> to
 622        see if the comparison value is NULL, because the SP-GiST core code
 623        will filter out such conditions.
 624        <structfield>reconstructedValue</> is the value reconstructed for the
 625        parent tuple; it is <literal>(Datum) 0</> at the root level or if the
 626        <function>inner_consistent</> function did not provide a value at the
 627        parent level.
 628        <structfield>traversalValue</> is a pointer to any traverse data
 629        passed down from the previous call of <function>inner_consistent</>
 630        on the parent index tuple, or NULL at the root level.
 631        <structfield>traversalMemoryContext</> is the memory context in which
 632        to store output traverse values (see below).
 633        <structfield>level</> is the current inner tuple's level, starting at
 634        zero for the root level.
 635        <structfield>returnData</> is <literal>true</> if reconstructed data is
 636        required for this query; this will only be so if the
 637        <function>config</> function asserted <structfield>canReturnData</>.
 638        <structfield>allTheSame</> is true if the current inner tuple is
 639        marked <quote>all-the-same</>; in this case all the nodes have the
 640        same label (if any) and so either all or none of them match the query
 641        (see <xref linkend="spgist-all-the-same">).
 642        <structfield>hasPrefix</> is true if the current inner tuple contains
 643        a prefix; if so,
 644        <structfield>prefixDatum</> is its value.
 645        <structfield>nNodes</> is the number of child nodes contained in the
 646        inner tuple, and
 647        <structfield>nodeLabels</> is an array of their label values, or
 648        NULL if the nodes do not have labels.
 649       </para>
 650
 651       <para>
 652        <structfield>nNodes</> must be set to the number of child nodes that
 653        need to be visited by the search, and
 654        <structfield>nodeNumbers</> must be set to an array of their indexes.
 655        If the operator class keeps track of levels, set
 656        <structfield>levelAdds</> to an array of the level increments
 657        required when descending to each node to be visited.  (Often these
 658        increments will be the same for all the nodes, but that's not
 659        necessarily so, so an array is used.)
 660        If value reconstruction is needed, set
 661        <structfield>reconstructedValues</> to an array of the values
 662        reconstructed for each child node to be visited; otherwise, leave
 663        <structfield>reconstructedValues</> as NULL.
 664        If it is desired to pass down additional out-of-band information
 665        (<quote>traverse values</>) to lower levels of the tree search,
 666        set <structfield>traversalValues</> to an array of the appropriate
 667        traverse values, one for each child node to be visited; otherwise,
 668        leave <structfield>traversalValues</> as NULL.
 669        Note that the <function>inner_consistent</> function is
 670        responsible for palloc'ing the
 671        <structfield>nodeNumbers</>, <structfield>levelAdds</>,
 672        <structfield>reconstructedValues</>, and
 673        <structfield>traversalValues</> arrays in the current memory context.
 674        However, any output traverse values pointed to by
 675        the <structfield>traversalValues</> array should be allocated
 676        in <structfield>traversalMemoryContext</>.
 677        Each traverse value must be a single palloc'd chunk.
 678       </para>
 679      </listitem>
 680     </varlistentry>
 681
 682     <varlistentry>
 683      <term><function>leaf_consistent</></term>
 684      <listitem>
 685       <para>
 686        Returns true if a leaf tuple satisfies a query.
 687       </para>
 688
 689       <para>
 690        The <acronym>SQL</> declaration of the function must look like this:
 691 <programlisting>
 692 CREATE FUNCTION my_leaf_consistent(internal, internal) RETURNS bool ...
 693 </programlisting>
 694       The first argument is a pointer to a <structname>spgLeafConsistentIn</>
 695       C struct, containing input data for the function.
 696       The second argument is a pointer to a <structname>spgLeafConsistentOut</>
 697       C struct, which the function must fill with result data.
 698 <programlisting>
 699 typedef struct spgLeafConsistentIn
 700 {
 701     ScanKey     scankeys;       /* array of operators and comparison values */
 702     int         nkeys;          /* length of array */
 703
 704     Datum       reconstructedValue;     /* value reconstructed at parent */
 705     void       *traversalValue; /* opclass-specific traverse value */
 706     int         level;          /* current level (counting from zero) */
 707     bool        returnData;     /* original data must be returned? */
 708
 709     Datum       leafDatum;      /* datum in leaf tuple */
 710 } spgLeafConsistentIn;
 711
 712 typedef struct spgLeafConsistentOut
 713 {
 714     Datum       leafValue;      /* reconstructed original data, if any */
 715     bool        recheck;        /* set true if operator must be rechecked */
 716 } spgLeafConsistentOut;
 717 </programlisting>
 718
 719        The array <structfield>scankeys</>, of length <structfield>nkeys</>,
 720        describes the index search condition(s).  These conditions are
 721        combined with AND &mdash; only index entries that satisfy all of
 722        them satisfy the query.  (Note that <structfield>nkeys</> = 0 implies
 723        that all index entries satisfy the query.)  Usually the consistent
 724        function only cares about the <structfield>sk_strategy</> and
 725        <structfield>sk_argument</> fields of each array entry, which
 726        respectively give the indexable operator and comparison value.
 727        In particular it is not necessary to check <structfield>sk_flags</> to
 728        see if the comparison value is NULL, because the SP-GiST core code
 729        will filter out such conditions.
 730        <structfield>reconstructedValue</> is the value reconstructed for the
 731        parent tuple; it is <literal>(Datum) 0</> at the root level or if the
 732        <function>inner_consistent</> function did not provide a value at the
 733        parent level.
 734        <structfield>traversalValue</> is a pointer to any traverse data
 735        passed down from the previous call of <function>inner_consistent</>
 736        on the parent index tuple, or NULL at the root level.
 737        <structfield>level</> is the current leaf tuple's level, starting at
 738        zero for the root level.
 739        <structfield>returnData</> is <literal>true</> if reconstructed data is
 740        required for this query; this will only be so if the
 741        <function>config</> function asserted <structfield>canReturnData</>.
 742        <structfield>leafDatum</> is the key value stored in the current
 743        leaf tuple.
 744       </para>
 745
 746       <para>
 747        The function must return <literal>true</> if the leaf tuple matches the
 748        query, or <literal>false</> if not.  In the <literal>true</> case,
 749        if <structfield>returnData</> is <literal>true</> then
 750        <structfield>leafValue</> must be set to the value originally supplied
 751        to be indexed for this leaf tuple.  Also,
 752        <structfield>recheck</> may be set to <literal>true</> if the match
 753        is uncertain and so the operator(s) must be re-applied to the actual
 754        heap tuple to verify the match.
 755       </para>
 756      </listitem>
 757     </varlistentry>
 758    </variablelist>
 759
 760   <para>
 761    All the SP-GiST support methods are normally called in a short-lived
 762    memory context; that is, <varname>CurrentMemoryContext</> will be reset
 763    after processing of each tuple.  It is therefore not very important to
 764    worry about pfree'ing everything you palloc.  (The <function>config</>
 765    method is an exception: it should try to avoid leaking memory.  But
 766    usually the <function>config</> method need do nothing but assign
 767    constants into the passed parameter struct.)
 768   </para>
 769
 770   <para>
 771    If the indexed column is of a collatable data type, the index collation
 772    will be passed to all the support methods, using the standard
 773    <function>PG_GET_COLLATION()</> mechanism.
 774   </para>
 775
 776 </sect1>
 777
 778 <sect1 id="spgist-implementation">
 779  <title>Implementation</title>
 780
 781   <para>
 782    This section covers implementation details and other tricks that are
 783    useful for implementers of <acronym>SP-GiST</acronym> operator classes to
 784    know.
 785   </para>
 786
 787  <sect2 id="spgist-limits">
 788   <title>SP-GiST Limits</title>
 789
 790   <para>
 791    Individual leaf tuples and inner tuples must fit on a single index page
 792    (8kB by default).  Therefore, when indexing values of variable-length
 793    data types, long values can only be supported by methods such as radix
 794    trees, in which each level of the tree includes a prefix that is short
 795    enough to fit on a page, and the final leaf level includes a suffix also
 796    short enough to fit on a page.  The operator class should set
 797    <structfield>longValuesOK</> to TRUE only if it is prepared to arrange for
 798    this to happen.  Otherwise, the <acronym>SP-GiST</acronym> core will
 799    reject any request to index a value that is too large to fit
 800    on an index page.
 801   </para>
 802
 803   <para>
 804    Likewise, it is the operator class's responsibility that inner tuples
 805    do not grow too large to fit on an index page; this limits the number
 806    of child nodes that can be used in one inner tuple, as well as the
 807    maximum size of a prefix value.
 808   </para>
 809
 810   <para>
 811    Another limitation is that when an inner tuple's node points to a set
 812    of leaf tuples, those tuples must all be in the same index page.
 813    (This is a design decision to reduce seeking and save space in the
 814    links that chain such tuples together.)  If the set of leaf tuples
 815    grows too large for a page, a split is performed and an intermediate
 816    inner tuple is inserted.  For this to fix the problem, the new inner
 817    tuple <emphasis>must</> divide the set of leaf values into more than one
 818    node group.  If the operator class's <function>picksplit</> function
 819    fails to do that, the <acronym>SP-GiST</acronym> core resorts to
 820    extraordinary measures described in <xref linkend="spgist-all-the-same">.
 821   </para>
 822  </sect2>
 823
 824  <sect2 id="spgist-null-labels">
 825   <title>SP-GiST Without Node Labels</title>
 826
 827   <para>
 828    Some tree algorithms use a fixed set of nodes for each inner tuple;
 829    for example, in a quad-tree there are always exactly four nodes
 830    corresponding to the four quadrants around the inner tuple's centroid
 831    point.  In such a case the code typically works with the nodes by
 832    number, and there is no need for explicit node labels.  To suppress
 833    node labels (and thereby save some space), the <function>picksplit</>
 834    function can return NULL for the <structfield>nodeLabels</> array,
 835    and likewise the <function>choose</> function can return NULL for
 836    the <structfield>prefixNodeLabels</> array during
 837    a <literal>spgSplitTuple</> action.
 838    This will in turn result in <structfield>nodeLabels</> being NULL during
 839    subsequent calls to <function>choose</> and <function>inner_consistent</>.
 840    In principle, node labels could be used for some inner tuples and omitted
 841    for others in the same index.
 842   </para>
 843
 844   <para>
 845    When working with an inner tuple having unlabeled nodes, it is an error
 846    for <function>choose</> to return <literal>spgAddNode</>, since the set
 847    of nodes is supposed to be fixed in such cases.
 848   </para>
 849  </sect2>
 850
 851  <sect2 id="spgist-all-the-same">
 852   <title><quote>All-the-same</> Inner Tuples</title>
 853
 854   <para>
 855    The <acronym>SP-GiST</acronym> core can override the results of the
 856    operator class's <function>picksplit</> function when
 857    <function>picksplit</> fails to divide the supplied leaf values into
 858    at least two node categories.  When this happens, the new inner tuple
 859    is created with multiple nodes that each have the same label (if any)
 860    that <function>picksplit</> gave to the one node it did use, and the
 861    leaf values are divided at random among these equivalent nodes.
 862    The <literal>allTheSame</> flag is set on the inner tuple to warn the
 863    <function>choose</> and <function>inner_consistent</> functions that the
 864    tuple does not have the node set that they might otherwise expect.
 865   </para>
 866
 867   <para>
 868    When dealing with an <literal>allTheSame</> tuple, a <function>choose</>
 869    result of <literal>spgMatchNode</> is interpreted to mean that the new
 870    value can be assigned to any of the equivalent nodes; the core code will
 871    ignore the supplied  <structfield>nodeN</> value and descend into one
 872    of the nodes at random (so as to keep the tree balanced).  It is an
 873    error for <function>choose</> to return <literal>spgAddNode</>, since
 874    that would make the nodes not all equivalent; the
 875    <literal>spgSplitTuple</> action must be used if the value to be inserted
 876    doesn't match the existing nodes.
 877   </para>
 878
 879   <para>
 880    When dealing with an <literal>allTheSame</> tuple, the
 881    <function>inner_consistent</> function should return either all or none
 882    of the nodes as targets for continuing the index search, since they are
 883    all equivalent.  This may or may not require any special-case code,
 884    depending on how much the <function>inner_consistent</> function normally
 885    assumes about the meaning of the nodes.
 886   </para>
 887  </sect2>
 888
 889 </sect1>
 890
 891 <sect1 id="spgist-examples">
 892  <title>Examples</title>
 893
 894  <para>
 895   The <productname>PostgreSQL</productname> source distribution includes
 896   several examples of index operator classes for <acronym>SP-GiST</acronym>,
 897   as described in <xref linkend="spgist-builtin-opclasses-table">.  Look
 898   into <filename>src/backend/access/spgist/</>
 899   and <filename>src/backend/utils/adt/</> to see the code.
 900  </para>
 901
 902 </sect1>
 903
 904 </chapter>