1 <!-- doc/src/sgml/gist.sgml -->
4 <title>GiST Indexes</title>
7 <primary>index</primary>
8 <secondary>GiST</secondary>
11 <sect1 id="gist-intro">
12 <title>Introduction</title>
15 <acronym>GiST</acronym> stands for Generalized Search Tree. It is a
16 balanced, tree-structured access method, that acts as a base template in
17 which to implement arbitrary indexing schemes. B-trees, R-trees and many
18 other indexing schemes can be implemented in <acronym>GiST</acronym>.
22 One advantage of <acronym>GiST</acronym> is that it allows the development
23 of custom data types with the appropriate access methods, by
24 an expert in the domain of the data type, rather than a database expert.
28 Some of the information here is derived from the University of California
29 at Berkeley's GiST Indexing Project
30 <ulink url="http://gist.cs.berkeley.edu/">web site</ulink> and
31 Marcel Kornacker's thesis,
32 <ulink url="http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/access-methods-for-next-generation.pdf.gz">
33 Access Methods for Next-Generation Database Systems</ulink>.
34 The <acronym>GiST</acronym>
35 implementation in <productname>PostgreSQL</productname> is primarily
36 maintained by Teodor Sigaev and Oleg Bartunov, and there is more
38 <ulink url="http://www.sai.msu.su/~megera/postgres/gist/">web site</ulink>.
43 <sect1 id="gist-builtin-opclasses">
44 <title>Built-in Operator Classes</title>
47 The core <productname>PostgreSQL</> distribution
48 includes the <acronym>GiST</acronym> operator classes shown in
49 <xref linkend="gist-builtin-opclasses-table">.
50 (Some of the optional modules described in <xref linkend="contrib">
51 provide additional <acronym>GiST</acronym> operator classes.)
54 <table id="gist-builtin-opclasses-table">
55 <title>Built-in <acronym>GiST</acronym> Operator Classes</title>
60 <entry>Indexed Data Type</entry>
61 <entry>Indexable Operators</entry>
62 <entry>Ordering Operators</entry>
67 <entry><literal>box_ops</></entry>
68 <entry><type>box</></entry>
70 <literal>&&</>
73 <literal>&<|</>
80 <literal>|&></>
89 <entry><literal>circle_ops</></entry>
90 <entry><type>circle</></entry>
92 <literal>&&</>
95 <literal>&<|</>
102 <literal>|&></>
103 <literal>|>></>
111 <entry><literal>inet_ops</></entry>
112 <entry><type>inet</>, <type>cidr</></entry>
114 <literal>&&</>
116 <literal>>>=</>
121 <literal><<=</>
130 <entry><literal>point_ops</></entry>
131 <entry><type>point</></entry>
143 <literal><-></>
147 <entry><literal>poly_ops</></entry>
148 <entry><type>polygon</></entry>
150 <literal>&&</>
151 <literal>&></>
152 <literal>&<</>
153 <literal>&<|</>
156 <literal><<|</>
160 <literal>|&></>
161 <literal>|>></>
169 <entry><literal>range_ops</></entry>
170 <entry>any range type</entry>
172 <literal>&&</>
173 <literal>&></>
174 <literal>&<</>
187 <entry><literal>tsquery_ops</></entry>
188 <entry><type>tsquery</></entry>
197 <entry><literal>tsvector_ops</></entry>
198 <entry><type>tsvector</></entry>
210 For historical reasons, the <literal>inet_ops</> operator class is
211 not the default class for types <type>inet</> and <type>cidr</>.
212 To use it, mention the class name in <command>CREATE INDEX</>,
215 CREATE INDEX ON my_table USING gist (my_inet_column inet_ops);
221 <sect1 id="gist-extensibility">
222 <title>Extensibility</title>
225 Traditionally, implementing a new index access method meant a lot of
226 difficult work. It was necessary to understand the inner workings of the
227 database, such as the lock manager and Write-Ahead Log. The
228 <acronym>GiST</acronym> interface has a high level of abstraction,
229 requiring the access method implementer only to implement the semantics of
230 the data type being accessed. The <acronym>GiST</acronym> layer itself
231 takes care of concurrency, logging and searching the tree structure.
235 This extensibility should not be confused with the extensibility of the
236 other standard search trees in terms of the data they can handle. For
237 example, <productname>PostgreSQL</productname> supports extensible B-trees
238 and hash indexes. That means that you can use
239 <productname>PostgreSQL</productname> to build a B-tree or hash over any
240 data type you want. But B-trees only support range predicates
241 (<literal><</literal>, <literal>=</literal>, <literal>></literal>),
242 and hash indexes only support equality queries.
246 So if you index, say, an image collection with a
247 <productname>PostgreSQL</productname> B-tree, you can only issue queries
248 such as <quote>is imagex equal to imagey</quote>, <quote>is imagex less
249 than imagey</quote> and <quote>is imagex greater than imagey</quote>.
250 Depending on how you define <quote>equals</quote>, <quote>less than</quote>
251 and <quote>greater than</quote> in this context, this could be useful.
252 However, by using a <acronym>GiST</acronym> based index, you could create
253 ways to ask domain-specific questions, perhaps <quote>find all images of
254 horses</quote> or <quote>find all over-exposed images</quote>.
258 All it takes to get a <acronym>GiST</acronym> access method up and running
259 is to implement several user-defined methods, which define the behavior of
260 keys in the tree. Of course these methods have to be pretty fancy to
261 support fancy queries, but for all the standard queries (B-trees,
262 R-trees, etc.) they're relatively straightforward. In short,
263 <acronym>GiST</acronym> combines extensibility along with generality, code
264 reuse, and a clean interface.
268 There are seven methods that an index operator class for
269 <acronym>GiST</acronym> must provide, and an eighth that is optional.
270 Correctness of the index is ensured
271 by proper implementation of the <function>same</>, <function>consistent</>
272 and <function>union</> methods, while efficiency (size and speed) of the
273 index will depend on the <function>penalty</> and <function>picksplit</>
275 The remaining two basic methods are <function>compress</> and
276 <function>decompress</>, which allow an index to have internal tree data of
277 a different type than the data it indexes. The leaves are to be of the
278 indexed data type, while the other tree nodes can be of any C struct (but
279 you still have to follow <productname>PostgreSQL</> data type rules here,
280 see about <literal>varlena</> for variable sized data). If the tree's
281 internal data type exists at the SQL level, the <literal>STORAGE</> option
282 of the <command>CREATE OPERATOR CLASS</> command can be used.
283 The optional eighth method is <function>distance</>, which is needed
284 if the operator class wishes to support ordered scans (nearest-neighbor
290 <term><function>consistent</></term>
293 Given an index entry <literal>p</> and a query value <literal>q</>,
294 this function determines whether the index entry is
295 <quote>consistent</> with the query; that is, could the predicate
296 <quote><replaceable>indexed_column</>
297 <replaceable>indexable_operator</> <literal>q</></quote> be true for
298 any row represented by the index entry? For a leaf index entry this is
299 equivalent to testing the indexable condition, while for an internal
300 tree node this determines whether it is necessary to scan the subtree
301 of the index represented by the tree node. When the result is
302 <literal>true</>, a <literal>recheck</> flag must also be returned.
303 This indicates whether the predicate is certainly true or only possibly
304 true. If <literal>recheck</> = <literal>false</> then the index has
305 tested the predicate condition exactly, whereas if <literal>recheck</>
306 = <literal>true</> the row is only a candidate match. In that case the
307 system will automatically evaluate the
308 <replaceable>indexable_operator</> against the actual row value to see
309 if it is really a match. This convention allows
310 <acronym>GiST</acronym> to support both lossless and lossy index
315 The <acronym>SQL</> declaration of the function must look like this:
318 CREATE OR REPLACE FUNCTION my_consistent(internal, data_type, smallint, oid, internal)
324 And the matching code in the C module could then follow this skeleton:
327 Datum my_consistent(PG_FUNCTION_ARGS);
328 PG_FUNCTION_INFO_V1(my_consistent);
331 my_consistent(PG_FUNCTION_ARGS)
333 GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
334 data_type *query = PG_GETARG_DATA_TYPE_P(1);
335 StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
336 /* Oid subtype = PG_GETARG_OID(3); */
337 bool *recheck = (bool *) PG_GETARG_POINTER(4);
338 data_type *key = DatumGetDataType(entry->key);
342 * determine return value as a function of strategy, key and query.
344 * Use GIST_LEAF(entry) to know where you're called in the index tree,
345 * which comes handy when supporting the = operator for example (you could
346 * check for non empty union() in non-leaf nodes and equality in leaf
350 *recheck = true; /* or false if check is exact */
352 PG_RETURN_BOOL(retval);
356 Here, <varname>key</> is an element in the index and <varname>query</>
357 the value being looked up in the index. The <literal>StrategyNumber</>
358 parameter indicates which operator of your operator class is being
359 applied — it matches one of the operator numbers in the
360 <command>CREATE OPERATOR CLASS</> command. Depending on what operators
361 you have included in the class, the data type of <varname>query</> could
362 vary with the operator, but the above skeleton assumes it doesn't.
369 <term><function>union</></term>
372 This method consolidates information in the tree. Given a set of
373 entries, this function generates a new index entry that represents
374 all the given entries.
378 The <acronym>SQL</> declaration of the function must look like this:
381 CREATE OR REPLACE FUNCTION my_union(internal, internal)
387 And the matching code in the C module could then follow this skeleton:
390 Datum my_union(PG_FUNCTION_ARGS);
391 PG_FUNCTION_INFO_V1(my_union);
394 my_union(PG_FUNCTION_ARGS)
396 GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
397 GISTENTRY *ent = entryvec->vector;
404 numranges = entryvec->n;
405 tmp = DatumGetDataType(ent[0].key);
410 out = data_type_deep_copy(tmp);
412 PG_RETURN_DATA_TYPE_P(out);
415 for (i = 1; i < numranges; i++)
418 tmp = DatumGetDataType(ent[i].key);
419 out = my_union_implementation(out, tmp);
422 PG_RETURN_DATA_TYPE_P(out);
428 As you can see, in this skeleton we're dealing with a data type
429 where <literal>union(X, Y, Z) = union(union(X, Y), Z)</>. It's easy
430 enough to support data types where this is not the case, by
431 implementing the proper union algorithm in this
432 <acronym>GiST</> support method.
436 The <function>union</> implementation function should return a
437 pointer to newly <function>palloc()</>ed memory. You can't just
438 return whatever the input is.
444 <term><function>compress</></term>
447 Converts the data item into a format suitable for physical storage in
452 The <acronym>SQL</> declaration of the function must look like this:
455 CREATE OR REPLACE FUNCTION my_compress(internal)
461 And the matching code in the C module could then follow this skeleton:
464 Datum my_compress(PG_FUNCTION_ARGS);
465 PG_FUNCTION_INFO_V1(my_compress);
468 my_compress(PG_FUNCTION_ARGS)
470 GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
473 if (entry->leafkey)
475 /* replace entry->key with a compressed version */
476 compressed_data_type *compressed_data = palloc(sizeof(compressed_data_type));
478 /* fill *compressed_data from entry->key ... */
480 retval = palloc(sizeof(GISTENTRY));
481 gistentryinit(*retval, PointerGetDatum(compressed_data),
482 entry->rel, entry->page, entry->offset, FALSE);
486 /* typically we needn't do anything with non-leaf entries */
490 PG_RETURN_POINTER(retval);
496 You have to adapt <replaceable>compressed_data_type</> to the specific
497 type you're converting to in order to compress your leaf nodes, of
504 <term><function>decompress</></term>
507 The reverse of the <function>compress</function> method. Converts the
508 index representation of the data item into a format that can be
509 manipulated by the database.
513 The <acronym>SQL</> declaration of the function must look like this:
516 CREATE OR REPLACE FUNCTION my_decompress(internal)
522 And the matching code in the C module could then follow this skeleton:
525 Datum my_decompress(PG_FUNCTION_ARGS);
526 PG_FUNCTION_INFO_V1(my_decompress);
529 my_decompress(PG_FUNCTION_ARGS)
531 PG_RETURN_POINTER(PG_GETARG_POINTER(0));
535 The above skeleton is suitable for the case where no decompression
542 <term><function>penalty</></term>
545 Returns a value indicating the <quote>cost</quote> of inserting the new
546 entry into a particular branch of the tree. Items will be inserted
547 down the path of least <function>penalty</function> in the tree.
548 Values returned by <function>penalty</function> should be non-negative.
549 If a negative value is returned, it will be treated as zero.
553 The <acronym>SQL</> declaration of the function must look like this:
556 CREATE OR REPLACE FUNCTION my_penalty(internal, internal, internal)
559 LANGUAGE C STRICT; -- in some cases penalty functions need not be strict
562 And the matching code in the C module could then follow this skeleton:
565 Datum my_penalty(PG_FUNCTION_ARGS);
566 PG_FUNCTION_INFO_V1(my_penalty);
569 my_penalty(PG_FUNCTION_ARGS)
571 GISTENTRY *origentry = (GISTENTRY *) PG_GETARG_POINTER(0);
572 GISTENTRY *newentry = (GISTENTRY *) PG_GETARG_POINTER(1);
573 float *penalty = (float *) PG_GETARG_POINTER(2);
574 data_type *orig = DatumGetDataType(origentry->key);
575 data_type *new = DatumGetDataType(newentry->key);
577 *penalty = my_penalty_implementation(orig, new);
578 PG_RETURN_POINTER(penalty);
584 The <function>penalty</> function is crucial to good performance of
585 the index. It'll get used at insertion time to determine which branch
586 to follow when choosing where to add the new entry in the tree. At
587 query time, the more balanced the index, the quicker the lookup.
593 <term><function>picksplit</></term>
596 When an index page split is necessary, this function decides which
597 entries on the page are to stay on the old page, and which are to move
602 The <acronym>SQL</> declaration of the function must look like this:
605 CREATE OR REPLACE FUNCTION my_picksplit(internal, internal)
611 And the matching code in the C module could then follow this skeleton:
614 Datum my_picksplit(PG_FUNCTION_ARGS);
615 PG_FUNCTION_INFO_V1(my_picksplit);
618 my_picksplit(PG_FUNCTION_ARGS)
620 GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
621 OffsetNumber maxoff = entryvec->n - 1;
622 GISTENTRY *ent = entryvec->vector;
623 GIST_SPLITVEC *v = (GIST_SPLITVEC *) PG_GETARG_POINTER(1);
628 data_type *tmp_union;
631 GISTENTRY **raw_entryvec;
633 maxoff = entryvec->n - 1;
634 nbytes = (maxoff + 1) * sizeof(OffsetNumber);
636 v->spl_left = (OffsetNumber *) palloc(nbytes);
637 left = v->spl_left;
640 v->spl_right = (OffsetNumber *) palloc(nbytes);
641 right = v->spl_right;
642 v->spl_nright = 0;
647 /* Initialize the raw entry vector. */
648 raw_entryvec = (GISTENTRY **) malloc(entryvec->n * sizeof(void *));
649 for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
650 raw_entryvec[i] = &(entryvec->vector[i]);
652 for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
654 int real_index = raw_entryvec[i] - entryvec->vector;
656 tmp_union = DatumGetDataType(entryvec->vector[real_index].key);
657 Assert(tmp_union != NULL);
660 * Choose where to put the index entries and update unionL and unionR
661 * accordingly. Append the entries to either v_spl_left or
662 * v_spl_right, and care about the counters.
665 if (my_choice_is_left(unionL, curl, unionR, curr))
670 unionL = my_union_implementation(unionL, tmp_union);
684 v->spl_ldatum = DataTypeGetDatum(unionL);
685 v->spl_rdatum = DataTypeGetDatum(unionR);
686 PG_RETURN_POINTER(v);
692 Like <function>penalty</>, the <function>picksplit</> function
693 is crucial to good performance of the index. Designing suitable
694 <function>penalty</> and <function>picksplit</> implementations
695 is where the challenge of implementing well-performing
696 <acronym>GiST</> indexes lies.
702 <term><function>same</></term>
705 Returns true if two index entries are identical, false otherwise.
709 The <acronym>SQL</> declaration of the function must look like this:
712 CREATE OR REPLACE FUNCTION my_same(internal, internal, internal)
718 And the matching code in the C module could then follow this skeleton:
721 Datum my_same(PG_FUNCTION_ARGS);
722 PG_FUNCTION_INFO_V1(my_same);
725 my_same(PG_FUNCTION_ARGS)
727 prefix_range *v1 = PG_GETARG_PREFIX_RANGE_P(0);
728 prefix_range *v2 = PG_GETARG_PREFIX_RANGE_P(1);
729 bool *result = (bool *) PG_GETARG_POINTER(2);
731 *result = my_eq(v1, v2);
732 PG_RETURN_POINTER(result);
736 For historical reasons, the <function>same</> function doesn't
737 just return a Boolean result; instead it has to store the flag
738 at the location indicated by the third argument.
744 <term><function>distance</></term>
747 Given an index entry <literal>p</> and a query value <literal>q</>,
748 this function determines the index entry's
749 <quote>distance</> from the query value. This function must be
750 supplied if the operator class contains any ordering operators.
751 A query using the ordering operator will be implemented by returning
752 index entries with the smallest <quote>distance</> values first,
753 so the results must be consistent with the operator's semantics.
754 For a leaf index entry the result just represents the distance to
755 the index entry; for an internal tree node, the result must be the
756 smallest distance that any child entry could have.
760 The <acronym>SQL</> declaration of the function must look like this:
763 CREATE OR REPLACE FUNCTION my_distance(internal, data_type, smallint, oid)
769 And the matching code in the C module could then follow this skeleton:
772 Datum my_distance(PG_FUNCTION_ARGS);
773 PG_FUNCTION_INFO_V1(my_distance);
776 my_distance(PG_FUNCTION_ARGS)
778 GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
779 data_type *query = PG_GETARG_DATA_TYPE_P(1);
780 StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
781 /* Oid subtype = PG_GETARG_OID(3); */
782 data_type *key = DatumGetDataType(entry->key);
786 * determine return value as a function of strategy, key and query.
789 PG_RETURN_FLOAT8(retval);
793 The arguments to the <function>distance</> function are identical to
794 the arguments of the <function>consistent</> function, except that no
795 recheck flag is used. The distance to a leaf index entry must always
796 be determined exactly, since there is no way to re-order the tuples
797 once they are returned. Some approximation is allowed when determining
798 the distance to an internal tree node, so long as the result is never
799 greater than any child's actual distance. Thus, for example, distance
800 to a bounding box is usually sufficient in geometric applications. The
801 result value can be any finite <type>float8</> value. (Infinity and
802 minus infinity are used internally to handle cases such as nulls, so it
803 is not recommended that <function>distance</> functions return these
813 All the GiST support methods are normally called in short-lived memory
814 contexts; that is, <varname>CurrentMemoryContext</> will get reset after
815 each tuple is processed. It is therefore not very important to worry about
816 pfree'ing everything you palloc. However, in some cases it's useful for a
817 support method to cache data across repeated calls. To do that, allocate
818 the longer-lived data in <literal>fcinfo->flinfo->fn_mcxt</>, and
819 keep a pointer to it in <literal>fcinfo->flinfo->fn_extra</>. Such
820 data will survive for the life of the index operation (e.g., a single GiST
821 index scan, index build, or index tuple insertion). Be careful to pfree
822 the previous value when replacing a <literal>fn_extra</> value, or the leak
823 will accumulate for the duration of the operation.
828 <sect1 id="gist-implementation">
829 <title>Implementation</title>
831 <sect2 id="gist-buffering-build">
832 <title>GiST buffering build</title>
834 Building large GiST indexes by simply inserting all the tuples tends to be
835 slow, because if the index tuples are scattered across the index and the
836 index is large enough to not fit in cache, the insertions need to perform
837 a lot of random I/O. Beginning in version 9.2, PostgreSQL supports a more
838 efficient method to build GiST indexes based on buffering, which can
839 dramatically reduce the number of random I/Os needed for non-ordered data
840 sets. For well-ordered data sets the benefit is smaller or non-existent,
841 because only a small number of pages receive new tuples at a time, and
842 those pages fit in cache even if the index as whole does not.
846 However, buffering index build needs to call the <function>penalty</>
847 function more often, which consumes some extra CPU resources. Also, the
848 buffers used in the buffering build need temporary disk space, up to
849 the size of the resulting index. Buffering can also influence the quality
850 of the resulting index, in both positive and negative directions. That
851 influence depends on various factors, like the distribution of the input
852 data and the operator class implementation.
856 By default, a GiST index build switches to the buffering method when the
857 index size reaches <xref linkend="guc-effective-cache-size">. It can
858 be manually turned on or off by the <literal>buffering</literal> parameter
859 to the CREATE INDEX command. The default behavior is good for most cases,
860 but turning buffering off might speed up the build somewhat if the input
867 <sect1 id="gist-examples">
868 <title>Examples</title>
871 The <productname>PostgreSQL</productname> source distribution includes
872 several examples of index methods implemented using
873 <acronym>GiST</acronym>. The core system currently provides text search
874 support (indexing for <type>tsvector</> and <type>tsquery</>) as well as
875 R-Tree equivalent functionality for some of the built-in geometric data types
876 (see <filename>src/backend/access/gist/gistproc.c</>). The following
877 <filename>contrib</> modules also contain <acronym>GiST</acronym>
882 <term><filename>btree_gist</></term>
884 <para>B-tree equivalent functionality for several data types</para>
889 <term><filename>cube</></term>
891 <para>Indexing for multidimensional cubes</para>
896 <term><filename>hstore</></term>
898 <para>Module for storing (key, value) pairs</para>
903 <term><filename>intarray</></term>
905 <para>RD-Tree for one-dimensional array of int4 values</para>
910 <term><filename>ltree</></term>
912 <para>Indexing for tree-like structures</para>
917 <term><filename>pg_trgm</></term>
919 <para>Text similarity using trigram matching</para>
924 <term><filename>seg</></term>
926 <para>Indexing for <quote>float ranges</quote></para>