granicus.if.org Git - postgresql/blob - doc/src/sgml/gin.sgml

   1 <!-- doc/src/sgml/gin.sgml -->
   2
   3 <chapter id="gin">
   4 <title>GIN Indexes</title>
   5
   6    <indexterm>
   7     <primary>index</primary>
   8     <secondary>GIN</secondary>
   9    </indexterm>
  10
  11 <sect1 id="gin-intro">
  12  <title>Introduction</title>
  13
  14  <para>
  15   <acronym>GIN</acronym> stands for Generalized Inverted Index.
  16   <acronym>GIN</acronym> is designed for handling cases where the items
  17   to be indexed are composite values, and the queries to be handled by
  18   the index need to search for element values that appear within
  19   the composite items.  For example, the items could be documents,
  20   and the queries could be searches for documents containing specific words.
  21  </para>
  22
  23  <para>
  24   We use the word <firstterm>item</firstterm> to refer to a composite value that
  25   is to be indexed, and the word <firstterm>key</firstterm> to refer to an element
  26   value.  <acronym>GIN</acronym> always stores and searches for keys,
  27   not item values per se.
  28  </para>
  29
  30  <para>
  31   A <acronym>GIN</acronym> index stores a set of (key, posting list) pairs,
  32   where a <firstterm>posting list</firstterm> is a set of row IDs in which the key
  33   occurs.  The same row ID can appear in multiple posting lists, since
  34   an item can contain more than one key.  Each key value is stored only
  35   once, so a <acronym>GIN</acronym> index is very compact for cases
  36   where the same key appears many times.
  37  </para>
  38
  39  <para>
  40   <acronym>GIN</acronym> is generalized in the sense that the
  41   <acronym>GIN</acronym> access method code does not need to know the
  42   specific operations that it accelerates.
  43   Instead, it uses custom strategies defined for particular data types.
  44   The strategy defines how keys are extracted from indexed items and
  45   query conditions, and how to determine whether a row that contains
  46   some of the key values in a query actually satisfies the query.
  47  </para>
  48
  49  <para>
  50   One advantage of <acronym>GIN</acronym> is that it allows the development
  51   of custom data types with the appropriate access methods, by
  52   an expert in the domain of the data type, rather than a database expert.
  53   This is much the same advantage as using <acronym>GiST</acronym>.
  54  </para>
  55
  56  <para>
  57   The <acronym>GIN</acronym>
  58   implementation in <productname>PostgreSQL</productname> is primarily
  59   maintained by Teodor Sigaev and Oleg Bartunov. There is more
  60   information about <acronym>GIN</acronym> on their
  61   <ulink url="http://www.sai.msu.su/~megera/wiki/Gin">website</ulink>.
  62  </para>
  63 </sect1>
  64
  65 <sect1 id="gin-builtin-opclasses">
  66  <title>Built-in Operator Classes</title>
  67
  68  <para>
  69   The core <productname>PostgreSQL</productname> distribution
  70   includes the <acronym>GIN</acronym> operator classes shown in
  71   <xref linkend="gin-builtin-opclasses-table"/>.
  72   (Some of the optional modules described in <xref linkend="contrib"/>
  73   provide additional <acronym>GIN</acronym> operator classes.)
  74  </para>
  75
  76   <table id="gin-builtin-opclasses-table">
  77    <title>Built-in <acronym>GIN</acronym> Operator Classes</title>
  78    <tgroup cols="3">
  79     <thead>
  80      <row>
  81       <entry>Name</entry>
  82       <entry>Indexed Data Type</entry>
  83       <entry>Indexable Operators</entry>
  84      </row>
  85     </thead>
  86     <tbody>
  87      <row>
  88       <entry><literal>array_ops</literal></entry>
  89       <entry><type>anyarray</type></entry>
  90       <entry>
  91        <literal>&amp;&amp;</literal>
  92        <literal>&lt;@</literal>
  93        <literal>=</literal>
  94        <literal>@&gt;</literal>
  95       </entry>
  96      </row>
  97      <row>
  98       <entry><literal>jsonb_ops</literal></entry>
  99       <entry><type>jsonb</type></entry>
 100       <entry>
 101        <literal>?</literal>
 102        <literal>?&amp;</literal>
 103        <literal>?|</literal>
 104        <literal>@&gt;</literal>
 105        <literal>@?</literal>
 106        <literal>@@</literal>
 107       </entry>
 108      </row>
 109      <row>
 110       <entry><literal>jsonb_path_ops</literal></entry>
 111       <entry><type>jsonb</type></entry>
 112       <entry>
 113        <literal>@&gt;</literal>
 114        <literal>@?</literal>
 115        <literal>@@</literal>
 116       </entry>
 117      </row>
 118      <row>
 119       <entry><literal>tsvector_ops</literal></entry>
 120       <entry><type>tsvector</type></entry>
 121       <entry>
 122        <literal>@@</literal>
 123        <literal>@@@</literal>
 124       </entry>
 125      </row>
 126     </tbody>
 127    </tgroup>
 128   </table>
 129
 130  <para>
 131   Of the two operator classes for type <type>jsonb</type>, <literal>jsonb_ops</literal>
 132   is the default.  <literal>jsonb_path_ops</literal> supports fewer operators but
 133   offers better performance for those operators.
 134   See <xref linkend="json-indexing"/> for details.
 135  </para>
 136
 137 </sect1>
 138
 139 <sect1 id="gin-extensibility">
 140  <title>Extensibility</title>
 141
 142  <para>
 143    The <acronym>GIN</acronym> interface has a high level of abstraction,
 144    requiring the access method implementer only to implement the semantics of
 145    the data type being accessed.  The <acronym>GIN</acronym> layer itself
 146    takes care of concurrency, logging and searching the tree structure.
 147  </para>
 148
 149  <para>
 150    All it takes to get a <acronym>GIN</acronym> access method working is to
 151    implement a few user-defined methods, which define the behavior of
 152    keys in the tree and the relationships between keys, indexed items,
 153    and indexable queries. In short, <acronym>GIN</acronym> combines
 154    extensibility with generality, code reuse, and a clean interface.
 155  </para>
 156
 157  <para>
 158    There are two methods that an operator class for
 159    <acronym>GIN</acronym> must provide:
 160
 161   <variablelist>
 162     <varlistentry>
 163      <term><function>Datum *extractValue(Datum itemValue, int32 *nkeys,
 164         bool **nullFlags)</function></term>
 165      <listitem>
 166       <para>
 167        Returns a palloc'd array of keys given an item to be indexed.  The
 168        number of returned keys must be stored into <literal>*nkeys</literal>.
 169        If any of the keys can be null, also palloc an array of
 170        <literal>*nkeys</literal> <type>bool</type> fields, store its address at
 171        <literal>*nullFlags</literal>, and set these null flags as needed.
 172        <literal>*nullFlags</literal> can be left <symbol>NULL</symbol> (its initial value)
 173        if all keys are non-null.
 174        The return value can be <symbol>NULL</symbol> if the item contains no keys.
 175       </para>
 176      </listitem>
 177     </varlistentry>
 178
 179     <varlistentry>
 180      <term><function>Datum *extractQuery(Datum query, int32 *nkeys,
 181         StrategyNumber n, bool **pmatch, Pointer **extra_data,
 182         bool **nullFlags, int32 *searchMode)</function></term>
 183      <listitem>
 184       <para>
 185        Returns a palloc'd array of keys given a value to be queried; that is,
 186        <literal>query</literal> is the value on the right-hand side of an
 187        indexable operator whose left-hand side is the indexed column.
 188        <literal>n</literal> is the strategy number of the operator within the
 189        operator class (see <xref linkend="xindex-strategies"/>).
 190        Often, <function>extractQuery</function> will need
 191        to consult <literal>n</literal> to determine the data type of
 192        <literal>query</literal> and the method it should use to extract key values.
 193        The number of returned keys must be stored into <literal>*nkeys</literal>.
 194        If any of the keys can be null, also palloc an array of
 195        <literal>*nkeys</literal> <type>bool</type> fields, store its address at
 196        <literal>*nullFlags</literal>, and set these null flags as needed.
 197        <literal>*nullFlags</literal> can be left <symbol>NULL</symbol> (its initial value)
 198        if all keys are non-null.
 199        The return value can be <symbol>NULL</symbol> if the <literal>query</literal> contains no keys.
 200       </para>
 201
 202       <para>
 203        <literal>searchMode</literal> is an output argument that allows
 204        <function>extractQuery</function> to specify details about how the search
 205        will be done.
 206        If <literal>*searchMode</literal> is set to
 207        <literal>GIN_SEARCH_MODE_DEFAULT</literal> (which is the value it is
 208        initialized to before call), only items that match at least one of
 209        the returned keys are considered candidate matches.
 210        If <literal>*searchMode</literal> is set to
 211        <literal>GIN_SEARCH_MODE_INCLUDE_EMPTY</literal>, then in addition to items
 212        containing at least one matching key, items that contain no keys at
 213        all are considered candidate matches.  (This mode is useful for
 214        implementing is-subset-of operators, for example.)
 215        If <literal>*searchMode</literal> is set to <literal>GIN_SEARCH_MODE_ALL</literal>,
 216        then all non-null items in the index are considered candidate
 217        matches, whether they match any of the returned keys or not.  (This
 218        mode is much slower than the other two choices, since it requires
 219        scanning essentially the entire index, but it may be necessary to
 220        implement corner cases correctly.  An operator that needs this mode
 221        in most cases is probably not a good candidate for a GIN operator
 222        class.)
 223        The symbols to use for setting this mode are defined in
 224        <filename>access/gin.h</filename>.
 225       </para>
 226
 227       <para>
 228        <literal>pmatch</literal> is an output argument for use when partial match
 229        is supported.  To use it, <function>extractQuery</function> must allocate
 230        an array of <literal>*nkeys</literal> <type>bool</type>s and store its address at
 231        <literal>*pmatch</literal>.  Each element of the array should be set to true
 232        if the corresponding key requires partial match, false if not.
 233        If <literal>*pmatch</literal> is set to <symbol>NULL</symbol> then GIN assumes partial match
 234        is not required.  The variable is initialized to <symbol>NULL</symbol> before call,
 235        so this argument can simply be ignored by operator classes that do
 236        not support partial match.
 237       </para>
 238
 239       <para>
 240        <literal>extra_data</literal> is an output argument that allows
 241        <function>extractQuery</function> to pass additional data to the
 242        <function>consistent</function> and <function>comparePartial</function> methods.
 243        To use it, <function>extractQuery</function> must allocate
 244        an array of <literal>*nkeys</literal> pointers and store its address at
 245        <literal>*extra_data</literal>, then store whatever it wants to into the
 246        individual pointers.  The variable is initialized to <symbol>NULL</symbol> before
 247        call, so this argument can simply be ignored by operator classes that
 248        do not require extra data.  If <literal>*extra_data</literal> is set, the
 249        whole array is passed to the <function>consistent</function> method, and
 250        the appropriate element to the <function>comparePartial</function> method.
 251       </para>
 252
 253      </listitem>
 254     </varlistentry>
 255   </variablelist>
 256
 257   An operator class must also provide a function to check if an indexed item
 258   matches the query. It comes in two flavors, a Boolean <function>consistent</function>
 259   function, and a ternary <function>triConsistent</function> function.
 260   <function>triConsistent</function> covers the functionality of both, so providing
 261   <function>triConsistent</function> alone is sufficient. However, if the Boolean
 262   variant is significantly cheaper to calculate, it can be advantageous to
 263   provide both.  If only the Boolean variant is provided, some optimizations
 264   that depend on refuting index items before fetching all the keys are
 265   disabled.
 266
 267   <variablelist>
 268     <varlistentry>
 269      <term><function>bool consistent(bool check[], StrategyNumber n, Datum query,
 270         int32 nkeys, Pointer extra_data[], bool *recheck,
 271         Datum queryKeys[], bool nullFlags[])</function></term>
 272      <listitem>
 273       <para>
 274        Returns true if an indexed item satisfies the query operator with
 275        strategy number <literal>n</literal> (or might satisfy it, if the recheck
 276        indication is returned).  This function does not have direct access
 277        to the indexed item's value, since <acronym>GIN</acronym> does not
 278        store items explicitly.  Rather, what is available is knowledge
 279        about which key values extracted from the query appear in a given
 280        indexed item.  The <literal>check</literal> array has length
 281        <literal>nkeys</literal>, which is the same as the number of keys previously
 282        returned by <function>extractQuery</function> for this <literal>query</literal> datum.
 283        Each element of the
 284        <literal>check</literal> array is true if the indexed item contains the
 285        corresponding query key, i.e., if (check[i] == true) the i-th key of the
 286        <function>extractQuery</function> result array is present in the indexed item.
 287        The original <literal>query</literal> datum is
 288        passed in case the <function>consistent</function> method needs to consult it,
 289        and so are the <literal>queryKeys[]</literal> and <literal>nullFlags[]</literal>
 290        arrays previously returned by <function>extractQuery</function>.
 291        <literal>extra_data</literal> is the extra-data array returned by
 292        <function>extractQuery</function>, or <symbol>NULL</symbol> if none.
 293       </para>
 294
 295       <para>
 296        When <function>extractQuery</function> returns a null key in
 297        <literal>queryKeys[]</literal>, the corresponding <literal>check[]</literal> element
 298        is true if the indexed item contains a null key; that is, the
 299        semantics of <literal>check[]</literal> are like <literal>IS NOT DISTINCT
 300        FROM</literal>.  The <function>consistent</function> function can examine the
 301        corresponding <literal>nullFlags[]</literal> element if it needs to tell
 302        the difference between a regular value match and a null match.
 303       </para>
 304
 305       <para>
 306        On success, <literal>*recheck</literal> should be set to true if the heap
 307        tuple needs to be rechecked against the query operator, or false if
 308        the index test is exact.  That is, a false return value guarantees
 309        that the heap tuple does not match the query; a true return value with
 310        <literal>*recheck</literal> set to false guarantees that the heap tuple does
 311        match the query; and a true return value with
 312        <literal>*recheck</literal> set to true means that the heap tuple might match
 313        the query, so it needs to be fetched and rechecked by evaluating the
 314        query operator directly against the originally indexed item.
 315       </para>
 316      </listitem>
 317     </varlistentry>
 318
 319     <varlistentry>
 320      <term><function>GinTernaryValue triConsistent(GinTernaryValue check[], StrategyNumber n, Datum query,
 321         int32 nkeys, Pointer extra_data[],
 322         Datum queryKeys[], bool nullFlags[])</function></term>
 323      <listitem>
 324       <para>
 325        <function>triConsistent</function> is similar to <function>consistent</function>,
 326        but instead of Booleans in the <literal>check</literal> vector, there are
 327        three possible values for each
 328        key: <literal>GIN_TRUE</literal>, <literal>GIN_FALSE</literal> and
 329        <literal>GIN_MAYBE</literal>. <literal>GIN_FALSE</literal> and <literal>GIN_TRUE</literal>
 330        have the same meaning as regular Boolean values, while
 331        <literal>GIN_MAYBE</literal> means that the presence of that key is not known.
 332        When <literal>GIN_MAYBE</literal> values are present, the function should only
 333        return <literal>GIN_TRUE</literal> if the item certainly matches whether or
 334        not the index item contains the corresponding query keys. Likewise, the
 335        function must return <literal>GIN_FALSE</literal> only if the item certainly
 336        does not match, whether or not it contains the <literal>GIN_MAYBE</literal>
 337        keys. If the result depends on the <literal>GIN_MAYBE</literal> entries, i.e.,
 338        the match cannot be confirmed or refuted based on the known query keys,
 339        the function must return <literal>GIN_MAYBE</literal>.
 340       </para>
 341       <para>
 342        When there are no <literal>GIN_MAYBE</literal> values in the <literal>check</literal>
 343        vector, a <literal>GIN_MAYBE</literal> return value is the equivalent of
 344        setting the <literal>recheck</literal> flag in the
 345        Boolean <function>consistent</function> function.
 346       </para>
 347      </listitem>
 348     </varlistentry>
 349   </variablelist>
 350  </para>
 351
 352  <para>
 353   In addition, GIN must have a way to sort the key values stored in the index.
 354   The operator class can define the sort ordering by specifying a comparison
 355   method:
 356
 357   <variablelist>
 358     <varlistentry>
 359      <term><function>int compare(Datum a, Datum b)</function></term>
 360      <listitem>
 361       <para>
 362        Compares two keys (not indexed items!) and returns an integer less than
 363        zero, zero, or greater than zero, indicating whether the first key is
 364        less than, equal to, or greater than the second.  Null keys are never
 365        passed to this function.
 366       </para>
 367      </listitem>
 368     </varlistentry>
 369   </variablelist>
 370
 371   Alternatively, if the operator class does not provide a <function>compare</function>
 372   method, GIN will look up the default btree operator class for the index
 373   key data type, and use its comparison function.  It is recommended to
 374   specify the comparison function in a GIN operator class that is meant for
 375   just one data type, as looking up the btree operator class costs a few
 376   cycles.  However, polymorphic GIN operator classes (such
 377   as <literal>array_ops</literal>) typically cannot specify a single comparison
 378   function.
 379  </para>
 380
 381  <para>
 382   Optionally, an operator class for <acronym>GIN</acronym> can supply the
 383   following method:
 384
 385   <variablelist>
 386     <varlistentry>
 387      <term><function>int comparePartial(Datum partial_key, Datum key, StrategyNumber n,
 388                               Pointer extra_data)</function></term>
 389      <listitem>
 390       <para>
 391        Compare a partial-match query key to an index key.  Returns an integer
 392        whose sign indicates the result: less than zero means the index key
 393        does not match the query, but the index scan should continue; zero
 394        means that the index key does match the query; greater than zero
 395        indicates that the index scan should stop because no more matches
 396        are possible.  The strategy number <literal>n</literal> of the operator
 397        that generated the partial match query is provided, in case its
 398        semantics are needed to determine when to end the scan.  Also,
 399        <literal>extra_data</literal> is the corresponding element of the extra-data
 400        array made by <function>extractQuery</function>, or <symbol>NULL</symbol> if none.
 401        Null keys are never passed to this function.
 402       </para>
 403      </listitem>
 404     </varlistentry>
 405   </variablelist>
 406  </para>
 407
 408  <para>
 409   To support <quote>partial match</quote> queries, an operator class must
 410   provide the <function>comparePartial</function> method, and its
 411   <function>extractQuery</function> method must set the <literal>pmatch</literal>
 412   parameter when a partial-match query is encountered.  See
 413   <xref linkend="gin-partial-match"/> for details.
 414  </para>
 415
 416  <para>
 417   The actual data types of the various <literal>Datum</literal> values mentioned
 418   above vary depending on the operator class.  The item values passed to
 419   <function>extractValue</function> are always of the operator class's input type, and
 420   all key values must be of the class's <literal>STORAGE</literal> type.  The type of
 421   the <literal>query</literal> argument passed to <function>extractQuery</function>,
 422   <function>consistent</function> and <function>triConsistent</function> is whatever is the
 423   right-hand input type of the class member operator identified by the
 424   strategy number.  This need not be the same as the indexed type, so long as
 425   key values of the correct type can be extracted from it.  However, it is
 426   recommended that the SQL declarations of these three support functions use
 427   the opclass's indexed data type for the <literal>query</literal> argument, even
 428   though the actual type might be something else depending on the operator.
 429  </para>
 430
 431 </sect1>
 432
 433 <sect1 id="gin-implementation">
 434  <title>Implementation</title>
 435
 436  <para>
 437   Internally, a <acronym>GIN</acronym> index contains a B-tree index
 438   constructed over keys, where each key is an element of one or more indexed
 439   items (a member of an array, for example) and where each tuple in a leaf
 440   page contains either a pointer to a B-tree of heap pointers (a
 441   <quote>posting tree</quote>), or a simple list of heap pointers (a <quote>posting
 442   list</quote>) when the list is small enough to fit into a single index tuple along
 443   with the key value.  <xref linkend="gin-internals-figure"/> illustrates
 444   these components of a GIN index.
 445  </para>
 446
 447  <para>
 448   As of <productname>PostgreSQL</productname> 9.1, null key values can be
 449   included in the index.  Also, placeholder nulls are included in the index
 450   for indexed items that are null or contain no keys according to
 451   <function>extractValue</function>.  This allows searches that should find empty
 452   items to do so.
 453  </para>
 454
 455  <para>
 456   Multicolumn <acronym>GIN</acronym> indexes are implemented by building
 457   a single B-tree over composite values (column number, key value).  The
 458   key values for different columns can be of different types.
 459  </para>
 460
 461  <figure id="gin-internals-figure">
 462   <title>GIN Internals</title>
 463   <mediaobject>
 464    <imageobject>
 465     <imagedata fileref="images/gin.svg" format="SVG" width="100%"/>
 466    </imageobject>
 467   </mediaobject>
 468  </figure>
 469
 470  <sect2 id="gin-fast-update">
 471   <title>GIN Fast Update Technique</title>
 472
 473   <para>
 474    Updating a <acronym>GIN</acronym> index tends to be slow because of the
 475    intrinsic nature of inverted indexes: inserting or updating one heap row
 476    can cause many inserts into the index (one for each key extracted
 477    from the indexed item). As of <productname>PostgreSQL</productname> 8.4,
 478    <acronym>GIN</acronym> is capable of postponing much of this work by inserting
 479    new tuples into a temporary, unsorted list of pending entries.
 480    When the table is vacuumed or autoanalyzed, or when
 481    <function>gin_clean_pending_list</function> function is called, or if the
 482    pending list becomes larger than
 483    <xref linkend="guc-gin-pending-list-limit"/>, the entries are moved to the
 484    main <acronym>GIN</acronym> data structure using the same bulk insert
 485    techniques used during initial index creation.  This greatly improves
 486    <acronym>GIN</acronym> index update speed, even counting the additional
 487    vacuum overhead.  Moreover the overhead work can be done by a background
 488    process instead of in foreground query processing.
 489   </para>
 490
 491   <para>
 492    The main disadvantage of this approach is that searches must scan the list
 493    of pending entries in addition to searching the regular index, and so
 494    a large list of pending entries will slow searches significantly.
 495    Another disadvantage is that, while most updates are fast, an update
 496    that causes the pending list to become <quote>too large</quote> will incur an
 497    immediate cleanup cycle and thus be much slower than other updates.
 498    Proper use of autovacuum can minimize both of these problems.
 499   </para>
 500
 501   <para>
 502    If consistent response time is more important than update speed,
 503    use of pending entries can be disabled by turning off the
 504    <literal>fastupdate</literal> storage parameter for a
 505    <acronym>GIN</acronym> index.  See <xref linkend="sql-createindex"/>
 506    for details.
 507   </para>
 508  </sect2>
 509
 510  <sect2 id="gin-partial-match">
 511   <title>Partial Match Algorithm</title>
 512
 513   <para>
 514    GIN can support <quote>partial match</quote> queries, in which the query
 515    does not determine an exact match for one or more keys, but the possible
 516    matches fall within a reasonably narrow range of key values (within the
 517    key sorting order determined by the <function>compare</function> support method).
 518    The <function>extractQuery</function> method, instead of returning a key value
 519    to be matched exactly, returns a key value that is the lower bound of
 520    the range to be searched, and sets the <literal>pmatch</literal> flag true.
 521    The key range is then scanned using the <function>comparePartial</function>
 522    method.  <function>comparePartial</function> must return zero for a matching
 523    index key, less than zero for a non-match that is still within the range
 524    to be searched, or greater than zero if the index key is past the range
 525    that could match.
 526   </para>
 527  </sect2>
 528
 529 </sect1>
 530
 531 <sect1 id="gin-tips">
 532 <title>GIN Tips and Tricks</title>
 533
 534  <variablelist>
 535   <varlistentry>
 536    <term>Create vs. insert</term>
 537    <listitem>
 538     <para>
 539      Insertion into a <acronym>GIN</acronym> index can be slow
 540      due to the likelihood of many keys being inserted for each item.
 541      So, for bulk insertions into a table it is advisable to drop the GIN
 542      index and recreate it after finishing bulk insertion.
 543     </para>
 544
 545     <para>
 546      As of <productname>PostgreSQL</productname> 8.4, this advice is less
 547      necessary since delayed indexing is used (see <xref
 548      linkend="gin-fast-update"/> for details).  But for very large updates
 549      it may still be best to drop and recreate the index.
 550     </para>
 551    </listitem>
 552   </varlistentry>
 553
 554   <varlistentry>
 555    <term><xref linkend="guc-maintenance-work-mem"/></term>
 556    <listitem>
 557     <para>
 558      Build time for a <acronym>GIN</acronym> index is very sensitive to
 559      the <varname>maintenance_work_mem</varname> setting; it doesn't pay to
 560      skimp on work memory during index creation.
 561     </para>
 562    </listitem>
 563   </varlistentry>
 564
 565   <varlistentry>
 566    <term><xref linkend="guc-gin-pending-list-limit"/></term>
 567    <listitem>
 568     <para>
 569      During a series of insertions into an existing <acronym>GIN</acronym>
 570      index that has <literal>fastupdate</literal> enabled, the system will clean up
 571      the pending-entry list whenever the list grows larger than
 572      <varname>gin_pending_list_limit</varname>. To avoid fluctuations in observed
 573      response time, it's desirable to have pending-list cleanup occur in the
 574      background (i.e., via autovacuum).  Foreground cleanup operations
 575      can be avoided by increasing <varname>gin_pending_list_limit</varname>
 576      or making autovacuum more aggressive.
 577      However, enlarging the threshold of the cleanup operation means that
 578      if a foreground cleanup does occur, it will take even longer.
 579     </para>
 580     <para>
 581      <varname>gin_pending_list_limit</varname> can be overridden for individual
 582      GIN indexes by changing storage parameters, and which allows each
 583      GIN index to have its own cleanup threshold.
 584      For example, it's possible to increase the threshold only for the GIN
 585      index which can be updated heavily, and decrease it otherwise.
 586     </para>
 587    </listitem>
 588   </varlistentry>
 589
 590   <varlistentry>
 591    <term><xref linkend="guc-gin-fuzzy-search-limit"/></term>
 592    <listitem>
 593     <para>
 594      The primary goal of developing <acronym>GIN</acronym> indexes was
 595      to create support for highly scalable full-text search in
 596      <productname>PostgreSQL</productname>, and there are often situations when
 597      a full-text search returns a very large set of results.  Moreover, this
 598      often happens when the query contains very frequent words, so that the
 599      large result set is not even useful.  Since reading many
 600      tuples from the disk and sorting them could take a lot of time, this is
 601      unacceptable for production.  (Note that the index search itself is very
 602      fast.)
 603     </para>
 604     <para>
 605      To facilitate controlled execution of such queries,
 606      <acronym>GIN</acronym> has a configurable soft upper limit on the
 607      number of rows returned: the
 608      <varname>gin_fuzzy_search_limit</varname> configuration parameter.
 609      It is set to 0 (meaning no limit) by default.
 610      If a non-zero limit is set, then the returned set is a subset of
 611      the whole result set, chosen at random.
 612     </para>
 613     <para>
 614      <quote>Soft</quote> means that the actual number of returned results
 615      could differ somewhat from the specified limit, depending on the query
 616      and the quality of the system's random number generator.
 617     </para>
 618     <para>
 619      From experience, values in the thousands (e.g., 5000 &mdash; 20000)
 620      work well.
 621     </para>
 622    </listitem>
 623   </varlistentry>
 624  </variablelist>
 625
 626 </sect1>
 627
 628 <sect1 id="gin-limit">
 629  <title>Limitations</title>
 630
 631  <para>
 632   <acronym>GIN</acronym> assumes that indexable operators are strict.  This
 633   means that <function>extractValue</function> will not be called at all on a null
 634   item value (instead, a placeholder index entry is created automatically),
 635   and <function>extractQuery</function> will not be called on a null query
 636   value either (instead, the query is presumed to be unsatisfiable).  Note
 637   however that null key values contained within a non-null composite item
 638   or query value are supported.
 639  </para>
 640 </sect1>
 641
 642 <sect1 id="gin-examples">
 643  <title>Examples</title>
 644
 645  <para>
 646   The core <productname>PostgreSQL</productname> distribution
 647   includes the <acronym>GIN</acronym> operator classes previously shown in
 648   <xref linkend="gin-builtin-opclasses-table"/>.
 649   The following <filename>contrib</filename> modules also contain
 650   <acronym>GIN</acronym> operator classes:
 651
 652  <variablelist>
 653   <varlistentry>
 654    <term><filename>btree_gin</filename></term>
 655    <listitem>
 656     <para>B-tree equivalent functionality for several data types</para>
 657    </listitem>
 658   </varlistentry>
 659
 660   <varlistentry>
 661    <term><filename>hstore</filename></term>
 662    <listitem>
 663     <para>Module for storing (key, value) pairs</para>
 664    </listitem>
 665   </varlistentry>
 666
 667   <varlistentry>
 668    <term><filename>intarray</filename></term>
 669    <listitem>
 670     <para>Enhanced support for <type>int[]</type></para>
 671    </listitem>
 672   </varlistentry>
 673
 674   <varlistentry>
 675    <term><filename>pg_trgm</filename></term>
 676    <listitem>
 677     <para>Text similarity using trigram matching</para>
 678    </listitem>
 679   </varlistentry>
 680  </variablelist>
 681  </para>
 682 </sect1>
 683
 684 </chapter>