2 $PostgreSQL: pgsql/doc/src/sgml/indexam.sgml,v 2.4 2005/04/20 22:19:58 tgl Exp $
6 <title>Index Access Method Interface Definition</title>
9 This chapter defines the interface between the core
10 <productname>PostgreSQL</productname> system and <firstterm>index access
11 methods</>, which manage individual index types. The core system
12 knows nothing about indexes beyond what is specified here, so it is
13 possible to develop entirely new index types by writing add-on code.
17 All indexes in <productname>PostgreSQL</productname> are what are known
18 technically as <firstterm>secondary indexes</>; that is, the index is
19 physically separate from the table file that it describes. Each index
20 is stored as its own physical <firstterm>relation</> and so is described
21 by an entry in the <structname>pg_class</> catalog. The contents of an
22 index are entirely under the control of its index access method. In
23 practice, all index access methods divide indexes into standard-size
24 pages so that they can use the regular storage manager and buffer manager
25 to access the index contents. (All the existing index access methods
26 furthermore use the standard page layout described in <xref
27 linkend="storage-page-layout">, and they all use the same format for index
28 tuple headers; but these decisions are not forced on an access method.)
32 An index is effectively a mapping from some data key values to
33 <firstterm>tuple identifiers</>, or <acronym>TIDs</>, of row versions
34 (tuples) in the index's parent table. A TID consists of a
35 block number and an item number within that block (see <xref
36 linkend="storage-page-layout">). This is sufficient
37 information to fetch a particular row version from the table.
38 Indexes are not directly aware that under MVCC, there may be multiple
39 extant versions of the same logical row; to an index, each tuple is
40 an independent object that needs its own index entry. Thus, an
41 update of a row always creates all-new index entries for the row, even if
42 the key values did not change. Index entries for dead tuples are
43 reclaimed (by vacuuming) when the dead tuples themselves are reclaimed.
46 <sect1 id="index-catalog">
47 <title>Catalog Entries for Indexes</title>
50 Each index access method is described by a row in the
51 <structname>pg_am</structname> system catalog (see
52 <xref linkend="catalog-pg-am">). The principal contents of a
53 <structname>pg_am</structname> row are references to
54 <link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>
55 entries that identify the index access
56 functions supplied by the access method. The APIs for these functions
57 are defined later in this chapter. In addition, the
58 <structname>pg_am</structname> row specifies a few fixed properties of
59 the access method, such as whether it can support multi-column indexes.
60 There is not currently any special support
61 for creating or deleting <structname>pg_am</structname> entries;
62 anyone able to write a new access method is expected to be competent
63 to insert an appropriate row for themselves.
67 To be useful, an index access method must also have one or more
68 <firstterm>operator classes</> defined in
69 <link linkend="catalog-pg-opclass"><structname>pg_opclass</structname></link>,
70 <link linkend="catalog-pg-amop"><structname>pg_amop</structname></link>, and
71 <link linkend="catalog-pg-amproc"><structname>pg_amproc</structname></link>.
72 These entries allow the planner
73 to determine what kinds of query qualifications can be used with
74 indexes of this access method. Operator classes are described
75 in <xref linkend="xindex">, which is prerequisite material for reading
80 An individual index is defined by a
81 <link linkend="catalog-pg-class"><structname>pg_class</structname></link>
82 entry that describes it as a physical relation, plus a
83 <link linkend="catalog-pg-index"><structname>pg_index</structname></link>
84 entry that shows the logical content of the index — that is, the set
85 of index columns it has and the semantics of those columns, as captured by
86 the associated operator classes. The index columns (key values) can be
87 either simple columns of the underlying table or expressions over the table
88 rows. The index access method normally has no interest in where the index
89 key values come from (it is always handed precomputed key values) but it
90 will be very interested in the operator class information in
91 <structname>pg_index</structname>. Both of these catalog entries can be
92 accessed as part of the <structname>Relation</> data structure that is
93 passed to all operations on the index.
97 Some of the flag columns of <structname>pg_am</structname> have nonobvious
98 implications. The requirements of <structfield>amcanunique</structfield>
99 are discussed in <xref linkend="index-unique-checks">, and those of
100 <structfield>amconcurrent</structfield> in <xref linkend="index-locking">.
101 The <structfield>amcanmulticol</structfield> flag asserts that the
102 access method supports multi-column indexes, while
103 <structfield>amindexnulls</structfield> asserts that index entries are
104 created for NULL key values. Since most indexable operators are
105 strict and hence cannot return TRUE for NULL inputs,
106 it is at first sight attractive to not store index entries for NULLs:
107 they could never be returned by an index scan anyway. However, this
108 argument fails for a full-table index scan (one with no scan keys);
109 such a scan should include null rows. In practice this means that
110 indexes that support ordered scans (have <structfield>amorderstrategy</>
111 nonzero) must index nulls, since the planner might decide to use such a
112 scan as a substitute for sorting. Such indexes must also be willing to
113 run a scan with no scan keys at all. Another restriction is that an index
114 access method that supports multiple index columns <emphasis>must</>
115 support indexing null values in columns after the first, because the planner
116 will assume the index can be used for queries on just the first
117 column(s). For example, consider an index on (a,b) and a query with
118 <literal>WHERE a = 4</literal>. The system will assume the index can be
119 used to scan for rows with <literal>a = 4</literal>, which is wrong if the
120 index omits rows where <literal>b</> is null.
121 It is, however, OK to omit rows where the first indexed column is null.
122 (GiST currently does so.) Thus,
123 <structfield>amindexnulls</structfield> should be set true only if the
124 index access method indexes all rows, including arbitrary combinations of
130 <sect1 id="index-functions">
131 <title>Index Access Method Functions</title>
134 The index construction and maintenance functions that an index access
135 method must provide are:
141 ambuild (Relation heapRelation,
142 Relation indexRelation,
143 IndexInfo *indexInfo);
145 Build a new index. The index relation has been physically created,
146 but is empty. It must be filled in with whatever fixed data the
147 access method requires, plus entries for all tuples already existing
148 in the table. Ordinarily the <function>ambuild</> function will call
149 <function>IndexBuildHeapScan()</> to scan the table for existing tuples
150 and compute the keys that need to be inserted into the index.
156 aminsert (Relation indexRelation,
159 ItemPointer heap_tid,
160 Relation heapRelation,
161 bool check_uniqueness);
163 Insert a new tuple into an existing index. The <literal>values</> and
164 <literal>isnull</> arrays give the key values to be indexed, and
165 <literal>heap_tid</> is the TID to be indexed.
166 If the access method supports unique indexes (its
167 <structname>pg_am</>.<structfield>amcanunique</> flag is true) then
168 <literal>check_uniqueness</> may be true, in which case the access method
169 must verify that there is no conflicting row; this is the only situation in
170 which the access method normally needs the <literal>heapRelation</>
171 parameter. See <xref linkend="index-unique-checks"> for details.
172 The result is TRUE if an index entry was inserted, FALSE if not. (A FALSE
173 result does not denote an error condition, but is used for cases such
174 as an index AM refusing to index a NULL.)
179 IndexBulkDeleteResult *
180 ambulkdelete (Relation indexRelation,
181 IndexBulkDeleteCallback callback,
182 void *callback_state);
184 Delete tuple(s) from the index. This is a <quote>bulk delete</> operation
185 that is intended to be implemented by scanning the whole index and checking
186 each entry to see if it should be deleted.
187 The passed-in <literal>callback</> function may be called, in the style
188 <literal>callback(<replaceable>TID</>, callback_state) returns bool</literal>,
189 to determine whether any particular index entry, as identified by its
190 referenced TID, is to be deleted. Must return either NULL or a palloc'd
191 struct containing statistics about the effects of the deletion operation.
196 IndexBulkDeleteResult *
197 amvacuumcleanup (Relation indexRelation,
198 IndexVacuumCleanupInfo *info,
199 IndexBulkDeleteResult *stats);
201 Clean up after a <command>VACUUM</command> operation (one or more
202 <function>ambulkdelete</> calls). An index access method does not have
203 to provide this function (if so, the entry in <structname>pg_am</> must
204 be zero). If it is provided, it is typically used for bulk cleanup
205 such as reclaiming empty index pages. <literal>info</>
206 provides some additional arguments such as a message level for statistical
207 reports, and <literal>stats</> is whatever the last
208 <function>ambulkdelete</> call returned. <function>amvacuumcleanup</>
209 may replace or modify this struct before returning it. If the result
210 is not NULL it must be a palloc'd struct. The statistics it contains
211 will be reported by <command>VACUUM</> if <literal>VERBOSE</> is given.
215 The purpose of an index, of course, is to support scans for tuples matching
216 an indexable <literal>WHERE</> condition, often called a
217 <firstterm>qualifier</> or <firstterm>scan key</>. The semantics of
218 index scanning are described more fully in <xref linkend="index-scanning">,
219 below. The scan-related functions that an index access method must provide
226 ambeginscan (Relation indexRelation,
230 Begin a new scan. The <literal>key</> array (of length <literal>nkeys</>)
231 describes the scan key(s) for the index scan. The result must be a
232 palloc'd struct. For implementation reasons the index access method
233 <emphasis>must</> create this struct by calling
234 <function>RelationGetIndexScan()</>. In most cases
235 <function>ambeginscan</> itself does little beyond making that call;
236 the interesting parts of indexscan startup are in <function>amrescan</>.
242 amgettuple (IndexScanDesc scan,
243 ScanDirection direction);
245 Fetch the next tuple in the given scan, moving in the given
246 direction (forward or backward in the index). Returns TRUE if a tuple was
247 obtained, FALSE if no matching tuples remain. In the TRUE case the tuple
248 TID is stored into the <literal>scan</> structure. Note that
249 <quote>success</> means only that the index contains an entry that matches
250 the scan keys, not that the tuple necessarily still exists in the heap or
251 will pass the caller's snapshot test.
257 amgetmulti (IndexScanDesc scan,
260 int32 *returned_tids);
262 Fetch multiple tuples in the given scan. Returns TRUE if the scan should
263 continue, FALSE if no matching tuples remain. <literal>tids</> points to
264 a caller-supplied array of <literal>max_tids</>
265 <structname>ItemPointerData</> records, which the call fills with TIDs of
266 matching tuples. <literal>*returned_tids</> is set to the number of TIDs
267 actually returned. This can be less than <literal>max_tids</>, or even
268 zero, even when the return value is TRUE. (This provision allows the
269 access method to choose the most efficient stopping points in its scan,
270 for example index page boundaries.) <function>amgetmulti</> and
271 <function>amgettuple</> cannot be used in the same index scan; there
272 are other restrictions too when using <function>amgetmulti</>, as explained
273 in <xref linkend="index-scanning">.
279 amrescan (IndexScanDesc scan,
282 Restart the given scan, possibly with new scan keys (to continue using
283 the old keys, NULL is passed for <literal>key</>). Note that it is not
284 possible for the number of keys to be changed. In practice the restart
285 feature is used when a new outer tuple is selected by a nestloop join
286 and so a new key comparison value is needed, but the scan key structure
287 remains the same. This function is also called by
288 <function>RelationGetIndexScan()</>, so it is used for initial setup
289 of an indexscan as well as rescanning.
295 amendscan (IndexScanDesc scan);
297 End a scan and release resources. The <literal>scan</> struct itself
298 should not be freed, but any locks or pins taken internally by the
299 access method must be released.
305 ammarkpos (IndexScanDesc scan);
307 Mark current scan position. The access method need only support one
308 remembered scan position per scan.
314 amrestrpos (IndexScanDesc scan);
316 Restore the scan to the most recently marked position.
322 amcostestimate (Query *root,
325 Cost *indexStartupCost,
326 Cost *indexTotalCost,
327 Selectivity *indexSelectivity,
328 double *indexCorrelation);
330 Estimate the costs of an index scan. This function is described fully
331 in <xref linkend="index-cost-estimation">, below.
335 By convention, the <literal>pg_proc</literal> entry for any index
336 access method function should show the correct number of arguments,
337 but declare them all as type <type>internal</> (since most of the arguments
338 have types that are not known to SQL, and we don't want users calling
339 the functions directly anyway). The return type is declared as
340 <type>void</>, <type>internal</>, or <type>boolean</> as appropriate.
345 <sect1 id="index-scanning">
346 <title>Index Scanning</title>
349 In an index scan, the index access method is responsible for regurgitating
350 the TIDs of all the tuples it has been told about that match the
351 <firstterm>scan keys</>. The access method is <emphasis>not</> involved in
352 actually fetching those tuples from the index's parent table, nor in
353 determining whether they pass the scan's time qualification test or other
358 A scan key is the internal representation of a <literal>WHERE</> clause of
359 the form <replaceable>index_key</> <replaceable>operator</>
360 <replaceable>constant</>, where the index key is one of the columns of the
361 index and the operator is one of the members of the operator class
362 associated with that index column. An index scan has zero or more scan
363 keys, which are implicitly ANDed — the returned tuples are expected
364 to satisfy all the indicated conditions.
368 The operator class may indicate that the index is <firstterm>lossy</> for a
369 particular operator; this implies that the index scan will return all the
370 entries that pass the scan key, plus possibly additional entries that do
371 not. The core system's indexscan machinery will then apply that operator
372 again to the heap tuple to verify whether or not it really should be
373 selected. For non-lossy operators, the index scan must return exactly the
374 set of matching entries, as there is no recheck.
378 Note that it is entirely up to the access method to ensure that it
379 correctly finds all and only the entries passing all the given scan keys.
380 Also, the core system will simply hand off all the <literal>WHERE</>
381 clauses that match the index keys and operator classes, without any
382 semantic analysis to determine whether they are redundant or
383 contradictory. As an example, given
384 <literal>WHERE x > 4 AND x > 14</> where <literal>x</> is a b-tree
385 indexed column, it is left to the b-tree <function>amrescan</> function
386 to realize that the first scan key is redundant and can be discarded.
387 The extent of preprocessing needed during <function>amrescan</> will
388 depend on the extent to which the index access method needs to reduce
389 the scan keys to a <quote>normalized</> form.
393 The <function>amgettuple</> function has a <literal>direction</> argument,
394 which can be either <literal>ForwardScanDirection</> (the normal case)
395 or <literal>BackwardScanDirection</>. If the first call after
396 <function>amrescan</> specifies <literal>BackwardScanDirection</>, then the
397 set of matching index entries is to be scanned back-to-front rather than in
398 the normal front-to-back direction, so <function>amgettuple</> must return
399 the last matching tuple in the index, rather than the first one as it
400 normally would. (This will only occur for access
401 methods that advertise they support ordered scans by setting
402 <structname>pg_am</>.<structfield>amorderstrategy</> nonzero.) After the
403 first call, <function>amgettuple</> must be prepared to advance the scan in
404 either direction from the most recently returned entry.
408 The access method must support <quote>marking</> a position in a scan
409 and later returning to the marked position. The same position may be
410 restored multiple times. However, only one position need be remembered
411 per scan; a new <function>ammarkpos</> call overrides the previously
416 Both the scan position and the mark position (if any) must be maintained
417 consistently in the face of concurrent insertions or deletions in the
418 index. It is OK if a freshly-inserted entry is not returned by a scan that
419 would have found the entry if it had existed when the scan started, or for
420 the scan to return such an entry upon rescanning or backing
421 up even though it had not been returned the first time through. Similarly,
422 a concurrent delete may or may not be reflected in the results of a scan.
423 What is important is that insertions or deletions not cause the scan to
424 miss or multiply return entries that were not themselves being inserted or
425 deleted. (For an index type that does not set
426 <structname>pg_am</>.<structfield>amconcurrent</>, it is sufficient to
427 handle these cases for insertions or deletions performed by the same
428 backend that's doing the scan. But when <structfield>amconcurrent</> is
429 true, insertions or deletions from other backends must be handled as well.)
433 Instead of using <function>amgettuple</>, an index scan can be done with
434 <function>amgetmulti</> to fetch multiple tuples per call. This can be
435 noticeably more efficient than <function>amgettuple</> because it allows
436 avoiding lock/unlock cycles within the access method. In principle
437 <function>amgetmulti</> should have the same effects as repeated
438 <function>amgettuple</> calls, but we impose several restrictions to
439 simplify matters. In the first place, <function>amgetmulti</> does not
440 take a <literal>direction</> argument, and therefore it does not support
441 backwards scan nor intrascan reversal of direction. The access method
442 need not support marking or restoring scan positions during an
443 <function>amgetmulti</> scan, either. (These restrictions cost little
444 since it would be difficult to use these features in an
445 <function>amgetmulti</> scan anyway: adjusting the caller's buffered
446 list of TIDs would be complex.) Finally, <function>amgetmulti</> does
447 not guarantee any locking of the returned tuples, with implications
448 spelled out in <xref linkend="index-locking">.
453 <sect1 id="index-locking">
454 <title>Index Locking Considerations</title>
457 An index access method can choose whether it supports concurrent updates
458 of the index by multiple processes. If the method's
459 <structname>pg_am</>.<structfield>amconcurrent</> flag is true, then
460 the core <productname>PostgreSQL</productname> system obtains
461 <literal>AccessShareLock</> on the index during an index scan, and
462 <literal>RowExclusiveLock</> when updating the index. Since these lock
463 types do not conflict, the access method is responsible for handling any
464 fine-grained locking it may need. An exclusive lock on the index as a whole
465 will be taken only during index creation, destruction, or
466 <literal>REINDEX</>. When <structfield>amconcurrent</> is false,
467 <productname>PostgreSQL</productname> still obtains
468 <literal>AccessShareLock</> during index scans, but it obtains
469 <literal>AccessExclusiveLock</> during any update. This ensures that
470 updaters have sole use of the index. Note that this implicitly assumes
471 that index scans are read-only; an access method that might modify the
472 index during a scan will still have to do its own locking to handle the
473 case of concurrent scans.
477 Recall that a backend's own locks never conflict; therefore, even a
478 non-concurrent index type must be prepared to handle the case where
479 a backend is inserting or deleting entries in an index that it is itself
480 scanning. (This is of course necessary to support an <command>UPDATE</>
481 that uses the index to find the rows to be updated.)
485 Building an index type that supports concurrent updates usually requires
486 extensive and subtle analysis of the required behavior. For the b-tree
487 and hash index types, you can read about the design decisions involved in
488 <filename>src/backend/access/nbtree/README</> and
489 <filename>src/backend/access/hash/README</>.
493 Aside from the index's own internal consistency requirements, concurrent
494 updates create issues about consistency between the parent table (the
495 <firstterm>heap</>) and the index. Because
496 <productname>PostgreSQL</productname> separates accesses
497 and updates of the heap from those of the index, there are windows in
498 which the index may be inconsistent with the heap. We handle this problem
499 with the following rules:
504 A new heap entry is made before making its index entries. (Therefore
505 a concurrent index scan is likely to fail to see the heap entry.
506 This is okay because the index reader would be uninterested in an
507 uncommitted row anyway. But see <xref linkend="index-unique-checks">.)
512 When a heap entry is to be deleted (by <command>VACUUM</>), all its
513 index entries must be removed first.
518 For concurrent index types, an indexscan must maintain a pin
519 on the index page holding the item last returned by
520 <function>amgettuple</>, and <function>ambulkdelete</> cannot delete
521 entries from pages that are pinned by other backends. The need
522 for this rule is explained below.
527 If an index is concurrent then it is possible for an index reader to
528 see an index entry just before it is removed by <command>VACUUM</>, and
529 then to arrive at the corresponding heap entry after that was removed by
530 <command>VACUUM</>. (With a nonconcurrent index, this is not possible
531 because of the conflicting index-level locks that will be taken out.)
532 This creates no serious problems if that item
533 number is still unused when the reader reaches it, since an empty
534 item slot will be ignored by <function>heap_fetch()</>. But what if a
535 third backend has already re-used the item slot for something else?
536 When using an MVCC-compliant snapshot, there is no problem because
537 the new occupant of the slot is certain to be too new to pass the
538 snapshot test. However, with a non-MVCC-compliant snapshot (such as
539 <literal>SnapshotNow</>), it would be possible to accept and return
540 a row that does not in fact match the scan keys. We could defend
541 against this scenario by requiring the scan keys to be rechecked
542 against the heap row in all cases, but that is too expensive. Instead,
543 we use a pin on an index page as a proxy to indicate that the reader
544 may still be <quote>in flight</> from the index entry to the matching
545 heap entry. Making <function>ambulkdelete</> block on such a pin ensures
546 that <command>VACUUM</> cannot delete the heap entry before the reader
547 is done with it. This solution costs little in runtime, and adds blocking
548 overhead only in the rare cases where there actually is a conflict.
552 This solution requires that index scans be <quote>synchronous</>: we have
553 to fetch each heap tuple immediately after scanning the corresponding index
554 entry. This is expensive for a number of reasons. An
555 <quote>asynchronous</> scan in which we collect many TIDs from the index,
556 and only visit the heap tuples sometime later, requires much less index
557 locking overhead and may allow a more efficient heap access pattern.
558 Per the above analysis, we must use the synchronous approach for
559 non-MVCC-compliant snapshots, but an asynchronous scan is workable
560 for a query using an MVCC snapshot.
564 In an <function>amgetmulti</> index scan, the access method need not
565 guarantee to keep an index pin on any of the returned tuples. (It would be
566 impractical to pin more than the last one anyway.) Therefore
567 it is only safe to use such scans with MVCC-compliant snapshots.
572 <sect1 id="index-unique-checks">
573 <title>Index Uniqueness Checks</title>
576 <productname>PostgreSQL</productname> enforces SQL uniqueness constraints
577 using <firstterm>unique indexes</>, which are indexes that disallow
578 multiple entries with identical keys. An access method that supports this
579 feature sets <structname>pg_am</>.<structfield>amcanunique</> true.
580 (At present, only b-tree supports it.)
584 Because of MVCC, it is always necessary to allow duplicate entries to
585 exist physically in an index: the entries might refer to successive
586 versions of a single logical row. The behavior we actually want to
587 enforce is that no MVCC snapshot could include two rows with equal
588 index keys. This breaks down into the following cases that must be
589 checked when inserting a new row into a unique index:
594 If a conflicting valid row has been deleted by the current transaction,
595 it's okay. (In particular, since an UPDATE always deletes the old row
596 version before inserting the new version, this will allow an UPDATE on
597 a row without changing the key.)
602 If a conflicting row has been inserted by an as-yet-uncommitted
603 transaction, the would-be inserter must wait to see if that transaction
604 commits. If it rolls back then there is no conflict. If it commits
605 without deleting the conflicting row again, there is a uniqueness
606 violation. (In practice we just wait for the other transaction to
607 end and then redo the visibility check in toto.)
612 Similarly, if a conflicting valid row has been deleted by an
613 as-yet-uncommitted transaction, the would-be inserter must wait
614 for that transaction to commit or abort, and then repeat the test.
621 We require the index access method to apply these tests itself, which
622 means that it must reach into the heap to check the commit status of
623 any row that is shown to have a duplicate key according to the index
624 contents. This is without a doubt ugly and non-modular, but it saves
625 redundant work: if we did a separate probe then the index lookup for
626 a conflicting row would be essentially repeated while finding the place to
627 insert the new row's index entry. What's more, there is no obvious way
628 to avoid race conditions unless the conflict check is an integral part
629 of insertion of the new index entry.
633 The main limitation of this scheme is that it has no convenient way
634 to support deferred uniqueness checks.
639 <sect1 id="index-cost-estimation">
640 <title>Index Cost Estimation Functions</title>
643 The amcostestimate function is given a list of WHERE clauses that have
644 been determined to be usable with the index. It must return estimates
645 of the cost of accessing the index and the selectivity of the WHERE
646 clauses (that is, the fraction of parent-table rows that will be
647 retrieved during the index scan). For simple cases, nearly all the
648 work of the cost estimator can be done by calling standard routines
649 in the optimizer; the point of having an amcostestimate function is
650 to allow index access methods to provide index-type-specific knowledge,
651 in case it is possible to improve on the standard estimates.
655 Each amcostestimate function must have the signature:
659 amcostestimate (Query *root,
662 Cost *indexStartupCost,
663 Cost *indexTotalCost,
664 Selectivity *indexSelectivity,
665 double *indexCorrelation);
668 The first four parameters are inputs:
675 The query being processed.
684 The index being considered.
690 <term>indexQuals</term>
693 List of index qual clauses (implicitly ANDed);
694 a NIL list indicates no qualifiers are available.
695 Note that the list contains expression trees, not ScanKeys.
703 The last four parameters are pass-by-reference outputs:
707 <term>*indexStartupCost</term>
710 Set to cost of index start-up processing
716 <term>*indexTotalCost</term>
719 Set to total cost of index processing
725 <term>*indexSelectivity</term>
728 Set to index selectivity
734 <term>*indexCorrelation</term>
737 Set to correlation coefficient between index scan order and
738 underlying table's order
746 Note that cost estimate functions must be written in C, not in SQL or
747 any available procedural language, because they must access internal
748 data structures of the planner/optimizer.
752 The index access costs should be computed in the units used by
753 <filename>src/backend/optimizer/path/costsize.c</filename>: a sequential
754 disk block fetch has cost 1.0, a nonsequential fetch has cost
755 <varname>random_page_cost</>, and the cost of processing one index row
756 should usually be taken as <varname>cpu_index_tuple_cost</>. In addition,
757 an appropriate multiple of <varname>cpu_operator_cost</> should be charged
758 for any comparison operators invoked during index processing (especially
759 evaluation of the indexQuals themselves).
763 The access costs should include all disk and CPU costs associated with
764 scanning the index itself, but <emphasis>not</> the costs of retrieving or
765 processing the parent-table rows that are identified by the index.
769 The <quote>start-up cost</quote> is the part of the total scan cost that must be expended
770 before we can begin to fetch the first row. For most indexes this can
771 be taken as zero, but an index type with a high start-up cost might want
776 The indexSelectivity should be set to the estimated fraction of the parent
777 table rows that will be retrieved during the index scan. In the case
778 of a lossy index, this will typically be higher than the fraction of
779 rows that actually pass the given qual conditions.
783 The indexCorrelation should be set to the correlation (ranging between
784 -1.0 and 1.0) between the index order and the table order. This is used
785 to adjust the estimate for the cost of fetching rows from the parent
790 <title>Cost Estimation</title>
792 A typical cost estimator will proceed as follows:
797 Estimate and return the fraction of parent-table rows that will be visited
798 based on the given qual conditions. In the absence of any index-type-specific
799 knowledge, use the standard optimizer function <function>clauselist_selectivity()</function>:
802 *indexSelectivity = clauselist_selectivity(root, indexQuals,
803 index->rel->relid, JOIN_INNER);
810 Estimate the number of index rows that will be visited during the
811 scan. For many index types this is the same as indexSelectivity times
812 the number of rows in the index, but it might be more. (Note that the
813 index's size in pages and rows is available from the IndexOptInfo struct.)
819 Estimate the number of index pages that will be retrieved during the scan.
820 This might be just indexSelectivity times the index's size in pages.
826 Compute the index access cost. A generic estimator might do this:
830 * Our generic assumption is that the index pages will be read
831 * sequentially, so they have cost 1.0 each, not random_page_cost.
832 * Also, we charge for evaluation of the indexquals at each index row.
833 * All the costs are assumed to be paid incrementally during the scan.
835 cost_qual_eval(&index_qual_cost, indexQuals);
836 *indexStartupCost = index_qual_cost.startup;
837 *indexTotalCost = numIndexPages +
838 (cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
845 Estimate the index correlation. For a simple ordered index on a single
846 field, this can be retrieved from pg_statistic. If the correlation
847 is not known, the conservative estimate is zero (no correlation).
853 Examples of cost estimator functions can be found in
854 <filename>src/backend/utils/adt/selfuncs.c</filename>.
859 <!-- Keep this comment at the end of the file
864 sgml-minimize-attributes:nil
865 sgml-always-quote-attributes:t
868 sgml-parent-document:nil
869 sgml-default-dtd-file:"./reference.ced"
870 sgml-exposed-tags:nil
871 sgml-local-catalogs:("/usr/lib/sgml/catalog")
872 sgml-local-ecat-files:nil