Fix lots of bad markup, bad English, bad explanations.
Second round of commits. pgcrypto and pgstandby still to go...
+<!-- $PostgreSQL: pgsql/doc/src/sgml/intagg.sgml,v 1.3 2007/12/10 05:32:51 tgl Exp $ -->
<sect1 id="intagg">
<title>intagg</title>
-
+
<indexterm zone="intagg">
<primary>intagg</primary>
</indexterm>
<para>
- This section describes the <literal>intagg</literal> module which provides an integer aggregator and an enumerator.
+ The <filename>intagg</filename> module provides an integer aggregator and an
+ enumerator.
+ </para>
+
+ <sect2>
+ <title>Functions</title>
+
+ <para>
+ The aggregator is an aggregate function
+ <function>int_array_aggregate(integer)</>
+ that produces an integer array
+ containing exactly the integers it is fed.
+ Here is a not-tremendously-useful example:
</para>
+
+ <programlisting>
+test=# select int_array_aggregate(i) from
+test-# generate_series(1,10,2) i;
+ int_array_aggregate
+---------------------
+ {1,3,5,7,9}
+(1 row)
+ </programlisting>
+
<para>
- Many database systems have the notion of a one to many table. Such a table usually sits between two indexed tables, as:
+ The enumerator is a function
+ <function>int_array_enum(integer[])</>
+ that returns <type>setof integer</>. It is essentially the reverse
+ operation of the aggregator: given an array of integers, expand it
+ into a set of rows. For example,
</para>
+
+ <programlisting>
+test=# select * from int_array_enum(array[1,3,5,7,9]);
+ int_array_enum
+----------------
+ 1
+ 3
+ 5
+ 7
+ 9
+(5 rows)
+ </programlisting>
+
+ </sect2>
+
+ <sect2>
+ <title>Sample Uses</title>
+
+ <para>
+ Many database systems have the notion of a one to many table. Such a table
+ usually sits between two indexed tables, for example:
+ </para>
+
<programlisting>
-CREATE TABLE one_to_many(left INT, right INT) ;
+CREATE TABLE left (id INT PRIMARY KEY, ...);
+CREATE TABLE right (id INT PRIMARY KEY, ...);
+CREATE TABLE one_to_many(left INT REFERENCES left, right INT REFERENCES right);
</programlisting>
<para>
- And it is used like this:
+ It is typically used like this:
</para>
<programlisting>
- SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right)
- WHERE one_to_many.left = item;
+ SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right)
+ WHERE one_to_many.left = <replaceable>item</>;
</programlisting>
<para>
- This will return all the items in the right hand table for an entry
+ This will return all the items in the right hand table for an entry
in the left hand table. This is a very common construct in SQL.
</para>
<para>
Now, this methodology can be cumbersome with a very large number of
- entries in the one_to_many table. Depending on the order in which
- data was entered, a join like this could result in an index scan
+ entries in the <structname>one_to_many</> table. Often,
+ a join like this would result in an index scan
and a fetch for each right hand entry in the table for a particular
- left hand entry. If you have a very dynamic system, there is not much you
+ left hand entry. If you have a very dynamic system, there is not much you
can do. However, if you have some data which is fairly static, you can
create a summary table with the aggregator.
</para>
<programlisting>
-CREATE TABLE summary as SELECT left, int_array_aggregate(right)
- AS right FROM one_to_many GROUP BY left;
+CREATE TABLE summary as
+ SELECT left, int_array_aggregate(right) AS right
+ FROM one_to_many
+ GROUP BY left;
</programlisting>
<para>
This will create a table with one row per left item, and an array
of right items. Now this is pretty useless without some way of using
- the array, thats why there is an array enumerator.
+ the array; that's why there is an array enumerator. You can do
</para>
+
<programlisting>
-SELECT left, int_array_enum(right) FROM summary WHERE left = item;
+SELECT left, int_array_enum(right) FROM summary WHERE left = <replaceable>item</>;
</programlisting>
<para>
- The above query using int_array_enum, produces the same results as:
+ The above query using <function>int_array_enum</> produces the same results
+ as
</para>
+
<programlisting>
-SELECT left, right FROM one_to_many WHERE left = item;
+SELECT left, right FROM one_to_many WHERE left = <replaceable>item</>;
</programlisting>
-
+
<para>
The difference is that the query against the summary table has to get
- only one row from the table, where as the query against "one_to_many"
- must index scan and fetch a row for each entry.
+ only one row from the table, whereas the direct query against
+ <structname>one_to_many</> must index scan and fetch a row for each entry.
</para>
+
<para>
- On our system, an EXPLAIN shows a query with a cost of 8488 gets reduced
- to a cost of 329. The query is a join between the one_to_many table,
+ On one system, an <command>EXPLAIN</> showed a query with a cost of 8488 was
+ reduced to a cost of 329. The original query was a join involving the
+ <structname>one_to_many</> table, which was replaced by:
</para>
+
<programlisting>
-SELECT right, count(right) FROM
-(
- SELECT left, int_array_enum(right) AS right FROM summary JOIN
- (SELECT left FROM left_table WHERE left = item) AS lefts
- ON (summary.left = lefts.left )
-) AS list GROUP BY right ORDER BY count DESC ;
+SELECT right, count(right) FROM
+ ( SELECT left, int_array_enum(right) AS right
+ FROM summary JOIN (SELECT left FROM left_table WHERE left = <replaceable>item</>) AS lefts
+ ON (summary.left = lefts.left)
+ ) AS list
+ GROUP BY right
+ ORDER BY count DESC;
</programlisting>
-</sect1>
+ </sect2>
+
+</sect1>
+<!-- $PostgreSQL: pgsql/doc/src/sgml/intarray.sgml,v 1.5 2007/12/10 05:32:51 tgl Exp $ -->
+
<sect1 id="intarray">
<title>intarray</title>
-
+
<indexterm zone="intarray">
<primary>intarray</primary>
</indexterm>
<para>
- This is an implementation of RD-tree data structure using GiST interface
- of PostgreSQL. It has built-in lossy compression.
- </para>
-
- <para>
- Current implementation provides index support for one-dimensional array of
- integers: gist__int_ops, suitable for small and medium size of arrays (used by
- default), and gist__intbig_ops for indexing large arrays (we use superimposed
- signature with length of 4096 bits to represent sets). There is also a
- non-default gin__int_ops for GIN indexes on integer arrays.
+ The <filename>intarray</> module provides a number of useful functions
+ and operators for manipulating one-dimensional arrays of integers.
+ There is also support for indexed searches using some of the operators.
</para>
<sect2>
- <title>Functions</title>
-
- <itemizedlist>
-
- <listitem>
- <para>
- <literal>int icount(int[])</literal> - the number of elements in intarray
- </para>
- <programlisting>
-test=# select icount('{1,2,3}'::int[]);
- icount
---------
- 3
-(1 row)
- </programlisting>
- </listitem>
-
- <listitem>
- <para>
- <literal>int[] sort(int[], 'asc' | 'desc')</literal> - sort intarray
- </para>
- <programlisting>
-test=# select sort('{1,2,3}'::int[],'desc');
- sort
----------
- {3,2,1}
-(1 row)
- </programlisting>
- </listitem>
-
- <listitem>
- <para>
- <literal>int[] sort(int[])</literal> - sort in ascending order
- </para>
- </listitem>
-
- <listitem>
- <para>
- <literal>int[] sort_asc(int[]),sort_desc(int[])</literal> - shortcuts for sort
- </para>
- </listitem>
-
- <listitem>
- <para>
- <literal>int[] uniq(int[])</literal> - returns unique elements
- </para>
- <programlisting>
-test=# select uniq(sort('{1,2,3,2,1}'::int[]));
- uniq
----------
- {1,2,3}
-(1 row)
- </programlisting>
- </listitem>
-
- <listitem>
- <para>
- <literal>int idx(int[], int item)</literal> - returns index of first
- intarray matching element to item, or '0' if matching failed.
- </para>
- <programlisting>
-test=# select idx('{1,2,3,2,1}'::int[],2);
- idx
------
- 2
-(1 row)
- </programlisting>
- </listitem>
-
- <listitem>
- <para>
- <literal>int[] subarray(int[],int START [, int LEN])</literal> - returns
- part of intarray starting from element number START (from 1) and length LEN.
- </para>
- <programlisting>
-test=# select subarray('{1,2,3,2,1}'::int[],2,3);
- subarray
-----------
- {2,3,2}
-(1 row)
- </programlisting>
- </listitem>
-
- <listitem>
- <para>
- <literal>int[] intset(int4)</literal> - casting int4 to int[]
- </para>
- <programlisting>
-test=# select intset(1);
- intset
---------
- {1}
-(1 row)
- </programlisting>
- </listitem>
-
- </itemizedlist>
- </sect2>
+ <title><filename>intarray</> Functions and Operators</title>
- <sect2>
- <title>Operations</title>
- <table>
- <title>Operations</title>
- <tgroup cols="2">
+ <table id="intarray-func-table">
+ <title><filename>intarray</> Functions</title>
+
+ <tgroup cols="5">
+ <thead>
+ <row>
+ <entry>Function</entry>
+ <entry>Return Type</entry>
+ <entry>Description</entry>
+ <entry>Example</entry>
+ <entry>Result</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><function>icount(int[])</function></entry>
+ <entry><type>int</type></entry>
+ <entry>number of elements in array</entry>
+ <entry><literal>icount('{1,2,3}'::int[])</literal></entry>
+ <entry><literal>3</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>sort(int[], text dir)</function></entry>
+ <entry><type>int[]</type></entry>
+ <entry>sort array — <parameter>dir</> must be <literal>asc</> or <literal>desc</></entry>
+ <entry><literal>sort('{1,2,3}'::int[], 'desc')</literal></entry>
+ <entry><literal>{3,2,1}</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>sort(int[])</function></entry>
+ <entry><type>int[]</type></entry>
+ <entry>sort in ascending order</entry>
+ <entry><literal>sort(array[11,77,44])</literal></entry>
+ <entry><literal>{11,44,77}</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>sort_asc(int[])</function></entry>
+ <entry><type>int[]</type></entry>
+ <entry>sort in ascending order</entry>
+ <entry><literal></literal></entry>
+ <entry><literal></literal></entry>
+ </row>
+
+ <row>
+ <entry><function>sort_desc(int[])</function></entry>
+ <entry><type>int[]</type></entry>
+ <entry>sort in descending order</entry>
+ <entry><literal></literal></entry>
+ <entry><literal></literal></entry>
+ </row>
+
+ <row>
+ <entry><function>uniq(int[])</function></entry>
+ <entry><type>int[]</type></entry>
+ <entry>remove adjacent duplicates</entry>
+ <entry><literal>uniq(sort('{1,2,3,2,1}'::int[]))</literal></entry>
+ <entry><literal>{1,2,3}</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>idx(int[], int item)</function></entry>
+ <entry><type>int</type></entry>
+ <entry>index of first element matching <parameter>item</> (0 if none)</entry>
+ <entry><literal>idx(array[11,22,33,22,11], 22)</literal></entry>
+ <entry><literal>2</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>subarray(int[], int start, int len)</function></entry>
+ <entry><type>int[]</type></entry>
+ <entry>portion of array starting at position <parameter>start</>, len <parameter>elements</></entry>
+ <entry><literal>subarray('{1,2,3,2,1}'::int[], 2, 3)</literal></entry>
+ <entry><literal>{2,3,2}</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>subarray(int[], int start)</function></entry>
+ <entry><type>int[]</type></entry>
+ <entry>portion of array starting at position <parameter>start</></entry>
+ <entry><literal>subarray('{1,2,3,2,1}'::int[], 2)</literal></entry>
+ <entry><literal>{2,3,2,1}</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>intset(int)</function></entry>
+ <entry><type>int[]</type></entry>
+ <entry>make single-element array</entry>
+ <entry><literal>intset(42)</literal></entry>
+ <entry><literal>{42}</literal></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ <table id="intarray-op-table">
+ <title><filename>intarray</> Operators</title>
+
+ <tgroup cols="3">
<thead>
<row>
<entry>Operator</entry>
+ <entry>Returns</entry>
<entry>Description</entry>
</row>
</thead>
+
<tbody>
<row>
<entry><literal>int[] && int[]</literal></entry>
- <entry>overlap - returns TRUE if arrays have at least one common element</entry>
+ <entry><type>boolean</type></entry>
+ <entry>overlap — <literal>true</> if arrays have at least one common element</entry>
</row>
<row>
<entry><literal>int[] @> int[]</literal></entry>
- <entry>contains - returns TRUE if left array contains right array</entry>
+ <entry><type>boolean</type></entry>
+ <entry>contains — <literal>true</> if left array contains right array</entry>
</row>
<row>
<entry><literal>int[] <@ int[]</literal></entry>
- <entry>contained - returns TRUE if left array is contained in right array</entry>
+ <entry><type>boolean</type></entry>
+ <entry>contained — <literal>true</> if left array is contained in right array</entry>
</row>
<row>
<entry><literal># int[]</literal></entry>
- <entry>returns the number of elements in array</entry>
+ <entry><type>int</type></entry>
+ <entry>number of elements in array</entry>
+ </row>
+ <row>
+ <entry><literal>int[] # int</literal></entry>
+ <entry><type>int</type></entry>
+ <entry>index (same as <function>idx</> function)</entry>
</row>
<row>
<entry><literal>int[] + int</literal></entry>
- <entry>push element to array ( add to end of array)</entry>
+ <entry><type>int[]</type></entry>
+ <entry>push element onto array (add it to end of array)</entry>
</row>
<row>
<entry><literal>int[] + int[] </literal></entry>
- <entry>merge of arrays (right array added to the end of left one)</entry>
+ <entry><type>int[]</type></entry>
+ <entry>array concatenation (right array added to the end of left one)</entry>
</row>
<row>
<entry><literal>int[] - int</literal></entry>
- <entry>remove entries matched by right argument from array</entry>
+ <entry><type>int[]</type></entry>
+ <entry>remove entries matching right argument from array</entry>
</row>
<row>
<entry><literal>int[] - int[]</literal></entry>
- <entry>remove right array from left</entry>
+ <entry><type>int[]</type></entry>
+ <entry>remove elements of right array from left</entry>
</row>
<row>
<entry><literal>int[] | int</literal></entry>
- <entry>returns intarray - union of arguments</entry>
+ <entry><type>int[]</type></entry>
+ <entry>union of arguments</entry>
</row>
<row>
<entry><literal>int[] | int[]</literal></entry>
- <entry>returns intarray as a union of two arrays</entry>
+ <entry><type>int[]</type></entry>
+ <entry>union of arrays</entry>
</row>
-
<row>
<entry><literal>int[] & int[]</literal></entry>
- <entry>returns intersection of arrays</entry>
+ <entry><type>int[]</type></entry>
+ <entry>intersection of arrays</entry>
</row>
-
<row>
<entry><literal>int[] @@ query_int</literal></entry>
- <entry>
- returns TRUE if array satisfies query (like
- <literal>'1&(2|3)'</literal>)
- </entry>
+ <entry><type>boolean</type></entry>
+ <entry><literal>true</> if array satisfies query (see below)</entry>
</row>
-
<row>
<entry><literal>query_int ~~ int[]</literal></entry>
- <entry>returns TRUE if array satisfies query (commutator of @@)</entry>
+ <entry><type>boolean</type></entry>
+ <entry><literal>true</> if array satisfies query (commutator of <literal>@@</>)</entry>
</row>
</tbody>
</tgroup>
</table>
+
<para>
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
+
+ <para>
+ The <literal>@@</> and <literal>~~</> operators test whether an array
+ satisfies a <firstterm>query</>, which is expressed as a value of a
+ specialized data type <type>query_int</>. A <firstterm>query</>
+ consists of integer values that are checked against the elements of
+ the array, possibly combined using the operators <literal>&</>
+ (AND), <literal>|</> (OR), and <literal>!</> (NOT). Parentheses
+ can be used as needed. For example,
+ the query <literal>1&(2|3)</> matches arrays that contain 1
+ and also contain either 2 or 3.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Index Support</title>
+
+ <para>
+ <filename>intarray</> provides index support for the
+ <literal>&&</>, <literal>@></>, <literal><@</>,
+ and <literal>@@</> operators, as well as regular array equality.
+ The implementation uses an RD-tree data structure with
+ built-in lossy compression.
+ </para>
+
+ <para>
+ Two GiST index operator classes are provided:
+ <literal>gist__int_ops</> (used by default) is suitable for
+ small and medium-size arrays, while
+ <literal>gist__intbig_ops</> uses a larger signature and is more
+ suitable for indexing large arrays.
+ </para>
+
+ <para>
+ There is also a non-default GIN operator class
+ <literal>gin__int_ops</>.
+ </para>
+
+ <para>
+ The choice between GiST and GIN indexing depends on the relative
+ performance characteristics of GiST and GIN, which are discussed elsewhere.
+ As a rule of thumb, a GIN index is faster to search than a GiST index, but
+ slower to build or update; so GIN is better suited for static data and GiST
+ for often-updated data.
+ </para>
</sect2>
<sect2>
<title>Example</title>
<programlisting>
-CREATE TABLE message (mid INT NOT NULL,sections INT[]);
-CREATE TABLE message_section_map (mid INT NOT NULL,sid INT NOT NULL);
+-- a message can be in one or more <quote>sections</>
+CREATE TABLE message (mid INT PRIMARY KEY, sections INT[], ...);
--- create indices
-CREATE unique index message_key ON message ( mid );
-CREATE unique index message_section_map_key2 ON message_section_map (sid, mid );
-CREATE INDEX message_rdtree_idx ON message USING GIST ( sections gist__int_ops);
+-- create specialized index
+CREATE INDEX message_rdtree_idx ON message USING GIST (sections gist__int_ops);
--- select some messages with section in 1 OR 2 - OVERLAP operator
-SELECT message.mid FROM message WHERE message.sections && '{1,2}';
+-- select messages in section 1 OR 2 - OVERLAP operator
+SELECT message.mid FROM message WHERE message.sections && '{1,2}';
--- select messages contains in sections 1 AND 2 - CONTAINS operator
+-- select messages in sections 1 AND 2 - CONTAINS operator
SELECT message.mid FROM message WHERE message.sections @> '{1,2}';
--- the same, CONTAINED operator
-SELECT message.mid FROM message WHERE '{1,2}' <@ message.sections;
+
+-- the same, using QUERY operator
+SELECT message.mid FROM message WHERE message.sections @@ '1&2'::query_int;
</programlisting>
</sect2>
<sect2>
<title>Benchmark</title>
+
<para>
- subdirectory bench contains benchmark suite.
+ The source directory <filename>contrib/intarray/bench</> contains a
+ benchmark test suite. To run:
</para>
+
<programlisting>
- cd ./bench
- 1. createdb TEST
- 2. psql TEST < ../_int.sql
- 3. ./create_test.pl | psql TEST
- 4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries
- with/without RD-Tree. Run script without arguments to
- see availbale options.
-
- a)test without RD-Tree (OR)
- ./bench.pl -d TEST -c -s 1,2 -v
- b)test with RD-Tree
- ./bench.pl -d TEST -c -s 1,2 -v -r
-
- BENCHMARKS:
-
- Size of table <message>: 200000
- Size of table <message_section_map>: 269133
-
- Distribution of messages by sections:
-
- section 0: 74377 messages
- section 1: 16284 messages
- section 50: 1229 messages
- section 99: 683 messages
-
- old - without RD-Tree support,
- new - with RD-Tree
-
- +----------+---------------+----------------+
- |Search set|OR, time in sec|AND, time in sec|
- | +-------+-------+--------+-------+
- | | old | new | old | new |
- +----------+-------+-------+--------+-------+
- | 1| 0.625| 0.101| -| -|
- +----------+-------+-------+--------+-------+
- | 99| 0.018| 0.017| -| -|
- +----------+-------+-------+--------+-------+
- | 1,2| 0.766| 0.133| 0.628| 0.045|
- +----------+-------+-------+--------+-------+
- | 1,2,50,65| 0.794| 0.141| 0.030| 0.006|
- +----------+-------+-------+--------+-------+
+ cd .../bench
+ createdb TEST
+ psql TEST < ../_int.sql
+ ./create_test.pl | psql TEST
+ ./bench.pl
</programlisting>
+
+ <para>
+ The <filename>bench.pl</> script has numerous options, which
+ are displayed when it is run without any arguments.
+ </para>
</sect2>
<sect2>
<title>Authors</title>
+
<para>
- All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and Oleg
- Bartunov (<email>oleg@sai.msu.su</email>). See
+ All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and
+ Oleg Bartunov (<email>oleg@sai.msu.su</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for
additional information. Andrey Oktyabrski did a great work on adding new
functions and operations.
</sect2>
</sect1>
-
+<!-- $PostgreSQL: pgsql/doc/src/sgml/isn.sgml,v 1.4 2007/12/10 05:32:51 tgl Exp $ -->
+
<sect1 id="isn">
<title>isn</title>
-
+
<indexterm zone="isn">
<primary>isn</primary>
</indexterm>
<para>
- The <literal>isn</literal> module adds data types for the following
- international-standard namespaces: EAN13, UPC, ISBN (books), ISMN (music),
- and ISSN (serials). This module is inspired by Garrett A. Wollman's
- isbn_issn code.
- </para>
- <para>
- This module validates, and automatically adds the correct
- hyphenations to the numbers. Also, it supports the new ISBN-13
- numbers to be used starting in January 2007.
+ The <filename>isn</filename> module provides data types for the following
+ international product numbering standards: EAN13, UPC, ISBN (books), ISMN
+ (music), and ISSN (serials). Numbers are validated on input, and correctly
+ hyphenated on output.
</para>
- <para>
- Premises:
- </para>
-
- <orderedlist>
- <listitem>
- <para>ISBN13, ISMN13, ISSN13 numbers are all EAN13 numbers</para>
- </listitem>
- <listitem>
- <para>EAN13 numbers aren't always ISBN13, ISMN13 or ISSN13 (some are)</para>
- </listitem>
- <listitem>
- <para>some ISBN13 numbers can be displayed as ISBN</para>
- </listitem>
- <listitem>
- <para>some ISMN13 numbers can be displayed as ISMN</para>
- </listitem>
- <listitem>
- <para>some ISSN13 numbers can be displayed as ISSN</para>
- </listitem>
- <listitem>
- <para>all UPC, ISBN, ISMN and ISSN can be represented as EAN13 numbers</para>
- </listitem>
- </orderedlist>
-
- <note>
- <para>
- All types are internally represented as 64 bit integers,
- and internally all are consistently interchangeable.
- </para>
- </note>
- <note>
- <para>
- We have two operator classes (for btree and for hash) so each data type
- can be indexed for faster access.
- </para>
- </note>
-
<sect2>
<title>Data types</title>
-
- <para>
- We have the following data types:
- </para>
<table>
- <title>Data types</title>
+ <title><filename>isn</filename> data types</title>
<tgroup cols="2">
<thead>
<row>
- <entry><para>Data type</para></entry>
- <entry><para>Description</para></entry>
+ <entry>Data type</entry>
+ <entry>Description</entry>
</row>
</thead>
+
<tbody>
<row>
- <entry><para><literal>EAN13</literal></para></entry>
+ <entry><type>EAN13</type></entry>
<entry>
- <para>
- European Article Numbers. This type will always show the EAN13-display
- format. Te output function for this is <literal>ean13_out()</literal>
- </para>
+ European Article Numbers, always displayed in the EAN13 display format
</entry>
</row>
<row>
- <entry><para><literal>ISBN13</literal></para></entry>
+ <entry><type>ISBN13</type></entry>
<entry>
- <para>
- For International Standard Book Numbers to be displayed in
- the new EAN13-display format.
- </para>
+ International Standard Book Numbers to be displayed in
+ the new EAN13 display format
</entry>
</row>
<row>
- <entry><para><literal>ISMN13</literal></para></entry>
+ <entry><type>ISMN13</type></entry>
<entry>
- <para>
- For International Standard Music Numbers to be displayed in
- the new EAN13-display format.
- </para>
+ International Standard Music Numbers to be displayed in
+ the new EAN13 display format
</entry>
</row>
<row>
- <entry><para><literal>ISSN13</literal></para></entry>
+ <entry><type>ISSN13</type></entry>
<entry>
- <para>
- For International Standard Serial Numbers to be displayed in the new
- EAN13-display format.
- </para>
+ International Standard Serial Numbers to be displayed in the new
+ EAN13 display format
</entry>
</row>
<row>
- <entry><para><literal>ISBN</literal></para></entry>
+ <entry><type>ISBN</type></entry>
<entry>
- <para>
- For International Standard Book Numbers to be displayed in the current
- short-display format.
- </para>
+ International Standard Book Numbers to be displayed in the old
+ short display format
</entry>
</row>
<row>
- <entry><para><literal>ISMN</literal></para></entry>
+ <entry><type>ISMN</type></entry>
<entry>
- <para>
- For International Standard Music Numbers to be displayed in the
- current short-display format.
- </para>
+ International Standard Music Numbers to be displayed in the
+ old short display format
</entry>
</row>
<row>
- <entry><para><literal>ISSN</literal></para></entry>
+ <entry><type>ISSN</type></entry>
<entry>
- <para>
- For International Standard Serial Numbers to be displayed in the
- current short-display format. These types will display the short
- version of the ISxN (ISxN 10) whenever it's possible, and it will
- show ISxN 13 when it's impossible to show the short version. The
- output function to do this is <literal>isn_out()</literal>
- </para>
+ International Standard Serial Numbers to be displayed in the
+ old short display format
</entry>
</row>
<row>
- <entry><para><literal>UPC</literal></para></entry>
+ <entry><type>UPC</type></entry>
<entry>
- <para>
- For Universal Product Codes. UPC numbers are a subset of the EAN13
- numbers (they are basically EAN13 without the first '0' digit.)
- The output function to do this is also <literal>isn_out()</literal>
- </para>
+ Universal Product Codes
</entry>
</row>
</tbody>
</tgroup>
</table>
- <note>
- <para>
- <literal>EAN13</literal>, <literal>ISBN13</literal>,
- <literal>ISMN13</literal> and <literal>ISSN13</literal> types will always
- display the long version of the ISxN (EAN13). The output function to do
- this is <literal>ean13_out()</literal>.
- </para>
- <para>
- The need for these types is just for displaying in different ways the same
- data: <literal>ISBN13</literal> is actually the same as
- <literal>ISBN</literal>, <literal>ISMN13=ISMN</literal> and
- <literal>ISSN13=ISSN</literal>.
- </para>
- </note>
- </sect2>
-
- <sect2>
- <title>Input functions</title>
-
<para>
- We have the following input functions:
+ Some notes:
</para>
- <table>
- <title>Input functions</title>
- <tgroup cols="2">
- <thead>
- <row>
- <entry>Function</entry>
- <entry>Description</entry>
- </row>
- </thead>
- <tbody>
- <row>
- <entry><para><literal>ean13_in()</literal></para></entry>
- <entry>
- <para>
- To take a string and return an EAN13.
- </para>
- </entry>
- </row>
-
- <row>
- <entry><para><literal>isbn_in()</literal></para></entry>
- <entry>
- <para>
- To take a string and return valid ISBN or ISBN13 numbers.
- </para>
- </entry>
- </row>
-
- <row>
- <entry><para><literal>ismn_in()</literal></para></entry>
- <entry>
- <para>
- To take a string and return valid ISMN or ISMN13 numbers.
- </para>
- </entry>
- </row>
-
- <row>
- <entry><para><literal>issn_in()</literal></para></entry>
- <entry>
- <para>
- To take a string and return valid ISSN or ISSN13 numbers.
- </para>
- </entry>
- </row>
- <row>
- <entry><para><literal>upc_in()</literal></para></entry>
- <entry>
- <para>
- To take a string and return an UPC codes.
- </para>
- </entry>
- </row>
- </tbody>
- </tgroup>
- </table>
- </sect2>
-
- <sect2>
- <title>Casts</title>
-
- <para>
- We are able to cast from:
- </para>
- <itemizedlist>
+ <orderedlist>
<listitem>
- <para>
- ISBN13 -> EAN13
- </para>
+ <para>ISBN13, ISMN13, ISSN13 numbers are all EAN13 numbers.</para>
</listitem>
<listitem>
- <para>
- ISMN13 -> EAN13
- </para>
+ <para>EAN13 numbers aren't always ISBN13, ISMN13 or ISSN13 (some
+ are).</para>
</listitem>
<listitem>
- <para>
- ISSN13 -> EAN13
- </para>
+ <para>Some ISBN13 numbers can be displayed as ISBN.</para>
</listitem>
<listitem>
- <para>
- ISBN -> EAN13
- </para>
+ <para>Some ISMN13 numbers can be displayed as ISMN.</para>
</listitem>
<listitem>
- <para>
- ISMN -> EAN13
- </para>
+ <para>Some ISSN13 numbers can be displayed as ISSN.</para>
</listitem>
<listitem>
- <para>
- ISSN -> EAN13
- </para>
+ <para>UPC numbers are a subset of the EAN13 numbers (they are basically
+ EAN13 without the first <literal>0</> digit).</para>
</listitem>
<listitem>
- <para>
- UPC -> EAN13
- </para>
+ <para>All UPC, ISBN, ISMN and ISSN numbers can be represented as EAN13
+ numbers.</para>
</listitem>
+ </orderedlist>
+
+ <para>
+ Internally, all these types use the same representation (a 64-bit
+ integer), and all are interchangeable. Multiple types are provided
+ to control display formatting and to permit tighter validity checking
+ of input that is supposed to denote one particular type of number.
+ </para>
+
+ <para>
+ The <type>ISBN</>, <type>ISMN</>, and <type>ISSN</> types will display the
+ short version of the number (ISxN 10) whenever it's possible, and will show
+ ISxN 13 format for numbers that do not fit in the short version.
+ The <type>EAN13</type>, <type>ISBN13</type>, <type>ISMN13</type> and
+ <type>ISSN13</type> types will always display the long version of the ISxN
+ (EAN13).
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Casts</title>
+
+ <para>
+ The <filename>isn</> module provides the following pairs of type casts:
+ </para>
+
+ <itemizedlist>
<listitem>
<para>
- ISBN <-> ISBN13
+ ISBN13 <=> EAN13
</para>
</listitem>
<listitem>
<para>
- ISMN <-> ISMN13
+ ISMN13 <=> EAN13
</para>
</listitem>
<listitem>
<para>
- ISSN <-> ISSN13
+ ISSN13 <=> EAN13
</para>
</listitem>
- </itemizedlist>
- </sect2>
-
- <sect2>
- <title>C API</title>
- <para>
- The C API is implemented as:
- </para>
- <programlisting>
- extern Datum isn_out(PG_FUNCTION_ARGS);
- extern Datum ean13_out(PG_FUNCTION_ARGS);
- extern Datum ean13_in(PG_FUNCTION_ARGS);
- extern Datum isbn_in(PG_FUNCTION_ARGS);
- extern Datum ismn_in(PG_FUNCTION_ARGS);
- extern Datum issn_in(PG_FUNCTION_ARGS);
- extern Datum upc_in(PG_FUNCTION_ARGS);
- </programlisting>
-
- <para>
- On success:
- </para>
- <itemizedlist>
<listitem>
<para>
- <literal>isn_out()</literal> takes any of our types and returns a string containing
- the shortes possible representation of the number.
+ ISBN <=> EAN13
</para>
</listitem>
<listitem>
<para>
- <literal>ean13_out()</literal> takes any of our types and returns the
- EAN13 (long) representation of the number.
+ ISMN <=> EAN13
</para>
</listitem>
<listitem>
<para>
- <literal>ean13_in()</literal> takes a string and return a EAN13. Which, as stated in (2)
- could or could not be any of our types, but it certainly is an EAN13
- number. Only if the string is a valid EAN13 number, otherwise it fails.
+ ISSN <=> EAN13
</para>
</listitem>
<listitem>
<para>
- <literal>isbn_in()</literal> takes a string and return an ISBN/ISBN13. Only if the string
- is really a ISBN/ISBN13, otherwise it fails.
+ UPC <=> EAN13
</para>
</listitem>
<listitem>
<para>
- <literal>ismn_in()</literal> takes a string and return an ISMN/ISMN13. Only if the string
- is really a ISMN/ISMN13, otherwise it fails.
+ ISBN <=> ISBN13
</para>
</listitem>
<listitem>
<para>
- <literal>issn_in()</literal> takes a string and return an ISSN/ISSN13. Only if the string
- is really a ISSN/ISSN13, otherwise it fails.
+ ISMN <=> ISMN13
</para>
</listitem>
<listitem>
<para>
- <literal>upc_in()</literal> takes a string and return an UPC. Only if the string is
- really a UPC, otherwise it fails.
+ ISSN <=> ISSN13
</para>
</listitem>
</itemizedlist>
<para>
- (on failure, the functions 'ereport' the error)
+ When casting from <type>EAN13</> to another type, there is a run-time
+ check that the value is within the domain of the other type, and an error
+ is thrown if not. The other casts are simply relabelings that will
+ always succeed.
</para>
</sect2>
<sect2>
- <title>Testing functions</title>
+ <title>Functions and Operators</title>
+
+ <para>
+ The <filename>isn</> module provides the standard comparison operators,
+ plus btree and hash indexing support for all these datatypes. In
+ addition there are several specialized functions. In this table,
+ <type>isn</> means any one of the module's data types:
+ </para>
+
<table>
- <title>Testing functions</title>
- <tgroup cols="2">
+ <title><filename>isn</> functions</title>
+ <tgroup cols="3">
<thead>
<row>
- <entry><para>Function</para></entry>
- <entry><para>Description</para></entry>
+ <entry>Function</entry>
+ <entry>Returns</entry>
+ <entry>Description</entry>
</row>
</thead>
+
<tbody>
<row>
- <entry><para><literal>isn_weak(boolean)</literal></para></entry>
- <entry><para>Sets the weak input mode.</para></entry>
+ <entry><function>isn_weak(boolean)</function></entry>
+ <entry><type>boolean</type></entry>
+ <entry>Sets the weak input mode (returns new setting)</entry>
</row>
<row>
- <entry><para><literal>isn_weak()</literal></para></entry>
- <entry><para>Gets the current status of the weak mode.</para></entry>
+ <entry><function>isn_weak()</function></entry>
+ <entry><type>boolean</type></entry>
+ <entry>Gets the current status of the weak mode</entry>
</row>
<row>
- <entry><para><literal>make_valid()</literal></para></entry>
- <entry><para>Validates an invalid number (deleting the invalid flag).</para></entry>
+ <entry><function>make_valid(isn)</function></entry>
+ <entry><type>isn</type></entry>
+ <entry>Validates an invalid number (clears the invalid flag)</entry>
</row>
<row>
- <entry><para><literal>is_valid()</literal></para></entry>
- <entry><para>Checks for the invalid flag prsence.</para></entry>
+ <entry><function>is_valid(isn)</function></entry>
+ <entry><type>boolean</type></entry>
+ <entry>Checks for the presence of the invalid flag</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
- <literal>Weak</literal> mode is used to be able to insert invalid data to
- a table. Invalid as in the check digit being wrong, not missing numbers.
+ <firstterm>Weak</firstterm> mode is used to be able to insert invalid data
+ into a table. Invalid means the check digit is wrong, not that there are
+ missing numbers.
</para>
+
<para>
Why would you want to use the weak mode? Well, it could be that
you have a huge collection of ISBN numbers, and that there are so many of
them that for weird reasons some have the wrong check digit (perhaps the
- numbers where scanned from a printed list and the OCR got the numbers wrong,
- perhaps the numbers were manually captured... who knows.) Anyway, the thing
- is you might want to clean the mess up, but you still want to be able to have
- all the numbers in your database and maybe use an external tool to access
- the invalid numbers in the database so you can verify the information and
- validate it more easily; as selecting all the invalid numbers in the table.
+ numbers were scanned from a printed list and the OCR got the numbers wrong,
+ perhaps the numbers were manually captured... who knows). Anyway, the point
+ is you might want to clean the mess up, but you still want to be able to
+ have all the numbers in your database and maybe use an external tool to
+ locate the invalid numbers in the database so you can verify the
+ information and validate it more easily; so for example you'd want to
+ select all the invalid numbers in the table.
</para>
+
<para>
When you insert invalid numbers in a table using the weak mode, the number
- will be inserted with the corrected check digit, but it will be flagged
- with an exclamation mark ('!') at the end (i.e. 0-11-000322-5!)
+ will be inserted with the corrected check digit, but it will be displayed
+ with an exclamation mark (<literal>!</>) at the end, for example
+ <literal>0-11-000322-5!</>. This invalid marker can be checked with
+ the <function>is_valid</> function and cleared with the
+ <function>make_valid</> function.
</para>
+
+ <para>
+ You can also force the insertion of invalid numbers even when not in the
+ weak mode, by appending the <literal>!</> character at the end of the
+ number.
+ </para>
+
<para>
- You can also force the insertion of invalid numbers even not in the weak mode,
- appending the '!' character at the end of the number.
+ Another special feature is that during input, you can write
+ <literal>?</> in place of the check digit, and the correct check digit
+ will be inserted automatically.
</para>
</sect2>
<sect2>
<title>Examples</title>
+
<programlisting>
--Using the types directly:
SELECT isbn('978-0-393-04002-9');
SELECT issn('1436-4522');
--Casting types:
--- note that you can only cast from ean13 to other type when the casted
--- number would be valid in the realm of the casted type;
--- thus, the following will NOT work: select isbn(ean13('0220356483481'));
--- but these will:
-SELECT upc(ean13('0220356483481'));
-SELECT ean13(upc('220356483481'));
+-- note that you can only cast from ean13 to another type when the
+-- number would be valid in the realm of the target type;
+-- thus, the following will NOT work: select isbn(ean13('0220356483481'));
+-- but these will:
+SELECT upc(ean13('0220356483481'));
+SELECT ean13(upc('220356483481'));
--Create a table with a single column to hold ISBN numbers:
-CREATE TABLE test ( id isbn );
+CREATE TABLE test (id isbn);
INSERT INTO test VALUES('9780393040029');
---Automatically calculating check digits (observe the '?'):
+--Automatically calculate check digits (observe the '?'):
INSERT INTO test VALUES('220500896?');
INSERT INTO test VALUES('978055215372?');
INSERT INTO test VALUES('9780141219307');
INSERT INTO test VALUES('2-205-00876-X');
SELECT isn_weak(false);
-
+
SELECT id FROM test WHERE NOT is_valid(id);
-UPDATE test SET id=make_valid(id) WHERE id = '2-205-00876-X!';
-
+UPDATE test SET id = make_valid(id) WHERE id = '2-205-00876-X!';
+
SELECT * FROM test;
SELECT isbn13(id) FROM test;
<sect2>
<title>Bibliography</title>
+
<para>
- The information to implement this module was collected through
+ The information to implement this module was collected from
several sites, including:
</para>
<programlisting>
http://www.ismn-international.org/
http://www.wikipedia.org/
</programlisting>
+
<para>
- the prefixes used for hyphenation where also compiled from:
+ The prefixes used for hyphenation were also compiled from:
</para>
<programlisting>
http://www.gs1.org/productssolutions/idkeys/support/prefix_list.html
http://www.isbn-international.org/en/identifiers.html
http://www.ismn-international.org/ranges.html
</programlisting>
+
<para>
Care was taken during the creation of the algorithms and they
were meticulously verified against the suggested algorithms
in the official ISBN, ISMN, ISSN User Manuals.
</para>
</sect2>
-
+
<sect2>
<title>Author</title>
<para>
Germán Méndez Bravo (Kronuz), 2004 - 2006
</para>
+
+ <para>
+ This module was inspired by Garrett A. Wollman's
+ isbn_issn code.
+ </para>
</sect2>
-</sect1>
+</sect1>
+<!-- $PostgreSQL: pgsql/doc/src/sgml/ltree.sgml,v 1.3 2007/12/10 05:32:51 tgl Exp $ -->
<sect1 id="ltree">
<title>ltree</title>
-
+
<indexterm zone="ltree">
<primary>ltree</primary>
</indexterm>
<para>
- <literal>ltree</literal> is a PostgreSQL module that contains implementation
- of data types, indexed access methods and queries for data organized as a
- tree-like structures.
+ This module implements a data type <type>ltree</> for representing
+ labels of data stored in a hierarchical tree-like structure.
+ Extensive facilities for searching through label trees are provided.
</para>
<sect2>
<title>Definitions</title>
+
<para>
- A <firstterm>label</firstterm> of a node is a sequence of one or more words
- separated by blank character '_' and containing letters and digits ( for
- example, [a-zA-Z0-9] for C locale). The length of a label is limited by 256
- bytes.
+ A <firstterm>label</firstterm> is a sequence of alphanumeric characters
+ and underscores (for example, in C locale the characters
+ <literal>A-Za-z0-9_</> are allowed). Labels must be less than 256 bytes
+ long.
</para>
+
<para>
- Example: 'Countries', 'Personal_Services'
+ Examples: <literal>42</>, <literal>Personal_Services</>
</para>
+
<para>
- A <firstterm>label path</firstterm> of a node is a sequence of one or more
- dot-separated labels l1.l2...ln, represents path from root to the node. The
- length of a label path is limited by 65Kb, but size <= 2Kb is preferrable.
- We consider it's not a strict limitation (maximal size of label path for
- DMOZ catalogue - <ulink url="http://www.dmoz.org"></ulink>, is about 240
- bytes!)
+ A <firstterm>label path</firstterm> is a sequence of zero or more
+ labels separated by dots, for example <literal>L1.L2.L3</>, representing
+ a path from the root of a hierarchical tree to a particular node. The
+ length of a label path must be less than 65Kb, but keeping it under 2Kb is
+ preferable. In practice this is not a major limitation; for example,
+ the longest label path in the DMOZ catalogue (<ulink
+ url="http://www.dmoz.org"></ulink>) is about 240 bytes.
</para>
+
<para>
- Example: <literal>'Top.Countries.Europe.Russia'</literal>
+ Example: <literal>Top.Countries.Europe.Russia</literal>
</para>
+
<para>
- We introduce several datatypes:
+ The <filename>ltree</> module provides several datatypes:
</para>
+
<itemizedlist>
<listitem>
<para>
- <literal>ltree</literal> - is a datatype for label path.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree[]</literal> - is a datatype for arrays of ltree.
+ <type>ltree</type> stores a label path.
</para>
</listitem>
+
<listitem>
<para>
- <literal>lquery</literal>
- - is a path expression that has regular expression in the label path and
- used for ltree matching. Star symbol (*) is used to specify any number of
- labels (levels) and could be used at the beginning and the end of lquery,
- for example, '*.Europe.*'.
+ <type>lquery</type> represents a regular-expression-like pattern
+ for matching <type>ltree</> values. A simple word matches that
+ label within a path. A star symbol (<literal>*</>) matches zero
+ or more labels. For example:
+ <programlisting>
+foo <lineannotation>Match the exact label path <literal>foo</></lineannotation>
+*.foo.* <lineannotation>Match any label path containing the label <literal>foo</></lineannotation>
+*.foo <lineannotation>Match any label path whose last label is <literal>foo</></lineannotation>
+ </programlisting>
</para>
+
<para>
- The following quantifiers are recognized for '*' (like in Perl):
+ Star symbols can also be quantified to restrict how many labels
+ they can match:
+ <programlisting>
+*{<replaceable>n</>} <lineannotation>Match exactly <replaceable>n</> labels</lineannotation>
+*{<replaceable>n</>,} <lineannotation>Match at least <replaceable>n</> labels</lineannotation>
+*{<replaceable>n</>,<replaceable>m</>} <lineannotation>Match at least <replaceable>n</> but not more than <replaceable>m</> labels</lineannotation>
+*{,<replaceable>m</>} <lineannotation>Match at most <replaceable>m</> labels — same as </lineannotation> *{0,<replaceable>m</>}
+ </programlisting>
</para>
- <itemizedlist>
- <listitem>
- <para>{n} Match exactly n levels</para>
- </listitem>
- <listitem>
- <para>{n,} Match at least n levels</para>
- </listitem>
- <listitem>
- <para>{n,m} Match at least n but not more than m levels</para>
- </listitem>
- <listitem>
- <para>{,m} Match at maximum m levels (eq. to {0,m})</para>
- </listitem>
- </itemizedlist>
+
<para>
- It is possible to use several modifiers at the end of a label:
+ There are several modifiers that can be put at the end of a non-star
+ label in <type>lquery</> to make it match more than just the exact match:
+ <programlisting>
+@ <lineannotation>Match case-insensitively, for example <literal>a@</> matches <literal>A</></lineannotation>
+* <lineannotation>Match any label with this prefix, for example <literal>foo*</> matches <literal>foobar</></lineannotation>
+% <lineannotation>Match initial underscore-separated words</lineannotation>
+ </programlisting>
+ The behavior of <literal>%</> is a bit complicated. It tries to match
+ words rather than the entire label. For example
+ <literal>foo_bar%</> matches <literal>foo_bar_baz</> but not
+ <literal>foo_barbaz</>. If combined with <literal>*</>, prefix
+ matching applies to each word separately, for example
+ <literal>foo_bar%*</> matches <literal>foo1_bar2_baz</> but
+ not <literal>foo1_br2_baz</>.
</para>
- <itemizedlist>
- <listitem>
- <para>@ Do case-insensitive label matching</para>
- </listitem>
- <listitem>
- <para>* Do prefix matching for a label</para>
- </listitem>
- <listitem>
- <para>% Don't account word separator '_' in label matching, that is
- 'Russian%' would match 'Russian_nations', but not 'Russian'
- </para>
- </listitem>
- </itemizedlist>
-
+
<para>
- <literal>lquery</literal> can contain logical '!' (NOT) at the beginning
- of the label and '|' (OR) to specify possible alternatives for label
- matching.
- </para>
- <para>
- Example of <literal>lquery</literal>:
+ Also, you can write several possibly-modified labels separated with
+ <literal>|</> (OR) to match any of those labels, and you can put
+ <literal>!</> (NOT) at the start to match any label that doesn't
+ match any of the alternatives.
</para>
- <programlisting>
+
+ <para>
+ Here's an annotated example of <type>lquery</type>:
+ <programlisting>
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
- a) b) c) d) e)
+ a. b. c. d. e.
</programlisting>
- <para>
- A label path should
+ This query will match any label path that:
</para>
<orderedlist numeration='loweralpha'>
<listitem>
<para>
- begin from a node with label 'Top'
+ begins with the label <literal>Top</literal>
</para>
</listitem>
<listitem>
<para>
- and following zero or 2 labels until
+ and next has zero to two labels before
</para>
</listitem>
<listitem>
<para>
- a node with label beginning from case-insensitive prefix 'sport'
+ a label beginning with the case-insensitive prefix <literal>sport</literal>
</para>
</listitem>
<listitem>
<para>
- following node with label not matched 'football' or 'tennis' and
+ then a label not matching <literal>football</literal> nor
+ <literal>tennis</literal>
</para>
</listitem>
<listitem>
<para>
- end on node with label beginning from 'Russ' or strictly matched
- 'Spain'.
+ and then ends with a label beginning with <literal>Russ</literal> or
+ exactly matching <literal>Spain</literal>.
</para>
</listitem>
</orderedlist>
-
</listitem>
<listitem>
- <para><literal>ltxtquery</literal>
- - is a datatype for label searching (like type 'query' for full text
- searching, see contrib/tsearch). It's possible to use modifiers @,%,* at
- the end of word. The meaning of modifiers are the same as for lquery.
- </para>
- <para>
- Example: <literal>'Europe & Russia*@ & !Transportation'</literal>
+ <para><type>ltxtquery</type> represents a full-text-search-like
+ pattern for matching <type>ltree</> values. An
+ <type>ltxtquery</type> value contains words, possibly with the
+ modifiers <literal>@</>, <literal>*</>, <literal>%</> at the end;
+ the modifiers have the same meanings as in <type>lquery</>.
+ Words can be combined with <literal>&</> (AND),
+ <literal>|</> (OR), <literal>!</> (NOT), and parentheses.
+ The key difference from
+ <type>lquery</> is that <type>ltxtquery</type> matches words without
+ regard to their position in the label path.
</para>
+
<para>
- Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and
- not 'Transportation'. Notice, the order of words as they appear in label
- path is not important !
+ Here's an example <type>ltxtquery</type>:
+ <programlisting>
+ Europe & Russia*@ & !Transportation
+ </programlisting>
+ This will match paths that contain the label <literal>Europe</literal> and
+ any label beginning with <literal>Russia</literal> (case-insensitive),
+ but not paths containing the label <literal>Transportation</literal>.
+ The location of these words within the path is not important.
+ Also, when <literal>%</> is used, the word can be matched to any
+ underscore-separated word within a label, regardless of position.
</para>
</listitem>
</itemizedlist>
- </sect2>
- <sect2>
- <title>Operations</title>
<para>
- The following operations are defined for type ltree:
+ Note: <type>ltxtquery</> allows whitespace between symbols, but
+ <type>ltree</> and <type>lquery</> do not.
</para>
+ </sect2>
+
+ <sect2>
+ <title>Operators and Functions</title>
- <itemizedlist>
- <listitem>
- <para>
- <literal><,>,<=,>=,=, <></literal>
- - Have their usual meanings. Comparison is doing in the order of direct
- tree traversing, children of a node are sorted lexicographic.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree @> ltree</literal>
- - returns TRUE if left argument is an ancestor of right argument (or
- equal).
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree <@ ltree </literal>
- - returns TRUE if left argument is a descendant of right argument (or
- equal).
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree ~ lquery, lquery ~ ltree</literal>
- - return TRUE if node represented by ltree satisfies lquery.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree ? lquery[], lquery ? ltree[]</literal>
- - return TRUE if node represented by ltree satisfies at least one lquery
- from array.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree @ ltxtquery, ltxtquery @ ltree</literal>
- - return TRUE if node represented by ltree satisfies ltxtquery.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree || ltree, ltree || text, text || ltree</literal>
- - return concatenated ltree.
- </para>
- </listitem>
- </itemizedlist>
-
<para>
- Operations for arrays of ltree (<literal>ltree[]</literal>):
+ Type <type>ltree</> has the usual comparison operators
+ <literal>=</>, <literal><></literal>,
+ <literal><</>, <literal>></>, <literal><=</>, <literal>>=</>.
+ Comparison sorts in the order of a tree traversal, with the children
+ of a node sorted by label text. In addition, there are the following
+ specialized operators:
</para>
- <itemizedlist>
- <listitem>
- <para>
- <literal>ltree[] @> ltree, ltree <@ ltree[]</literal>
- - returns TRUE if array ltree[] contains an ancestor of ltree.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree @> ltree[], ltree[] <@ ltree</literal>
- - returns TRUE if array ltree[] contains a descendant of ltree.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree[] ~ lquery, lquery ~ ltree[]</literal>
- - returns TRUE if array ltree[] contains label paths matched lquery.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree[] ? lquery[], lquery[] ? ltree[]</literal>
- - returns TRUE if array ltree[] contains label paths matched atleaset one
- lquery from array.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree[] @ ltxtquery, ltxtquery @ ltree[]</literal>
- - returns TRUE if array ltree[] contains label paths matched ltxtquery
- (full text search).
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery</literal>
- - returns first element of array ltree[] satisfies corresponding condition
- and NULL in vice versa.
- </para>
- </listitem>
- </itemizedlist>
- </sect2>
+ <table id="ltree-op-table">
+ <title><type>ltree</> Operators</title>
- <sect2>
- <title>Remark</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Operator</entry>
+ <entry>Returns</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><type>ltree</> <literal>@></> <type>ltree</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>is left argument an ancestor of right (or equal)?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree</> <literal><@</> <type>ltree</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>is left argument a descendant of right (or equal)?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree</> <literal>~</> <type>lquery</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does <type>ltree</> match <type>lquery</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>lquery</> <literal>~</> <type>ltree</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does <type>ltree</> match <type>lquery</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree</> <literal>?</> <type>lquery[]</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does <type>ltree</> match any <type>lquery</> in array?</entry>
+ </row>
+
+ <row>
+ <entry><type>lquery[]</> <literal>?</> <type>ltree</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does <type>ltree</> match any <type>lquery</> in array?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree</> <literal>@</> <type>ltxtquery</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does <type>ltree</> match <type>ltxtquery</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltxtquery</> <literal>@</> <type>ltree</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does <type>ltree</> match <type>ltxtquery</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree</> <literal>||</> <type>ltree</></entry>
+ <entry><type>ltree</type></entry>
+ <entry>concatenate <type>ltree</> paths</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree</> <literal>||</> <type>text</></entry>
+ <entry><type>ltree</type></entry>
+ <entry>convert text to <type>ltree</> and concatenate</entry>
+ </row>
+
+ <row>
+ <entry><type>text</> <literal>||</> <type>ltree</></entry>
+ <entry><type>ltree</type></entry>
+ <entry>convert text to <type>ltree</> and concatenate</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree[]</> <literal>@></> <type>ltree</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does array contain an ancestor of <type>ltree</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree</> <literal><@</> <type>ltree[]</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does array contain an ancestor of <type>ltree</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree[]</> <literal><@</> <type>ltree</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does array contain a descendant of <type>ltree</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree</> <literal>@></> <type>ltree[]</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does array contain a descendant of <type>ltree</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree[]</> <literal>~</> <type>lquery</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does array contain any path matching <type>lquery</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>lquery</> <literal>~</> <type>ltree[]</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does array contain any path matching <type>lquery</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree[]</> <literal>?</> <type>lquery[]</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does <type>ltree</> array contain any path matching any <type>lquery</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>lquery[]</> <literal>?</> <type>ltree[]</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does <type>ltree</> array contain any path matching any <type>lquery</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree[]</> <literal>@</> <type>ltxtquery</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does array contain any path matching <type>ltxtquery</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltxtquery</> <literal>@</> <type>ltree[]</></entry>
+ <entry><type>boolean</type></entry>
+ <entry>does array contain any path matching <type>ltxtquery</>?</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree[]</> <literal>?@></> <type>ltree</></entry>
+ <entry><type>ltree</type></entry>
+ <entry>first array entry that is an ancestor of <type>ltree</>; NULL if none</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree[]</> <literal>?<@</> <type>ltree</></entry>
+ <entry><type>ltree</type></entry>
+ <entry>first array entry that is a descendant of <type>ltree</>; NULL if none</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree[]</> <literal>?~</> <type>lquery</></entry>
+ <entry><type>ltree</type></entry>
+ <entry>first array entry that matches <type>lquery</>; NULL if none</entry>
+ </row>
+
+ <row>
+ <entry><type>ltree[]</> <literal>?@</> <type>ltxtquery</></entry>
+ <entry><type>ltree</type></entry>
+ <entry>first array entry that matches <type>ltxtquery</>; NULL if none</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
<para>
- Operations <literal><@</literal>, <literal>@></literal>, <literal>@</literal> and
- <literal>~</literal> have analogues - <literal>^<@, ^@>, ^@, ^~,</literal> which don't use
- indices!
+ The operators <literal><@</literal>, <literal>@></literal>,
+ <literal>@</literal> and <literal>~</literal> have analogues
+ <literal>^<@</>, <literal>^@></>, <literal>^@</>,
+ <literal>^~</literal>, which are the same except they do not use
+ indexes. These are useful only for testing purposes.
</para>
+
+ <para>
+ The following functions are available:
+ </para>
+
+ <table id="ltree-func-table">
+ <title><type>ltree</> Functions</title>
+
+ <tgroup cols="5">
+ <thead>
+ <row>
+ <entry>Function</entry>
+ <entry>Return Type</entry>
+ <entry>Description</entry>
+ <entry>Example</entry>
+ <entry>Result</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><function>subltree(ltree, int start, int end)</function></entry>
+ <entry><type>ltree</type></entry>
+ <entry>subpath of <type>ltree</> from position <parameter>start</> to
+ position <parameter>end</>-1 (counting from 0)</entry>
+ <entry><literal>subltree('Top.Child1.Child2',1,2)</literal></entry>
+ <entry><literal>Child1</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>subpath(ltree, int offset, int len)</function></entry>
+ <entry><type>ltree</type></entry>
+ <entry>subpath of <type>ltree</> starting at position
+ <parameter>offset</>, length <parameter>len</>.
+ If <parameter>offset</> is negative, subpath starts that far from the
+ end of the path. If <parameter>len</> is negative, leaves that many
+ labels off the end of the path.</entry>
+ <entry><literal>subpath('Top.Child1.Child2',0,2)</literal></entry>
+ <entry><literal>Top.Child1</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>subpath(ltree, int offset)</function></entry>
+ <entry><type>ltree</type></entry>
+ <entry>subpath of <type>ltree</> starting at position
+ <parameter>offset</>, extending to end of path.
+ If <parameter>offset</> is negative, subpath starts that far from the
+ end of the path.</entry>
+ <entry><literal>subpath('Top.Child1.Child2',1)</literal></entry>
+ <entry><literal>Child1.Child2</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>nlevel(ltree)</function></entry>
+ <entry><type>integer</type></entry>
+ <entry>number of labels in path</entry>
+ <entry><literal>nlevel('Top.Child1.Child2')</literal></entry>
+ <entry><literal>3</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>index(ltree a, ltree b)</function></entry>
+ <entry><type>integer</type></entry>
+ <entry>position of first occurrence of <parameter>b</> in
+ <parameter>a</>; -1 if not found</entry>
+ <entry><literal>index('0.1.2.3.5.4.5.6.8.5.6.8','5.6')</literal></entry>
+ <entry><literal>6</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>index(ltree a, ltree b, int offset)</function></entry>
+ <entry><type>integer</type></entry>
+ <entry>position of first occurrence of <parameter>b</> in
+ <parameter>a</>, searching starting at <parameter>offset</>;
+ negative <parameter>offset</> means start <parameter>-offset</>
+ labels from the end of the path</entry>
+ <entry><literal>index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4)</literal></entry>
+ <entry><literal>9</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>text2ltree(text)</function></entry>
+ <entry><type>ltree</type></entry>
+ <entry>cast <type>text</> to <type>ltree</></entry>
+ <entry><literal></literal></entry>
+ <entry><literal></literal></entry>
+ </row>
+
+ <row>
+ <entry><function>ltree2text(ltree)</function></entry>
+ <entry><type>text</type></entry>
+ <entry>cast <type>ltree</> to <type>text</></entry>
+ <entry><literal></literal></entry>
+ <entry><literal></literal></entry>
+ </row>
+
+ <row>
+ <entry><function>lca(ltree, ltree, ...)</function></entry>
+ <entry><type>ltree</type></entry>
+ <entry>lowest common ancestor, i.e., longest common prefix of paths
+ (up to 8 arguments supported)</entry>
+ <entry><literal>lca('1.2.2.3','1.2.3.4.5.6')</literal></entry>
+ <entry><literal>1.2</literal></entry>
+ </row>
+
+ <row>
+ <entry><function>lca(ltree[])</function></entry>
+ <entry><type>ltree</type></entry>
+ <entry>lowest common ancestor, i.e., longest common prefix of paths</entry>
+ <entry><literal>lca(array['1.2.2.3'::ltree,'1.2.3'])</literal></entry>
+ <entry><literal>1.2</literal></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
</sect2>
<sect2>
- <title>Indices</title>
+ <title>Indexes</title>
<para>
- Various indices could be created to speed up execution of operations:
+ <filename>ltree</> supports several types of indexes that can speed
+ up the indicated operators:
</para>
<itemizedlist>
<listitem>
<para>
- B-tree index over ltree: <literal><, <=, =, >=, ></literal>
+ B-tree index over <type>ltree</>:
+ <literal><</>, <literal><=</>, <literal>=</>,
+ <literal>>=</>, <literal>></literal>
</para>
</listitem>
<listitem>
<para>
- GiST index over ltree: <literal><, <=, =, >=, >, @>, <@, @, ~, ?</literal>
+ GiST index over <type>ltree</>:
+ <literal><</>, <literal><=</>, <literal>=</>,
+ <literal>>=</>, <literal>></>,
+ <literal>@></>, <literal><@</>,
+ <literal>@</>, <literal>~</>, <literal>?</literal>
</para>
<para>
- Example:
+ Example of creating such an index:
</para>
<programlisting>
CREATE INDEX path_gist_idx ON test USING GIST (path);
</programlisting>
</listitem>
<listitem>
- <para>GiST index over ltree[]:
- <literal>ltree[]<@ ltree, ltree @> ltree[], @, ~, ?.</literal>
- </para>
<para>
- Example:
+ GiST index over <type>ltree[]</>:
+ <literal>ltree[] <@ ltree</>, <literal>ltree @> ltree[]</>,
+ <literal>@</>, <literal>~</>, <literal>?</literal>
</para>
- <programlisting>
- CREATE INDEX path_gist_idx ON test USING GIST (array_path);
- </programlisting>
<para>
- Notices: This index is lossy.
- </para>
- </listitem>
- </itemizedlist>
- </sect2>
-
- <sect2>
- <title>Functions</title>
-
- <itemizedlist>
- <listitem>
- <para>
- <literal>ltree subltree(ltree, start, end)</literal>
- returns subpath of ltree from start (inclusive) until the end.
+ Example of creating such an index:
</para>
<programlisting>
- # select subltree('Top.Child1.Child2',1,2);
- subltree
- --------
- Child1
+ CREATE INDEX path_gist_idx ON test USING GIST (array_path);
</programlisting>
- </listitem>
- <listitem>
<para>
- <literal>ltree subpath(ltree, OFFSET,LEN)</literal> and
- <literal>ltree subpath(ltree, OFFSET)</literal>
- returns subpath of ltree from OFFSET (inclusive) with length LEN.
- If OFFSET is negative returns subpath starts that far from the end
- of the path. If LENGTH is omitted, returns everything to the end
- of the path. If LENGTH is negative, leaves that many labels off
- the end of the path.
+ Note: This index type is lossy.
</para>
- <programlisting>
- # select subpath('Top.Child1.Child2',1,2);
- subpath
- -------
- Child1.Child2
-
- # select subpath('Top.Child1.Child2',-2,1);
- subpath
- ---------
- Child1
- </programlisting>
- </listitem>
- <listitem>
- <para>
- <literal>int4 nlevel(ltree)</literal> - returns level of the node.
- </para>
- <programlisting>
- # select nlevel('Top.Child1.Child2');
- nlevel
- --------
- 3
- </programlisting>
- <para>
- Note, that arguments start, end, OFFSET, LEN have meaning of level of the
- node !
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>int4 index(ltree,ltree)</literal> and
- <literal>int4 index(ltree,ltree,OFFSET)</literal>
- returns number of level of the first occurence of second argument in first
- one beginning from OFFSET. if OFFSET is negative, than search begins from |
- OFFSET| levels from the end of the path.
- </para>
- <programlisting>
- SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3);
- index
- -------
- 6
- SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4);
- index
- -------
- 9
- </programlisting>
- </listitem>
- <listitem>
- <para>
- <literal>ltree text2ltree(text)</literal> and
- <literal>text ltree2text(text)</literal> cast functions for ltree and text.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>ltree lca(ltree,ltree,...) (up to 8 arguments)</literal> and
- <literal>ltree lca(ltree[])</literal> Returns Lowest Common Ancestor (lca).
- </para>
- <programlisting>
- # select lca('1.2.2.3','1.2.3.4.5.6');
- lca
- -----
- 1.2
- # select lca('{la.2.3,1.2.3.4.5.6}') is null;
- ?column?
- ----------
- f
- </programlisting>
</listitem>
</itemizedlist>
</sect2>
- <sect2>
- <title>Installation</title>
- <programlisting>
- cd contrib/ltree
- make
- make install
- make installcheck
- </programlisting>
- </sect2>
-
<sect2>
<title>Example</title>
+
+ <para>
+ This example uses the following data (also available in file
+ <filename>contrib/ltree/ltreetest.sql</> in the source distribution):
+ </para>
+
<programlisting>
- createdb ltreetest
- psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql
- psql ltreetest < ltreetest.sql
+CREATE TABLE test (path ltree);
+INSERT INTO test VALUES ('Top');
+INSERT INTO test VALUES ('Top.Science');
+INSERT INTO test VALUES ('Top.Science.Astronomy');
+INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
+INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
+INSERT INTO test VALUES ('Top.Hobbies');
+INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
+INSERT INTO test VALUES ('Top.Collections');
+INSERT INTO test VALUES ('Top.Collections.Pictures');
+INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
+INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
+INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
+INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
+CREATE INDEX path_gist_idx ON test USING gist(path);
+CREATE INDEX path_idx ON test USING btree(path);
</programlisting>
<para>
-Now, we have a database ltreetest populated with a data describing hierarchy
-shown below:
+ Now, we have a table <structname>test</> populated with data describing
+ the hierarchy shown below:
</para>
<programlisting>
-
-
- TOP
- / | \
- Science Hobbies Collections
+ Top
+ / | \
+ Science Hobbies Collections
/ | \
Astronomy Amateurs_Astronomy Pictures
/ \ |
/ | \
Galaxies Stars Astronauts
</programlisting>
+
<para>
- Inheritance:
+ We can do inheritance:
</para>
<programlisting>
ltreetest=# select path from test where path <@ 'Top.Science';
- path
+ path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Cosmology
(4 rows)
</programlisting>
+
<para>
- Matching:
+ Here are some examples of path matching:
</para>
+
<programlisting>
ltreetest=# select path from test where path ~ '*.Astronomy.*';
- path
+ path
-----------------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Collections.Pictures.Astronomy.Galaxies
Top.Collections.Pictures.Astronomy.Astronauts
(7 rows)
+
ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*';
- path
+ path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
</programlisting>
+
<para>
- Full text search:
+ Here are some examples of full text search:
</para>
<programlisting>
ltreetest=# select path from test where path @ 'Astro*% & !pictures@';
- path
+ path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
(4 rows)
ltreetest=# select path from test where path @ 'Astro* & !pictures@';
- path
+ path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
</programlisting>
+
<para>
- Using Functions:
+ Path construction using functions:
</para>
<programlisting>
ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy';
- ?column?
-------------------------------------------
- Top.Science.Space.Astronomy
- Top.Science.Space.Astronomy.Astrophysics
- Top.Science.Space.Astronomy.Cosmology
-(3 rows)
-We could create SQL-function:
-CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree
-AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
-LANGUAGE SQL IMMUTABLE;
- </programlisting>
- <para>
- and previous select could be rewritten as:
- </para>
-
- <programlisting>
-ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy';
- ins_label
+ ?column?
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
</programlisting>
<para>
- Or with another arguments:
+ We could simplify this by creating a SQL function that inserts a label
+ at a specified position in a path:
</para>
-
<programlisting>
-CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree
-AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));'
+CREATE FUNCTION ins_label(ltree, int, text) RETURNS ltree
+AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
LANGUAGE SQL IMMUTABLE;
-ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy';
- ins_label
+ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy';
+ ins_label
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
</programlisting>
</sect2>
- <sect2>
- <title>Additional data</title>
- <para>
- To get more feeling from our ltree module you could download
- dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes),
- available from
- <ulink url="http://www.sai.msu.su/~megera/postgres/gist/ltree/"></ulink>
- dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree.
- Setup your test database (dmoz), load ltree module and issue command:
- </para>
- <programlisting>
- zcat dmozltree-eng.sql.gz| psql dmoz
- </programlisting>
- <para>
- Data will be loaded into database dmoz and all indices will be created.
- </para>
- </sect2>
-
- <sect2>
- <title>Benchmarks</title>
- <para>
- All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ
- data, containing 300,274 nodes (see above for download link). We used some
- basic queries typical for walking through catalog.
- </para>
-
- <sect3>
- <title>Queries</title>
- <itemizedlist>
- <listitem>
- <para>
- Q0: Count all rows (sort of base time for comparison)
- </para>
- <programlisting>
- select count(*) from dmoz;
- count
- --------
- 300274
- (1 row)
- </programlisting>
- </listitem>
- <listitem>
- <para>
- Q1: Get direct children (without inheritance)
- </para>
- <programlisting>
- select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}';
- path
- -----------------------------------
- Top.Adult.Arts.Animation.Cartoons
- Top.Adult.Arts.Animation.Anime
- (2 rows)
- </programlisting>
- </listitem>
- <listitem>
- <para>
- Q2: The same as Q1 but with counting of successors
- </para>
- <programlisting>
- select path as parentpath , (select count(*)-1 from dmoz where path <@
- p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}';
- parentpath | count
- -----------------------------------+-------
- Top.Adult.Arts.Animation.Cartoons | 2
- Top.Adult.Arts.Animation.Anime | 61
- (2 rows)
- </programlisting>
- </listitem>
- <listitem>
- <para>
- Q3: Get all parents
- </para>
- <programlisting>
- select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by
- path asc;
- path
- --------------------------
- Top
- Top.Adult
- Top.Adult.Arts
- Top.Adult.Arts.Animation
- (4 rows)
- </programlisting>
- </listitem>
- <listitem>
- <para>
- Q4: Get all parents with counting of children
- </para>
- <programlisting>
- select path, (select count(*)-1 from dmoz where path <@ p.path) as count
- from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc;
- path | count
- --------------------------+--------
- Top | 300273
- Top.Adult | 4913
- Top.Adult.Arts | 339
- Top.Adult.Arts.Animation | 65
- (4 rows)
- </programlisting>
- </listitem>
- <listitem>
- <para>
- Q5: Get all children with levels
- </para>
- <programlisting>
- select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level
- from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc;
- path | level
- ------------------------------------------------+-------
- Top.Adult.Arts.Animation.Anime | 1
- Top.Adult.Arts.Animation.Anime.Fan_Works | 2
- Top.Adult.Arts.Animation.Anime.Games | 2
- Top.Adult.Arts.Animation.Anime.Genres | 2
- Top.Adult.Arts.Animation.Anime.Image_Galleries | 2
- Top.Adult.Arts.Animation.Anime.Multimedia | 2
- Top.Adult.Arts.Animation.Anime.Resources | 2
- Top.Adult.Arts.Animation.Anime.Titles | 2
- Top.Adult.Arts.Animation.Cartoons | 1
- Top.Adult.Arts.Animation.Cartoons.AVS | 2
- Top.Adult.Arts.Animation.Cartoons.Members | 2
- (11 rows)
- </programlisting>
- </listitem>
- </itemizedlist>
- </sect3>
-
- <sect3>
- <title>Timings</title>
- <programlisting>
-+---------------------------------------------+
-|Query|Rows|Time (ms) index|Time (ms) no index|
-|-----+----+---------------+------------------|
-| Q0| 1| NA| 1453.44|
-|-----+----+---------------+------------------|
-| Q1| 2| 0.49| 1001.54|
-|-----+----+---------------+------------------|
-| Q2| 2| 1.48| 3009.39|
-|-----+----+---------------+------------------|
-| Q3| 4| 0.55| 906.98|
-|-----+----+---------------+------------------|
-| Q4| 4| 24385.07| 4951.91|
-|-----+----+---------------+------------------|
-| Q5| 11| 0.85| 1003.23|
-+---------------------------------------------+
- </programlisting>
- <para>
- Timings without indices were obtained using operations which doesn't use
- indices (see above)
- </para>
- </sect3>
-
- <sect3>
- <title>Remarks</title>
- <para>
- We didn't run full-scale tests, also we didn't present (yet) data for
- operations with arrays of ltree (ltree[]) and full text searching. We'll
- appreciate your input. So far, below some (rather obvious) results:
- </para>
- <itemizedlist>
- <listitem>
- <para>
- Indices does help execution of queries
- </para>
- </listitem>
- <listitem>
- <para>
- Q4 performs bad because one needs to read almost all data from the HDD
- </para>
- </listitem>
- </itemizedlist>
- </sect3>
- </sect2>
- <sect2>
- <title>Some Backgrounds</title>
- <para>
- The approach we use for ltree is much like one we used in our other GiST based
- contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical
- background is available in papers referenced from our GiST development page
- (<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink>).
- </para>
- <para>
- A hierarchical data structure (tree) is a set of nodes. Each node has a
- signature (LPS) of a fixed size, which is a hashed label path of that node.
- Traversing a tree we could *certainly* prune branches if
- </para>
- <programlisting>
- LQS (bitwise AND) LPS != LQS
- </programlisting>
- <para>
- where LQS is a signature of lquery or ltxtquery, obtained in the same way as
- LPS.
- </para>
- <programlisting>
- ltree[]:
- </programlisting>
- <para>
- For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children
- reachable from that node. Signatures are stored in RD-tree, implemented using
- GiST, which provides indexed access.
- </para>
- <programlisting>
- ltree:
- </programlisting>
- <para>
- For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is
- represented by (left_bound, signature, right_bound), so that we could speedup
- operations <literal><, <=, =, >=, ></literal> using left_bound, right_bound and prune branches of
- a tree using signature.
- </para>
- </sect2>
<sect2>
<title>Authors</title>
+
<para>
- All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and
- Oleg Bartunov (<email>oleg@sai.msu.su</email>). See
+ All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and
+ Oleg Bartunov (<email>oleg@sai.msu.su</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for
- additional information. Authors would like to thank Eugeny Rodichev for
+ additional information. Authors would like to thank Eugeny Rodichev for
helpful discussions. Comments and bug reports are welcome.
</para>
</sect2>
+
</sect1>
+<!-- $PostgreSQL: pgsql/doc/src/sgml/oid2name.sgml,v 1.3 2007/12/10 05:32:51 tgl Exp $ -->
+
<sect1 id="oid2name">
<title>oid2name</title>
-
+
<indexterm zone="oid2name">
<primary>oid2name</primary>
</indexterm>
<para>
- This utility allows administrators to examine the file structure used by
- PostgreSQL. To make use of it, you need to be familiar with the file
- structure, which is described in <xref linkend="storage">.
+ <application>oid2name</> is a utility program that helps administrators to
+ examine the file structure used by PostgreSQL. To make use of it, you need
+ to be familiar with the database file structure, which is described in
+ <xref linkend="storage">.
</para>
-
- <sect2>
- <title>Overview</title>
- <para>
- <literal>oid2name</literal> connects to the database and extracts OID,
- filenode, and table name information. You can also have it show database
- OIDs and tablespace OIDs.
- </para>
+
+ <note>
<para>
- When displaying specific tables, you can select which tables to show by
- using -o, -f and -t. The first switch takes an OID, the second takes
- a filenode, and the third takes a tablename (actually, it's a LIKE
- pattern, so you can use things like "foo%"). Note that you can use as many
- of these switches as you like, and the listing will include all objects
- matched by any of the switches. Also note that these switches can only
- show objects in the database given in -d.
+ The name <quote>oid2name</> is historical, and is actually rather
+ misleading, since most of the time when you use it, you will really
+ be concerned with tables' filenode numbers (which are the file names
+ visible in the database directories). Be sure you understand the
+ difference between table OIDs and table filenodes!
</para>
+ </note>
+
+ <sect2>
+ <title>Overview</title>
+
<para>
- If you don't give any of -o, -f or -t it will dump all the tables in the
- database given in -d. If you don't give -d, it will show a database
- listing. Alternatively you can give -s to get a tablespace listing.
+ <application>oid2name</application> connects to a target database and
+ extracts OID, filenode, and/or table name information. You can also have
+ it show database OIDs or tablespace OIDs. The program is controlled by
+ a large number of command-line switches, as shown in
+ <xref linkend="oid2name-switches">.
</para>
- <table>
- <title>Additional switches</title>
+
+ <table id="oid2name-switches">
+ <title><application>oid2name</> switches</title>
<tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Switch</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
<tbody>
<row>
- <entry><literal>-i</literal></entry>
- <entry>include indexes and sequences in the database listing.</entry>
+ <entry><literal>-o</literal> <replaceable>oid</></entry>
+ <entry>show info for table with OID <replaceable>oid</></entry>
</row>
+
<row>
- <entry><literal>-x</literal></entry>
- <entry>display more information about each object shown: tablespace name,
- schema name, OID.
- </entry>
+ <entry><literal>-f</literal> <replaceable>filenode</></entry>
+ <entry>show info for table with filenode <replaceable>filenode</></entry>
+ </row>
+
+ <row>
+ <entry><literal>-t</literal> <replaceable>tablename_pattern</></entry>
+ <entry>show info for table(s) matching <replaceable>tablename_pattern</></entry>
+ </row>
+
+ <row>
+ <entry><literal>-s</literal></entry>
+ <entry>show tablespace OIDs</entry>
</row>
+
<row>
<entry><literal>-S</literal></entry>
- <entry>also show system objects (those in information_schema, pg_toast
- and pg_catalog schemas)
+ <entry>include system objects (those in
+ <literal>information_schema</literal>, <literal>pg_toast</literal>
+ and <literal>pg_catalog</literal> schemas)
</entry>
</row>
+
+ <row>
+ <entry><literal>-i</literal></entry>
+ <entry>include indexes and sequences in the listing</entry>
+ </row>
+
+ <row>
+ <entry><literal>-x</literal></entry>
+ <entry>display more information about each object shown: tablespace name,
+ schema name, and OID
+ </entry>
+ </row>
+
<row>
<entry><literal>-q</literal></entry>
- <entry>don't display headers(useful for scripting)</entry>
+ <entry>omit headers (useful for scripting)</entry>
+ </row>
+
+ <row>
+ <entry><literal>-d</literal> <replaceable>database</></entry>
+ <entry>database to connect to</entry>
+ </row>
+
+ <row>
+ <entry><literal>-H</literal> <replaceable>host</></entry>
+ <entry>database server's host</entry>
+ </row>
+
+ <row>
+ <entry><literal>-p</literal> <replaceable>port</></entry>
+ <entry>database server's port</entry>
+ </row>
+
+ <row>
+ <entry><literal>-U</literal> <replaceable>username</></entry>
+ <entry>username to connect as</entry>
+ </row>
+
+ <row>
+ <entry><literal>-P</literal> <replaceable>password</></entry>
+ <entry>password (deprecated — putting this on the command line
+ is a security hazard)</entry>
</row>
</tbody>
</tgroup>
</table>
+
+ <para>
+ To display specific tables, select which tables to show by
+ using <literal>-o</>, <literal>-f</> and/or <literal>-t</>.
+ <literal>-o</> takes an OID,
+ <literal>-f</> takes a filenode,
+ and <literal>-t</> takes a tablename (actually, it's a LIKE
+ pattern, so you can use things like <literal>foo%</>).
+ You can use as many
+ of these switches as you like, and the listing will include all objects
+ matched by any of the switches. But note that these switches can only
+ show objects in the database given by <literal>-d</>.
+ </para>
+
+ <para>
+ If you don't give any of <literal>-o</>, <literal>-f</> or <literal>-t</>,
+ but do give <literal>-d</>, it will list all tables in the database
+ named by <literal>-d</>. In this mode, the <literal>-S</> and
+ <literal>-i</> switches control what gets listed.
+ </para>
+
+ <para>
+ If you don't give <literal>-d</> either, it will show a listing of database
+ OIDs. Alternatively you can give <literal>-s</> to get a tablespace
+ listing.
+ </para>
</sect2>
-
+
<sect2>
<title>Examples</title>
-
+
<programlisting>
+$ # what's in this database server, anyway?
$ oid2name
All databases:
Oid Database Name Tablespace
155151 fastdisk
155152 bigdisk
-$ cd $PGDATA/17228
+$ # OK, let's look into database alvherre
+$ cd $PGDATA/base/17228
$ # get top 10 db objects in the default tablespace, ordered by size
$ ls -lS * | head -10
-rw------- 1 alvherre alvherre 163840 sep 14 09:50 16699
-rw------- 1 alvherre alvherre 122880 sep 6 17:51 16751
+$ # I wonder what file 155173 is ...
$ oid2name -d alvherre -f 155173
From database "alvherre":
Filenode Table Name
155173 accounts
1155291 accounts_pkey
-$ # you can also mix the options, and have more details
+$ # you can mix the options, and get more details with -x
$ oid2name -d alvherre -t accounts -f 1155291 -x
From database "alvherre":
Filenode Table Name Oid Schema Tablespace
155151/17228/ 155151/PG_VERSION
$ # Oh, what was database 17228 again?
-$ oid2name
+$ oid2name
All databases:
Oid Database Name Tablespace
----------------------------------
Filenode Table Name
----------------------
155156 foo
-
-$ # end of sample session.
</programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Limitations</title>
<para>
- You can also get approximate size data for each object using psql. For
- example,
- </para>
- <programlisting>
- SELECT relpages, relfilenode, relname FROM pg_class ORDER BY relpages DESC;
- </programlisting>
- <para>
- Each page is typically 8k. Relpages is updated by VACUUM.
+ <application>oid2name</> requires a running database server with
+ non-corrupt system catalogs. It is therefore of only limited use
+ for recovering from catastrophic database corruption situations.
</para>
</sect2>
-
+
<sect2>
<title>Author</title>
+
<para>
- b. palmer, <email>bpalmer@crimelabs.net</email>
+ B. Palmer <email>bpalmer@crimelabs.net</email>
</para>
</sect2>
</sect1>
-
+<!-- $PostgreSQL: pgsql/doc/src/sgml/pageinspect.sgml,v 1.3 2007/12/10 05:32:51 tgl Exp $ -->
<sect1 id="pageinspect">
<title>pageinspect</title>
-
+
<indexterm zone="pageinspect">
<primary>pageinspect</primary>
</indexterm>
<para>
- The functions in this module allow you to inspect the contents of data pages
- at a low level, for debugging purposes.
+ The <filename>pageinspect</> module provides functions that allow you to
+ inspect the contents of database pages at a low level, which is useful for
+ debugging purposes. All of these functions may be used only by superusers.
</para>
<sect2>
- <title>Functions included</title>
-
- <itemizedlist>
- <listitem>
- <para>
- <literal>get_raw_page</literal> reads one block of the named table and returns a copy as a
- bytea field. This allows a single time-consistent copy of the block to be
- made. Use of this functions is restricted to superusers.
- </para>
- </listitem>
+ <title>Functions</title>
- <listitem>
- <para>
- <literal>page_header</literal> shows fields which are common to all PostgreSQL heap and index
- pages. Use of this function is restricted to superusers.
- </para>
- <para>
- A page image obtained with <literal>get_raw_page</literal> should be passed as argument:
- </para>
- <programlisting>
-regression=# SELECT * FROM page_header(get_raw_page('pg_class',0));
- lsn | tli | flags | lower | upper | special | pagesize | version | prune_xid
- -----------+-----+-------+-------+-------+---------+----------+---------+-----------
- 0/24A1B50 | 1 | 1 | 232 | 368 | 8192 | 8192 | 4 | 0
- </programlisting>
- <para>
- The returned columns correspond to the fields in the PageHeaderData struct.
- See src/include/storage/bufpage.h for more details.
- </para>
- </listitem>
-
- <listitem>
- <para>
- <literal>heap_page_items</literal> shows all line pointers on a heap page. For those line
- pointers that are in use, tuple headers are also shown. All tuples are
- shown, whether or not the tuples were visible to an MVCC snapshot at the
- time the raw page was copied. Use of this function is restricted to
- superusers.
- </para>
- <para>
- A heap page image obtained with <literal>get_raw_page</literal> should be passed as argument:
- </para>
- <programlisting>
- test=# SELECT * FROM heap_page_items(get_raw_page('pg_class',0));
- </programlisting>
- <para>
- See src/include/storage/itemid.h and src/include/access/htup.h for
- explanations of the fields returned.
- </para>
- </listitem>
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>get_raw_page(text, int) returns bytea</function>
+ </term>
- <listitem>
- <para>
- <literal>bt_metap()</literal> returns information about the btree index metapage:
- </para>
- <programlisting>
- test=> SELECT * FROM bt_metap('pg_cast_oid_index');
- -[ RECORD 1 ]-----
- magic | 340322
- version | 2
- root | 1
- level | 0
- fastroot | 1
- fastlevel | 0
- </programlisting>
- </listitem>
-
- <listitem>
- <para>
- <literal>bt_page_stats()</literal> shows information about single btree pages:
- </para>
- <programlisting>
- test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
- -[ RECORD 1 ]-+-----
- blkno | 1
- type | l
- live_items | 256
- dead_items | 0
- avg_item_size | 12
- page_size | 8192
- free_size | 4056
- btpo_prev | 0
- btpo_next | 0
- btpo | 0
- btpo_flags | 3
- </programlisting>
- </listitem>
-
- <listitem>
- <para>
- <literal>bt_page_items()</literal> returns information about specific items on btree pages:
+ <listitem>
+ <para>
+ <function>get_raw_page</function> reads the specified block of the named
+ table and returns a copy as a <type>bytea</> value. This allows a
+ single time-consistent copy of the block to be obtained.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>page_header(bytea) returns record</function>
+ </term>
+
+ <listitem>
+ <para>
+ <function>page_header</function> shows fields that are common to all
+ <productname>PostgreSQL</> heap and index pages.
+ </para>
+
+ <para>
+ A page image obtained with <function>get_raw_page</function> should be
+ passed as argument. For example:
+ </para>
+ <programlisting>
+test=# SELECT * FROM page_header(get_raw_page('pg_class', 0));
+ lsn | tli | flags | lower | upper | special | pagesize | version | prune_xid
+-----------+-----+-------+-------+-------+---------+----------+---------+-----------
+ 0/24A1B50 | 1 | 1 | 232 | 368 | 8192 | 8192 | 4 | 0
+ </programlisting>
+
+ <para>
+ The returned columns correspond to the fields in the
+ <structname>PageHeaderData</> struct.
+ See <filename>src/include/storage/bufpage.h</> for details.
</para>
- <programlisting>
- test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
- itemoffset | ctid | itemlen | nulls | vars | data
- ------------+---------+---------+-------+------+-------------
- 1 | (0,1) | 12 | f | f | 23 27 00 00
- 2 | (0,2) | 12 | f | f | 24 27 00 00
- 3 | (0,3) | 12 | f | f | 25 27 00 00
- 4 | (0,4) | 12 | f | f | 26 27 00 00
- 5 | (0,5) | 12 | f | f | 27 27 00 00
- 6 | (0,6) | 12 | f | f | 28 27 00 00
- 7 | (0,7) | 12 | f | f | 29 27 00 00
- 8 | (0,8) | 12 | f | f | 2a 27 00 00
- </programlisting>
- </listitem>
- </itemizedlist>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>heap_page_items(bytea) returns setof record</function>
+ </term>
+
+ <listitem>
+ <para>
+ <function>heap_page_items</function> shows all line pointers on a heap
+ page. For those line pointers that are in use, tuple headers are also
+ shown. All tuples are shown, whether or not the tuples were visible to
+ an MVCC snapshot at the time the raw page was copied.
+ </para>
+ <para>
+ A heap page image obtained with <function>get_raw_page</function> should
+ be passed as argument. For example:
+ </para>
+ <programlisting>
+test=# SELECT * FROM heap_page_items(get_raw_page('pg_class', 0));
+ </programlisting>
+ <para>
+ See <filename>src/include/storage/itemid.h</> and
+ <filename>src/include/access/htup.h</> for explanations of the fields
+ returned.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>bt_metap(text) returns record</function>
+ </term>
+
+ <listitem>
+ <para>
+ <function>bt_metap</function> returns information about a btree
+ index's metapage. For example:
+ </para>
+ <programlisting>
+test=# SELECT * FROM bt_metap('pg_cast_oid_index');
+-[ RECORD 1 ]-----
+magic | 340322
+version | 2
+root | 1
+level | 0
+fastroot | 1
+fastlevel | 0
+ </programlisting>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>bt_page_stats(text, int) returns record</function>
+ </term>
+
+ <listitem>
+ <para>
+ <function>bt_page_stats</function> returns summary information about
+ single pages of btree indexes. For example:
+ </para>
+ <programlisting>
+test=# SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
+-[ RECORD 1 ]-+-----
+blkno | 1
+type | l
+live_items | 256
+dead_items | 0
+avg_item_size | 12
+page_size | 8192
+free_size | 4056
+btpo_prev | 0
+btpo_next | 0
+btpo | 0
+btpo_flags | 3
+ </programlisting>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>bt_page_items(text, int) returns setof record</function>
+ </term>
+
+ <listitem>
+ <para>
+ <function>bt_page_items</function> returns detailed information about
+ all of the items on a btree index page. For example:
+ </para>
+ <programlisting>
+test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
+ itemoffset | ctid | itemlen | nulls | vars | data
+------------+---------+---------+-------+------+-------------
+ 1 | (0,1) | 12 | f | f | 23 27 00 00
+ 2 | (0,2) | 12 | f | f | 24 27 00 00
+ 3 | (0,3) | 12 | f | f | 25 27 00 00
+ 4 | (0,4) | 12 | f | f | 26 27 00 00
+ 5 | (0,5) | 12 | f | f | 27 27 00 00
+ 6 | (0,6) | 12 | f | f | 28 27 00 00
+ 7 | (0,7) | 12 | f | f | 29 27 00 00
+ 8 | (0,8) | 12 | f | f | 2a 27 00 00
+ </programlisting>
+ </listitem>
+ </varlistentry>
+ </variablelist>
</sect2>
-</sect1>
+</sect1>
+<!-- $PostgreSQL: pgsql/doc/src/sgml/pgbench.sgml,v 1.4 2007/12/10 05:32:51 tgl Exp $ -->
<sect1 id="pgbench">
<title>pgbench</title>
-
+
<indexterm zone="pgbench">
<primary>pgbench</primary>
</indexterm>
<para>
- <literal>pgbench</literal> is a simple program to run a benchmark test.
- <literal>pgbench</literal> is a client application of PostgreSQL and runs
- with PostgreSQL only. It performs lots of small and simple transactions
- including SELECT/UPDATE/INSERT operations then calculates number of
- transactions successfully completed within a second (transactions
- per second, tps). Targeting data includes a table with at least 100k
- tuples.
+ <application>pgbench</application> is a simple program for running benchmark
+ tests on <productname>PostgreSQL</>. It runs the same sequence of SQL
+ commands over and over, possibly in multiple concurrent database sessions,
+ and then calculates the average transaction rate (transactions per second).
+ By default, <application>pgbench</application> tests a scenario that is
+ loosely based on TPC-B, involving five <command>SELECT</>,
+ <command>UPDATE</>, and <command>INSERT</> commands per transaction.
+ However, it is easy to test other cases by writing your own transaction
+ script files.
</para>
+
<para>
- Example outputs from pgbench look like:
- </para>
+ Typical output from pgbench looks like:
+
<programlisting>
-number of clients: 4
-number of transactions per client: 100
-number of processed transactions: 400/400
-tps = 19.875015(including connections establishing)
-tps = 20.098827(excluding connections establishing)
+transaction type: TPC-B (sort of)
+scaling factor: 10
+number of clients: 10
+number of transactions per client: 1000
+number of transactions actually processed: 10000/10000
+tps = 85.184871 (including connections establishing)
+tps = 85.296346 (excluding connections establishing)
</programlisting>
- <para> Similar program called "JDBCBench" already exists, but it requires
- Java that may not be available on every platform. Moreover some
- people concerned about the overhead of Java that might lead
- inaccurate results. So I decided to write in pure C, and named
- it "pgbench."
- </para>
- <para>
- Features of pgbench:
+ The first four lines just report some of the most important parameter
+ settings. The next line reports the number of transactions completed
+ and intended (the latter being just the product of number of clients
+ and number of transactions); these will be equal unless the run
+ failed before completion. The last two lines report the TPS rate,
+ figured with and without counting the time to start database sessions.
</para>
- <itemizedlist>
- <listitem>
- <para>
- pgbench is written in C using libpq only. So it is very portable
- and easy to install.
- </para>
- </listitem>
- <listitem>
- <para>
- pgbench can simulate concurrent connections using asynchronous
- capability of libpq. No threading is required.
- </para>
- </listitem>
- </itemizedlist>
<sect2>
<title>Overview</title>
- <orderedlist>
- <listitem>
- <para>(optional)Initialize database by:</para>
- <programlisting>
-pgbench -i <dbname>
- </programlisting>
- <para>
- where <dbname> is the name of database. pgbench uses four tables
- accounts, branches, history and tellers. These tables will be
- destroyed. Be very careful if you have tables having same
- names. Default test data contains:
- </para>
- <programlisting>
-table # of tuples
+
+ <para>
+ The default TPC-B-like transaction test requires specific tables to be
+ set up beforehand. <application>pgbench</> should be invoked with
+ the <literal>-i</> (initialize) option to create and populate these
+ tables. (When you are testing a custom script, you don't need this
+ step, but will instead need to do whatever setup your test needs.)
+ Initialization looks like:
+
+ <programlisting>
+pgbench -i <optional> <replaceable>other-options</> </optional> <replaceable>dbname</>
+ </programlisting>
+
+ where <replaceable>dbname</> is the name of the already-created
+ database to test in. (You may also need <literal>-h</>,
+ <literal>-p</>, and/or <literal>-U</> options to specify how to
+ connect to the database server.)
+ </para>
+
+ <caution>
+ <para>
+ <literal>pgbench -i</> creates four tables <structname>accounts</>,
+ <structname>branches</>, <structname>history</>, and
+ <structname>tellers</>, destroying any existing tables of these names.
+ Be very careful to use another database if you have tables having these
+ names!
+ </para>
+ </caution>
+
+ <para>
+ At the default <quote>scale factor</> of 1, the tables initially
+ contain this many rows:
+ </para>
+ <programlisting>
+table # of rows
-------------------------
branches 1
tellers 10
accounts 100000
history 0
- </programlisting>
- <para>
- You can increase the number of tuples by using -s option. branches,
- tellers and accounts tables are created with a fillfactor which is
- set using -F option. See below.
- </para>
- </listitem>
- <listitem>
- <para>Run the benchmark test</para>
- <programlisting>
-pgbench <dbname>
- </programlisting>
- <para>
- The default configuration is:
- </para>
- <programlisting>
- number of clients: 1
- number of transactions per client: 10
- </programlisting>
- </listitem>
- </orderedlist>
+ </programlisting>
+ <para>
+ You can (and, for most purposes, probably should) increase the number
+ of rows by using the <literal>-s</> (scale factor) option. The
+ <literal>-F</> (fillfactor) option might also be used at this point.
+ </para>
+
+ <para>
+ Once you have done the necessary setup, you can run your benchmark
+ with a command that doesn't include <literal>-i</>, that is
+
+ <programlisting>
+pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
+ </programlisting>
- <table>
- <title><literal>pgbench</literal> options</title>
+ In nearly all cases, you'll need some options to make a useful test.
+ The most important options are <literal>-c</> (number of clients),
+ <literal>-t</> (number of transactions), and <literal>-f</> (specify
+ a custom script file). See below for a full list.
+ </para>
+
+ <para>
+ <xref linkend="pgbench-init-options"> shows options that are used
+ during database initialization, while
+ <xref linkend="pgbench-run-options"> shows options that are used
+ while running benchmarks, and
+ <xref linkend="pgbench-common-options"> shows options that are useful
+ in both cases.
+ </para>
+
+ <table id="pgbench-init-options">
+ <title><application>pgbench</application> initialization options</title>
<tgroup cols="2">
<thead>
<row>
- <entry>Parameter</entry>
+ <entry>Option</entry>
<entry>Description</entry>
</row>
</thead>
+
<tbody>
<row>
- <entry><literal>-h hostname</literal></entry>
+ <entry><literal>-i</literal></entry>
<entry>
- <para>
- hostname where the backend is running. If this option
- is omitted, pgbench will connect to the localhost via
- Unix domain socket.
- </para>
+ Required to invoke initialization mode.
</entry>
</row>
<row>
- <entry><literal>-p port</literal></entry>
+ <entry><literal>-s</literal> <replaceable>scale_factor</></entry>
<entry>
- <para>
- the port number that the backend is accepting. default is
- libpq's default, usually 5432.
- </para>
+ Multiply the number of rows generated by the scale factor.
+ For example, <literal>-s 100</> will imply 10,000,000 rows
+ in the <structname>accounts</> table. Default is 1.
</entry>
</row>
<row>
- <entry><literal>-c number_of_clients</literal></entry>
+ <entry><literal>-F</literal> <replaceable>fillfactor</></entry>
<entry>
- <para>
- Number of clients simulated. default is 1.
- </para>
+ Create the <structname>accounts</>, <structname>tellers</> and
+ <structname>branches</> tables with the given fillfactor.
+ Default is 100.
</entry>
</row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <table id="pgbench-run-options">
+ <title><application>pgbench</application> benchmarking options</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Option</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
<row>
- <entry><literal>-t number_of_transactions</literal></entry>
+ <entry><literal>-c</literal> <replaceable>clients</></entry>
<entry>
- <para>
- Number of transactions each client runs. default is 10.
- </para>
+ Number of clients simulated, that is, number of concurrent database
+ sessions. Default is 1.
</entry>
</row>
<row>
- <entry><literal>-s scaling_factor</literal></entry>
+ <entry><literal>-t</literal> <replaceable>transactions</></entry>
<entry>
- <para>
- this should be used with -i (initialize) option.
- number of tuples generated will be multiple of the
- scaling factor. For example, -s 100 will imply 10M
- (10,000,000) tuples in the accounts table.
- default is 1.
- </para>
- <para>
- NOTE: scaling factor should be at least
- as large as the largest number of clients you intend
- to test; else you'll mostly be measuring update contention.
- Regular (not initializing) runs using one of the
- built-in tests will detect scale based on the number of
- branches in the database. For custom (-f) runs it can
- be manually specified with this parameter.
- </para>
+ Number of transactions each client runs. Default is 10.
</entry>
</row>
<row>
- <entry><literal>-D varname=value</literal></entry>
+ <entry><literal>-N</literal></entry>
<entry>
- <para>
- Define a variable. It can be refered to by a script
- provided by using -f option. Multiple -D options are allowed.
- </para>
+ Do not update <structname>tellers</> and <structname>branches</>.
+ This will avoid update contention on these tables, but
+ it makes the test case even less like TPC-B.
</entry>
</row>
<row>
- <entry><literal>-U login</literal></entry>
+ <entry><literal>-S</literal></entry>
<entry>
- <para>
- Specify db user's login name if it is different from
- the Unix login name.
- </para>
+ Perform select-only transactions instead of TPC-B-like test.
</entry>
</row>
<row>
- <entry><literal>-P password</literal></entry>
+ <entry><literal>-f</literal> <replaceable>filename</></entry>
<entry>
- <para>
- Specify the db password. CAUTION: using this option
- might be a security hole since ps command will
- show the password. Use this for TESTING PURPOSE ONLY.
- </para>
+ Read transaction script from <replaceable>filename</>.
+ See below for details.
+ <literal>-N</literal>, <literal>-S</literal>, and <literal>-f</literal>
+ are mutually exclusive.
</entry>
</row>
<row>
<entry><literal>-n</literal></entry>
<entry>
- <para>
- No vacuuming and cleaning the history table prior to the
- test is performed.
- </para>
+ No vacuuming is performed before running the test.
+ This option is <emphasis>necessary</>
+ if you are running a custom test scenario that does not include
+ the standard tables <structname>accounts</>,
+ <structname>branches</>, <structname>history</>, and
+ <structname>tellers</>.
</entry>
</row>
<row>
<entry><literal>-v</literal></entry>
<entry>
- <para>
- Do vacuuming before testing. This will take some time.
- With neither -n nor -v, pgbench will vacuum tellers and
- branches tables only.
- </para>
+ Vacuum all four standard tables before running the test.
+ With neither <literal>-n</> nor <literal>-v</>, pgbench will vacuum
+ <structname>tellers</> and <structname>branches</> tables, and
+ will remove all entries in <structname>history</>.
</entry>
</row>
<row>
- <entry><literal>-S</literal></entry>
+ <entry><literal>-D</literal> <replaceable>varname</><literal>=</><replaceable>value</></entry>
<entry>
- <para>
- Perform select only transactions instead of TPC-B.
- </para>
+ Define a variable for use by a custom script (see below).
+ Multiple <literal>-D</> options are allowed.
</entry>
</row>
<row>
- <entry><literal>-N</literal></entry>
+ <entry><literal>-C</literal></entry>
<entry>
- <para>
- Do not update "branches" and "tellers". This will
- avoid heavy update contention on branches and tellers,
- while it will not make pgbench supporting TPC-B like
- transactions.
- </para>
+ Establish a new connection for each transaction, rather than
+ doing it just once per client thread.
+ This is useful to measure the connection overhead.
</entry>
</row>
<row>
- <entry><literal>-f filename</literal></entry>
+ <entry><literal>-l</literal></entry>
<entry>
- <para>
- Read transaction script from file. Detailed
- explanation will appear later.
- </para>
+ Write the time taken by each transaction to a logfile.
+ See below for details.
</entry>
</row>
<row>
- <entry><literal>-C</literal></entry>
+ <entry><literal>-s</literal> <replaceable>scale_factor</></entry>
<entry>
- <para>
- Establish connection for each transaction, rather than
- doing it just once at beginning of pgbench in the normal
- mode. This is useful to measure the connection overhead.
- </para>
+ Report the specified scale factor in <application>pgbench</>'s
+ output. With the built-in tests, this is not necessary; the
+ correct scale factor will be detected by counting the number of
+ rows in the <structname>branches</> table. However, when testing
+ custom benchmarks (<literal>-f</> option), the scale factor
+ will be reported as 1 unless this option is used.
</entry>
</row>
<row>
- <entry><literal>-l</literal></entry>
+ <entry><literal>-d</literal></entry>
<entry>
- <para>
- Write the time taken by each transaction to a logfile,
- with the name "pgbench_log.xxx", where xxx is the PID
- of the pgbench process. The format of the log is:
- </para>
- <programlisting>
- client_id transaction_no time file_no time-epoch time-us
- </programlisting>
- <para>
- where time is measured in microseconds, , the file_no is
- which test file was used (useful when multiple were
- specified with -f), and time-epoch/time-us are a
- UNIX epoch format timestamp followed by an offset
- in microseconds (suitable for creating a ISO 8601
- timestamp with a fraction of a second) of when
- the transaction completed.
- </para>
- <para>
- Here are example outputs:
- </para>
- <programlisting>
- 0 199 2241 0 1175850568 995598
- 0 200 2465 0 1175850568 998079
- 0 201 2513 0 1175850569 608
- 0 202 2038 0 1175850569 2663
- </programlisting>
+ Print debugging output.
</entry>
</row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <table id="pgbench-common-options">
+ <title><application>pgbench</application> common options</title>
+ <tgroup cols="2">
+ <thead>
<row>
- <entry><literal>-F fillfactor</literal></entry>
- <entry>
- <para>
- Create tables(accounts, tellers and branches) with the given
- fillfactor. Default is 100. This should be used with -i
- (initialize) option.
- </para>
- </entry>
+ <entry>Option</entry>
+ <entry>Description</entry>
</row>
+ </thead>
+
+ <tbody>
<row>
- <entry><literal>-d</literal></entry>
- <entry>
- <para>
- debug option.
- </para>
- </entry>
+ <entry><literal>-h</literal> <replaceable>hostname</></entry>
+ <entry>database server's host</entry>
+ </row>
+ <row>
+ <entry><literal>-p</literal> <replaceable>port</></entry>
+ <entry>database server's port</entry>
+ </row>
+ <row>
+ <entry><literal>-U</literal> <replaceable>login</></entry>
+ <entry>username to connect as</entry>
+ </row>
+ <row>
+ <entry><literal>-P</literal> <replaceable>password</></entry>
+ <entry>password (deprecated — putting this on the command line
+ is a security hazard)</entry>
</row>
</tbody>
</tgroup>
</sect2>
<sect2>
- <title>What is the "transaction" actually performed in pgbench?</title>
- <orderedlist>
- <listitem><para><literal>begin;</literal></para></listitem>
-
- <listitem><para><literal>update accounts set abalance = abalance + :delta where aid = :aid;</literal></para></listitem>
+ <title>What is the <quote>transaction</> actually performed in pgbench?</title>
- <listitem><para><literal>select abalance from accounts where aid = :aid;</literal></para></listitem>
-
- <listitem><para><literal>update tellers set tbalance = tbalance + :delta where tid = :tid;</literal></para></listitem>
-
- <listitem><para><literal>update branches set bbalance = bbalance + :delta where bid = :bid;</literal></para></listitem>
-
- <listitem><para><literal>insert into history(tid,bid,aid,delta) values(:tid,:bid,:aid,:delta);</literal></para></listitem>
+ <para>
+ The default transaction script issues seven commands per transaction:
+ </para>
- <listitem><para><literal>end;</literal></para></listitem>
+ <orderedlist>
+ <listitem><para><literal>BEGIN;</literal></para></listitem>
+ <listitem><para><literal>UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid;</literal></para></listitem>
+ <listitem><para><literal>SELECT abalance FROM accounts WHERE aid = :aid;</literal></para></listitem>
+ <listitem><para><literal>UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid;</literal></para></listitem>
+ <listitem><para><literal>UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid;</literal></para></listitem>
+ <listitem><para><literal>INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);</literal></para></listitem>
+ <listitem><para><literal>END;</literal></para></listitem>
</orderedlist>
+
<para>
- If you specify -N, (4) and (5) aren't included in the transaction.
+ If you specify <literal>-N</>, steps 4 and 5 aren't included in the
+ transaction. If you specify <literal>-S</>, only the <command>SELECT</> is
+ issued.
</para>
</sect2>
<sect2>
- <title>Script file</title>
+ <title>Custom Scripts</title>
+
<para>
- <literal>pgbench</literal> has support for reading a transaction script
- from a specified file (<literal>-f</literal> option). This file should
- include SQL commands in each line. SQL command consists of multiple lines
- are not supported. Empty lines and lines begging with "--" will be ignored.
+ <application>pgbench</application> has support for running custom
+ benchmark scenarios by replacing the default transaction script
+ (described above) with a transaction script read from a file
+ (<literal>-f</literal> option). In this case a <quote>transaction</>
+ counts as one execution of a script file. You can even specify
+ multiple scripts (multiple <literal>-f</literal> options), in which
+ case a random one of the scripts is chosen each time a client session
+ starts a new transaction.
</para>
+
<para>
- Multiple <literal>-f</literal> options are allowed. In this case each
- transaction is assigned randomly chosen script.
+ The format of a script file is one SQL command per line; multi-line
+ SQL commands are not supported. Empty lines and lines beginning with
+ <literal>--</> are ignored. Script file lines can also be
+ <quote>meta commands</>, which are interpreted by <application>pgbench</>
+ itself, as described below.
</para>
+
<para>
- SQL commands can include "meta command" which begins with "\" (back
- slash). A meta command takes some arguments separted by white
- spaces. Currently following meta command is supported:
+ There is a simple variable-substitution facility for script files.
+ Variables can be set by the command-line <literal>-D</> option,
+ explained above, or by the meta commands explained below.
+ In addition to any variables preset by <literal>-D</> command-line options,
+ the variable <literal>scale</> is preset to the current scale factor.
+ Once set, a variable's
+ value can be inserted into a SQL command by writing
+ <literal>:</><replaceable>variablename</>. When running more than
+ one client session, each session has its own set of variables.
</para>
- <itemizedlist>
- <listitem>
- <para>
- <literal>\set name operand1 [ operator operand2 ]</literal>
- - Sets the calculated value using "operand1" "operator"
- "operand2" to variable "name". If "operator" and "operand2"
- are omitted, the value of operand1 is set to variable "name".
- </para>
- <para>
- Example:
- </para>
- <programlisting>
+ <para>
+ Script file meta commands begin with a backslash (<literal>\</>).
+ Arguments to a meta command are separated by white space.
+ These meta commands are supported:
+ </para>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <literal>\set <replaceable>varname</> <replaceable>operand1</> [ <replaceable>operator</> <replaceable>operand2</> ]</literal>
+ </term>
+
+ <listitem>
+ <para>
+ Sets variable <replaceable>varname</> to a calculated integer value.
+ Each <replaceable>operand</> is either an integer constant or a
+ <literal>:</><replaceable>variablename</> reference to a variable
+ having an integer value. The <replaceable>operator</> can be
+ <literal>+</>, <literal>-</>, <literal>*</>, or <literal>/</>.
+ </para>
+
+ <para>
+ Example:
+ <programlisting>
\set ntellers 10 * :scale
- </programlisting>
- </listitem>
- <listitem>
- <para>
- <literal>\setrandom name min max</literal>
- - Assigns random integer to name between min and max
- </para>
- <para>
- Example:
- </para>
- <programlisting>
-\setrandom aid 1 100000
- </programlisting>
- </listitem>
- <listitem>
- <para>
- Variables can be referred to in SQL comands by adding ":" in front
- of the varible name.
- </para>
- <para>
- Example:
- </para>
- <programlisting>
-SELECT abalance FROM accounts WHERE aid = :aid
- </programlisting>
- <para>
- Variables can also be defined by using -D option.
- </para>
- </listitem>
- <listitem>
- <para>
- <literal>\sleep num [us|ms|s]</> - Causes script execution to sleep for the
- specified duration of microseconds (us), milliseconds (ms) or the default
- seconds (s).
- </para>
- <para>
- Example:
- </para>
- <programlisting>
-\setrandom millisec 1000 2500
-\sleep : millisec ms
- </programlisting>
- </listitem>
- </itemizedlist>
- </sect2>
+ </programlisting>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</></literal>
+ </term>
+
+ <listitem>
+ <para>
+ Sets variable <replaceable>varname</> to a random integer value
+ between the limits <replaceable>min</> and <replaceable>max</>.
+ Each limit can be either an integer constant or a
+ <literal>:</><replaceable>variablename</> reference to a variable
+ having an integer value.
+ </para>
+
+ <para>
+ Example:
+ <programlisting>
+\setrandom aid 1 :naccounts
+ </programlisting>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <literal>\sleep <replaceable>number</> [ us | ms | s ]</literal>
+ </term>
+
+ <listitem>
+ <para>
+ Causes script execution to sleep for the specified duration in
+ microseconds (<literal>us</>), milliseconds (<literal>ms</>) or seconds
+ (<literal>s</>). If the unit is omitted then seconds are the default.
+ <replaceable>number</> can be either an integer constant or a
+ <literal>:</><replaceable>variablename</> reference to a variable
+ having an integer value.
+ </para>
+
+ <para>
+ Example:
+ <programlisting>
+\sleep 10 ms
+ </programlisting>
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
- <sect2>
- <title>Examples</title>
<para>
- Example, TPC-B like benchmark can be defined as follows(scaling
- factor = 1):
- </para>
- <programlisting>
+ As an example, the full definition of the built-in TPC-B-like
+ transaction is:
+
+ <programlisting>
\set nbranches :scale
\set ntellers 10 * :scale
\set naccounts 100000 * :scale
\setrandom aid 1 :naccounts
\setrandom bid 1 :nbranches
\setrandom tid 1 :ntellers
-\setrandom delta 1 10000
-BEGIN
-UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid
-SELECT abalance FROM accounts WHERE aid = :aid
-UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid
-UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid
-INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, 'now')
-END
- </programlisting>
+\setrandom delta -5000 5000
+BEGIN;
+UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid;
+SELECT abalance FROM accounts WHERE aid = :aid;
+UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+END;
+ </programlisting>
+
+ This script allows each iteration of the transaction to reference
+ different, randomly-chosen rows. (This example also shows why it's
+ important for each client session to have its own variables —
+ otherwise they'd not be independently touching different rows.)
+ </para>
+
+ </sect2>
+
+ <sect2>
+ <title>Per-transaction logging</title>
+
<para>
- If you want to automatically set the scaling factor from the number of
- tuples in branches table, use -s option and shell command like this:
+ With the <literal>-l</> option, <application>pgbench</> writes the time
+ taken by each transaction to a logfile. The logfile will be named
+ <filename>pgbench_log.<replaceable>nnn</></filename>, where
+ <replaceable>nnn</> is the PID of the pgbench process.
+ The format of the log is:
+
+ <programlisting>
+ <replaceable>client_id</> <replaceable>transaction_no</> <replaceable>time</> <replaceable>file_no</> <replaceable>time_epoch</> <replaceable>time_us</>
+ </programlisting>
+
+ where <replaceable>time</> is the elapsed transaction time in microseconds,
+ <replaceable>file_no</> identifies which script file was used
+ (useful when multiple scripts were specified with <literal>-f</>),
+ and <replaceable>time_epoch</>/<replaceable>time_us</> are a
+ UNIX epoch format timestamp and an offset
+ in microseconds (suitable for creating a ISO 8601
+ timestamp with fractional seconds) showing when
+ the transaction completed.
</para>
- <programlisting>
-pgbench -s $(psql -At -c "SELECT count(*) FROM branches") -f tpc_b.sql
- </programlisting>
+
<para>
- Notice that -f option does not execute vacuum and clearing history
- table before starting benchmark.
+ Here are example outputs:
+ <programlisting>
+ 0 199 2241 0 1175850568 995598
+ 0 200 2465 0 1175850568 998079
+ 0 201 2513 0 1175850569 608
+ 0 202 2038 0 1175850569 2663
+ </programlisting>
</para>
</sect2>
-</sect1>
+ <sect2>
+ <title>Good Practices</title>
+
+ <para>
+ It is very easy to use <application>pgbench</> to produce completely
+ meaningless numbers. Here are some guidelines to help you get useful
+ results.
+ </para>
+
+ <para>
+ In the first place, <emphasis>never</> believe any test that runs
+ for only a few seconds. Increase the <literal>-t</> setting enough
+ to make the run last at least a few minutes, so as to average out noise.
+ In some cases you could need hours to get numbers that are reproducible.
+ It's a good idea to try the test run a few times, to find out if your
+ numbers are reproducible or not.
+ </para>
+
+ <para>
+ For the default TPC-B-like test scenario, the initialization scale factor
+ (<literal>-s</>) should be at least as large as the largest number of
+ clients you intend to test (<literal>-c</>); else you'll mostly be
+ measuring update contention. There are only <literal>-s</> rows in
+ the <structname>branches</> table, and every transaction wants to
+ update one of them, so <literal>-c</> values in excess of <literal>-s</>
+ will undoubtedly result in lots of transactions blocked waiting for
+ other transactions.
+ </para>
+ <para>
+ The default test scenario is also quite sensitive to how long it's been
+ since the tables were initialized: accumulation of dead rows and dead space
+ in the tables changes the results. To understand the results you must keep
+ track of the total number of updates and when vacuuming happens. If
+ autovacuum is enabled it can result in unpredictable changes in measured
+ performance.
+ </para>
+
+ <para>
+ A limitation of <application>pgbench</> is that it can itself become
+ the bottleneck when trying to test a large number of client sessions.
+ This can be alleviated by running <application>pgbench</> on a different
+ machine from the database server, although low network latency will be
+ essential. It might even be useful to run several <application>pgbench</>
+ instances concurrently, on several client machines, against the same
+ database server.
+ </para>
+ </sect2>
+
+</sect1>
+<!-- $PostgreSQL: pgsql/doc/src/sgml/pgbuffercache.sgml,v 2.2 2007/12/10 05:32:51 tgl Exp $ -->
+
<sect1 id="pgbuffercache">
<title>pg_buffercache</title>
-
+
<indexterm zone="pgbuffercache">
<primary>pg_buffercache</primary>
</indexterm>
<para>
- The <literal>pg_buffercache</literal> module provides a means for examining
- what's happening to the buffercache at any given time without having to
- restart or rebuild the server with debugging code added. The intent is to
- do for the buffercache what pg_locks does for locks.
+ The <filename>pg_buffercache</filename> module provides a means for
+ examining what's happening in the shared buffer cache in real time.
</para>
+
<para>
- This module consists of a C function <literal>pg_buffercache_pages()</literal>
- that returns a set of records, plus a view <literal>pg_buffercache</literal>
- to wrapper the function.
+ The module provides a C function <function>pg_buffercache_pages</function>
+ that returns a set of records, plus a view
+ <structname>pg_buffercache</structname> that wraps the function for
+ convenient use.
</para>
+
<para>
- By default public access is REVOKED from both of these, just in case there
+ By default public access is revoked from both of these, just in case there
are security issues lurking.
</para>
<sect2>
- <title>Notes</title>
+ <title>The <structname>pg_buffercache</structname> view</title>
+
<para>
- The definition of the columns exposed in the view is:
+ The definitions of the columns exposed by the view are:
</para>
- <programlisting>
- Column | references | Description
- ----------------+----------------------+------------------------------------
- bufferid | | Id, 1..shared_buffers.
- relfilenode | pg_class.relfilenode | Refilenode of the relation.
- reltablespace | pg_tablespace.oid | Tablespace oid of the relation.
- reldatabase | pg_database.oid | Database for the relation.
- relblocknumber | | Offset of the page in the relation.
- isdirty | | Is the page dirty?
- usagecount | | Page LRU count
- </programlisting>
+
+ <table>
+ <title><structname>pg_buffercache</> Columns</title>
+
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+
+ <row>
+ <entry><structfield>bufferid</structfield></entry>
+ <entry><type>integer</type></entry>
+ <entry></entry>
+ <entry>ID, in the range 1..<varname>shared_buffers</></entry>
+ </row>
+
+ <row>
+ <entry><structfield>relfilenode</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry><literal>pg_class.relfilenode</literal></entry>
+ <entry>Relfilenode of the relation</entry>
+ </row>
+
+ <row>
+ <entry><structfield>reltablespace</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry><literal>pg_tablespace.oid</literal></entry>
+ <entry>Tablespace OID of the relation</entry>
+ </row>
+
+ <row>
+ <entry><structfield>reldatabase</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry><literal>pg_database.oid</literal></entry>
+ <entry>Database OID of the relation</entry>
+ </row>
+
+ <row>
+ <entry><structfield>relblocknumber</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>Page number within the relation</entry>
+ </row>
+
+ <row>
+ <entry><structfield>isdirty</structfield></entry>
+ <entry><type>boolean</type></entry>
+ <entry></entry>
+ <entry>Is the page dirty?</entry>
+ </row>
+
+ <row>
+ <entry><structfield>usagecount</structfield></entry>
+ <entry><type>smallint</type></entry>
+ <entry></entry>
+ <entry>Page LRU count</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
<para>
There is one row for each buffer in the shared cache. Unused buffers are
- shown with all fields null except bufferid.
+ shown with all fields null except <structfield>bufferid</>. Shared system
+ catalogs are shown as belonging to database zero.
</para>
+
<para>
- Because the cache is shared by all the databases, there are pages from
- relations not belonging to the current database.
+ Because the cache is shared by all the databases, there will normally be
+ pages from relations not belonging to the current database. This means
+ that there may not be matching join rows in <structname>pg_class</> for
+ some rows, or that there could even be incorrect joins. If you are
+ trying to join against <structname>pg_class</>, it's a good idea to
+ restrict the join to rows having <structfield>reldatabase</> equal to
+ the current database's OID or zero.
</para>
+
<para>
- When the pg_buffercache view is accessed, internal buffer manager locks are
- taken, and a copy of the buffer cache data is made for the view to display.
- This ensures that the view produces a consistent set of results, while not
- blocking normal buffer activity longer than necessary. Nonetheless there
+ When the <structname>pg_buffercache</> view is accessed, internal buffer
+ manager locks are taken for long enough to copy all the buffer state
+ data that the view will display.
+ This ensures that the view produces a consistent set of results, while not
+ blocking normal buffer activity longer than necessary. Nonetheless there
could be some impact on database performance if this view is read often.
</para>
</sect2>
<sect2>
<title>Sample output</title>
- <programlisting>
- regression=# \d pg_buffercache;
- View "public.pg_buffercache"
- Column | Type | Modifiers
- ----------------+----------+-----------
- bufferid | integer |
- relfilenode | oid |
- reltablespace | oid |
- reldatabase | oid |
- relblocknumber | bigint |
- isdirty | boolean |
- usagecount | smallint |
-
- View definition:
- SELECT p.bufferid, p.relfilenode, p.reltablespace, p.reldatabase,
- p.relblocknumber, p.isdirty, p.usagecount
- FROM pg_buffercache_pages() p(bufferid integer, relfilenode oid,
- reltablespace oid, reldatabase oid, relblocknumber bigint,
- isdirty boolean, usagecount smallint);
+ <programlisting>
regression=# SELECT c.relname, count(*) AS buffers
- FROM pg_class c INNER JOIN pg_buffercache b
- ON b.relfilenode = c.relfilenode INNER JOIN pg_database d
- ON (b.reldatabase = d.oid AND d.datname = current_database())
+ FROM pg_buffercache b INNER JOIN pg_class c
+ ON b.relfilenode = c.relfilenode AND
+ b.reldatabase IN (0, (SELECT oid FROM pg_database
+ WHERE datname = current_database()))
GROUP BY c.relname
ORDER BY 2 DESC LIMIT 10;
relname | buffers
pg_depend | 22
pg_depend_reference_index | 20
(10 rows)
-
- regression=#
</programlisting>
</sect2>
<sect2>
<title>Authors</title>
- <itemizedlist>
- <listitem>
- <para>
- Mark Kirkwood <email>markir@paradise.net.nz</email>
- </para>
- </listitem>
- <listitem>
- <para>Design suggestions: Neil Conway <email>neilc@samurai.com</email></para>
- </listitem>
- <listitem>
- <para>Debugging advice: Tom Lane <email>tgl@sss.pgh.pa.us</email></para>
- </listitem>
- </itemizedlist>
+
+ <para>
+ Mark Kirkwood <email>markir@paradise.net.nz</email>
+ </para>
+
+ <para>
+ Design suggestions: Neil Conway <email>neilc@samurai.com</email>
+ </para>
+
+ <para>
+ Debugging advice: Tom Lane <email>tgl@sss.pgh.pa.us</email>
+ </para>
</sect2>
</sect1>
+<!-- $PostgreSQL: pgsql/doc/src/sgml/pgfreespacemap.sgml,v 2.2 2007/12/10 05:32:51 tgl Exp $ -->
+
<sect1 id="pgfreespacemap">
<title>pg_freespacemap</title>
-
+
<indexterm zone="pgfreespacemap">
<primary>pg_freespacemap</primary>
</indexterm>
<para>
- This module provides a means for examining the free space map (FSM). It
- consists of two C functions: <literal>pg_freespacemap_relations()</literal>
- and <literal>pg_freespacemap_pages()</literal> that return a set
- of records, plus two views <literal>pg_freespacemap_relations</literal> and
- <literal>pg_freespacemap_pages</literal> for more user-friendly access to
- the functions.
- </para>
- <para>
- The module provides the ability to examine the contents of the free space
- map, without having to restart or rebuild the server with additional
- debugging code.
+ The <filename>pg_freespacemap</> module provides a means for examining the
+ free space map (FSM). It provides two C functions:
+ <function>pg_freespacemap_relations</function> and
+ <function>pg_freespacemap_pages</function> that each return a set of
+ records, plus two views <structname>pg_freespacemap_relations</structname>
+ and <structname>pg_freespacemap_pages</structname> that wrap the functions
+ for convenient use.
</para>
+
<para>
- By default public access is REVOKED from the functions and views, just in
- case there are security issues present in the code.
+ By default public access is revoked from the functions and views, just in
+ case there are security issues lurking.
</para>
<sect2>
- <title>Notes</title>
+ <title>The <filename>pg_freespacemap</> views</title>
+
<para>
- The definitions for the columns exposed in the views are:
+ The definitions of the columns exposed by the views are:
</para>
<table>
- <title>pg_freespacemap_relations</title>
- <tgroup cols="3">
+ <title><structname>pg_freespacemap_relations</> Columns</title>
+
+ <tgroup cols="4">
<thead>
<row>
- <entry>Column</entry>
- <entry>references</entry>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
+
<row>
- <entry>reltablespace</entry>
- <entry>pg_tablespace.oid</entry>
- <entry>Tablespace oid of the relation.</entry>
+ <entry><structfield>reltablespace</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry><literal>pg_tablespace.oid</literal></entry>
+ <entry>Tablespace OID of the relation</entry>
</row>
<row>
- <entry>reldatabase</entry>
- <entry>pg_database.oid</entry>
- <entry>Database oid of the relation.</entry>
+ <entry><structfield>reldatabase</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry><literal>pg_database.oid</literal></entry>
+ <entry>Database OID of the relation</entry>
</row>
<row>
- <entry>relfilenode</entry>
- <entry>pg_class.relfilenode</entry>
- <entry>Relfilenode of the relation.</entry>
+ <entry><structfield>relfilenode</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry><literal>pg_class.relfilenode</literal></entry>
+ <entry>Relfilenode of the relation</entry>
</row>
<row>
- <entry>avgrequest</entry>
+ <entry><structfield>avgrequest</structfield></entry>
+ <entry><type>integer</type></entry>
<entry></entry>
<entry>Moving average of free space requests (NULL for indexes)</entry>
</row>
<row>
- <entry>interestingpages</entry>
+ <entry><structfield>interestingpages</structfield></entry>
+ <entry><type>integer</type></entry>
<entry></entry>
- <entry>Count of pages last reported as containing useful free space.</entry>
+ <entry>Count of pages last reported as containing useful free space</entry>
</row>
<row>
- <entry>storedpages</entry>
+ <entry><structfield>storedpages</structfield></entry>
+ <entry><type>integer</type></entry>
<entry></entry>
- <entry>Count of pages actually stored in free space map.</entry>
+ <entry>Count of pages actually stored in free space map</entry>
</row>
<row>
- <entry>nextpage</entry>
+ <entry><structfield>nextpage</structfield></entry>
+ <entry><type>integer</type></entry>
<entry></entry>
- <entry>Page index (from 0) to start next search at.</entry>
+ <entry>Page index (from 0) to start next search at</entry>
</row>
+
</tbody>
</tgroup>
</table>
<table>
- <title>pg_freespacemap_pages</title>
- <tgroup cols="3">
+ <title><structname>pg_freespacemap_pages</> Columns</title>
+
+ <tgroup cols="4">
<thead>
<row>
- <entry>Column</entry>
- <entry> references</entry>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
+
<row>
- <entry>reltablespace</entry>
- <entry>pg_tablespace.oid</entry>
- <entry>Tablespace oid of the relation.</entry>
+ <entry><structfield>reltablespace</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry><literal>pg_tablespace.oid</literal></entry>
+ <entry>Tablespace OID of the relation</entry>
</row>
<row>
- <entry>reldatabase</entry>
- <entry>pg_database.oid</entry>
- <entry>Database oid of the relation.</entry>
+ <entry><structfield>reldatabase</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry><literal>pg_database.oid</literal></entry>
+ <entry>Database OID of the relation</entry>
</row>
<row>
- <entry>relfilenode</entry>
- <entry>pg_class.relfilenode</entry>
- <entry>Relfilenode of the relation.</entry>
+ <entry><structfield>relfilenode</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry><literal>pg_class.relfilenode</literal></entry>
+ <entry>Relfilenode of the relation</entry>
</row>
<row>
- <entry>relblocknumber</entry>
+ <entry><structfield>relblocknumber</structfield></entry>
+ <entry><type>bigint</type></entry>
<entry></entry>
- <entry>Page number in the relation.</entry>
+ <entry>Page number within the relation</entry>
</row>
<row>
- <entry>bytes</entry>
+ <entry><structfield>bytes</structfield></entry>
+ <entry><type>integer</type></entry>
<entry></entry>
- <entry>Free bytes in the page, or NULL for an index page (see below).</entry>
+ <entry>Free bytes in the page, or NULL for an index page (see below)</entry>
</row>
+
</tbody>
</tgroup>
</table>
<para>
- For <literal>pg_freespacemap_relations</literal>, there is one row for each
- relation in the free space map. <literal>storedpages</literal> is the
- number of pages actually stored in the map, while
- <literal>interestingpages</literal> is the number of pages the last VACUUM
- thought had useful amounts of free space.
+ For <structname>pg_freespacemap_relations</structname>, there is one row
+ for each relation in the free space map.
+ <structfield>storedpages</structfield> is the number of pages actually
+ stored in the map, while <structfield>interestingpages</structfield> is the
+ number of pages the last <command>VACUUM</> thought had useful amounts of
+ free space.
</para>
+
<para>
- If <literal>storedpages</literal> is consistently less than interestingpages
- then it'd be a good idea to increase <literal>max_fsm_pages</literal>. Also,
- if the number of rows in <literal>pg_freespacemap_relations</literal> is
- close to <literal>max_fsm_relations</literal>, then you should consider
- increasing <literal>max_fsm_relations</literal>.
+ If <structfield>storedpages</structfield> is consistently less than
+ <structfield>interestingpages</> then it'd be a good idea to increase
+ <varname>max_fsm_pages</varname>. Also, if the number of rows in
+ <structname>pg_freespacemap_relations</structname> is close to
+ <varname>max_fsm_relations</varname>, then you should consider increasing
+ <varname>max_fsm_relations</varname>.
</para>
+
<para>
- For <literal>pg_freespacemap_pages</literal>, there is one row for each page
- in the free space map. The number of rows for a relation will match the
- <literal>storedpages</literal> column in
- <literal>pg_freespacemap_relations</literal>.
+ For <structname>pg_freespacemap_pages</structname>, there is one row for
+ each page in the free space map. The number of rows for a relation will
+ match the <structfield>storedpages</structfield> column in
+ <structname>pg_freespacemap_relations</structname>.
</para>
+
<para>
For indexes, what is tracked is entirely-unused pages, rather than free
space within pages. Therefore, the average request size and free bytes
within a page are not meaningful, and are shown as NULL.
</para>
+
<para>
- Because the map is shared by all the databases, it will include relations
- not belonging to the current database.
+ Because the map is shared by all the databases, there will normally be
+ entries for relations not belonging to the current database. This means
+ that there may not be matching join rows in <structname>pg_class</> for
+ some rows, or that there could even be incorrect joins. If you are
+ trying to join against <structname>pg_class</>, it's a good idea to
+ restrict the join to rows having <structfield>reldatabase</> equal to
+ the current database's OID or zero.
</para>
+
<para>
- When either of the views are accessed, internal free space map locks are
- taken, and a copy of the map data is made for them to display.
- This ensures that the views produce a consistent set of results, while not
- blocking normal activity longer than necessary. Nonetheless there
+ When either of the views is accessed, internal free space map locks are
+ taken for long enough to copy all the state data that the view will display.
+ This ensures that the views produce a consistent set of results, while not
+ blocking normal activity longer than necessary. Nonetheless there
could be some impact on database performance if they are read often.
</para>
</sect2>
<sect2>
- <title>Sample output - pg_freespacemap_relations</title>
- <programlisting>
-regression=# \d pg_freespacemap_relations
-View "public.pg_freespacemap_relations"
- Column | Type | Modifiers
-------------------+---------+-----------
- reltablespace | oid |
- reldatabase | oid |
- relfilenode | oid |
- avgrequest | integer |
- interestingpages | integer |
- storedpages | integer |
- nextpage | integer |
-View definition:
- SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.avgrequest, p.interestingpages, p.storedpages, p.nextpage
- FROM pg_freespacemap_relations() p(reltablespace oid, reldatabase oid, relfilenode oid, avgrequest integer, interestingpages integer, storedpages integer, nextpage integer);
+ <title>Sample output</title>
+ <programlisting>
regression=# SELECT c.relname, r.avgrequest, r.interestingpages, r.storedpages
FROM pg_freespacemap_relations r INNER JOIN pg_class c
- ON c.relfilenode = r.relfilenode INNER JOIN pg_database d
- ON r.reldatabase = d.oid AND (d.datname = current_database())
+ ON r.relfilenode = c.relfilenode AND
+ r.reldatabase IN (0, (SELECT oid FROM pg_database
+ WHERE datname = current_database()))
ORDER BY r.storedpages DESC LIMIT 10;
relname | avgrequest | interestingpages | storedpages
---------------------------------+------------+------------------+-------------
pg_class_relname_nsp_index | | 10 | 10
pg_proc | 302 | 8 | 8
(10 rows)
- </programlisting>
- </sect2>
-
- <sect2>
- <title>Sample output - pg_freespacemap_pages</title>
- <programlisting>
-regression=# \d pg_freespacemap_pages
- View "public.pg_freespacemap_pages"
- Column | Type | Modifiers
-----------------+---------+-----------
- reltablespace | oid |
- reldatabase | oid |
- relfilenode | oid |
- relblocknumber | bigint |
- bytes | integer |
-View definition:
- SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.relblocknumber, p.bytes
- FROM pg_freespacemap_pages() p(reltablespace oid, reldatabase oid, relfilenode oid, relblocknumber bigint, bytes integer);
regression=# SELECT c.relname, p.relblocknumber, p.bytes
FROM pg_freespacemap_pages p INNER JOIN pg_class c
- ON c.relfilenode = p.relfilenode INNER JOIN pg_database d
- ON (p.reldatabase = d.oid AND d.datname = current_database())
+ ON p.relfilenode = c.relfilenode AND
+ p.reldatabase IN (0, (SELECT oid FROM pg_database
+ WHERE datname = current_database()))
ORDER BY c.relname LIMIT 10;
- relname | relblocknumber | bytes
+ relname | relblocknumber | bytes
--------------+----------------+-------
a_star | 0 | 8040
abstime_tbl | 0 | 7908
<sect2>
<title>Author</title>
+
<para>
Mark Kirkwood <email>markir@paradise.net.nz</email>
</para>
</sect2>
+
</sect1>
+<!-- $PostgreSQL: pgsql/doc/src/sgml/pgrowlocks.sgml,v 1.4 2007/12/10 05:32:51 tgl Exp $ -->
<sect1 id="pgrowlocks">
<title>pgrowlocks</title>
-
+
<indexterm zone="pgrowlocks">
<primary>pgrowlocks</primary>
</indexterm>
<para>
- The <literal>pgrowlocks</literal> module provides a function to show row
+ The <filename>pgrowlocks</filename> module provides a function to show row
locking information for a specified table.
</para>
<sect2>
<title>Overview</title>
- <programlisting>
-pgrowlocks(text) RETURNS pgrowlocks_type
- </programlisting>
+
+ <synopsis>
+pgrowlocks(text) returns setof record
+ </synopsis>
+
<para>
- The parameter is a name of table. And <literal>pgrowlocks_type</literal> is
- defined as:
+ The parameter is the name of a table. The result is a set of records,
+ with one row for each locked row within the table. The output columns
+ are:
</para>
- <programlisting>
-CREATE TYPE pgrowlocks_type AS (
- locked_row TID, -- row TID
- lock_type TEXT, -- lock type
- locker XID, -- locking XID
- multi bool, -- multi XID?
- xids xid[], -- multi XIDs
- pids INTEGER[] -- locker's process id
-);
- </programlisting>
<table>
- <title>pgrowlocks_type</title>
- <tgroup cols="2">
+ <title><function>pgrowlocks</> output columns</title>
+
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
<tbody>
+
<row>
- <entry>locked_row</entry>
- <entry>tuple ID(TID) of each locked rows</entry>
+ <entry><structfield>locked_row</structfield></entry>
+ <entry><type>tid</type></entry>
+ <entry>Tuple ID (TID) of locked row</entry>
</row>
<row>
- <entry>lock_type</entry>
- <entry>"Shared" for shared lock, "Exclusive" for exclusive lock</entry>
+ <entry><structfield>lock_type</structfield></entry>
+ <entry><type>text</type></entry>
+ <entry><literal>Shared</> for shared lock, or
+ <literal>Exclusive</> for exclusive lock</entry>
</row>
<row>
- <entry>locker</entry>
- <entry>transaction ID of locker (Note 1)</entry>
+ <entry><structfield>locker</structfield></entry>
+ <entry><type>xid</type></entry>
+ <entry>Transaction ID of locker, or multixact ID if multi-transaction</entry>
</row>
<row>
- <entry>multi</entry>
- <entry>"t" if locker is a multi transaction, otherwise "f"</entry>
+ <entry><structfield>multi</structfield></entry>
+ <entry><type>boolean</type></entry>
+ <entry>True if locker is a multi-transaction</entry>
</row>
<row>
- <entry>xids</entry>
- <entry>XIDs of lockers (Note 2)</entry>
+ <entry><structfield>xids</structfield></entry>
+ <entry><type>xid[]</type></entry>
+ <entry>Transaction IDs of lockers (more than one if multi-transaction)</entry>
</row>
<row>
- <entry>pids</entry>
- <entry>process ids of locking backends</entry>
+ <entry><structfield>pids</structfield></entry>
+ <entry><type>integer[]</type></entry>
+ <entry>Process IDs of locking backends (more than one if multi-transaction)</entry>
</row>
+
</tbody>
</tgroup>
</table>
- <para>
- Note1: If the locker is multi transaction, it represents the multi ID.
- </para>
- <para>
- Note2: If the locker is multi, multiple data are shown.
- </para>
<para>
- The calling sequence for <literal>pgrowlocks</literal> is as follows:
- <literal>pgrowlocks</literal> grabs AccessShareLock for the target table and
- reads each row one by one to get the row locking information. You should
- notice that:
+ <function>pgrowlocks</function> takes <literal>AccessShareLock</> for the
+ target table and reads each row one by one to collect the row locking
+ information. This is not very speedy for a large table. Note that:
</para>
+
<orderedlist>
<listitem>
<para>
- if the table is exclusive locked by someone else,
- <literal>pgrowlocks</literal> will be blocked.
+ If the table as a whole is exclusive-locked by someone else,
+ <function>pgrowlocks</function> will be blocked.
</para>
</listitem>
<listitem>
<para>
- <literal>pgrowlocks</literal> may show incorrect information if there's a
- new lock or a lock is freeed while its execution.
+ <function>pgrowlocks</function> is not guaranteed to produce a
+ self-consistent snapshot. It is possible that a new row lock is taken,
+ or an old lock is freed, during its execution.
</para>
</listitem>
</orderedlist>
+
<para>
- <literal>pgrowlocks</literal> does not show the contents of locked rows. If
- you want to take a look at the row contents at the same time, you could do
- something like this:
- </para>
+ <function>pgrowlocks</function> does not show the contents of locked
+ rows. If you want to take a look at the row contents at the same time, you
+ could do something like this:
+
<programlisting>
-SELECT * FROM accounts AS a, pgrowlocks('accounts') AS p WHERE p.locked_ row = a.ctid;
+SELECT * FROM accounts AS a, pgrowlocks('accounts') AS p
+ WHERE p.locked_row = a.ctid;
</programlisting>
+
+ Be aware however that (as of <productname>PostgreSQL</> 8.3) such a
+ query will be very inefficient.
+ </para>
</sect2>
<sect2>
- <title>Example</title>
- <para>
- <literal>pgrowlocks</literal> returns the following columns:
- </para>
- <para>
- Here is a sample execution of pgrowlocks:
- </para>
+ <title>Sample output</title>
+
<programlisting>
test=# SELECT * FROM pgrowlocks('t1');
- locked_row | lock_type | locker | multi | xids | pids
+ locked_row | lock_type | locker | multi | xids | pids
------------+-----------+--------+-------+-----------+---------------
(0,1) | Shared | 19 | t | {804,805} | {29066,29068}
(0,2) | Shared | 19 | t | {804,805} | {29066,29068}
(0,4) | Exclusive | 804 | f | {804} | {29066}
(4 rows)
</programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ Tatsuo Ishii
+ </para>
</sect2>
-</sect1>
+</sect1>
+<!-- $PostgreSQL: pgsql/doc/src/sgml/pgstattuple.sgml,v 1.3 2007/12/10 05:32:51 tgl Exp $ -->
<sect1 id="pgstattuple">
<title>pgstattuple</title>
-
+
<indexterm zone="pgstattuple">
<primary>pgstattuple</primary>
</indexterm>
<para>
- <literal>pgstattuple</literal> modules provides various functions to obtain
- tuple statistics.
+ The <filename>pgstattuple</filename> module provides various functions to
+ obtain tuple-level statistics.
</para>
<sect2>
<title>Functions</title>
- <itemizedlist>
- <listitem>
- <para>
- <literal>pgstattuple()</literal> returns the relation length, percentage
- of the "dead" tuples of a relation and other info. This may help users to
- determine whether vacuum is necessary or not. Here is an example session:
- </para>
- <programlisting>
-test=> \x
-Expanded display is on.
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>pgstattuple(text) returns record</>
+ </term>
+
+ <listitem>
+ <para>
+ <function>pgstattuple</function> returns a relation's physical length,
+ percentage of <quote>dead</> tuples, and other info. This may help users
+ to determine whether vacuum is necessary or not. The argument is the
+ target relation's name (optionally schema-qualified).
+ For example:
+ </para>
+ <programlisting>
test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
-[ RECORD 1 ]------+-------
table_len | 458752
dead_tuple_percent | 0.69
free_space | 8932
free_percent | 1.95
- </programlisting>
+ </programlisting>
+
<para>
- Here are explanations for each column:
+ The output columns are:
</para>
-
+
<table>
- <title><literal>pgstattuple()</literal> column descriptions</title>
- <tgroup cols="2">
+ <title><function>pgstattuple</function> output columns</title>
+ <tgroup cols="3">
<thead>
<row>
<entry>Column</entry>
+ <entry>Type</entry>
<entry>Description</entry>
</row>
</thead>
+
<tbody>
<row>
- <entry>table_len</entry>
- <entry>physical relation length in bytes</entry>
+ <entry><structfield>table_len</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Physical relation length in bytes</entry>
+ </row>
+ <row>
+ <entry><structfield>tuple_count</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Number of live tuples</entry>
</row>
<row>
- <entry>tuple_count</entry>
- <entry>number of live tuples</entry>
+ <entry><structfield>tuple_len</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Total length of live tuples in bytes</entry>
</row>
<row>
- <entry>tuple_len</entry>
- <entry>total tuples length in bytes</entry>
+ <entry><structfield>tuple_percent</structfield></entry>
+ <entry><type>float8</type></entry>
+ <entry>Percentage of live tuples</entry>
</row>
<row>
- <entry>tuple_percent</entry>
- <entry>live tuples in %</entry>
+ <entry><structfield>dead_tuple_count</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Number of dead tuples</entry>
</row>
<row>
- <entry>dead_tuple_len</entry>
- <entry>total dead tuples length in bytes</entry>
+ <entry><structfield>dead_tuple_len</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Total length of dead tuples in bytes</entry>
</row>
<row>
- <entry>dead_tuple_percent</entry>
- <entry>dead tuples in %</entry>
+ <entry><structfield>dead_tuple_percent</structfield></entry>
+ <entry><type>float8</type></entry>
+ <entry>Percentage of dead tuples</entry>
</row>
<row>
- <entry>free_space</entry>
- <entry>free space in bytes</entry>
+ <entry><structfield>free_space</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Total free space in bytes</entry>
</row>
<row>
- <entry>free_percent</entry>
- <entry>free space in %</entry>
+ <entry><structfield>free_percent</structfield></entry>
+ <entry><type>float8</type></entry>
+ <entry>Percentage of free space</entry>
</row>
+
</tbody>
</tgroup>
</table>
- <para>
- <note>
- <para>
- <literal>pgstattuple</literal> acquires only a read lock on the relation. So
- concurrent update may affect the result.
- </para>
- </note>
- <note>
- <para>
- <literal>pgstattuple</literal> judges a tuple is "dead" if HeapTupleSatisfiesNow()
- returns false.
- </para>
- </note>
- </para>
- </listitem>
-
- <listitem>
<para>
- <literal>pg_relpages()</literal> returns the number of pages in the relation.
+ <function>pgstattuple</function> acquires only a read lock on the
+ relation. So the results do not reflect an instantaneous snapshot;
+ concurrent updates will affect them.
</para>
- </listitem>
- <listitem>
<para>
- <literal>pgstatindex()</literal> returns an array showing the information about an index:
+ <function>pgstattuple</function> judges a tuple is <quote>dead</> if
+ <function>HeapTupleSatisfiesNow</> returns false.
</para>
- <programlisting>
-test=> \x
-Expanded display is on.
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>pgstattuple(oid) returns record</>
+ </term>
+
+ <listitem>
+ <para>
+ This is the same as <function>pgstattuple(text)</function>, except
+ that the target relation is specified by OID.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>pgstatindex(text) returns record</>
+ </term>
+
+ <listitem>
+ <para>
+ <function>pgstatindex</function> returns a record showing information
+ about a btree index. For example:
+ </para>
+ <programlisting>
test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
-[ RECORD 1 ]------+------
version | 2
avg_leaf_density | 50.27
leaf_fragmentation | 0
</programlisting>
- </listitem>
- </itemizedlist>
+
+ <para>
+ The output columns are:
+ </para>
+
+ <table>
+ <title><function>pgstatindex</function> output columns</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column</entry>
+ <entry>Type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><structfield>version</structfield></entry>
+ <entry><type>integer</type></entry>
+ <entry>Btree version number</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tree_level</structfield></entry>
+ <entry><type>integer</type></entry>
+ <entry>Tree level of the root page</entry>
+ </row>
+
+ <row>
+ <entry><structfield>index_size</structfield></entry>
+ <entry><type>integer</type></entry>
+ <entry>Total number of pages in index</entry>
+ </row>
+
+ <row>
+ <entry><structfield>root_block_no</structfield></entry>
+ <entry><type>integer</type></entry>
+ <entry>Location of root block</entry>
+ </row>
+
+ <row>
+ <entry><structfield>internal_pages</structfield></entry>
+ <entry><type>integer</type></entry>
+ <entry>Number of <quote>internal</> (upper-level) pages</entry>
+ </row>
+
+ <row>
+ <entry><structfield>leaf_pages</structfield></entry>
+ <entry><type>integer</type></entry>
+ <entry>Number of leaf pages</entry>
+ </row>
+
+ <row>
+ <entry><structfield>empty_pages</structfield></entry>
+ <entry><type>integer</type></entry>
+ <entry>Number of empty pages</entry>
+ </row>
+
+ <row>
+ <entry><structfield>deleted_pages</structfield></entry>
+ <entry><type>integer</type></entry>
+ <entry>Number of deleted pages</entry>
+ </row>
+
+ <row>
+ <entry><structfield>avg_leaf_density</structfield></entry>
+ <entry><type>float8</type></entry>
+ <entry>Average density of leaf pages</entry>
+ </row>
+
+ <row>
+ <entry><structfield>leaf_fragmentation</structfield></entry>
+ <entry><type>float8</type></entry>
+ <entry>Leaf page fragmentation</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ As with <function>pgstattuple</>, the results are accumulated
+ page-by-page, and should not be expected to represent an
+ instantaneous snapshot of the whole index.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>pg_relpages(text) returns integer</>
+ </term>
+
+ <listitem>
+ <para>
+ <function>pg_relpages</function> returns the number of pages in the
+ relation.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
</sect2>
<sect2>
- <title>Usage</title>
- <para>
- <literal>pgstattuple</literal> may be called as a relation function and is
- defined as follows:
- </para>
- <programlisting>
- CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
- AS 'MODULE_PATHNAME', 'pgstattuple'
- LANGUAGE C STRICT;
-
- CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
- AS 'MODULE_PATHNAME', 'pgstattuplebyid'
- LANGUAGE C STRICT;
- </programlisting>
+ <title>Author</title>
+
<para>
- The argument is the relation name (optionally it may be qualified)
- or the OID of the relation. Note that pgstattuple only returns
- one row.
+ Tatsuo Ishii
</para>
</sect2>
</sect1>
-
+<!-- $PostgreSQL: pgsql/doc/src/sgml/pgtrgm.sgml,v 2.2 2007/12/10 05:32:51 tgl Exp $ -->
+
<sect1 id="pgtrgm">
<title>pg_trgm</title>
-
+
<indexterm zone="pgtrgm">
<primary>pg_trgm</primary>
</indexterm>
<para>
- The <literal>pg_trgm</literal> module provides functions and index classes
- for determining the similarity of text based on trigram matching.
+ The <filename>pg_trgm</filename> module provides functions and operators
+ for determining the similarity of text based on trigram matching, as
+ well as index operator classes that support fast searching for similar
+ strings.
</para>
<sect2>
- <title>Trigram (or Trigraph)</title>
- <para>
- A trigram is a set of three consecutive characters taken
- from a string. A string is considered to have two spaces
- prefixed and one space suffixed when determining the set
- of trigrams that comprise the string.
- </para>
+ <title>Trigram (or Trigraph) Concepts</title>
+
<para>
- eg. The set of trigrams in the word "cat" is " c", " ca",
- "at " and "cat".
+ A trigram is a group of three consecutive characters taken
+ from a string. We can measure the similarity of two strings by
+ counting the number of trigrams they share. This simple idea
+ turns out to be very effective for measuring the similarity of
+ words in many natural languages.
</para>
+
+ <note>
+ <para>
+ A string is considered to have two spaces
+ prefixed and one space suffixed when determining the set
+ of trigrams contained in the string.
+ For example, the set of trigrams in the string
+ <quote><literal>cat</literal></quote> is
+ <quote><literal> c</literal></quote>,
+ <quote><literal> ca</literal></quote>,
+ <quote><literal>cat</literal></quote>, and
+ <quote><literal>at </literal></quote>.
+ </para>
+ </note>
</sect2>
<sect2>
- <title>Public Functions</title>
- <table>
- <title><literal>pg_trgm</literal> functions</title>
- <tgroup cols="2">
+ <title>Functions and Operators</title>
+
+ <table id="pgtrgm-func-table">
+ <title><filename>pg_trgm</filename> functions</title>
+ <tgroup cols="3">
<thead>
<row>
<entry>Function</entry>
+ <entry>Returns</entry>
<entry>Description</entry>
</row>
</thead>
+
<tbody>
<row>
- <entry><literal>real similarity(text, text)</literal></entry>
+ <entry><function>similarity(text, text)</function></entry>
+ <entry><type>real</type></entry>
<entry>
- <para>
- Returns a number that indicates how closely matches the two
- arguments are. A zero result indicates that the two words
- are completely dissimilar, and a result of one indicates that
- the two words are identical.
- </para>
+ Returns a number that indicates how similar the two arguments are.
+ The range of the result is zero (indicating that the two strings are
+ completely dissimilar) to one (indicating that the two strings are
+ identical).
</entry>
</row>
<row>
- <entry><literal>real show_limit()</literal></entry>
+ <entry><function>show_trgm(text)</function></entry>
+ <entry><type>text[]</type></entry>
<entry>
- <para>
- Returns the current similarity threshold used by the '%'
- operator. This in effect sets the minimum similarity between
- two words in order that they be considered similar enough to
- be misspellings of each other, for example.
- </para>
+ Returns an array of all the trigrams in the given string.
+ (In practice this is seldom useful except for debugging.)
</entry>
</row>
<row>
- <entry><literal>real set_limit(real)</literal></entry>
+ <entry><function>show_limit()</function></entry>
+ <entry><type>real</type></entry>
<entry>
- <para>
- Sets the current similarity threshold that is used by the '%'
- operator, and is returned by the show_limit() function.
- </para>
+ Returns the current similarity threshold used by the <literal>%</>
+ operator. This sets the minimum similarity between
+ two words for them to be considered similar enough to
+ be misspellings of each other, for example.
</entry>
</row>
<row>
- <entry><literal>text[] show_trgm(text)</literal></entry>
+ <entry><function>set_limit(real)</function></entry>
+ <entry><type>real</type></entry>
<entry>
- <para>
- Returns an array of all the trigrams of the supplied text
- parameter.
- </para>
+ Sets the current similarity threshold that is used by the <literal>%</>
+ operator. The threshold must be between 0 and 1 (default is 0.3).
+ Returns the same value passed in.
</entry>
</row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <table id="pgtrgm-op-table">
+ <title><filename>pg_trgm</filename> operators</title>
+ <tgroup cols="3">
+ <thead>
<row>
- <entry>Operator: <literal>text % text (returns boolean)</literal></entry>
+ <entry>Operator</entry>
+ <entry>Returns</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><type>text</> <literal>%</literal> <type>text</></entry>
+ <entry><type>boolean</type></entry>
<entry>
- <para>
- The '%' operator returns TRUE if its two arguments have a similarity
- that is greater than the similarity threshold set by set_limit(). It
- will return FALSE if the similarity is less than the current
- threshold.
- </para>
+ Returns <literal>true</> if its arguments have a similarity that is
+ greater than the current similarity threshold set by
+ <function>set_limit</>.
</entry>
</row>
</tbody>
</sect2>
<sect2>
- <title>Public Index Operator Class</title>
+ <title>Index Support</title>
+
<para>
- The <literal>pg_trgm</literal> module comes with the
- <literal>gist_trgm_ops</literal> index operator class that allows a
- developer to create an index over a text column for the purpose
- of very fast similarity searches.
+ The <filename>pg_trgm</filename> module provides GiST and GIN index
+ operator classes that allow you to create an index over a text column for
+ the purpose of very fast similarity searches. These index types support
+ the <literal>%</> similarity operator (and no other operators, so you may
+ want a regular btree index too).
</para>
+
<para>
- To use this index, the '%' operator must be used and an appropriate
- similarity threshold for the application must be set. Example:
- </para>
+ Example:
+
<programlisting>
CREATE TABLE test_trgm (t text);
CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
</programlisting>
+or
+ <programlisting>
+CREATE INDEX trgm_idx ON test_trgm USING gin (t gin_trgm_ops);
+ </programlisting>
+ </para>
+
<para>
- At this point, you will have an index on the t text column that you
- can use for similarity searching. Example:
+ At this point, you will have an index on the <structfield>t</> column that
+ you can use for similarity searching. A typical query is
</para>
<programlisting>
-SELECT
- t,
- similarity(t, 'word') AS sml
-FROM
- test_trgm
-WHERE
- t % 'word'
-ORDER BY
- sml DESC, t;
+SELECT t, similarity(t, '<replaceable>word</>') AS sml
+ FROM test_trgm
+ WHERE t % '<replaceable>word</>'
+ ORDER BY sml DESC, t;
</programlisting>
<para>
This will return all values in the text column that are sufficiently
- similar to 'word', sorted from best match to worst. The index will
- be used to make this a fast operation over very large data sets.
+ similar to <replaceable>word</>, sorted from best match to worst. The
+ index will be used to make this a fast operation even over very large data
+ sets.
+ </para>
+
+ <para>
+ The choice between GiST and GIN indexing depends on the relative
+ performance characteristics of GiST and GIN, which are discussed elsewhere.
+ As a rule of thumb, a GIN index is faster to search than a GiST index, but
+ slower to build or update; so GIN is better suited for static data and GiST
+ for often-updated data.
</para>
</sect2>
<sect2>
<title>Text Search Integration</title>
+
<para>
Trigram matching is a very useful tool when used in conjunction
- with a full text index.
+ with a full text index. In particular it can help to recognize
+ misspelled input words that will not be matched directly by the
+ full text search mechanism.
</para>
+
<para>
The first step is to generate an auxiliary table containing all
the unique words in the documents:
</para>
+
<programlisting>
-CREATE TABLE words AS SELECT word FROM
- stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
+CREATE TABLE words AS SELECT word FROM
+ ts_stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
</programlisting>
+
<para>
where <structname>documents</> is a table that has a text field
- <structfield>bodytext</> that we wish to search. The use of the
- <literal>simple</> configuration with the <function>to_tsvector</>
- function, instead of just using the already
- existing vector is to avoid creating a list of already stemmed
- words. This way, only the original, unstemmed words are added
- to the word list.
+ <structfield>bodytext</> that we wish to search. The reason for using
+ the <literal>simple</> configuration with the <function>to_tsvector</>
+ function, instead of using a language-specific configuration,
+ is that we want a list of the original (unstemmed) words.
</para>
+
<para>
Next, create a trigram index on the word column:
</para>
+
<programlisting>
-CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);
- </programlisting>
- <para>
- or
- </para>
- <programlisting>
-CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops);
+CREATE INDEX words_idx ON words USING gin(word gin_trgm_ops);
</programlisting>
+
<para>
- Now, a <literal>SELECT</literal> query similar to the example above can be
- used to suggest spellings for misspelled words in user search terms. A
- useful extra clause is to ensure that the similar words are also
- of similar length to the misspelled word.
- </para>
- <para>
- <note>
- <para>
- Since the <structname>words</> table has been generated as a separate,
- static table, it will need to be periodically regenerated so that
- it remains up to date with the document collection.
- </para>
- </note>
+ Now, a <command>SELECT</command> query similar to the previous example can
+ be used to suggest spellings for misspelled words in user search terms.
+ A useful extra test is to require that the selected words are also of
+ similar length to the misspelled word.
</para>
+
+ <note>
+ <para>
+ Since the <structname>words</> table has been generated as a separate,
+ static table, it will need to be periodically regenerated so that
+ it remains reasonably up-to-date with the document collection.
+ Keeping it exactly current is usually unnecessary.
+ </para>
+ </note>
</sect2>
<sect2>
<title>References</title>
+
<para>
GiST Development Site
<ulink url="http://www.sai.msu.su/~megera/postgres/gist/"></ulink>
<sect2>
<title>Authors</title>
+
<para>
Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
</para>
Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
</para>
<para>
- Documentation: Christopher Kings-Lynne
+ Documentation: Christopher Kings-Lynne
</para>
<para>
This module is sponsored by Delta-Soft Ltd., Moscow, Russia.