4 <indexterm zone="pgtrgm">
5 <primary>pg_trgm</primary>
9 The <literal>pg_trgm</literal> module provides functions and index classes
10 for determining the similarity of text based on trigram matching.
14 <title>Trigram (or Trigraph)</title>
16 A trigram is a set of three consecutive characters taken
17 from a string. A string is considered to have two spaces
18 prefixed and one space suffixed when determining the set
19 of trigrams that comprise the string.
22 eg. The set of trigrams in the word "cat" is " c", " ca",
28 <title>Public Functions</title>
30 <title><literal>pg_trgm</literal> functions</title>
34 <entry>Function</entry>
35 <entry>Description</entry>
40 <entry><literal>real similarity(text, text)</literal></entry>
43 Returns a number that indicates how closely matches the two
44 arguments are. A zero result indicates that the two words
45 are completely dissimilar, and a result of one indicates that
46 the two words are identical.
51 <entry><literal>real show_limit()</literal></entry>
54 Returns the current similarity threshold used by the '%'
55 operator. This in effect sets the minimum similarity between
56 two words in order that they be considered similar enough to
57 be misspellings of each other, for example.
62 <entry><literal>real set_limit(real)</literal></entry>
65 Sets the current similarity threshold that is used by the '%'
66 operator, and is returned by the show_limit() function.
71 <entry><literal>text[] show_trgm(text)</literal></entry>
74 Returns an array of all the trigrams of the supplied text
80 <entry>Operator: <literal>text % text (returns boolean)</literal></entry>
83 The '%' operator returns TRUE if its two arguments have a similarity
84 that is greater than the similarity threshold set by set_limit(). It
85 will return FALSE if the similarity is less than the current
96 <title>Public Index Operator Class</title>
98 The <literal>pg_trgm</literal> module comes with the
99 <literal>gist_trgm_ops</literal> index operator class that allows a
100 developer to create an index over a text column for the purpose
101 of very fast similarity searches.
104 To use this index, the '%' operator must be used and an appropriate
105 similarity threshold for the application must be set. Example:
108 CREATE TABLE test_trgm (t text);
109 CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
112 At this point, you will have an index on the t text column that you
113 can use for similarity searching. Example:
118 similarity(t, 'word') AS sml
127 This will return all values in the text column that are sufficiently
128 similar to 'word', sorted from best match to worst. The index will
129 be used to make this a fast operation over very large data sets.
134 <title>Text Search Integration</title>
136 Trigram matching is a very useful tool when used in conjunction
137 with a full text index.
140 The first step is to generate an auxiliary table containing all
141 the unique words in the documents:
144 CREATE TABLE words AS SELECT word FROM
145 stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
148 where <structname>documents</> is a table that has a text field
149 <structfield>bodytext</> that we wish to search. The use of the
150 <literal>simple</> configuration with the <function>to_tsvector</>
151 function, instead of just using the already
152 existing vector is to avoid creating a list of already stemmed
153 words. This way, only the original, unstemmed words are added
157 Next, create a trigram index on the word column:
160 CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);
166 CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops);
169 Now, a <literal>SELECT</literal> query similar to the example above can be
170 used to suggest spellings for misspelled words in user search terms. A
171 useful extra clause is to ensure that the similar words are also
172 of similar length to the misspelled word.
177 Since the <structname>words</> table has been generated as a separate,
178 static table, it will need to be periodically regenerated so that
179 it remains up to date with the document collection.
186 <title>References</title>
188 GiST Development Site
189 <ulink url="http://www.sai.msu.su/~megera/postgres/gist/"></ulink>
192 Tsearch2 Development Site
193 <ulink url="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/"></ulink>
198 <title>Authors</title>
200 Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
203 Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
206 Documentation: Christopher Kings-Lynne
209 This module is sponsored by Delta-Soft Ltd., Moscow, Russia.