<table id="textsearch-operators-table">
<title>Text Search Operators</title>
- <tgroup cols="4">
+ <tgroup cols="5">
<thead>
<row>
<entry>Operator</entry>
+ <entry>Return Type</entry>
<entry>Description</entry>
<entry>Example</entry>
<entry>Result</entry>
<tbody>
<row>
<entry> <literal>@@</literal> </entry>
+ <entry><type>boolean</></entry>
<entry><type>tsvector</> matches <type>tsquery</> ?</entry>
<entry><literal>to_tsvector('fat cats ate rats') @@ to_tsquery('cat & rat')</literal></entry>
<entry><literal>t</literal></entry>
</row>
<row>
<entry> <literal>@@@</literal> </entry>
+ <entry><type>boolean</></entry>
<entry>deprecated synonym for <literal>@@</></entry>
<entry><literal>to_tsvector('fat cats ate rats') @@@ to_tsquery('cat & rat')</literal></entry>
<entry><literal>t</literal></entry>
</row>
<row>
<entry> <literal>||</literal> </entry>
+ <entry><type>tsvector</></entry>
<entry>concatenate <type>tsvector</>s</entry>
<entry><literal>'a:1 b:2'::tsvector || 'c:1 d:2 b:3'::tsvector</literal></entry>
<entry><literal>'a':1 'b':2,5 'c':3 'd':4</literal></entry>
</row>
<row>
<entry> <literal>&&</literal> </entry>
+ <entry><type>tsquery</></entry>
<entry>AND <type>tsquery</>s together</entry>
<entry><literal>'fat | rat'::tsquery && 'cat'::tsquery</literal></entry>
<entry><literal>( 'fat' | 'rat' ) & 'cat'</literal></entry>
</row>
<row>
<entry> <literal>||</literal> </entry>
+ <entry><type>tsquery</></entry>
<entry>OR <type>tsquery</>s together</entry>
<entry><literal>'fat | rat'::tsquery || 'cat'::tsquery</literal></entry>
<entry><literal>( 'fat' | 'rat' ) | 'cat'</literal></entry>
</row>
<row>
<entry> <literal>!!</literal> </entry>
+ <entry><type>tsquery</></entry>
<entry>negate a <type>tsquery</></entry>
<entry><literal>!! 'cat'::tsquery</literal></entry>
<entry><literal>!'cat'</literal></entry>
</row>
<row>
<entry> <literal><-></literal> </entry>
+ <entry><type>tsquery</></entry>
<entry><type>tsquery</> followed by <type>tsquery</></entry>
<entry><literal>to_tsquery('fat') <-> to_tsquery('rat')</literal></entry>
<entry><literal>'fat' <-> 'rat'</literal></entry>
</row>
<row>
<entry> <literal>@></literal> </entry>
+ <entry><type>boolean</></entry>
<entry><type>tsquery</> contains another ?</entry>
<entry><literal>'cat'::tsquery @> 'cat & rat'::tsquery</literal></entry>
<entry><literal>f</literal></entry>
</row>
<row>
<entry> <literal><@</literal> </entry>
+ <entry><type>boolean</></entry>
<entry><type>tsquery</> is contained in ?</entry>
<entry><literal>'cat'::tsquery <@ 'cat & rat'::tsquery</literal></entry>
<entry><literal>t</literal></entry>
<literal><function>phraseto_tsquery(<optional> <replaceable class="PARAMETER">config</> <type>regconfig</> , </optional> <replaceable class="PARAMETER">query</> <type>text</type>)</function></literal>
</entry>
<entry><type>tsquery</type></entry>
- <entry>produce <type>tsquery</> ignoring punctuation</entry>
+ <entry>produce <type>tsquery</> that searches for a phrase,
+ ignoring punctuation</entry>
<entry><literal>phraseto_tsquery('english', 'The Fat Rats')</literal></entry>
<entry><literal>'fat' <-> 'rat'</literal></entry>
</row>
<literal><function>ts_rewrite(<replaceable class="PARAMETER">query</replaceable> <type>tsquery</>, <replaceable class="PARAMETER">target</replaceable> <type>tsquery</>, <replaceable class="PARAMETER">substitute</replaceable> <type>tsquery</>)</function></literal>
</entry>
<entry><type>tsquery</type></entry>
- <entry>replace target with substitute within query</entry>
+ <entry>replace <replaceable>target</> with <replaceable>substitute</>
+ within query</entry>
<entry><literal>ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'foo|bar'::tsquery)</literal></entry>
<entry><literal>'b' & ( 'foo' | 'bar' )</literal></entry>
</row>
<literal><function>tsquery_phrase(<replaceable class="PARAMETER">query1</replaceable> <type>tsquery</>, <replaceable class="PARAMETER">query2</replaceable> <type>tsquery</>)</function></literal>
</entry>
<entry><type>tsquery</type></entry>
- <entry>implementation of <literal><-></> (FOLLOWED BY) operator</entry>
+ <entry>make query that searches for <replaceable>query1</> followed
+ by <replaceable>query2</> (same as <literal><-></>
+ operator)</entry>
<entry><literal>tsquery_phrase(to_tsquery('fat'), to_tsquery('cat'))</literal></entry>
<entry><literal>'fat' <-> 'cat'</literal></entry>
</row>
<literal><function>tsquery_phrase(<replaceable class="PARAMETER">query1</replaceable> <type>tsquery</>, <replaceable class="PARAMETER">query2</replaceable> <type>tsquery</>, <replaceable class="PARAMETER">distance</replaceable> <type>integer</>)</function></literal>
</entry>
<entry><type>tsquery</type></entry>
- <entry>phrase-concatenate with distance</entry>
+ <entry>make query that searches for <replaceable>query1</> followed by
+ <replaceable>query2</> at maximum distance <replaceable>distance</></entry>
<entry><literal>tsquery_phrase(to_tsquery('fat'), to_tsquery('cat'), 10)</literal></entry>
<entry><literal>'fat' <10> 'cat'</literal></entry>
</row>
As the above example suggests, a <type>tsquery</type> is not just raw
text, any more than a <type>tsvector</type> is. A <type>tsquery</type>
contains search terms, which must be already-normalized lexemes, and
- may combine multiple terms using AND, OR, NOT and FOLLOWED BY operators.
- (For details see <xref linkend="datatype-textsearch">.) There are
- functions <function>to_tsquery</>, <function>plainto_tsquery</>
+ may combine multiple terms using AND, OR, NOT, and FOLLOWED BY operators.
+ (For details see <xref linkend="datatype-tsquery">.) There are
+ functions <function>to_tsquery</>, <function>plainto_tsquery</>,
and <function>phraseto_tsquery</>
that are helpful in converting user-written text into a proper
- <type>tsquery</type>, for example by normalizing words appearing in
+ <type>tsquery</type>, primarily by normalizing words appearing in
the text. Similarly, <function>to_tsvector</> is used to parse and
normalize a document string. So in practice a text search match would
look more like this:
already normalized, so <literal>rats</> does not match <literal>rat</>.
</para>
- <para>
- Phrase search is made possible with the help of the <literal><-></>
- (FOLLOWED BY) operator, which enforces lexeme order. This allows you
- to discard strings not containing the desired phrase, for example:
-
-<programlisting>
-SELECT q @@ to_tsquery('fatal <-> error')
-FROM unnest(array[to_tsvector('fatal error'),
- to_tsvector('error is not fatal')]) AS q;
- ?column?
-----------
- t
- f
-</programlisting>
-
- A more generic version of the FOLLOWED BY operator takes form of
- <literal><N></>, where N stands for the greatest allowed distance
- between the specified lexemes. The <literal>phraseto_tsquery</>
- function makes use of this behavior in order to construct a
- <literal>tsquery</> capable of matching the provided phrase:
-
-<programlisting>
-SELECT phraseto_tsquery('cat ate some rats');
- phraseto_tsquery
--------------------------------
- ( 'cat' <-> 'ate' ) <2> 'rat'
-</programlisting>
- </para>
-
<para>
The <literal>@@</literal> operator also
supports <type>text</type> input, allowing explicit conversion of a text
The form <type>text</type> <literal>@@</literal> <type>text</type>
is equivalent to <literal>to_tsvector(x) @@ plainto_tsquery(y)</literal>.
</para>
+
+ <para>
+ Within a <type>tsquery</>, the <literal>&</literal> (AND) operator
+ specifies that both its arguments must appear in the document to have a
+ match. Similarly, the <literal>|</literal> (OR) operator specifies that
+ at least one of its arguments must appear, while the <literal>!</> (NOT)
+ operator specifies that its argument must <emphasis>not</> appear in
+ order to have a match. Parentheses can be used to control nesting of
+ these operators.
+ </para>
+
+ <para>
+ Searching for phrases is possible with the help of
+ the <literal><-></> (FOLLOWED BY) <type>tsquery</> operator, which
+ matches only if its arguments have matches that are adjacent and in the
+ given order. For example:
+
+<programlisting>
+SELECT to_tsvector('fatal error') @@ to_tsquery('fatal <-> error');
+ ?column?
+----------
+ t
+
+SELECT to_tsvector('error is not fatal') @@ to_tsquery('fatal <-> error');
+ ?column?
+----------
+ f
+</programlisting>
+
+ There is a more general version of the FOLLOWED BY operator having the
+ form <literal><<replaceable>N</>></literal>,
+ where <replaceable>N</> is an integer standing for the greatest distance
+ allowed between the matching lexemes. <literal><1></literal> is
+ the same as <literal><-></>, while <literal><2></literal>
+ allows one other lexeme to optionally appear between the matches, and so
+ on. The <literal>phraseto_tsquery</> function makes use of this
+ operator to construct a <literal>tsquery</> that can match a multi-word
+ phrase when some of the words are stop words. For example:
+
+<programlisting>
+SELECT phraseto_tsquery('cats ate rats');
+ phraseto_tsquery
+-------------------------------
+ ( 'cat' <-> 'ate' ) <-> 'rat'
+
+SELECT phraseto_tsquery('the cats ate the rats');
+ phraseto_tsquery
+-------------------------------
+ ( 'cat' <-> 'ate' ) <2> 'rat'
+</programlisting>
+ </para>
</sect2>
<sect2 id="textsearch-intro-configurations">
<para>
<productname>PostgreSQL</productname> provides the
functions <function>to_tsquery</function>,
- <function>plainto_tsquery</function> and
+ <function>plainto_tsquery</function>, and
<function>phraseto_tsquery</function>
for converting a query to the <type>tsquery</type> data type.
<function>to_tsquery</function> offers access to more features
- than both <function>plainto_tsquery</function> and
- <function>phraseto_tsquery</function>, but is less forgiving
+ than either <function>plainto_tsquery</function> or
+ <function>phraseto_tsquery</function>, but it is less forgiving
about its input.
</para>
<para>
<function>to_tsquery</function> creates a <type>tsquery</> value from
<replaceable>querytext</replaceable>, which must consist of single tokens
- separated by the Boolean operators <literal>&</literal> (AND),
- <literal>|</literal> (OR), <literal>!</literal> (NOT), and also the
- <literal><-></literal> (FOLLOWED BY) phrase search operator. These operators
- can be grouped using parentheses. In other words, the input to
+ separated by the <type>tsquery</> operators <literal>&</literal> (AND),
+ <literal>|</literal> (OR), <literal>!</literal> (NOT), and
+ <literal><-></literal> (FOLLOWED BY), possibly grouped
+ using parentheses. In other words, the input to
<function>to_tsquery</function> must already follow the general rules for
<type>tsquery</> input, as described in <xref
- linkend="datatype-textsearch">. The difference is that while basic
+ linkend="datatype-tsquery">. The difference is that while basic
<type>tsquery</> input takes the tokens at face value,
- <function>to_tsquery</function> normalizes each token to a lexeme using
+ <function>to_tsquery</function> normalizes each token into a lexeme using
the specified or default configuration, and discards any tokens that are
stop words according to the configuration. For example:
</screen>
Without quotes, <function>to_tsquery</function> will generate a syntax
- error for tokens that are not separated by an AND or OR operator.
+ error for tokens that are not separated by an AND, OR, or FOLLOWED BY
+ operator.
</para>
<indexterm>
</synopsis>
<para>
- <function>plainto_tsquery</> transforms unformatted text
- <replaceable>querytext</replaceable> to <type>tsquery</type>.
+ <function>plainto_tsquery</> transforms the unformatted text
+ <replaceable>querytext</replaceable> to a <type>tsquery</type> value.
The text is parsed and normalized much as for <function>to_tsvector</>,
- then the <literal>&</literal> (AND) Boolean operator is inserted
- between surviving words.
+ then the <literal>&</literal> (AND) <type>tsquery</type> operator is
+ inserted between surviving words.
</para>
<para>
'fat' & 'rat'
</screen>
- Note that <function>plainto_tsquery</> cannot
- recognize Boolean and phrase search operators, weight labels,
+ Note that <function>plainto_tsquery</> will not
+ recognize <type>tsquery</type> operators, weight labels,
or prefix-match labels in its input:
<screen>
<para>
<function>phraseto_tsquery</> behaves much like
- <function>plainto_tsquery</>, with the exception
- that it utilizes the <literal><-></literal> (FOLLOWED BY) phrase search
- operator instead of the <literal>&</literal> (AND) Boolean operator.
- This is particularly useful when searching for exact lexeme sequences,
- since the phrase search operator helps to maintain lexeme order.
+ <function>plainto_tsquery</>, except that it inserts
+ the <literal><-></literal> (FOLLOWED BY) operator between
+ surviving words instead of the <literal>&</literal> (AND) operator.
+ Also, stop words are not simply discarded, but are accounted for by
+ inserting <literal><<replaceable>N</>></literal> operators rather
+ than <literal><-></literal> operators. This function is useful
+ when searching for exact lexeme sequences, since the FOLLOWED BY
+ operators check lexeme order not just the presence of all the lexemes.
</para>
<para>
'fat' <-> 'rat'
</screen>
- Just like the <function>plainto_tsquery</>, the
- <function>phraseto_tsquery</> function cannot
- recognize Boolean and phrase search operators, weight labels,
+ Like <function>plainto_tsquery</>, the
+ <function>phraseto_tsquery</> function will not
+ recognize <type>tsquery</type> operators, weight labels,
or prefix-match labels in its input:
<screen>
-----------------------------
( 'fat' <-> 'rat' ) <-> 'c'
</screen>
-
- It is possible to specify the configuration to be used to parse the document,
- for example, we could create a new one using the hunspell dictionary
- (namely 'eng_hunspell') in order to match phrases with different word forms:
-
-<screen>
-SELECT phraseto_tsquery('eng_hunspell', 'developer of the building which collapsed');
- phraseto_tsquery
---------------------------------------------------------------------------------------------
- ( 'developer' <3> 'building' ) <2> 'collapse' | ( 'developer' <3> 'build' ) <2> 'collapse'
-</screen>
</para>
</sect2>
<listitem>
<para>
- Returns a vector which lists the same lexemes as the given vector, but
- which lacks any position or weight information. While the returned
- vector is much less useful than an unstripped vector for relevance
- ranking, it will usually be much smaller.
+ Returns a vector that lists the same lexemes as the given vector, but
+ lacks any position or weight information. The result is usually much
+ smaller than an unstripped vector, but it is also less useful.
+ Relevance ranking does not work as well on stripped vectors as
+ unstripped ones. Also, when given stripped input,
+ the <literal><-></> (FOLLOWED BY) <type>tsquery</> operator
+ effectively degenerates to a simple <literal>&</> (AND) test.
</para>
</listitem>
<listitem>
<para>
- Returns the phrase-concatenation of the two given queries.
+ Returns a query that searches for a match to the first given query
+ immediately followed by a match to the second given query, using
+ the <literal><-></> (FOLLOWED BY)
+ <type>tsquery</> operator. For example:
<screen>
SELECT to_tsquery('fat') <-> to_tsquery('cat | rat');
<listitem>
<para>
- Returns the distanced phrase-concatenation of the two given queries.
- This function lies in the implementation of the <literal><-></> operator.
+ Returns a query that searches for a match to the first given query
+ followed by a match to the second given query at a distance of at
+ most <replaceable>distance</replaceable> lexemes, using
+ the <literal><<replaceable>N</>></literal>
+ <type>tsquery</> operator. For example:
<screen>
SELECT tsquery_phrase(to_tsquery('fat'), to_tsquery('cat'), 10);
<para>Position values in <type>tsvector</> must be greater than 0 and
no more than 16,383</para>
</listitem>
+ <listitem>
+ <para>The match distance in a <literal><<replaceable>N</>></literal>
+ (FOLLOWED BY) <type>tsquery</> operator cannot be more than
+ 16,384</para>
+ </listitem>
<listitem>
<para>No more than 256 positions per lexeme</para>
</listitem>