From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed, 29 Jun 2016 19:00:25 +0000 (-0400)
Subject: Adjust text search documentation for recent commits.
X-Git-Tag: REL9_6_BETA3~87
X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=4242a715c3fca1a8fa31f810b7cffa88b4d4e439;p=postgresql

Adjust text search documentation for recent commits.

Fix some now-obsolete statements that were overlooked in commits
6734a1cac, 3dbbd0f02, 028350f61.  Document the behavior of <0>.
Also do a little bit of rearranging and copy-editing for clarity.
---

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 9643746ca4..67d0c349e0 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -3885,12 +3885,12 @@ SELECT 'a:1A fat:2B,4C cat:5D'::tsvector;
 
     <para>
      It is important to understand that the
-     <type>tsvector</type> type itself does not perform any normalization;
-     it assumes the words it is given are normalized appropriately
-     for the application.  For example,
+     <type>tsvector</type> type itself does not perform any word
+     normalization; it assumes the words it is given are normalized
+     appropriately for the application.  For example,
 
 <programlisting>
-select 'The Fat Rats'::tsvector;
+SELECT 'The Fat Rats'::tsvector;
       tsvector      
 --------------------
  'Fat' 'Rats' 'The'
@@ -3929,12 +3929,20 @@ SELECT to_tsvector('english', 'The Fat Rats');
      <literal>&lt;-&gt;</> (FOLLOWED BY).  There is also a variant
      <literal>&lt;<replaceable>N</>&gt;</literal> of the FOLLOWED BY
      operator, where <replaceable>N</> is an integer constant that
-     specifies a maximum distance between the two lexemes being searched
+     specifies the distance between the two lexemes being searched
      for.  <literal>&lt;-&gt;</> is equivalent to <literal>&lt;1&gt;</>.
     </para>
 
     <para>
-     Parentheses can be used to enforce grouping of the operators:
+     Parentheses can be used to enforce grouping of these operators.
+     In the absence of parentheses, <literal>!</> (NOT) binds most tightly,
+     <literal>&lt;-&gt;</literal> (FOLLOWED BY) next most tightly, then
+     <literal>&amp;</literal> (AND), with <literal>|</literal> (OR) binding
+     the least tightly.
+    </para>
+
+    <para>
+     Here are some examples:
 
 <programlisting>
 SELECT 'fat &amp; rat'::tsquery;
@@ -3951,17 +3959,21 @@ SELECT 'fat &amp; rat &amp; ! cat'::tsquery;
         tsquery         
 ------------------------
  'fat' &amp; 'rat' &amp; !'cat'
+
+SELECT '(fat | rat) &lt;-&gt; cat'::tsquery;
+              tsquery
+-----------------------------------
+ 'fat' &lt;-&gt; 'cat' | 'rat' &lt;-&gt; 'cat'
 </programlisting>
 
-     In the absence of parentheses, <literal>!</> (NOT) binds most tightly,
-     and <literal>&amp;</literal> (AND) and <literal>&lt;-&gt;</literal> (FOLLOWED BY)
-     both bind more tightly than <literal>|</literal> (OR).
+     The last example demonstrates that <type>tsquery</type> sometimes
+     rearranges nested operators into a logically equivalent formulation.
     </para>
 
     <para>
      Optionally, lexemes in a <type>tsquery</type> can be labeled with
      one or more weight letters, which restricts them to match only
-     <type>tsvector</> lexemes with matching weights:
+     <type>tsvector</> lexemes with one of those weights:
 
 <programlisting>
 SELECT 'fat:ab &amp; cat'::tsquery;
@@ -3981,25 +3993,7 @@ SELECT 'super:*'::tsquery;
  'super':*
 </programlisting>
      This query will match any word in a <type>tsvector</> that begins
-     with <quote>super</>.  Note that prefixes are first processed by
-     text search configurations, which means this comparison returns
-     true:
-<programlisting>
-SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
- ?column? 
-----------
- t
-(1 row)
-</programlisting>
-     because <literal>postgres</> gets stemmed to <literal>postgr</>:
-<programlisting>
-SELECT to_tsquery('postgres:*');
- to_tsquery 
-------------
- 'postgr':*
-(1 row)
-</programlisting>
-     which then matches <literal>postgraduate</>.
+     with <quote>super</>.
     </para>
 
     <para>
@@ -4015,6 +4009,24 @@ SELECT to_tsquery('Fat:ab &amp; Cats');
 ------------------
  'fat':AB &amp; 'cat'
 </programlisting>
+
+     Note that <function>to_tsquery</> will process prefixes in the same way
+     as other words, which means this comparison returns true:
+
+<programlisting>
+SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
+ ?column?
+----------
+ t
+</programlisting>
+     because <literal>postgres</> gets stemmed to <literal>postgr</>:
+<programlisting>
+SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
+  to_tsvector  | to_tsquery
+---------------+------------
+ 'postgradu':1 | 'postgr':*
+</programlisting>
+     which will match the stemmed form of <literal>postgraduate</>.
     </para>
 
    </sect2>
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index df4732e654..41151ef4bd 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -322,8 +322,7 @@ text @@ text
     match.  Similarly, the <literal>|</literal> (OR) operator specifies that
     at least one of its arguments must appear, while the <literal>!</> (NOT)
     operator specifies that its argument must <emphasis>not</> appear in
-    order to have a match.  Parentheses can be used to control nesting of
-    these operators.
+    order to have a match.
    </para>
 
    <para>
@@ -346,10 +345,10 @@ SELECT to_tsvector('error is not fatal') @@ to_tsquery('fatal &lt;-&gt; error');
 
     There is a more general version of the FOLLOWED BY operator having the
     form <literal>&lt;<replaceable>N</>&gt;</literal>,
-    where <replaceable>N</> is an integer standing for the exact distance
-    allowed between the matching lexemes.  <literal>&lt;1&gt;</literal> is
+    where <replaceable>N</> is an integer standing for the difference between
+    the positions of the matching lexemes.  <literal>&lt;1&gt;</literal> is
     the same as <literal>&lt;-&gt;</>, while <literal>&lt;2&gt;</literal>
-    allows one other lexeme to appear between the matches, and so
+    allows exactly one other lexeme to appear between the matches, and so
     on.  The <literal>phraseto_tsquery</> function makes use of this
     operator to construct a <literal>tsquery</> that can match a multi-word
     phrase when some of the words are stop words.  For example:
@@ -366,9 +365,17 @@ SELECT phraseto_tsquery('the cats ate the rats');
  'cat' &lt;-&gt; 'ate' &lt;2&gt; 'rat'
 </programlisting>
    </para>
+
+   <para>
+    A special case that's sometimes useful is that <literal>&lt;0&gt;</literal>
+    can be used to require that two patterns match the same word.
+   </para>
+
    <para>
-     The precedence of tsquery operators is as follows: <literal>|</literal>, <literal>&amp;</literal>, 
-     <literal>&lt;-&gt;</literal>, <literal>!</literal>.
+    Parentheses can be used to control nesting of the <type>tsquery</>
+    operators.  Without parentheses, <literal>|</literal> binds least tightly,
+    then <literal>&amp;</literal>, then <literal>&lt;-&gt;</literal>,
+    and <literal>!</literal> most tightly.
    </para>
   </sect2>
 
@@ -1423,9 +1430,10 @@ FROM (SELECT id, body, q, ts_rank_cd(ti, q) AS rank
        lacks any position or weight information.  The result is usually much
        smaller than an unstripped vector, but it is also less useful.
        Relevance ranking does not work as well on stripped vectors as
-       unstripped ones.  Also, when given stripped input,
+       unstripped ones.  Also,
        the <literal>&lt;-&gt;</> (FOLLOWED BY) <type>tsquery</> operator
-       effectively degenerates to a simple <literal>&amp;</> (AND) test.
+       will never match stripped input, since it cannot determine the
+       distance between lexeme occurrences.
       </para>
      </listitem>