<programlisting>
CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector('english', body));
</programlisting>
-
+
Notice that the 2-argument version of <function>to_tsvector</function> is
used. Only text search functions which specify a configuration name can
be used in expression indexes (<xref linkend="indexes-expressional">).
column current anytime <literal>title</> or <literal>body</> changes.
Keep in mind that, just like with expression indexes, it is important to
specify the configuration name when creating text search data types
- inside triggers so the column's contents are not affected by changes to
+ inside triggers so the column's contents are not affected by changes to
<varname>default_text_search_config</>.
</para>
</programlisting>
</para>
- <para>
+ <para>
In the example above we see that the resulting <type>tsvector</type> does not
contain the words <literal>a</literal>, <literal>on</literal>, or
<literal>it</literal>, the word <literal>rats</literal> became
<literal>rat</literal>, and the punctuation sign <literal>-</literal> was
- ignored.
- </para>
+ ignored.
+ </para>
<para>
The <function>to_tsvector</function> function internally calls a parser
<programlisting>
SELECT * FROM ts_debug('english','a fat cat sat on a mat - it ate a fat rats');
- Alias | Description | Token | Dictionaries | Lexized token
+ Alias | Description | Token | Dictionaries | Lexized token
-------+---------------+-------+--------------+----------------
lword | Latin word | a | {english} | english: {}
- blank | Space symbols | | |
+ blank | Space symbols | | |
lword | Latin word | fat | {english} | english: {fat}
- blank | Space symbols | | |
+ blank | Space symbols | | |
lword | Latin word | cat | {english} | english: {cat}
- blank | Space symbols | | |
+ blank | Space symbols | | |
lword | Latin word | sat | {english} | english: {sat}
- blank | Space symbols | | |
+ blank | Space symbols | | |
lword | Latin word | on | {english} | english: {}
- blank | Space symbols | | |
+ blank | Space symbols | | |
lword | Latin word | a | {english} | english: {}
- blank | Space symbols | | |
+ blank | Space symbols | | |
lword | Latin word | mat | {english} | english: {mat}
- blank | Space symbols | | |
- blank | Space symbols | - | |
+ blank | Space symbols | | |
+ blank | Space symbols | - | |
lword | Latin word | it | {english} | english: {}
- blank | Space symbols | | |
+ blank | Space symbols | | |
lword | Latin word | ate | {english} | english: {ate}
- blank | Space symbols | | |
+ blank | Space symbols | | |
lword | Latin word | a | {english} | english: {}
- blank | Space symbols | | |
+ blank | Space symbols | | |
lword | Latin word | fat | {english} | english: {fat}
- blank | Space symbols | | |
+ blank | Space symbols | | |
lword | Latin word | rats | {english} | english: {rat}
(24 rows)
</programlisting>
<programlisting>
{D-weight, C-weight, B-weight, A-weight}
</programlisting>
-
+
If no weights are provided,
then these defaults are used:
a b <b>c</b>
SELECT ts_headline('a b c', 'c'::tsquery, 'StartSel=<,StopSel=>');
- ts_headline
+ ts_headline
-------------
a b <c>
</programlisting>
</para>
<para>
- Some examples of normalization:
+ Some examples of normalization:
<itemizedlist spacing="compact" mark="bullet">
Linguistic - ispell dictionaries try to reduce input words to a
normalized form; stemmer dictionaries remove word endings
</para>
- </listitem>
+ </listitem>
<listitem>
<para>
Identical <acronym>URL</acronym> locations are identified and canonicalized:
<sect2 id="textsearch-stopwords">
<title>Stop Words</title>
-
+
<para>
Stop words are words which are very common, appear in almost
every document, and have no discrimination value. Therefore, they can be ignored
<programlisting>
SELECT * FROM ts_debug('english','Paris');
- Alias | Description | Token | Dictionaries | Lexized token
+ Alias | Description | Token | Dictionaries | Lexized token
-------+-------------+-------+----------------+----------------------
lword | Latin word | Paris | {english_stem} | english_stem: {pari}
(1 row)
ALTER MAPPING FOR lword WITH synonym, english_stem;
SELECT * FROM ts_debug('english','Paris');
- Alias | Description | Token | Dictionaries | Lexized token
+ Alias | Description | Token | Dictionaries | Lexized token
-------+-------------+-------+------------------------+------------------
lword | Latin word | Paris | {synonym,english_stem} | synonym: {paris}
(1 row)
<secondary>GiST</secondary>
</indexterm>
-<!--
<indexterm zone="textsearch-indexes">
<primary>GiST</primary>
+ <secondary>text search</secondary>
</indexterm>
--->
<term>
<synopsis>
CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING gist(<replaceable>column</replaceable>);
<secondary>GIN</secondary>
</indexterm>
-<!--
<indexterm zone="textsearch-indexes">
<primary>GIN</primary>
+ <secondary>text search</secondary>
</indexterm>
--->
+
<term>
<synopsis>
CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING gin(<replaceable>column</replaceable>);
List of fulltext configurations
Schema | Name | Description
----------+----------------------------
- fulltext | fulltext_cfg |
+ fulltext | fulltext_cfg |
public | fulltext_cfg |
</programlisting>
</para>
PG_FREE_IF_COPY(in, 0);
pcfg=cfg;
- while (pcfg->key)
+ while (pcfg->key)
{
if (strcasecmp("MAXLEN", pcfg->key) == 0)
d->maxlen=atoi(pcfg->value);
- else if ( strcasecmp("REJECTLONG", pcfg->key) == 0)
+ else if ( strcasecmp("REJECTLONG", pcfg->key) == 0)
{
if ( strcasecmp("true", pcfg->value) == 0 )
d->rejectlong=true;
if (PG_GETARG_INT32(2) > d->maxlen)
{
- if (d->rejectlong)
+ if (d->rejectlong)
{ /* stop, return void array */
pfree(txt);
res[0].lexeme = NULL;
/* blank type */
type = 12;
/* go to the next non-white-space character */
- while ((pst->buffer)[pst->pos] == ' ' &&
+ while ((pst->buffer)[pst->pos] == ' ' &&
pst->pos < pst->len)
(pst->pos)++;
} else {
/* word type */
type = 3;
/* go to the next white-space character */
- while ((pst->buffer)[pst->pos] != ' ' &&
+ while ((pst->buffer)[pst->pos] != ' ' &&
pst->pos < pst->len)
(pst->pos)++;
}