granicus.if.org Git - postgresql/blob - contrib/tsearch2/docs/tsearch-V2-intro.html

   1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
   2 <html>
   3 <head>
   4   <title>tsearch-v2-intro</title>
   5 </head>
   6 <body class="content">
   7 <div class="content">
   8 <h2>Tsearch2 - Introduction</h2>
   9 <p><a href="
  10 http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html">
  11 [Online version]</a> of this document is available.</p>
  12
  13 <p>The tsearch2 module is available to add as an extension to the
  14 PostgreSQL database to allow for Full Text Indexing. This document
  15 is an introduction to installing, configuring, using and
  16 maintaining the database with the tsearch2 module activated.</p>
  17 <p>Please, note, tsearch2 module is fully incompatible with old
  18 tsearch, which is deprecated in 7.4 and will be obsoleted in
  19 8.0.</p>
  20 <h3>USING TSEARCH2 AND POSTGRESQL FOR A WEB BASED SEARCH
  21 ENGINE</h3>
  22 <p>This documentation is provided as a short guide on how to
  23 quickly get up and running with tsearch2 and PostgreSQL, for those
  24 who want to implement a full text indexed based search engine. It
  25 is not meant to be a complete in-depth guide into the full ins and
  26 outs of the contrib/tsearch2 module, and is primarily aimed at
  27 beginners who want to speed up searching of large text fields, or
  28 those migrating from other database systems such as MS-SQL.</p>
  29 <p>The README.tsearch2 file included in the contrib/tsearch2
  30 directory contains a brief overview and history behind tsearch.
  31 This can also be found online <a href="
  32 http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/">[right
  33 here]</a>.</p>
  34 <p>Further in depth documentation such as a full function
  35 reference, and user guide can be found online at the <a href="
  36 http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/">[tsearch
  37 documentation home]</a>.</p>
  38 <h3>ACKNOWLEDGEMENTS</h3>
  39
  40 <p>Robert John Shepherd originally wrote this documentation for the
  41 previous version of tsearch module (v1) included with the postgres
  42 release. I took his documentation and updated it to comply with the
  43 tsearch2 modifications.</p>
  44 <p>Robert's original acknowledgements:</p>
  45 <p>"Thanks to Oleg Bartunov for taking the time to answer many of
  46 my questions regarding this module, and also to Teodor Sigaev for
  47 clearing up the process of making your own dictionaries. Plus of
  48 course a big thanks to the pair of them for writing this module in
  49 the first place!"</p>
  50 <p>I would also like to extend my thanks to the developers, and
  51 Oleg Bartunov for all of his direction and help with the new
  52 features of tsearch2.</p>
  53 <h3>OVERVIEW</h3>
  54 <p>MS-SQL provides a full text indexing (FTI) system which enables
  55 the fast searching of text based fields, very useful for websites
  56 (and other applications) that require a results set based on key
  57 words. PostgreSQL ships with a contributed module called tsearch2,
  58 which implements a special type of index that can also be used for
  59 full text indexing. Further more, unlike MS' offering which
  60 requires regular incremental rebuilds of the text indexes
  61 themselves, tsearch2 indexes are always up-to-date and keeping them
  62 so induces very little overhead.</p>
  63 <p>Before we get into the details, it is recommended that you have
  64 installed and tested PostgreSQL, are reasonably familiar with
  65 databases, the SQL query language and also understand the basics of
  66 connecting to PostgreSQL from the local shell. This document isn't
  67 intended for the complete PostgreSQL newbie, but anyone with a
  68 reasonable grasp of the basics should be able to follow it.</p>
  69 <h3>INSTALLATION</h3>
  70 <p>Starting with PostgreSQL version 7.4 tsearch2 is now included in
  71 the contrib directory with the PostgreSQL sources. contrib/tsearch2
  72 is where you will find everything needed to install and use
  73 tsearch2. Please note that tsearch2 will also work with PostgreSQL
  74 version 7.3.x, but it is not the module included with the source
  75 distribution. You will have to download the module separately and
  76 install it in the same fashion.</p>
  77
  78 <p>I installed the tsearch2 module to a PostgreSQL 7.3 database
  79 from the contrib directory without squashing the original (old)
  80 tsearch module. What I did was move the modules tsearch src
  81 driectory into the contrib tree under the name tsearchV2.</p>
  82 <p>Step one is to download the tsearch V2 module :</p>
  83 <p><a href="
  84 http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/">[http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/]</a>
  85 (check Development History for latest stable version !)</p>
  86 <pre>
  87         tar -zxvf tsearch-v2.tar.gz
  88         mv tsearch2 $PGSQL_SRC/contrib/
  89         cd $PGSQL_SRC/contrib/tsearch2
  90 </pre>
  91 <p>If you are installing from PostgreSQL version 7.4 or higher, you
  92 can skip those steps and just change to the contrib/tsearch2
  93 directory in the source tree and continue from there.</p>
  94 <p>As of May 9, 2004 there is a source patch available for
  95 tsearch2. The patch provides changes to the pg_ts_ configuration
  96 tables to allow for easy dump and restore of a database containing
  97 tsearch2. The patch is available here : <a href="
  98 http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/regprocedure_7.4.patch.gz">
  99 [http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/regprocedure_7.4.patch.gz]</a></p>
 100
 101 <p>To apply this patch, download the mentioned file and place it in
 102 your postgreSQL source tree ($PGSQL_SRC). This patch is not
 103 required for tsearch2 to work. I would however, highly recommend it
 104 as it makes the backup and restore procedures very simple.</p>
 105 <pre>
 106       cd $PGSQL_SRC
 107         gunzip regprocedure_7.4.patch.gz
 108         patch -b -p1 &lt; regprocedure_7.4.patch
 109 </pre>
 110 <p>If you have a working version of tsearch2 in your database, you
 111 do not need to re-install the tsearch2 module. Just apply the patch
 112 and run make. This patch only affects the tsearch2.sql file. You
 113 can run the SQL script found : <a href="
 114 http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/regprocedure_update.sql">
 115 [right here]</a> This script will make the modifications found in
 116 the patch, and update the fields from the existing data. From this
 117 point on, you can dump and restore the database in a normal
 118 fashion. Without this patch, you must follow the instructions later
 119 in this document for backup and restore.</p>
 120 <p>This patch is only needed for tsearch2 in PostgreSQL versions
 121 7.3.x and 7.4.x. The patch has been applied to the sources for
 122 8.0.x.</p>
 123 <p>When you have your source tree for tsearch2 ready, you can
 124 continue with the regular building and installation process</p>
 125
 126 <pre>
 127         gmake
 128         gmake install
 129         gmake installcheck
 130 </pre>
 131 <p>That is pretty much all you have to do, unless of course you get
 132 errors. However if you get those, you better go check with the
 133 mailing lists over at <a href="
 134 http://www.postgresql.org">http://www.postgresql.org</a> or
 135 <a href="
 136 http://openfts.sourceforge.net/">http://openfts.sourceforge.net/</a>
 137 since its never failed for me.</p>
 138 <p>If you ever need to revert this patch, and go back to the
 139 unpatched version of tsearch2, it is simple if you followed the
 140 above patch command. The -b option creates a backup of the original
 141 file, so we can just copy it back.</p>
 142 <pre>
 143      cd $PGSQL_SRC/contrib/tsearch2
 144         cp tsearch.sql.in.orig tsearch.sql.in
 145         make
 146
 147 </pre>
 148 <p>If you need the patched version again, just follow the patch
 149 instructions again.</p>
 150 <p>The directory in the contib/ and the directory from the archive
 151 is called tsearch2. Tsearch2 is completely incompatible with the
 152 previous version of tsearch. This means that both versions can be
 153 installed into a single database, and migration the new version may
 154 be much easier.</p>
 155 <p>NOTE: the previous version of tsearch found in the
 156 contrib/tsearch directory is depricated. Although it is still
 157 available and included within PostgreSQL version 7.4. It will be
 158 removed in version 8.0.</p>
 159 <h3>ADDING TSEARCH2 FUNCTIONALITY TO A DATABASE</h3>
 160 <p>We should create a database to use as an example for the
 161 remainder of this file. We can call the database "ftstest". You can
 162 create it from the command line like this:</p>
 163 <pre>
 164         #createdb ftstest
 165 </pre>
 166 <p>If you thought installation was easy, this next bit is even
 167 easier. Change to the PGSQL_SRC/contrib/tsearch2 directory and
 168 type:</p>
 169
 170 <pre>
 171         psql ftstest &lt; tsearch2.sql
 172 </pre>
 173 <p>The file "tsearch2.sql" holds all the wonderful little goodies
 174 you need to do full text indexing. It defines numerous functions
 175 and operators, and creates the needed tables in the database. There
 176 will be 4 new tables created after running the tsearch2.sql file :
 177 pg_ts_dict, pg_ts_parser, pg_ts_cfg, pg_ts_cfgmap are added.</p>
 178 <p>You can check out the tables if you like:</p>
 179 <pre>
 180         #psql ftstest
 181         ftstest=# \d
 182                     List of relations
 183          Schema |     Name     | Type  |  Owner
 184         --------+--------------+-------+----------
 185          public | pg_ts_cfg    | table | kopciuch
 186          public | pg_ts_cfgmap | table | kopciuch
 187          public | pg_ts_dict   | table | kopciuch
 188          public | pg_ts_parser | table | kopciuch
 189         (4 rows)
 190 </pre>
 191 <p>You may need to grant permissions to use on pg_ts_dict, pg_ts_parser, pg_ts_cfg, pg_ts_cfgmap tables to let non-superuser works with tsearch2. GRANT SELECT should be enough for search-only access. </p>
 192 <h3>TYPES AND FUNCTIONS PROVIDED BY TSEARCH2</h3>
 193 <p>The first thing we can do is try out some of the types that are
 194 provided for us. Lets look at the tsvector type provided for
 195 us:</p>
 196
 197 <pre>
 198         SELECT 'Our first string used today'::tsvector;
 199                         tsvector
 200         ---------------------------------------
 201          'Our' 'used' 'first' 'today' 'string'
 202         (1 row)
 203 </pre>
 204 <p>The results are the words used within our string. Notice they
 205 are not in any particular order. The tsvector type returns a string
 206 of space separated words.</p>
 207 <pre>
 208         SELECT 'Our first string used today first string'::tsvector;
 209                             tsvector
 210         -----------------------------------------------
 211          'Our' 'used' 'first' 'today' 'string'
 212         (1 row)
 213 </pre>
 214 <p>Notice the results string has each unique word ('first' and
 215 'string' only appear once in the tsvector value). Which of course
 216 makes sense if you are searching the full text ... you only need to
 217 know each unique word in the text.</p>
 218 <p>Those examples were just casting a text field to that of type
 219 tsvector. Lets check out one of the new functions created by the
 220 tsearch2 module.</p>
 221 <p>The function to_tsvector has 3 possible signatures:</p>
 222 <pre>
 223
 224         to_tsvector(oid, text);
 225         to_tsvector(text, text);
 226         to_tsvector(text);
 227 </pre>
 228 <p>We will use the second method using two text fields. The
 229 overloaded methods provide us with a way to specifiy the way the
 230 searchable text is broken up into words (Stemming process). Right
 231 now we will specify the 'default' configuration. See the section on
 232 TSEARCH2 CONFIGURATION to learn more about this.</p>
 233 <pre>
 234         SELECT to_tsvector('default',
 235                            'Our first string used today first string');
 236                         to_tsvector
 237         --------------------------------------------
 238          'use':4 'first':2,6 'today':5 'string':3,7
 239         (1 row)
 240 </pre>
 241 <p>The result returned from this function is of type tsvector. The
 242 results came about by this reasoning: All of the words in the text
 243 passed in are stemmed, or not used because they are stop words
 244 defined in our configuration. Each lower case morphed word is
 245 returned with all of the positons in the text.</p>
 246 <p>In this case the word "Our" is a stop word in the default
 247 configuration. That means it will not be included in the result.
 248 The word "first" is found at positions 2 and 6 (although "Our" is a
 249 stop word, it's position is maintained). The word(s) positioning is
 250 maintained exactly as in the original string. The word "used" is
 251 morphed to the word "use" based on the default configuration for
 252 word stemming, and is found at position 4. The rest of the results
 253 follow the same logic. Just a reminder again ... the order of the
 254 'word' position in the output is not in any kind of order. (ie
 255 'use':4 appears first)</p>
 256 <p>If you want to view the output of the tsvector fields without
 257 their positions, you can do so with the function
 258 "strip(tsvector)".</p>
 259 <pre>
 260         SELECT strip(to_tsvector('default',
 261                      'Our first string used today first string'));
 262                     strip
 263         --------------------------------
 264          'use' 'first' 'today' 'string'
 265
 266 </pre>
 267 <p>If you wish to know the number of unique words returned in the
 268 tsvector you can do so by using the function "length(tsvector)"</p>
 269 <pre>
 270         SELECT length(to_tsvector('default',
 271                       'Our first string used today first string'));
 272          length
 273         --------
 274               4
 275         (1 row)
 276 </pre>
 277 <p>Lets take a look at the function to_tsquery. It also has 3
 278 signatures which follow the same rational as the to_tsvector
 279 function:</p>
 280 <pre>
 281         to_tsquery(oid, text);
 282         to_tsquery(text, text);
 283         to_tsquery(text);
 284 </pre>
 285 <p>Lets try using the function with a single word :</p>
 286 <pre>
 287         SELECT to_tsquery('default', 'word');
 288          to_tsquery
 289         -----------
 290          'word'
 291          (1 row)
 292
 293 </pre>
 294 <p>I call the function the same way I would a to_tsvector function,
 295 specifying the 'default' configuration for morphing, and the result
 296 is the stemmed output 'word'.</p>
 297 <p>Lets attempt to use the function with a string of multiple
 298 words:</p>
 299 <pre>
 300         SELECT to_tsquery('default', 'this is many words');
 301         ERROR:  Syntax error
 302 </pre>
 303 <p>The function can not accept a space separated string. The
 304 intention of the to_tsquery function is to return a type of
 305 "tsquery" used for searching a tsvector field. What we need to do
 306 is search for one to many words with some kind of logic (for now
 307 simple boolean).</p>
 308 <pre>
 309         SELECT to_tsquery('default', 'searching|sentence');
 310               to_tsquery
 311         ----------------------
 312          'search' | 'sentenc'
 313         (1 row)
 314 </pre>
 315 <p>Notice that the words are separated by the boolean logic "OR",
 316 the text could contain boolean operators &amp;,|,!,() with their
 317 usual meaning.</p>
 318
 319 <p>You can not use words defined as being a stop word in your
 320 configuration. The function will not fail ... you will just get no
 321 result, and a NOTICE like this:</p>
 322 <pre>
 323         SELECT to_tsquery('default', 'a|is&amp;not|!the');
 324         NOTICE:  Query contains only stopword(s)
 325                  or doesn't contain lexeme(s), ignored
 326          to_tsquery
 327         -----------
 328         (1 row)
 329 </pre>
 330 <p>That is a beginning to using the types, and functions defined in
 331 the tsearch2 module. There are numerous more functions that I have
 332 not touched on. You can read through the tsearch2.sql file built
 333 when compiling to get more familiar with what is included.</p>
 334 <h3>INDEXING FIELDS IN A TABLE</h3>
 335 <p>The next stage is to add a full text index to an existing table.
 336 In this example we already have a table defined as follows:</p>
 337 <pre>
 338         CREATE TABLE tblMessages
 339         (
 340                 intIndex        int4,
 341                 strTopic        varchar(100),
 342                 strMessage      text
 343         );
 344 </pre>
 345
 346 <p>We are assuming there are several rows with some kind of data in
 347 them. Any data will do, just do several inserts with test strings
 348 for a topic, and a message. here is some test data I inserted. (yes
 349 I know it's completely useless stuff ;-) but it will serve our
 350 purpose right now).</p>
 351 <pre>
 352         INSERT INTO tblMessages
 353                VALUES ('1', 'Testing Topic', 'Testing message data input');
 354         INSERT INTO tblMessages
 355                VALUES ('2', 'Movie', 'Breakfast at Tiffany\'s');
 356         INSERT INTO tblMessages
 357                VALUES ('3', 'Famous Author', 'Stephen King');
 358         INSERT INTO tblMessages
 359                VALUES ('4', 'Political Topic',
 360                             'Nelson Mandella is released from prison');
 361         INSERT INTO tblMessages
 362                VALUES ('5', 'Nursery rhyme phrase',
 363                             'Little jack horner sat in a corner');
 364         INSERT INTO tblMessages
 365                VALUES ('6', 'Gettysburg address quotation',
 366                             'Four score and seven years ago'
 367                             ' our fathers brought forth on this'
 368                             ' continent a new nation, conceived in'
 369                             ' liberty and dedicated to the proposition'
 370                             ' that all men are created equal');
 371         INSERT INTO tblMessages
 372                VALUES ('7', 'Classic Rock Bands',
 373                             'Led Zeppelin Grateful Dead and The Sex Pistols');
 374         INSERT INTO tblMessages
 375                VALUES ('8', 'My birth address',
 376                             '18 Sommervile road, Regina, Saskatchewan');
 377         INSERT INTO tblMessages
 378                VALUES ('9', 'Joke', 'knock knock : who\'s there?'
 379                                     ' I will not finish this joke');
 380         INSERT INTO tblMessages
 381                VALUES ('10', 'Computer information',
 382                              'My computer is a pentium III 400 mHz'
 383                              ' with 192 megabytes of RAM');
 384 </pre>
 385 <p>The next stage is to create a special text index which we will
 386 use for FTI, so we can search our table of messages for words or a
 387 phrase. We do this using the SQL command:</p>
 388 <pre>
 389         ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector;
 390 </pre>
 391 <p>Note that unlike traditional indexes, this is actually a new
 392 field in the same table, which is then used (through the magic of
 393 the tsearch2 operators and functions) by a special index we will
 394 create in a moment.</p>
 395 <p>The general rule for the initial insertion of data will follow
 396 four steps:</p>
 397 <pre>
 398
 399     1. update table
 400     2. vacuum full analyze
 401     3. create index
 402     4. vacuum full analyze
 403 </pre>
 404 <p>The data can be updated into the table, the vacuum full analyze
 405 will reclaim unused space. The index can be created on the table
 406 after the data has been inserted. Having the index created prior to
 407 the update will slow down the process. It can be done in that
 408 manner, this way is just more efficient. After the index has been
 409 created on the table, vacuum full analyze is run again to update
 410 postgres's statistics (ie having the index take effect).</p>
 411 <pre>
 412         UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage);
 413         VACUUM FULL ANALYZE;
 414 </pre>
 415 <p>Note that this only inserts the field strMessage as a tsvector,
 416 so if you want to also add strTopic to the information stored, you
 417 should instead do the following, which effectively concatenates the
 418 two fields into one before being inserted into the table:</p>
 419 <pre>
 420         UPDATE tblMessages
 421             SET idxFTI=to_tsvector('default',coalesce(strTopic,'') ||' '|| coalesce(strMessage,''));
 422         VACUUM FULL ANALYZE;
 423 </pre>
 424 <p><strong>Using the coalesce function makes sure this
 425 concatenation also works with NULL fields.</strong></p>
 426
 427 <p>We need to create the index on the column idxFTI. Keep in mind
 428 that the database will update the index when some action is taken.
 429 In this case we _need_ the index (The whole point of Full Text
 430 INDEXING ;-)), so don't worry about any indexing overhead. We will
 431 create an index based on the gist or gin function. GiST is an index
 432 structure for Generalized Search Tree, GIN is a inverted index (see <a href="tsearch2-ref.html#indexes">The tsearch2 Reference: Indexes</a>).</p>
 433 <pre>
 434         CREATE INDEX idxFTI_idx ON tblMessages USING gist(idxFTI);
 435         VACUUM FULL ANALYZE;
 436 </pre>
 437 <p>After you have converted all of your data and indexed the
 438 column, you can select some rows to see what actually happened. I
 439 will not display output here but you can play around yourselves and
 440 see what happened.</p>
 441 <p>The last thing to do is set up a trigger so every time a row in
 442 this table is changed, the text index is automatically updated.
 443 This is easily done using:</p>
 444 <pre>
 445         CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
 446             FOR EACH ROW EXECUTE PROCEDURE tsearch2(idxFTI, strMessage);
 447 </pre>
 448 <p>Or if you are indexing both strMessage and strTopic you should
 449 instead do:</p>
 450 <pre>
 451
 452         CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
 453             FOR EACH ROW EXECUTE PROCEDURE
 454                 tsearch2(idxFTI, strTopic, strMessage);
 455 </pre>
 456 <p>Before you ask, the tsearch2 function accepts multiple fields as
 457 arguments so there is no need to concatenate the two into one like
 458 we did before.</p>
 459 <p>If you want to do something specific with columns, you may write
 460 your very own trigger function using plpgsql or other procedural
 461 languages (but not SQL, unfortunately) and use it instead of
 462 <em>tsearch2</em> trigger.</p>
 463 <p>You could however call other stored procedures from within the
 464 tsearch2 function. Lets say we want to create a function to remove
 465 certain characters (like the @ symbol from all text).</p>
 466 <pre>
 467        CREATE FUNCTION dropatsymbol(text)
 468                      RETURNS text AS 'select replace($1, \'@\', \' \');' LANGUAGE SQL;
 469 </pre>
 470 <p>Now we can use this function within the tsearch2 function on the
 471 trigger.</p>
 472
 473 <pre>
 474       DROP TRIGGER tsvectorupdate ON tblmessages;
 475         CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
 476             FOR EACH ROW EXECUTE PROCEDURE tsearch2(idxFTI, dropatsymbol, strMessage);
 477         INSERT INTO tblmessages VALUES (69, 'Attempt for dropatsymbol', 'Test@test.com');
 478 </pre>
 479 <p>If at this point you receive an error stating: ERROR: Can't find
 480 tsearch config by locale</p>
 481 <p>Do not worry. You have done nothing wrong. And tsearch2 is not
 482 broken. All that has happened here is that the configuration is
 483 setup to use a configuration based on the locale of the server. All
 484 you have to do is change your default configuration, or add a new
 485 one for your specific locale. See the section on TSEARCH2
 486 CONFIGURATION.</p>
 487 <pre class="real">
 488    SELECT * FROM tblmessages WHERE intindex = 69;
 489
 490          intindex |         strtopic         |  strmessage   |        idxfti
 491         ----------+--------------------------+---------------+-----------------------
 492                 69 | Attempt for dropatsymbol | Test@test.com | 'test':1 'test.com':2
 493         (1 row)
 494 </pre>
 495 Notice that the string content was passed throught the stored
 496 procedure dropatsymbol. The '@' character was replaced with a
 497 single space ... and the output from the procedure was then stored
 498 in the tsvector column.
 499 <p>This could be useful for removing other characters from indexed
 500 text, or any kind of preprocessing needed to be done on the text
 501 prior to insertion into the index.</p>
 502 <h3>QUERYING A TABLE</h3>
 503
 504 <p>There are some examples in the README.tsearch2 file for querying
 505 a table. One major difference between tsearch and tsearch2 is the
 506 operator ## is no longer available. Only the operator @@ is
 507 defined, using the types tsvector on one side and tsquery on the
 508 other side.</p>
 509 <p>Lets search the indexed data for the word "Test". I indexed
 510 based on the the concatenation of the strTopic, and the
 511 strMessage:</p>
 512 <pre>
 513         SELECT intindex, strtopic FROM tblmessages
 514                                   WHERE idxfti @@ 'test'::tsquery;
 515          intindex |   strtopic
 516         ----------+---------------
 517                 1 | Testing Topic
 518         (1 row)
 519 </pre>
 520 <p>The only result that matched was the row with a topic "Testing
 521 Topic". Notice that the word I search for was all lowercase. Let's
 522 see what happens when I query for uppercase "Test".</p>
 523 <pre>
 524         SELECT intindex, strtopic FROM tblmessages
 525                                   WHERE idxfti @@ 'Test'::tsquery;
 526          intindex | strtopic
 527         ----------+----------
 528         (0 rows)
 529 </pre>
 530 <p>We get zero rows returned. The reason is because when the text
 531 was inserted, it was morphed to my default configuration (because
 532 of the call to to_tsvector in the UPDATE statement). If there was
 533 no morphing done, and the tsvector field(s) contained the word
 534 'Text', a match would have been found.</p>
 535 <p>Most likely the best way to query the field is to use the
 536 to_tsquery function on the right hand side of the @@ operator like
 537 this:</p>
 538
 539 <pre>
 540         SELECT intindex, strtopic FROM tblmessages
 541                WHERE idxfti @@ to_tsquery('default', 'Test | Zeppelin');
 542          intindex |      strtopic
 543         ----------+--------------------
 544                 1 | Testing Topic
 545                 7 | Classic Rock Bands
 546         (2 rows)
 547 </pre>
 548 <p>That query searched for all instances of "Test" OR "Zeppelin".
 549 It returned two rows: the "Testing Topic" row, and the "Classic
 550 Rock Bands" row. The to_tsquery function performed the correct
 551 morphology upon the parameters, and searched the tsvector field
 552 appropriately.</p>
 553 <p>The last example here relates to searching for a phrase, for
 554 example "minority report". This poses a problem with regard to
 555 tsearch2, as it doesn't index phrases, only words. But there is a
 556 way around which doesn't appear to have a significant impact on
 557 query time, and that is to use a query such as the following:</p>
 558 <pre>
 559         SELECT intindex, strTopic FROM tblmessages
 560                 WHERE idxfti @@ to_tsquery('default', 'gettysburg &amp; address')
 561                 AND strMessage ~* '.*men are created equal.*';
 562          intindex |           strtopic
 563         ----------+------------------------------
 564                 6 | Gettysburg address quotation
 565         (1 row)
 566         SELECT intindex, strTopic FROM tblmessages
 567                 WHERE idxfti @@ to_tsquery('default', 'gettysburg &amp; address')
 568                 AND strMessage ~* '.*something that does not exist.*';
 569          intindex | strtopic
 570         ----------+----------
 571         (0 rows)
 572 </pre>
 573 <p>Of course if your indexing both strTopic and strMessage, and
 574 want to search for this phrase on both, then you will have to get
 575 out the brackets and extend this query a little more.</p>
 576
 577 <h3>TSEARCH2 CONFIGURATION</h3>
 578 <p>Some words such as "and", "the", and "who" are automatically not
 579 indexed, since they belong to a pre-existing dictionary of "Stop
 580 Words" which tsearch2 does not perform indexing on. If someone
 581 needs to search for "The Who" in your database, they are going to
 582 have a tough time coming up with any results, since both are
 583 ignored in the indexes. But there is a solution.</p>
 584 <p>Lets say we want to add a word into the stop word list for
 585 english stemming. We could edit the file
 586 :'/usr/local/pgsql/share/english.stop' and add a word to the list.
 587 I edited mine to exclude my name from indexing:</p>
 588 <pre>
 589     - Edit /usr/local/pgsql/share/english.stop
 590     - Add 'andy' to the list
 591     - Save the file.
 592 </pre>
 593 <p>When you connect to the database, the dict_init procedure is run
 594 during initialization. And in my configuration it will read the
 595 stop words from the file I just edited. If you were connected to
 596 the DB while editing the stop words, you will need to end the
 597 current session and re-connect. When you re-connect to the
 598 database, 'andy' is no longer indexed:</p>
 599 <pre>
 600         SELECT to_tsvector('default', 'Andy');
 601          to_tsvector
 602         ------------
 603         (1 row)
 604 </pre>
 605 <p>Originally I would get the result :</p>
 606
 607 <pre>
 608         SELECT to_tsvector('default', 'Andy');
 609          to_tsvector
 610         ------------
 611          'andi':1
 612         (1 row)
 613 </pre>
 614 <p>But since I added it as a stop word, it would be ingnored on the
 615 indexing. The stop word added was used in the dictionary "en_stem".
 616 If I were to use a different configuration such as 'simple', the
 617 results would be different. There are no stop words for the simple
 618 dictionary. It will just convert to lower case, and index every
 619 unique word.</p>
 620 <pre>
 621         SELECT to_tsvector('simple', 'Andy andy The the in out');
 622                      to_tsvector
 623         -------------------------------------
 624          'in':5 'out':6 'the':3,4 'andy':1,2
 625         (1 row)
 626 </pre>
 627 <p>All this talk about which configuration to use is leading us
 628 into the actual configuration of tsearch2. In the examples in this
 629 document the configuration has always been specified when using the
 630 tsearch2 functions:</p>
 631 <pre>
 632         SELECT to_tsvector('default', 'Testing the default config');
 633         SELECT to_tsvector('simple', 'Example of simple Config');
 634 </pre>
 635 <p>The pg_ts_cfg table holds each configuration you can use with
 636 the tsearch2 functions. As you can see the ts_name column contains
 637 both the 'default' configurations based on the 'C' locale. And the
 638 'simple' configuration which is not based on any locale.</p>
 639
 640 <pre>
 641         SELECT * from pg_ts_cfg;
 642              ts_name     | prs_name |    locale
 643         -----------------+----------+--------------
 644          default         | default  | C
 645          default_russian | default  | ru_RU.KOI8-R
 646          simple          | default  |
 647         (3 rows)
 648 </pre>
 649 <p>Each row in the pg_ts_cfg table contains the name of the
 650 tsearch2 configuration, the name of the parser to use, and the
 651 locale mapped to the configuration. There is only one parser to
 652 choose from the table pg_ts_parser called 'default'. More parsers
 653 could be written, but for our needs we will use the default.</p>
 654 <p>There are 3 configurations installed by tsearch2 initially. If
 655 your locale is set to 'en_US' for example (like my laptop), then as
 656 you can see there is currently no dictionary configured to use with
 657 that locale. You can either set up a new configuration or just use
 658 one that already exists. If I do not specify which configuration to
 659 use in the to_tsvector function, I receive the following error.</p>
 660 <pre>
 661         SELECT to_tsvector('learning tsearch is like going to school');
 662         ERROR:  Can't find tsearch config by locale
 663 </pre>
 664 <p>We will create a new configuration for use with the server
 665 encoding 'en_US'. The first step is to add a new configuration into
 666 the pg_ts_cfg table. We will call the configuration
 667 'default_english', with the default parser and use the locale
 668 'en_US'.</p>
 669 <pre>
 670         INSERT INTO pg_ts_cfg (ts_name, prs_name, locale)
 671                VALUES ('default_english', 'default', 'en_US');
 672
 673 </pre>
 674 <p>We have only declared that there is a configuration called
 675 'default_english'. We need to set the configuration of how
 676 'default_english' will work. The next step is creating a new
 677 dictionary to use. The configuration of the dictionary is
 678 completlely different in tsearch2. In the prior versions to make
 679 changes, you would have to re-compile your changes into the
 680 tsearch.so. All of the configuration has now been moved into the
 681 system tables created by executing the SQL code from
 682 tsearch2.sql</p>
 683 <p>Lets take a first look at the pg_ts_dict table</p>
 684 <pre>
 685   ftstest=# \d pg_ts_dict
 686          Table "public.pg_ts_dict"
 687              Column      |     Type     | Modifiers
 688         -----------------+--------------+-----------
 689          dict_name       | text         | not null
 690          dict_init       | regprocedure |
 691          dict_initoption | text         |
 692          dict_lexize     | regprocedure | not null
 693          dict_comment    | text         |
 694         Indexes: pg_ts_dict_pkey primary key btree (dict_name)
 695 </pre>
 696 <p>The dict_name column is the name of the dictionary, for example
 697 'simple', 'en_stem' or 'ru_stem'. The dict_init column is a text
 698 representation of a stored procedure to run for initialization of
 699 that dictionary, for example 'snb_en_init(text)' or
 700 'snb_ru_init(text)'. The initial configuration of tsearch2 had the
 701 dict_init and dict_lexize columns as type oid. The patch mentioned
 702 in the Installation Notes changes these types to regprocedure. The
 703 data inserted, or updated can still be the oid of the stored
 704 procedure. The representation is just different. This makes backup
 705 and restore procedures much easier for tsearch2. The dict_init
 706 option is used for options passed to the init function for the
 707 stored procedure. In the cases of 'en_stem' or 'ru_stem' it is a
 708 path to a stopword file for that dictionary, for example
 709 '/usr/local/pgsql/share/english.stop'. This is however dictated by
 710 the dictionary. ISpell dictionaries may require different options.
 711 The dict_lexize column is another OID of a stored procedure to the
 712 function used to lexize, for example 'snb_lexize(internal,
 713 internal, integer)'. The dict_comment column is just a comment.</p>
 714 <p>Next we will configure the use of a new dictionary based on
 715 ISpell. We will assume you have ISpell installed on you machine.
 716 (in /usr/local/lib)</p>
 717 <p>There has been some confusion in the past as to which files are
 718 used from ISpell. ISpell operates using a hash file. This is a
 719 binary file created by the ISpell command line utility "buildhash".
 720 This utility accepts a file containing the words from the
 721 dictionary, and the affixes file and the output is the hash file.
 722 The default installation of ISPell installs the english hash file
 723 english.hash, which is the exact same file as american.hash. ISpell
 724 uses this as the fallback dictionary to use.</p>
 725 <p>This hash file is not what tsearch2 requires as the ISpell
 726 interface. The file(s) needed are those used to create the hash.
 727 Tsearch uses the dictionary words for morphology, so the listing is
 728 needed not spellchecking. Regardless, these files are included in
 729 the ISpell sources, and you can use them to integrate into
 730 tsearch2. This is not complicated, but is not very obvious to begin
 731 with. The tsearch2 ISpell interface needs only the listing of
 732 dictionary words, it will parse and load those words, and use the
 733 ISpell dictionary for lexeme processing.</p>
 734
 735 <p>I found the ISPell make system to be very finicky. Their
 736 documentation actually states this to be the case. So I just did
 737 things the command line way. In the ISpell source tree under
 738 languages/english there are several files in this directory. For a
 739 complete description, please read the ISpell README. Basically for
 740 the english dictionary there is the option to create the small,
 741 medium, large and extra large dictionaries. The medium dictionary
 742 is recommended. If the make system is configured correctly, it
 743 would build and install the english.has file from the medium size
 744 dictionary. Since we are only concerned with the dictionary word
 745 listing ... it can be created from the /languages/english directory
 746 with the following command:</p>
 747 <pre>
 748        sort -u -t/ +0f -1 +0 -T /usr/tmp -o english.med english.0 english.1
 749 </pre>
 750 <p>This will create a file called english.med. You can copy this
 751 file to whever you like. I placed mine in /usr/local/lib so it
 752 coincides with the ISpell hash files. You can now add the tsearch2
 753 configuration entry for the ISpell english dictionary. We will also
 754 continue to use the english word stop file that was installed for
 755 the en_stem dictionary. You could use a different one if you like.
 756 The ISpell configuration is based on the "ispell_template"
 757 dictionary installed by default with tsearch2. We will use the OIDs
 758 to the stored procedures from the row where the dict_name =
 759 'ispell_template'.</p>
 760 <pre>
 761         INSERT INTO pg_ts_dict
 762                (SELECT 'en_ispell',
 763                        dict_init,
 764                        'DictFile="/usr/local/lib/english.med",'
 765                        'AffFile="/usr/local/lib/english.aff",'
 766                        'StopFile="/usr/local/pgsql/share/contrib/english.stop"',
 767                        dict_lexize
 768                 FROM pg_ts_dict
 769                 WHERE dict_name = 'ispell_template');
 770 </pre>
 771 <p>Now that we have a dictionary we can specify it's use in a query
 772 to get a lexeme. For this we will use the lexize function. The
 773 lexize function takes the name of the dictionary to use as an
 774 argument. Just as the other tsearch2 functions operate. You will
 775 need to stop your psql session and start it again in order for this
 776 modification to take place.</p>
 777 <pre>
 778   SELECT lexize('en_ispell', 'program');
 779           lexize
 780         -----------
 781          {program}
 782         (1 row)
 783
 784 </pre>
 785 <p>If you wanted to always use the ISpell english dictionary you
 786 have installed, you can configure tsearch2 to always use a specific
 787 dictionary.</p>
 788 <pre>
 789   SELECT set_curdict('en_ispell');
 790 </pre>
 791 <p>Lexize is meant to turn a word into a lexeme. It is possible to
 792 receive more than one lexeme returned for a single word.</p>
 793 <pre>
 794  SELECT lexize('en_ispell', 'conditionally');
 795            lexize
 796         -----------------------------
 797          {conditionally,conditional}
 798         (1 row)
 799 </pre>
 800 <p>The lexize function is not meant to take a full string as an
 801 argument to return lexemes for. If you passed in an entire sentence,
 802 it attempts to find that entire sentence in the dictionary. Since
 803 the dictionary contains only words, you will receive an empty
 804 result set back.</p>
 805 <pre>
 806       SELECT lexize('en_ispell', 'This is a senctece to lexize');
 807          lexize
 808         --------
 809
 810         (1 row)
 811
 812 If you parse a lexeme from a word not in the dictionary, then you will receive an empty result. This makes sense because the word "tsearch" is not in the english dictionary. You can create your own additions to the dictionary if you like. This may be useful for scientific or technical glossaries that need to be indexed. SELECT lexize('en_ispell', 'tsearch'); lexize -------- (1 row)
 813
 814 </pre>
 815 <p>This is not to say that tsearch will be ignored when adding text
 816 information to the the tsvector index column. This will be
 817 explained in greater detail with the table pg_ts_cfgmap.</p>
 818 <p>Next we need to set up the configuration for mapping the
 819 dictionay use to the lexxem parsings. This will be done by altering
 820 the pg_ts_cfgmap table. We will insert several rows, specifying to
 821 use the new dictionary we installed and configured for lexizing
 822 within tsearch2. There are several type of lexims we would be
 823 concerned with forcing the use of the ISpell dictionary.</p>
 824 <pre>
 825         INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name)
 826                VALUES ('default_english', 'lhword', '{en_ispell,en_stem}');
 827         INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name)
 828                VALUES ('default_english', 'lpart_hword', '{en_ispell,en_stem}');
 829         INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name)
 830                VALUES ('default_english', 'lword', '{en_ispell,en_stem}');
 831 </pre>
 832 <p>We have just inserted 3 records to the configuration mapping,
 833 specifying that the lexeme types for "lhword, lpart_hword and lword"
 834 are to be stemmed using the 'en_ispell' dictionary we added into
 835 pg_ts_dict, when using the configuration ' default_english' which
 836 we added to pg_ts_cfg.</p>
 837 <p>There are several other lexeme types used that we do not need to
 838 specify as using the ISpell dictionary. We can simply insert values
 839 using the 'simple' stemming process dictionary.</p>
 840 <pre>
 841         INSERT INTO pg_ts_cfgmap
 842                VALUES ('default_english', 'url', '{simple}');
 843         INSERT INTO pg_ts_cfgmap
 844                VALUES ('default_english', 'host', '{simple}');
 845         INSERT INTO pg_ts_cfgmap
 846                VALUES ('default_english', 'sfloat', '{simple}');
 847         INSERT INTO pg_ts_cfgmap
 848                VALUES ('default_english', 'uri', '{simple}');
 849         INSERT INTO pg_ts_cfgmap
 850                VALUES ('default_english', 'int', '{simple}');
 851         INSERT INTO pg_ts_cfgmap
 852                VALUES ('default_english', 'float', '{simple}');
 853         INSERT INTO pg_ts_cfgmap
 854                VALUES ('default_english', 'email', '{simple}');
 855         INSERT INTO pg_ts_cfgmap
 856                VALUES ('default_english', 'word', '{simple}');
 857         INSERT INTO pg_ts_cfgmap
 858                VALUES ('default_english', 'hword', '{simple}');
 859         INSERT INTO pg_ts_cfgmap
 860                VALUES ('default_english', 'nlword', '{simple}');
 861         INSERT INTO pg_ts_cfgmap
 862                VALUES ('default_english', 'nlpart_hword', '{simple}');
 863         INSERT INTO pg_ts_cfgmap
 864                VALUES ('default_english', 'part_hword', '{simple}');
 865         INSERT INTO pg_ts_cfgmap
 866                VALUES ('default_english', 'nlhword', '{simple}');
 867         INSERT INTO pg_ts_cfgmap
 868                VALUES ('default_english', 'file', '{simple}');
 869         INSERT INTO pg_ts_cfgmap
 870                VALUES ('default_english', 'uint', '{simple}');
 871         INSERT INTO pg_ts_cfgmap
 872                VALUES ('default_english', 'version', '{simple}');
 873 </pre>
 874
 875 <p>Our addition of a configuration for 'default_english' is now
 876 complete. We have successfully created a new tsearch2
 877 configuration. At the same time we have also set the new
 878 configuration to be our default for en_US locale.</p>
 879 <pre>
 880         SELECT to_tsvector('default_english',
 881                            'learning tsearch is like going to school');
 882                            to_tsvector
 883         --------------------------------------------------
 884          'go':5 'like':4 'learn':1 'school':7 'tsearch':2
 885         SELECT to_tsvector('learning tsearch is like going to school');
 886                             to_tsvector
 887         --------------------------------------------------
 888          'go':5 'like':4 'learn':1 'school':7 'tsearch':2
 889         (1 row)
 890 </pre>
 891 <p>Notice here that words like "tsearch" are still parsed and
 892 indexed in the tsvector column. There is a lexeme returned for the
 893 word becuase in the configuration mapping table, we specify words
 894 to be used from the 'en_ispell' dictionary first, but as a fallback
 895 to use the 'en_stem' dictionary. Therefore a lexeme is not returned
 896 from en_ispell, but is returned from en_stem, and added to the
 897 tsvector.</p>
 898 <pre>
 899  SELECT to_tsvector('learning tsearch is like going to computer school');
 900                                 to_tsvector
 901         ---------------------------------------------------------------------------
 902          'go':5 'like':4 'learn':1 'school':8 'compute':7 'tsearch':2 'computer':7
 903         (1 row)
 904 </pre>
 905 <p>Notice in this last example I added the word "computer" to the
 906 text to be converted into a tsvector. Because we have setup our
 907 default configuration to use the ISpell english dictionary, the
 908 words are lexized, and computer returns 2 lexemes at the same
 909 position. 'compute':7 and 'computer':7 are now both indexed for the
 910 word computer.</p>
 911 <p>You can create additional dictionary lists, or use the extra
 912 large dictionary from ISpell. You can read through the ISpell
 913 documents, and source tree to make modifications as you see
 914 fit.</p>
 915 <p>In the case that you already have a configuration set for the
 916 locale, and you are changing it to your new dictionary
 917 configuration. You will have to set the old locale to NULL. If we
 918 are using the 'C' locale then we would do this:</p>
 919
 920 <pre>
 921         UPDATE pg_ts_cfg SET locale=NULL WHERE locale = 'C';
 922 </pre>
 923 <p>That about wraps up the configuration of tsearch2. There is much
 924 more you can do with the tables provided. This was just an
 925 introduction to get things working rather quickly.</p>
 926 <h3>ADDING NEW DICTIONARIES TO TSEARCH2</h3>
 927 <p>To aid in the addition of new dictionaries to the tsearch2
 928 module you can use another additional module in combination with
 929 tsearch2. The gendict module is included into tsearch2 distribution
 930 and is available from gendict/ subdirectory.</p>
 931 <p>I will not go into detail about installation and instructions on
 932 how to use gendict to it's fullest extent right now. You can read
 933 the README.gendict ... it has all of the instructions and
 934 information you will need.</p>
 935 <h3>BACKING UP AND RESTORING DATABASES THAT FEATURE TSEARCH2</h3>
 936 <p><strong>Never rely on anyone elses instructions to backup and
 937 restore a database system, always develop and understand your own
 938 methodology, and test it numerous times before you need to do it
 939 for real.</strong></p>
 940 <p>The backup and restore procedure has changed over time. This is
 941 not meant to be the bible for tsearch2 back up and restore. Please
 942 read all sections so you have a complete understanding of some
 943 backup and restore issues. Please test your own procedures, and do
 944 not rely on these instructions solely.</p>
 945
 946 <p>If you come accross some issues in your own procedures, please
 947 feel free to bring the question up on the Open-FTS, and PostgreSQL
 948 mailing lists.</p>
 949 <h3>ORIGINAL BACKUP PROCEDURES</h3>
 950 <p>Originally, tsearch2 had problems when using the pg_dump, and or
 951 the pg_dumpall utilities. The problem lies within the original use
 952 of OIDs for column types. Since OIDs are not consistent accross
 953 pg_dumps, when you reload the data values into the pg_ts_dict
 954 table, for example, those oids no longer point to anything. You
 955 would then end up trying to use a "broken" tsearch2
 956 configuration.</p>
 957 <p>The solution was to backup and restore a database using the
 958 tsearch2 module into small unique parts, and then load them in the
 959 correct order. You would have to edit the schema and remove the
 960 tsearch stored procedure references in the sql file. You would have
 961 to load your global objects, then the tsearch2 objects. You had to
 962 re-create the tsearch module before restoring your schema so no
 963 conflicts would arise. Then you could restore your data (all
 964 schemas, and types needed for the data were now available).</p>
 965 <p><strong>The original backup instructions were as
 966 follows</strong></p>
 967 <p>1) Backup any global database objects such as users and groups
 968 (this step is usually only necessary when you will be restoring to
 969 a virgin system)</p>
 970 <pre>
 971         pg_dumpall -g &gt; GLOBALobjects.sql
 972
 973 </pre>
 974 <p>2) Backup the full database schema using pg_dump</p>
 975 <pre>
 976         pg_dump -s DATABASE &gt; DATABASEschema.sql
 977 </pre>
 978 <p>3) Backup the full database using pg_dump</p>
 979 <pre>
 980         pg_dump -Fc DATABASE &gt; DATABASEdata.tar
 981 </pre>
 982
 983 <p><strong>The original restore procedures were as
 984 follows</strong></p>
 985 <p>1) Create the blank database</p>
 986 <pre>
 987         createdb DATABASE
 988 </pre>
 989 <p>2) Restore any global database objects such as users and groups
 990 (this step is usually only necessary when you will be restoring to
 991 a virgin system)</p>
 992 <pre>
 993         psql DATABASE &lt; GLOBALobjects.sql
 994 </pre>
 995 <p>3) Create the tsearch2 objects, functions and operators</p>
 996
 997 <pre>
 998         psql DATABASE &lt; tsearch2.sql
 999 </pre>
1000 <p>4) Edit the backed up database schema and delete all SQL
1001 commands which create tsearch2 related functions, operators and
1002 data types, BUT NOT fields in table definitions that specify
1003 tsvector types. If your not sure what these are, they are the ones
1004 listed in tsearch2.sql. Then restore the edited schema to the
1005 database</p>
1006 <pre>
1007         psql DATABASE &lt; DATABASEschema.sql
1008 </pre>
1009 <p>5) Restore the data for the database</p>
1010 <pre>
1011
1012         pg_restore -N -a -d DATABASE DATABASEdata.tar
1013 </pre>
1014 <p>If you get any errors in step 4, it will most likely be because
1015 you forgot to remove an object that was created in tsearch2.sql.
1016 Any errors in step 5 will mean the database schema was probably
1017 restored wrongly.</p>
1018 <p><strong>Issues with this procedure</strong></p>
1019 <p>As I mentioned before, it is vital that you test out your own
1020 backup and restore procedures. These procedures were originally
1021 adopted from this document's orignal author. Robert John Shepherd.
1022 It makes use of the pg_dump custom archive functionality. I am not
1023 that familiar with the formatting output of pg_dump, and using
1024 pg_restore. I have always had the luxury of using text files
1025 (Everything is DATABASE.sql).</p>
1026 <p>One issue not forseen in the case of using a binary dump is the
1027 when you have added more than the default tsearch2 configurations.
1028 Upon reload of the data it will fail due to duplicate primary keys.
1029 If you load the tsearch2 module, and then delete the data loaded by
1030 tsearch2 into the configuration tables, the data will restore. The
1031 configurations are incorrect because you can not remove the data
1032 using OID references from the custom archive.</p>
1033 <p>It would be very simple to fix this problem if the data was not
1034 in an archive format. I do believe all of your data would have been
1035 restored properly and you can get things working fairly easy. All
1036 one would have to do is create the configurations as in the
1037 tsearch2.sql file. And then create your custom configurations
1038 again.</p>
1039 <p>I have read in the pg_dump man page that if the tar archive
1040 format is used, it is possible to limit which data is restored
1041 using pg_restore. If anyone has more experience with pg_dump
1042 archives, and pg_restore. Please feel free to test and contribute
1043 your procedure(s).</p>
1044 <h3>CURRENT BACKUP AND RESTORE PROCEDURES</h3>
1045
1046 <p>Currently a backup and restore of a database using the tsearch2
1047 module can be quite simple. If you have applied the patch mentioned
1048 in the installation instructions prior to tsearch2 installation.
1049 This patch removes the use of the oid column. The text
1050 representation for the stored procedures used are dumped with the
1051 data and the restoration of the data works seemlessly.</p>
1052 <p>1) to backup the database</p>
1053 <pre>
1054   pg_dump DATABASE &gt; DATABASE.sql
1055 </pre>
1056 <p>1) to restore the database</p>
1057 <pre>
1058      createdb DATABASE
1059         psql -d DATABASE -f DATABASE.sql
1060 </pre>
1061 <p>This procedure is now like any normal backup and restore
1062 procedure. I can say whether this has been proven using the pg_dump
1063 archive, and restoring with pg_restore. In theory there should be
1064 no problems with any format after the patch is applied.</p>
1065
1066 <p>This restoration procedure should never be an issue with the
1067 patch applied to version 8.0 of PostgreSQL. Only versions 7.3 and
1068 7.4 are affected. You can avoid any troubles by applying the patch
1069 prior to installation, or running the SQL script provided to live
1070 database before backup and restoring is done.</p>
1071 </div>
1072 </body>
1073 </html>