granicus.if.org Git - postgresql/blob - doc/src/sgml/backup.sgml

   1 <!--
   2 $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.75 2005/11/04 23:13:59 petere Exp $
   3 -->
   4 <chapter id="backup">
   5  <title>Backup and Restore</title>
   6
   7  <indexterm zone="backup"><primary>backup</></>
   8
   9  <para>
  10   As with everything that contains valuable data, <productname>PostgreSQL</>
  11   databases should be backed up regularly. While the procedure is
  12   essentially simple, it is important to have a basic understanding of
  13   the underlying techniques and assumptions.
  14  </para>
  15
  16  <para>
  17   There are three fundamentally different approaches to backing up
  18   <productname>PostgreSQL</> data:
  19   <itemizedlist>
  20    <listitem><para><acronym>SQL</> dump</para></listitem>
  21    <listitem><para>File system level backup</para></listitem>
  22    <listitem><para>On-line backup</para></listitem>
  23   </itemizedlist>
  24   Each has its own strengths and weaknesses.
  25  </para>
  26
  27  <sect1 id="backup-dump">
  28   <title><acronym>SQL</> Dump</title>
  29
  30   <para>
  31    The idea behind the SQL-dump method is to generate a text file with SQL
  32    commands that, when fed back to the server, will recreate the
  33    database in the same state as it was at the time of the dump.
  34    <productname>PostgreSQL</> provides the utility program
  35    <xref linkend="app-pgdump"> for this purpose. The basic usage of this
  36    command is:
  37 <synopsis>
  38 pg_dump <replaceable class="parameter">dbname</replaceable> &gt; <replaceable class="parameter">outfile</replaceable>
  39 </synopsis>
  40    As you see, <application>pg_dump</> writes its results to the
  41    standard output. We will see below how this can be useful.
  42   </para>
  43
  44   <para>
  45    <application>pg_dump</> is a regular <productname>PostgreSQL</>
  46    client application (albeit a particularly clever one). This means
  47    that you can do this backup procedure from any remote host that has
  48    access to the database. But remember that <application>pg_dump</>
  49    does not operate with special permissions. In particular, it must
  50    have read access to all tables that you want to back up, so in
  51    practice you almost always have to run it as a database superuser.
  52   </para>
  53
  54   <para>
  55    To specify which database server <application>pg_dump</> should
  56    contact, use the command line options <option>-h
  57    <replaceable>host</></> and <option>-p <replaceable>port</></>. The
  58    default host is the local host or whatever your
  59    <envar>PGHOST</envar> environment variable specifies. Similarly,
  60    the default port is indicated by the <envar>PGPORT</envar>
  61    environment variable or, failing that, by the compiled-in default.
  62    (Conveniently, the server will normally have the same compiled-in
  63    default.)
  64   </para>
  65
  66   <para>
  67    As any other <productname>PostgreSQL</> client application,
  68    <application>pg_dump</> will by default connect with the database
  69    user name that is equal to the current operating system user name. To override
  70    this, either specify the <option>-U</option> option or set the
  71    environment variable <envar>PGUSER</envar>. Remember that
  72    <application>pg_dump</> connections are subject to the normal
  73    client authentication mechanisms (which are described in <xref
  74    linkend="client-authentication">).
  75   </para>
  76
  77   <para>
  78    Dumps created by <application>pg_dump</> are internally consistent,
  79    that is, updates to the database while <application>pg_dump</> is
  80    running will not be in the dump. <application>pg_dump</> does not
  81    block other operations on the database while it is working.
  82    (Exceptions are those operations that need to operate with an
  83    exclusive lock, such as <command>VACUUM FULL</command>.)
  84   </para>
  85
  86   <important>
  87    <para>
  88     When your database schema relies on OIDs (for instance as foreign
  89     keys) you must instruct <application>pg_dump</> to dump the OIDs
  90     as well. To do this, use the <option>-o</option> command line
  91     option.
  92    </para>
  93   </important>
  94
  95   <sect2 id="backup-dump-restore">
  96    <title>Restoring the dump</title>
  97
  98    <para>
  99     The text files created by <application>pg_dump</> are intended to
 100     be read in by the <application>psql</application> program. The
 101     general command form to restore a dump is
 102 <synopsis>
 103 psql <replaceable class="parameter">dbname</replaceable> &lt; <replaceable class="parameter">infile</replaceable>
 104 </synopsis>
 105     where <replaceable class="parameter">infile</replaceable> is what
 106     you used as <replaceable class="parameter">outfile</replaceable>
 107     for the <application>pg_dump</> command. The database <replaceable
 108     class="parameter">dbname</replaceable> will not be created by this
 109     command, you must create it yourself from <literal>template0</> before executing
 110     <application>psql</> (e.g., with <literal>createdb -T template0
 111     <replaceable class="parameter">dbname</></literal>).
 112     <application>psql</> supports options similar to <application>pg_dump</>
 113     for controlling the database server location and the user name. See
 114     <xref linkend="app-psql">'s reference page for more information.
 115    </para>
 116
 117    <para>
 118     Not only must the target database already exist before starting to
 119     run the restore, but so must all the users who own objects in the
 120     dumped database or were granted permissions on the objects.  If they
 121     do not, then the restore will fail to recreate the objects with the
 122     original ownership and/or permissions.  (Sometimes this is what you want,
 123     but usually it is not.)
 124    </para>
 125
 126    <para>
 127     Once restored, it is wise to run <xref linkend="sql-analyze"
 128     endterm="sql-analyze-title"> on each database so the optimizer has
 129     useful statistics. An easy way to do this is to run
 130     <command>vacuumdb -a -z</> to
 131     <command>VACUUM ANALYZE</> all databases; this is equivalent to
 132     running <command>VACUUM ANALYZE</command> manually.
 133    </para>
 134
 135    <para>
 136     The ability of <application>pg_dump</> and <application>psql</> to
 137     write to or read from pipes makes it possible to dump a database
 138     directly from one server to another; for example:
 139 <programlisting>
 140 pg_dump -h <replaceable>host1</> <replaceable>dbname</> | psql -h <replaceable>host2</> <replaceable>dbname</>
 141 </programlisting>
 142    </para>
 143
 144    <important>
 145     <para>
 146      The dumps produced by <application>pg_dump</> are relative to
 147      <literal>template0</>. This means that any languages, procedures,
 148      etc. added to <literal>template1</> will also be dumped by
 149      <application>pg_dump</>. As a result, when restoring, if you are
 150      using a customized <literal>template1</>, you must create the
 151      empty database from <literal>template0</>, as in the example
 152      above.
 153     </para>
 154    </important>
 155
 156    <para>
 157     For advice on how to load large amounts of data into
 158     <productname>PostgreSQL</productname> efficiently, refer to <xref
 159     linkend="populate">.
 160    </para>
 161   </sect2>
 162
 163   <sect2 id="backup-dump-all">
 164    <title>Using <application>pg_dumpall</></title>
 165
 166    <para>
 167     The above mechanism is cumbersome and inappropriate when backing
 168     up an entire database cluster. For this reason the <xref
 169     linkend="app-pg-dumpall"> program is provided.
 170     <application>pg_dumpall</> backs up each database in a given
 171     cluster, and also preserves cluster-wide data such as users and
 172     groups. The basic usage of this command is:
 173 <synopsis>
 174 pg_dumpall &gt; <replaceable>outfile</>
 175 </synopsis>
 176     The resulting dump can be restored with <application>psql</>:
 177 <synopsis>
 178 psql -f <replaceable class="parameter">infile</replaceable> postgres
 179 </synopsis>
 180     (Actually, you can specify any existing database name to start from,
 181     but if you are reloading in an empty cluster then <literal>postgres</>
 182     should generally be used.)  It is always necessary to have
 183     database superuser access when restoring a <application>pg_dumpall</>
 184     dump, as that is required to restore the user and group information.
 185    </para>
 186   </sect2>
 187
 188   <sect2 id="backup-dump-large">
 189    <title>Handling large databases</title>
 190
 191    <para>
 192     Since <productname>PostgreSQL</productname> allows tables larger
 193     than the maximum file size on your system, it can be problematic
 194     to dump such a table to a file, since the resulting file will likely
 195     be larger than the maximum size allowed by your system. Since
 196     <application>pg_dump</> can write to the standard output, you can
 197     just use standard Unix tools to work around this possible problem.
 198    </para>
 199
 200    <formalpara>
 201     <title>Use compressed dumps.</title>
 202     <para>
 203      You can use your favorite compression program, for example
 204      <application>gzip</application>.
 205
 206 <programlisting>
 207 pg_dump <replaceable class="parameter">dbname</replaceable> | gzip &gt; <replaceable class="parameter">filename</replaceable>.gz
 208 </programlisting>
 209
 210      Reload with
 211
 212 <programlisting>
 213 createdb <replaceable class="parameter">dbname</replaceable>
 214 gunzip -c <replaceable class="parameter">filename</replaceable>.gz | psql <replaceable class="parameter">dbname</replaceable>
 215 </programlisting>
 216
 217      or
 218
 219 <programlisting>
 220 cat <replaceable class="parameter">filename</replaceable>.gz | gunzip | psql <replaceable class="parameter">dbname</replaceable>
 221 </programlisting>
 222     </para>
 223    </formalpara>
 224
 225    <formalpara>
 226     <title>Use <command>split</>.</title>
 227     <para>
 228      The <command>split</command> command
 229      allows you to split the output into pieces that are
 230      acceptable in size to the underlying file system. For example, to
 231      make chunks of 1 megabyte:
 232
 233 <programlisting>
 234 pg_dump <replaceable class="parameter">dbname</replaceable> | split -b 1m - <replaceable class="parameter">filename</replaceable>
 235 </programlisting>
 236
 237      Reload with
 238
 239 <programlisting>
 240 createdb <replaceable class="parameter">dbname</replaceable>
 241 cat <replaceable class="parameter">filename</replaceable>* | psql <replaceable class="parameter">dbname</replaceable>
 242 </programlisting>
 243     </para>
 244    </formalpara>
 245
 246    <formalpara>
 247     <title>Use the custom dump format.</title>
 248     <para>
 249      If <productname>PostgreSQL</productname> was built on a system with the
 250      <application>zlib</> compression library installed, the custom dump
 251      format will compress data as it writes it to the output file. This will
 252      produce dump file sizes similar to using <command>gzip</command>, but it
 253      has the added advantage that tables can be restored selectively. The
 254      following command dumps a database using the custom dump format:
 255
 256 <programlisting>
 257 pg_dump -Fc <replaceable class="parameter">dbname</replaceable> &gt; <replaceable class="parameter">filename</replaceable>
 258 </programlisting>
 259
 260      A custom-format dump is not a script for <application>psql</>, but
 261      instead must be restored with <application>pg_restore</>.
 262      See the <xref linkend="app-pgdump"> and <xref
 263      linkend="app-pgrestore"> reference pages for details.
 264     </para>
 265    </formalpara>
 266
 267   </sect2>
 268  </sect1>
 269
 270  <sect1 id="backup-file">
 271   <title>File system level backup</title>
 272
 273   <para>
 274    An alternative backup strategy is to directly copy the files that
 275    <productname>PostgreSQL</> uses to store the data in the database. In
 276    <xref linkend="creating-cluster"> it is explained where these files
 277    are located, but you have probably found them already if you are
 278    interested in this method. You can use whatever method you prefer
 279    for doing usual file system backups, for example
 280
 281 <programlisting>
 282 tar -cf backup.tar /usr/local/pgsql/data
 283 </programlisting>
 284   </para>
 285
 286   <para>
 287    There are two restrictions, however, which make this method
 288    impractical, or at least inferior to the <application>pg_dump</>
 289    method:
 290
 291    <orderedlist>
 292     <listitem>
 293      <para>
 294       The database server <emphasis>must</> be shut down in order to
 295       get a usable backup. Half-way measures such as disallowing all
 296       connections will <emphasis>not</emphasis> work
 297       (mainly because <command>tar</command> and similar tools do not take an
 298       atomic snapshot of the state of the file system at a point in
 299       time). Information about stopping the server can be found in
 300       <xref linkend="postmaster-shutdown">.  Needless to say that you
 301       also need to shut down the server before restoring the data.
 302      </para>
 303     </listitem>
 304
 305     <listitem>
 306      <para>
 307       If you have dug into the details of the file system layout of the
 308       database, you may be tempted to try to back up or restore only certain
 309       individual tables or databases from their respective files or
 310       directories. This will <emphasis>not</> work because the
 311       information contained in these files contains only half the
 312       truth. The other half is in the commit log files
 313       <filename>pg_clog/*</filename>, which contain the commit status of
 314       all transactions. A table file is only usable with this
 315       information. Of course it is also impossible to restore only a
 316       table and the associated <filename>pg_clog</filename> data
 317       because that would render all other tables in the database
 318       cluster useless.  So file system backups only work for complete
 319       restoration of an entire database cluster.
 320      </para>
 321     </listitem>
 322    </orderedlist>
 323   </para>
 324
 325   <para>
 326    An alternative file-system backup approach is to make a
 327    <quote>consistent snapshot</quote> of the data directory, if the
 328    file system supports that functionality (and you are willing to
 329    trust that it is implemented correctly).  The typical procedure is
 330    to make a <quote>frozen snapshot</> of the volume containing the
 331    database, then copy the whole data directory (not just parts, see
 332    above) from the snapshot to a backup device, then release the frozen
 333    snapshot.  This will work even while the database server is running.
 334    However, a backup created in this way saves
 335    the database files in a state where the database server was not
 336    properly shut down; therefore, when you start the database server
 337    on the backed-up data, it will think the server had crashed
 338    and replay the WAL log.  This is not a problem, just be aware of
 339    it (and be sure to include the WAL files in your backup).
 340   </para>
 341
 342   <para>
 343    If your database is spread across multiple file systems, there may not
 344    be any way to obtain exactly-simultaneous frozen snapshots of all
 345    the volumes.  For example, if your data files and WAL log are on different
 346    disks, or if tablespaces are on different file systems, it might
 347    not be possible to use snapshot backup because the snapshots must be
 348    simultaneous.
 349    Read your file system documentation very carefully before trusting
 350    to the consistent-snapshot technique in such situations.  The safest
 351    approach is to shut down the database server for long enough to
 352    establish all the frozen snapshots.
 353   </para>
 354
 355   <para>
 356    Another option is to use <application>rsync</> to perform a file
 357    system backup.  This is done by first running <application>rsync</>
 358    while the database server is running, then shutting down the database
 359    server just long enough to do a second <application>rsync</>.  The
 360    second <application>rsync</> will be much quicker than the first,
 361    because it has relatively little data to transfer, and the end result
 362    will be consistent because the server was down.  This method
 363    allows a file system backup to be performed with minimal downtime.
 364   </para>
 365
 366   <para>
 367    Note that a file system backup will not necessarily be
 368    smaller than an SQL dump. On the contrary, it will most likely be
 369    larger. (<application>pg_dump</application> does not need to dump
 370    the contents of indexes for example, just the commands to recreate
 371    them.)
 372   </para>
 373  </sect1>
 374
 375  <sect1 id="backup-online">
 376   <title>On-line backup and point-in-time recovery (PITR)</title>
 377
 378   <indexterm zone="backup">
 379    <primary>on-line backup</primary>
 380   </indexterm>
 381
 382   <indexterm zone="backup">
 383    <primary>point-in-time recovery</primary>
 384   </indexterm>
 385
 386   <indexterm zone="backup">
 387    <primary>PITR</primary>
 388   </indexterm>
 389
 390   <para>
 391    At all times, <productname>PostgreSQL</> maintains a
 392    <firstterm>write ahead log</> (WAL) in the <filename>pg_xlog/</>
 393    subdirectory of the cluster's data directory. The log describes
 394    every change made to the database's data files.  This log exists
 395    primarily for crash-safety purposes: if the system crashes, the
 396    database can be restored to consistency by <quote>replaying</> the
 397    log entries made since the last checkpoint.  However, the existence
 398    of the log makes it possible to use a third strategy for backing up
 399    databases: we can combine a file-system-level backup with backup of
 400    the WAL files.  If recovery is needed, we restore the backup and
 401    then replay from the backed-up WAL files to bring the backup up to
 402    current time.  This approach is more complex to administer than
 403    either of the previous approaches, but it has some significant
 404    benefits:
 405   <itemizedlist>
 406    <listitem>
 407     <para>
 408      We do not need a perfectly consistent backup as the starting point.
 409      Any internal inconsistency in the backup will be corrected by log
 410      replay (this is not significantly different from what happens during
 411      crash recovery).  So we don't need file system snapshot capability,
 412      just <application>tar</> or a similar archiving tool.
 413     </para>
 414    </listitem>
 415    <listitem>
 416     <para>
 417      Since we can string together an indefinitely long sequence of WAL files
 418      for replay, continuous backup can be achieved simply by continuing to archive
 419      the WAL files.  This is particularly valuable for large databases, where
 420      it may not be convenient to take a full backup frequently.
 421     </para>
 422    </listitem>
 423    <listitem>
 424     <para>
 425      There is nothing that says we have to replay the WAL entries all the
 426      way to the end.  We could stop the replay at any point and have a
 427      consistent snapshot of the database as it was at that time.  Thus,
 428      this technique supports <firstterm>point-in-time recovery</>: it is
 429      possible to restore the database to its state at any time since your base
 430      backup was taken.
 431     </para>
 432    </listitem>
 433    <listitem>
 434     <para>
 435      If we continuously feed the series of WAL files to another
 436      machine that has been loaded with the same base backup file, we
 437      have a <quote>hot standby</> system: at any point we can bring up
 438      the second machine and it will have a nearly-current copy of the
 439      database.
 440     </para>
 441    </listitem>
 442   </itemizedlist>
 443   </para>
 444
 445   <para>
 446    As with the plain file-system-backup technique, this method can only
 447    support restoration of an entire database cluster, not a subset.
 448    Also, it requires a lot of archival storage: the base backup may be bulky,
 449    and a busy system will generate many megabytes of WAL traffic that
 450    have to be archived.  Still, it is the preferred backup technique in
 451    many situations where high reliability is needed.
 452   </para>
 453
 454   <para>
 455    To recover successfully using an on-line backup, you need a continuous
 456    sequence of archived WAL files that extends back at least as far as the
 457    start time of your backup.  So to get started, you should set up and test
 458    your procedure for archiving WAL files <emphasis>before</> you take your
 459    first base backup.  Accordingly, we first discuss the mechanics of
 460    archiving WAL files.
 461   </para>
 462
 463   <sect2 id="backup-archiving-wal">
 464    <title>Setting up WAL archiving</title>
 465
 466    <para>
 467     In an abstract sense, a running <productname>PostgreSQL</> system
 468     produces an indefinitely long sequence of WAL records.  The system
 469     physically divides this sequence into WAL <firstterm>segment
 470     files</>, which are normally 16MB apiece (although the size can be
 471     altered when building <productname>PostgreSQL</>).  The segment
 472     files are given numeric names that reflect their position in the
 473     abstract WAL sequence.  When not using WAL archiving, the system
 474     normally creates just a few segment files and then
 475     <quote>recycles</> them by renaming no-longer-needed segment files
 476     to higher segment numbers.  It's assumed that a segment file whose
 477     contents precede the checkpoint-before-last is no longer of
 478     interest and can be recycled.
 479    </para>
 480
 481    <para>
 482     When archiving WAL data, we want to capture the contents of each segment
 483     file once it is filled, and save that data somewhere before the segment
 484     file is recycled for reuse.  Depending on the application and the
 485     available hardware, there could be many different ways of <quote>saving
 486     the data somewhere</>: we could copy the segment files to an NFS-mounted
 487     directory on another machine, write them onto a tape drive (ensuring that
 488     you have a way of restoring the file with its original file name), or batch
 489     them together and burn them onto CDs, or something else entirely.  To
 490     provide the database administrator with as much flexibility as possible,
 491     <productname>PostgreSQL</> tries not to make any assumptions about how
 492     the archiving will be done.  Instead, <productname>PostgreSQL</> lets
 493     the administrator specify a shell command to be executed to copy a
 494     completed segment file to wherever it needs to go.  The command could be
 495     as simple as a <application>cp</>, or it could invoke a complex shell
 496     script &mdash; it's all up to you.
 497    </para>
 498
 499    <para>
 500     The shell command to use is specified by the <xref
 501     linkend="guc-archive-command"> configuration parameter, which in practice
 502     will always be placed in the <filename>postgresql.conf</filename> file.
 503     In this string,
 504     any <literal>%p</> is replaced by the absolute path of the file to
 505     archive, while any <literal>%f</> is replaced by the file name only.
 506     Write <literal>%%</> if you need to embed an actual <literal>%</>
 507     character in the command.  The simplest useful command is something
 508     like
 509 <programlisting>
 510 archive_command = 'cp -i %p /mnt/server/archivedir/%f &lt;/dev/null'
 511 </programlisting>
 512     which will copy archivable WAL segments to the directory
 513     <filename>/mnt/server/archivedir</>.  (This is an example, not a
 514     recommendation, and may not work on all platforms.)
 515    </para>
 516
 517    <para>
 518     The archive command will be executed under the ownership of the same
 519     user that the <productname>PostgreSQL</> server is running as.  Since
 520     the series of WAL files being archived contains effectively everything
 521     in your database, you will want to be sure that the archived data is
 522     protected from prying eyes; for example, archive into a directory that
 523     does not have group or world read access.
 524    </para>
 525
 526    <para>
 527     It is important that the archive command return zero exit status if and
 528     only if it succeeded.  Upon getting a zero result,
 529     <productname>PostgreSQL</> will assume that the WAL segment file has been
 530     successfully archived, and will remove or recycle it.
 531     However, a nonzero status tells
 532     <productname>PostgreSQL</> that the file was not archived; it will try
 533     again periodically until it succeeds.
 534    </para>
 535
 536    <para>
 537     The archive command should generally be designed to refuse to overwrite
 538     any pre-existing archive file.  This is an important safety feature to
 539     preserve the integrity of your archive in case of administrator error
 540     (such as sending the output of two different servers to the same archive
 541     directory).
 542     It is advisable to test your proposed archive command to ensure that it
 543     indeed does not overwrite an existing file, <emphasis>and that it returns
 544     nonzero status in this case</>.  We have found that <literal>cp -i</> does
 545     this correctly on some platforms but not others.  If the chosen command
 546     does not itself handle this case correctly, you should add a command
 547     to test for pre-existence of the archive file.  For example, something
 548     like
 549 <programlisting>
 550 archive_command = 'test ! -f .../%f &amp;&amp; cp %p .../%f'
 551 </programlisting>
 552     works correctly on most Unix variants.
 553    </para>
 554
 555    <para>
 556     While designing your archiving setup, consider what will happen if
 557     the archive command fails repeatedly because some aspect requires
 558     operator intervention or the archive runs out of space. For example, this
 559     could occur if you write to tape without an autochanger; when the tape
 560     fills, nothing further can be archived until the tape is swapped.
 561     You should ensure that any error condition or request to a human operator
 562     is reported appropriately so that the situation can be
 563     resolved relatively quickly. The <filename>pg_xlog/</> directory will
 564     continue to fill with WAL segment files until the situation is resolved.
 565    </para>
 566
 567    <para>
 568     The speed of the archiving command is not important, so long as it can keep up
 569     with the average rate at which your server generates WAL data.  Normal
 570     operation continues even if the archiving process falls a little behind.
 571     If archiving falls significantly behind, this will increase the amount of
 572     data that would be lost in the event of a disaster. It will also mean that
 573     the <filename>pg_xlog/</> directory will contain large numbers of
 574     not-yet-archived segment files, which could eventually exceed available
 575     disk space. You are advised to monitor the archiving process to ensure that
 576     it is working as you intend.
 577    </para>
 578
 579    <para>
 580     If you are concerned about being able to recover right up to the
 581     current instant, you may want to take additional steps to ensure that
 582     the current, partially-filled WAL segment is also copied someplace.
 583     This is particularly important if your server generates only little WAL
 584     traffic (or has slack periods where it does so), since it could take a
 585     long time before a WAL segment file is completely filled and ready to
 586     archive.  One possible way to handle this is to set up a
 587     <application>cron</> job that periodically (once a minute, perhaps)
 588     identifies the current WAL segment file and saves it someplace safe.
 589     Then the combination of the archived WAL segments and the saved current
 590     segment will be enough to ensure you can always restore to within a
 591     minute of current time.  This behavior is not presently built into
 592     <productname>PostgreSQL</> because we did not want to complicate the
 593     definition of the <xref linkend="guc-archive-command"> by requiring it
 594     to keep track of successively archived, but different, copies of the
 595     same WAL file.  The <xref linkend="guc-archive-command"> is only
 596     invoked on completed WAL segments. Except in the case of retrying a
 597     failure, it will be called only once for any given file name.
 598    </para>
 599
 600    <para>
 601     In writing your archive command, you should assume that the file names to
 602     be archived may be up to 64 characters long and may contain any
 603     combination of ASCII letters, digits, and dots.  It is not necessary to
 604     remember the original full path (<literal>%p</>) but it is necessary to
 605     remember the file name (<literal>%f</>).
 606    </para>
 607
 608    <para>
 609     Note that although WAL archiving will allow you to restore any
 610     modifications made to the data in your <productname>PostgreSQL</> database
 611     it will not restore changes made to configuration files (that is,
 612     <filename>postgresql.conf</>, <filename>pg_hba.conf</> and
 613     <filename>pg_ident.conf</>), since those are edited manually rather
 614     than through SQL operations.
 615     You may wish to keep the configuration files in a location that will
 616     be backed up by your regular file system backup procedures.  See
 617     <xref linkend="runtime-config-file-locations"> for how to relocate the
 618     configuration files.
 619    </para>
 620   </sect2>
 621
 622   <sect2 id="backup-base-backup">
 623    <title>Making a Base Backup</title>
 624
 625    <para>
 626     The procedure for making a base backup is relatively simple:
 627   <orderedlist>
 628    <listitem>
 629     <para>
 630      Ensure that WAL archiving is enabled and working.
 631     </para>
 632    </listitem>
 633    <listitem>
 634     <para>
 635      Connect to the database as a superuser, and issue the command
 636 <programlisting>
 637 SELECT pg_start_backup('label');
 638 </programlisting>
 639      where <literal>label</> is any string you want to use to uniquely
 640      identify this backup operation.  (One good practice is to use the
 641      full path where you intend to put the backup dump file.)
 642      <function>pg_start_backup</> creates a <firstterm>backup label</> file,
 643      called <filename>backup_label</>, in the cluster directory with
 644      information about your backup.
 645     </para>
 646
 647     <para>
 648      It does not matter which database within the cluster you connect to to
 649      issue this command.  You can ignore the result returned by the function;
 650      but if it reports an error, deal with that before proceeding.
 651     </para>
 652    </listitem>
 653    <listitem>
 654     <para>
 655      Perform the backup, using any convenient file-system-backup tool
 656      such as <application>tar</> or <application>cpio</>.  It is neither
 657      necessary nor desirable to stop normal operation of the database
 658      while you do this.
 659     </para>
 660    </listitem>
 661    <listitem>
 662     <para>
 663      Again connect to the database as a superuser, and issue the command
 664 <programlisting>
 665 SELECT pg_stop_backup();
 666 </programlisting>
 667      This should return successfully.
 668     </para>
 669    </listitem>
 670    <listitem>
 671     <para>
 672      Once the WAL segment files used during the backup are archived as part
 673      of normal database activity, you are done.
 674     </para>
 675    </listitem>
 676   </orderedlist>
 677    </para>
 678
 679    <para>
 680     Some backup tools that you might wish to use emit warnings or errors
 681     if the files they are trying to copy change while the copy proceeds.
 682     This situation is normal, and not an error, when taking a base backup of
 683     an active database; so you need to ensure that you can distinguish
 684     complaints of this sort from real errors.  For example, some versions
 685     of <application>rsync</> return a separate exit code for <quote>vanished
 686     source files</>, and you can write a driver script to accept this exit
 687     code as a non-error case.  Also,
 688     some versions of GNU <application>tar</> consider it an error if a file
 689     is changed while <application>tar</> is copying it.  There does not seem
 690     to be any very convenient way to distinguish this error from other types
 691     of errors, other than manual inspection of <application>tar</>'s messages.
 692     GNU <application>tar</> is therefore not the best tool for making base
 693     backups.
 694    </para>
 695
 696    <para>
 697     It is not necessary to be very concerned about the amount of time elapsed
 698     between <function>pg_start_backup</> and the start of the actual backup,
 699     nor between the end of the backup and <function>pg_stop_backup</>; a
 700     few minutes' delay won't hurt anything.  You
 701     must however be quite sure that these operations are carried out in
 702     sequence and do not overlap.
 703    </para>
 704
 705    <para>
 706     Be certain that your backup dump includes all of the files underneath
 707     the database cluster directory (e.g., <filename>/usr/local/pgsql/data</>).
 708     If you are using tablespaces that do not reside underneath this directory,
 709     be careful to include them as well (and be sure that your backup dump
 710     archives symbolic links as links, otherwise the restore will mess up
 711     your tablespaces).
 712    </para>
 713
 714    <para>
 715     You may, however, omit from the backup dump the files within the
 716     <filename>pg_xlog/</> subdirectory of the cluster directory.  This
 717     slight complication is worthwhile because it reduces the risk
 718     of mistakes when restoring.  This is easy to arrange if
 719     <filename>pg_xlog/</> is a symbolic link pointing to someplace outside
 720     the cluster directory, which is a common setup anyway for performance
 721     reasons.
 722    </para>
 723
 724    <para>
 725     To make use of this backup, you will need to keep around all the WAL
 726     segment files generated during and after the file system backup.
 727     To aid you in doing this, the <function>pg_stop_backup</> function
 728     creates a <firstterm>backup history file</> that is immediately
 729     stored into the WAL archive area. This file is named after the first
 730     WAL segment file that you need to have to make use of the backup.
 731     For example, if the starting WAL file is
 732     <literal>0000000100001234000055CD</> the backup history file will be
 733     named something like
 734     <literal>0000000100001234000055CD.007C9330.backup</>. (The second
 735     number in the file name stands for an exact position within the WAL
 736     file, and can ordinarily be ignored.) Once you have safely archived
 737     the file system backup and the WAL segment files used during the
 738     backup (as specified in the backup history file), all archived WAL
 739     segments with names numerically less are no longer needed to recover
 740     the file system backup and may be deleted. However, you should
 741     consider keeping several backup sets to be absolutely certain that
 742     you can recover your data. Keep in mind that only completed WAL
 743     segment files are archived, so there will be delay between running
 744     <function>pg_stop_backup</> and the archiving of all WAL segment
 745     files needed to make the file system backup consistent.
 746    </para>
 747    <para>
 748     The backup history file is just a small text file. It contains the
 749     label string you gave to <function>pg_start_backup</>, as well as
 750     the starting and ending times of the backup. If you used the label
 751     to identify where the associated dump file is kept, then the
 752     archived history file is enough to tell you which dump file to
 753     restore, should you need to do so.
 754    </para>
 755
 756    <para>
 757     Since you have to keep around all the archived WAL files back to your
 758     last base backup, the interval between base backups should usually be
 759     chosen based on how much storage you want to expend on archived WAL
 760     files.  You should also consider how long you are prepared to spend
 761     recovering, if recovery should be necessary &mdash; the system will have to
 762     replay all those WAL segments, and that could take awhile if it has
 763     been a long time since the last base backup.
 764    </para>
 765
 766    <para>
 767     It's also worth noting that the <function>pg_start_backup</> function
 768     makes a file named <filename>backup_label</> in the database cluster
 769     directory, which is then removed again by <function>pg_stop_backup</>.
 770     This file will of course be archived as a part of your backup dump file.
 771     The backup label file includes the label string you gave to
 772     <function>pg_start_backup</>, as well as the time at which
 773     <function>pg_start_backup</> was run, and the name of the starting WAL
 774     file.  In case of confusion it will
 775     therefore be possible to look inside a backup dump file and determine
 776     exactly which backup session the dump file came from.
 777    </para>
 778
 779    <para>
 780     It is also possible to make a backup dump while the postmaster is
 781     stopped.  In this case, you obviously cannot use
 782     <function>pg_start_backup</> or <function>pg_stop_backup</>, and
 783     you will therefore be left to your own devices to keep track of which
 784     backup dump is which and how far back the associated WAL files go.
 785     It is generally better to follow the on-line backup procedure above.
 786    </para>
 787   </sect2>
 788
 789   <sect2 id="backup-pitr-recovery">
 790    <title>Recovering with an On-line Backup</title>
 791
 792    <para>
 793     Okay, the worst has happened and you need to recover from your backup.
 794     Here is the procedure:
 795   <orderedlist>
 796    <listitem>
 797     <para>
 798      Stop the postmaster, if it's running.
 799     </para>
 800    </listitem>
 801    <listitem>
 802     <para>
 803      If you have the space to do so,
 804      copy the whole cluster data directory and any tablespaces to a temporary
 805      location in case you need them later. Note that this precaution will
 806      require that you have enough free space on your system to hold two
 807      copies of your existing database. If you do not have enough space,
 808      you need at the least to copy the contents of the <filename>pg_xlog</>
 809      subdirectory of the cluster data directory, as it may contain logs which
 810      were not archived before the system went down.
 811     </para>
 812    </listitem>
 813    <listitem>
 814     <para>
 815      Clean out all existing files and subdirectories under the cluster data
 816      directory and under the root directories of any tablespaces you are using.
 817     </para>
 818    </listitem>
 819    <listitem>
 820     <para>
 821      Restore the database files from your backup dump.  Be careful that they
 822      are restored with the right ownership (the database system user, not
 823      root!) and with the right permissions.  If you are using tablespaces,
 824      you may want to verify that the symbolic links in <filename>pg_tblspc/</>
 825      were correctly restored.
 826     </para>
 827    </listitem>
 828    <listitem>
 829     <para>
 830      Remove any files present in <filename>pg_xlog/</>; these came from the
 831      backup dump and are therefore probably obsolete rather than current.
 832      If you didn't archive <filename>pg_xlog/</> at all, then re-create it,
 833      and be sure to re-create the subdirectory
 834     <filename>pg_xlog/archive_status/</> as well.
 835     </para>
 836    </listitem>
 837    <listitem>
 838     <para>
 839      If you had unarchived WAL segment files that you saved in step 2,
 840      copy them into <filename>pg_xlog/</>.  (It is best to copy them,
 841      not move them, so that you still have the unmodified files if a
 842      problem occurs and you have to start over.)
 843     </para>
 844    </listitem>
 845    <listitem>
 846     <para>
 847      Create a recovery command file <filename>recovery.conf</> in the cluster
 848      data directory (see <xref linkend="recovery-config-settings">). You may
 849      also want to temporarily modify <filename>pg_hba.conf</> to prevent
 850      ordinary users from connecting until you are sure the recovery has worked.
 851     </para>
 852    </listitem>
 853    <listitem>
 854     <para>
 855      Start the postmaster.  The postmaster will go into recovery mode and
 856      proceed to read through the archived WAL files it needs.  Upon completion
 857      of the recovery process, the postmaster will rename
 858      <filename>recovery.conf</> to <filename>recovery.done</> (to prevent
 859      accidentally re-entering recovery mode in case of a crash later) and then
 860      commence normal database operations.
 861     </para>
 862    </listitem>
 863    <listitem>
 864     <para>
 865      Inspect the contents of the database to ensure you have recovered to
 866      where you want to be.  If not, return to step 1.  If all is well,
 867      let in your users by restoring <filename>pg_hba.conf</> to normal.
 868     </para>
 869    </listitem>
 870   </orderedlist>
 871    </para>
 872
 873    <para>
 874     The key part of all this is to set up a recovery command file that
 875     describes how you want to recover and how far the recovery should
 876     run.  You can use <filename>recovery.conf.sample</> (normally
 877     installed in the installation <filename>share/</> directory) as a
 878     prototype.  The one thing that you absolutely must specify in
 879     <filename>recovery.conf</> is the <varname>restore_command</>,
 880     which tells <productname>PostgreSQL</> how to get back archived
 881     WAL file segments.  Like the <varname>archive_command</>, this is
 882     a shell command string.  It may contain <literal>%f</>, which is
 883     replaced by the name of the desired log file, and <literal>%p</>,
 884     which is replaced by the absolute path to copy the log file to.
 885     Write <literal>%%</> if you need to embed an actual <literal>%</>
 886     character in the command.  The simplest useful command is
 887     something like
 888 <programlisting>
 889 restore_command = 'cp /mnt/server/archivedir/%f %p'
 890 </programlisting>
 891     which will copy previously archived WAL segments from the directory
 892     <filename>/mnt/server/archivedir</>.  You could of course use something
 893     much more complicated, perhaps even a shell script that requests the
 894     operator to mount an appropriate tape.
 895    </para>
 896
 897    <para>
 898     It is important that the command return nonzero exit status on failure.
 899     The command <emphasis>will</> be asked for log files that are not present
 900     in the archive; it must return nonzero when so asked.  This is not an
 901     error condition.  Be aware also that the base name of the <literal>%p</>
 902     path will be different from <literal>%f</>; do not expect them to be
 903     interchangeable.
 904    </para>
 905
 906    <para>
 907     WAL segments that cannot be found in the archive will be sought in
 908     <filename>pg_xlog/</>; this allows use of recent un-archived segments.
 909     However segments that are available from the archive will be used in
 910     preference to files in <filename>pg_xlog/</>.  The system will not
 911     overwrite the existing contents of <filename>pg_xlog/</> when retrieving
 912     archived files.
 913    </para>
 914
 915    <para>
 916     Normally, recovery will proceed through all available WAL segments,
 917     thereby restoring the database to the current point in time (or as
 918     close as we can get given the available WAL segments).  But if you want
 919     to recover to some previous point in time (say, right before the junior
 920     DBA dropped your main transaction table), just specify the required
 921     stopping point in <filename>recovery.conf</>.  You can specify the stop
 922     point, known as the <quote>recovery target</>, either by date/time or
 923     by completion of a specific transaction ID.  As of this writing only
 924     the date/time option is very usable, since there are no tools to help
 925     you identify with any accuracy which transaction ID to use.
 926    </para>
 927
 928    <note>
 929      <para>
 930       The stop point must be after the ending time of the base backup (the
 931       time of <function>pg_stop_backup</>).  You cannot use a base backup
 932       to recover to a time when that backup was still going on.  (To
 933       recover to such a time, you must go back to your previous base backup
 934       and roll forward from there.)
 935      </para>
 936     </note>
 937
 938     <sect3 id="recovery-config-settings" xreflabel="Recovery Settings">
 939      <title>Recovery Settings</title>
 940
 941      <para>
 942       These settings can only be made in the <filename>recovery.conf</>
 943       file, and apply only for the duration of the recovery. They must be
 944       reset for any subsequent recovery you wish to perform. They cannot be
 945       changed once recovery has begun.
 946      </para>
 947
 948      <variablelist>
 949
 950      <varlistentry id="restore-command" xreflabel="restore_command">
 951       <term><varname>restore_command</varname> (<type>string</type>)</term>
 952       <listitem>
 953        <para>
 954         The shell command to execute to retrieve an archived segment of
 955         the WAL file series. This parameter is required.
 956         Any <literal>%f</> in the string is
 957         replaced by the name of the file to retrieve from the archive,
 958         and any <literal>%p</> is replaced by the absolute path to copy
 959         it to on the server.
 960         Write <literal>%%</> to embed an actual <literal>%</> character
 961         in the command.
 962        </para>
 963        <para>
 964         It is important for the command to return a zero exit status if and
 965         only if it succeeds.  The command <emphasis>will</> be asked for file
 966         names that are not present in the archive; it must return nonzero
 967         when so asked.  Examples:
 968 <programlisting>
 969 restore_command = 'cp /mnt/server/archivedir/%f "%p"'
 970 restore_command = 'copy /mnt/server/archivedir/%f "%p"'  # Windows
 971 </programlisting>
 972        </para>
 973       </listitem>
 974      </varlistentry>
 975
 976      <varlistentry id="recovery-target-time" xreflabel="recovery_target_time">
 977       <term><varname>recovery_target_time</varname>
 978            (<type>timestamp</type>)
 979       </term>
 980       <listitem>
 981        <para>
 982         This parameter specifies the time stamp up to which recovery
 983         will proceed.
 984         At most one of <varname>recovery_target_time</> and
 985         <xref linkend="recovery-target-xid"> can be specified.
 986         The default is to recover to the end of the WAL log.
 987         The precise stopping point is also influenced by
 988         <xref linkend="recovery-target-inclusive">.
 989        </para>
 990       </listitem>
 991      </varlistentry>
 992
 993      <varlistentry id="recovery-target-xid" xreflabel="recovery_target_xid">
 994       <term><varname>recovery_target_xid</varname> (<type>string</type>)</term>
 995       <listitem>
 996        <para>
 997         This parameter specifies the transaction ID up to which recovery
 998         will proceed. Keep in mind
 999         that while transaction IDs are assigned sequentially at transaction
1000         start, transactions can complete in a different numeric order.
1001         The transactions that will be recovered are those that committed
1002         before (and optionally including) the specified one.
1003         At most one of <varname>recovery_target_xid</> and
1004         <xref linkend="recovery-target-time"> can be specified.
1005         The default is to recover to the end of the WAL log.
1006         The precise stopping point is also influenced by
1007         <xref linkend="recovery-target-inclusive">.
1008        </para>
1009       </listitem>
1010      </varlistentry>
1011
1012      <varlistentry id="recovery-target-inclusive"
1013                    xreflabel="recovery_target_inclusive">
1014       <term><varname>recovery_target_inclusive</varname>
1015         (<type>boolean</type>)
1016       </term>
1017       <listitem>
1018        <para>
1019         Specifies whether we stop just after the specified recovery target
1020         (<literal>true</literal>), or just before the recovery target
1021         (<literal>false</literal>).
1022         Applies to both <xref linkend="recovery-target-time">
1023         and <xref linkend="recovery-target-xid">, whichever one is
1024         specified for this recovery.  This indicates whether transactions
1025         having exactly the target commit time or ID, respectively, will
1026         be included in the recovery.  Default is <literal>true</>.
1027        </para>
1028       </listitem>
1029      </varlistentry>
1030
1031      <varlistentry id="recovery-target-timeline"
1032                    xreflabel="recovery_target_timeline">
1033       <term><varname>recovery_target_timeline</varname>
1034         (<type>string</type>)
1035       </term>
1036       <listitem>
1037        <para>
1038         Specifies recovering into a particular timeline.  The default is
1039         to recover along the same timeline that was current when the
1040         base backup was taken.  You would only need to set this parameter
1041         in complex re-recovery situations, where you need to return to
1042         a state that itself was reached after a point-in-time recovery.
1043         See <xref linkend="backup-timelines"> for discussion.
1044        </para>
1045       </listitem>
1046      </varlistentry>
1047
1048    </variablelist>
1049
1050    </sect3>
1051
1052   </sect2>
1053
1054   <sect2 id="backup-timelines">
1055    <title>Timelines</title>
1056
1057   <indexterm zone="backup">
1058    <primary>timelines</primary>
1059   </indexterm>
1060
1061    <para>
1062     The ability to restore the database to a previous point in time creates
1063     some complexities that are akin to science-fiction stories about time
1064     travel and parallel universes.  In the original history of the database,
1065     perhaps you dropped a critical table at 5:15PM on Tuesday evening.
1066     Unfazed, you get out your backup, restore to the point-in-time 5:14PM
1067     Tuesday evening, and are up and running.  In <emphasis>this</> history of
1068     the database universe, you never dropped the table at all.  But suppose
1069     you later realize this wasn't such a great idea after all, and would like
1070     to return to some later point in the original history.  You won't be able
1071     to if, while your database was up-and-running, it overwrote some of the
1072     sequence of WAL segment files that led up to the time you now wish you
1073     could get back to.  So you really want to distinguish the series of
1074     WAL records generated after you've done a point-in-time recovery from
1075     those that were generated in the original database history.
1076    </para>
1077
1078    <para>
1079     To deal with these problems, <productname>PostgreSQL</> has a notion
1080     of <firstterm>timelines</>.  Each time you recover to a point-in-time
1081     earlier than the end of the WAL sequence, a new timeline is created
1082     to identify the series of WAL records generated after that recovery.
1083     (If recovery proceeds all the way to the end of WAL, however, we do not
1084     start a new timeline: we just extend the existing one.)  The timeline
1085     ID number is part of WAL segment file names, and so a new timeline does
1086     not overwrite the WAL data generated by previous timelines.  It is
1087     in fact possible to archive many different timelines.  While that might
1088     seem like a useless feature, it's often a lifesaver.  Consider the
1089     situation where you aren't quite sure what point-in-time to recover to,
1090     and so have to do several point-in-time recoveries by trial and error
1091     until you find the best place to branch off from the old history.  Without
1092     timelines this process would soon generate an unmanageable mess.  With
1093     timelines, you can recover to <emphasis>any</> prior state, including
1094     states in timeline branches that you later abandoned.
1095    </para>
1096
1097    <para>
1098     Each time a new timeline is created, <productname>PostgreSQL</> creates
1099     a <quote>timeline history</> file that shows which timeline it branched
1100     off from and when.  These history files are necessary to allow the system
1101     to pick the right WAL segment files when recovering from an archive that
1102     contains multiple timelines.  Therefore, they are archived into the WAL
1103     archive area just like WAL segment files.  The history files are just
1104     small text files, so it's cheap and appropriate to keep them around
1105     indefinitely (unlike the segment files which are large).  You can, if
1106     you like, add comments to a history file to make your own notes about
1107     how and why this particular timeline came to be.  Such comments will be
1108     especially valuable when you have a thicket of different timelines as
1109     a result of experimentation.
1110    </para>
1111
1112    <para>
1113     The default behavior of recovery is to recover along the same timeline
1114     that was current when the base backup was taken.  If you want to recover
1115     into some child timeline (that is, you want to return to some state that
1116     was itself generated after a recovery attempt), you need to specify the
1117     target timeline ID in <filename>recovery.conf</>.  You cannot recover into
1118     timelines that branched off earlier than the base backup.
1119    </para>
1120   </sect2>
1121
1122   <sect2 id="backup-online-caveats">
1123    <title>Caveats</title>
1124
1125    <para>
1126     At this writing, there are several limitations of the on-line backup
1127     technique.  These will probably be fixed in future releases:
1128
1129   <itemizedlist>
1130    <listitem>
1131     <para>
1132      Operations on hash and R-tree indexes are
1133      not presently WAL-logged, so replay will not update these index types.
1134      The recommended workaround is to manually <command>REINDEX</> each
1135      such index after completing a recovery operation.
1136     </para>
1137    </listitem>
1138
1139    <listitem>
1140     <para>
1141      If a <command>CREATE DATABASE</> command is executed while a base
1142      backup is being taken, and then the template database that the
1143      <command>CREATE DATABASE</> copied is modified while the base backup
1144      is still in progress, it is possible that recovery will cause those
1145      modifications to be propagated into the created database as well.
1146      This is of course undesirable.  To avoid this risk, it is best not to
1147      modify any template databases while taking a base backup.
1148     </para>
1149    </listitem>
1150
1151    <listitem>
1152     <para>
1153      <command>CREATE TABLESPACE</> commands are WAL-logged with the literal
1154      absolute path, and will therefore be replayed as tablespace creations
1155      with the same absolute path.  This might be undesirable if the log is
1156      being replayed on a different machine.  It can be dangerous even if
1157      the log is being replayed on the same machine, but into a new data
1158      directory: the replay will still overwrite the contents of the original
1159      tablespace.  To avoid potential gotchas of this sort, the best practice
1160      is to take a new base backup after creating or dropping tablespaces.
1161     </para>
1162    </listitem>
1163   </itemizedlist>
1164    </para>
1165
1166    <para>
1167     It should also be noted that the default <acronym>WAL</acronym>
1168     format is fairly bulky since it includes many disk page snapshots.
1169     These page snapshots are designed to support crash recovery,
1170     since we may need to fix partially-written disk pages.  Depending
1171     on your system hardware and software, the risk of partial writes may
1172     be small enough to ignore, in which case you can significantly reduce
1173     the total volume of archived logs by turning off page snapshots
1174     using the <xref linkend="guc-full-page-writes"> parameter.
1175     (Read the notes and warnings in
1176     <xref linkend="wal"> before you do so.)
1177     Turning off page snapshots does not prevent use of the logs for PITR
1178     operations.
1179     An area for future development is to compress archived WAL data by
1180     removing unnecessary page copies even when <varname>full_page_writes</>
1181     is on.  In the meantime, administrators
1182     may wish to reduce the number of page snapshots included in WAL by
1183     increasing the checkpoint interval parameters as much as feasible.
1184    </para>
1185   </sect2>
1186  </sect1>
1187
1188  <sect1 id="migration">
1189   <title>Migration Between Releases</title>
1190
1191   <indexterm zone="migration">
1192    <primary>upgrading</primary>
1193   </indexterm>
1194
1195   <indexterm zone="migration">
1196    <primary>version</primary>
1197    <secondary>compatibility</secondary>
1198   </indexterm>
1199
1200   <para>
1201    This section discusses how to migrate your database data from one
1202    <productname>PostgreSQL</> release to a newer one.
1203    The software installation procedure <foreignphrase>per se</> is not the
1204    subject of this section; those details are in <xref linkend="installation">.
1205   </para>
1206
1207   <para>
1208    As a general rule, the internal data storage format is subject to
1209    change between major releases of <productname>PostgreSQL</> (where
1210    the number after the first dot changes). This does not apply to
1211    different minor releases under the same major release (where the
1212    number after the second dot changes); these always have compatible
1213    storage formats. For example, releases 7.0.1, 7.1.2, and 7.2 are
1214    not compatible, whereas 7.1.1 and 7.1.2 are. When you update
1215    between compatible versions, you can simply replace the executables
1216    and reuse the data directory on disk. Otherwise you need to back
1217    up your data and restore it on the new server.  This has to be done
1218    using <application>pg_dump</>; file system level backup methods
1219    obviously won't work. There are checks in place that prevent you
1220    from using a data directory with an incompatible version of
1221    <productname>PostgreSQL</productname>, so no great harm can be done by
1222    trying to start the wrong server version on a data directory.
1223   </para>
1224
1225   <para>
1226    It is recommended that you use the <application>pg_dump</> and
1227    <application>pg_dumpall</> programs from the newer version of
1228    <productname>PostgreSQL</>, to take advantage of any enhancements
1229    that may have been made in these programs.  Current releases of the
1230    dump programs can read data from any server version back to 7.0.
1231   </para>
1232
1233   <para>
1234    The least downtime can be achieved by installing the new server in
1235    a different directory and running both the old and the new servers
1236    in parallel, on different ports. Then you can use something like
1237
1238 <programlisting>
1239 pg_dumpall -p 5432 | psql -d postgres -p 6543
1240 </programlisting>
1241
1242    to transfer your data.  Or use an intermediate file if you want.
1243    Then you can shut down the old server and start the new server at
1244    the port the old one was running at. You should make sure that the
1245    old database is not updated after you run <application>pg_dumpall</>,
1246    otherwise you will obviously lose that data. See <xref
1247    linkend="client-authentication"> for information on how to prohibit
1248    access.
1249   </para>
1250
1251   <para>
1252    In practice you probably want to test your client
1253    applications on the new setup before switching over completely.
1254    This is another reason for setting up concurrent installations
1255    of old and new versions.
1256   </para>
1257
1258   <para>
1259    If you cannot or do not want to run two servers in parallel you can
1260    do the backup step before installing the new version, bring down
1261    the server, move the old version out of the way, install the new
1262    version, start the new server, restore the data. For example:
1263
1264 <programlisting>
1265 pg_dumpall &gt; backup
1266 pg_ctl stop
1267 mv /usr/local/pgsql /usr/local/pgsql.old
1268 cd ~/postgresql-&version;
1269 gmake install
1270 initdb -D /usr/local/pgsql/data
1271 postmaster -D /usr/local/pgsql/data
1272 psql -f backup postgres
1273 </programlisting>
1274
1275    See <xref linkend="runtime"> about ways to start and stop the
1276    server and other details. The installation instructions will advise
1277    you of strategic places to perform these steps.
1278   </para>
1279
1280   <note>
1281    <para>
1282     When you <quote>move the old installation out of the way</quote>
1283     it may no longer be perfectly usable. Some of the executable programs
1284     contain absolute paths to various installed programs and data files.
1285     This is usually not a big problem but if you plan on using two
1286     installations in parallel for a while you should assign them
1287     different installation directories at build time.  (This problem
1288     is rectified in <productname>PostgreSQL</> 8.0 and later, but you
1289     need to be wary of moving older installations.)
1290    </para>
1291   </note>
1292  </sect1>
1293 </chapter>
1294
1295 <!-- Keep this comment at the end of the file
1296 Local variables:
1297 mode:sgml
1298 sgml-omittag:nil
1299 sgml-shorttag:t
1300 sgml-minimize-attributes:nil
1301 sgml-always-quote-attributes:t
1302 sgml-indent-step:1
1303 sgml-indent-tabs-mode:nil
1304 sgml-indent-data:t
1305 sgml-parent-document:nil
1306 sgml-default-dtd-file:"./reference.ced"
1307 sgml-exposed-tags:nil
1308 sgml-local-catalogs:("/usr/share/sgml/catalog")
1309 sgml-local-ecat-files:nil
1310 End:
1311 -->