Move "Warm Standby Servers for High Availability" and "Hot Standby"

author Heikki Linnakangas <heikki.linnakangas@iki.fi>

Tue, 9 Feb 2010 16:50:25 +0000 (16:50 +0000)

committer Heikki Linnakangas <heikki.linnakangas@iki.fi>

Tue, 9 Feb 2010 16:50:25 +0000 (16:50 +0000)
author Heikki Linnakangas <heikki.linnakangas@iki.fi>
Tue, 9 Feb 2010 16:50:25 +0000 (16:50 +0000)
committer Heikki Linnakangas <heikki.linnakangas@iki.fi>
Tue, 9 Feb 2010 16:50:25 +0000 (16:50 +0000)
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml

index f3888c6e60bf0be0015a873b6dc1871e8fe273df..76151c1f6c57cd85b9eb4e265c49940a2fc12c95 100644 (file)
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.140 2010/02/03 17:25:05 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.141 2010/02/09 16:50:25 heikki Exp $ -->
  
  <chapter id="backup">
   <title>Backup and Restore</title>
@@ -1492,1388 +1492,6 @@ archive_command = 'local_backup_script.sh'
    </sect2>
   </sect1>
  
- <sect1 id="warm-standby">
-  <title>Warm Standby Servers for High Availability</title>
-
-  <indexterm zone="backup">
-   <primary>warm standby</primary>
-  </indexterm>
-
-  <indexterm zone="backup">
-   <primary>PITR standby</primary>
-  </indexterm>
-
-  <indexterm zone="backup">
-   <primary>standby server</primary>
-  </indexterm>
-
-  <indexterm zone="backup">
-   <primary>log shipping</primary>
-  </indexterm>
-
-  <indexterm zone="backup">
-   <primary>witness server</primary>
-  </indexterm>
-
-  <indexterm zone="backup">
-   <primary>STONITH</primary>
-  </indexterm>
-
-  <indexterm zone="backup">
-   <primary>high availability</primary>
-  </indexterm>
-
-  <para>
-   Continuous archiving can be used to create a <firstterm>high
-   availability</> (HA) cluster configuration with one or more
-   <firstterm>standby servers</> ready to take over operations if the
-   primary server fails. This capability is widely referred to as
-   <firstterm>warm standby</> or <firstterm>log shipping</>.
-  </para>
-
-  <para>
-   The primary and standby server work together to provide this capability,
-   though the servers are only loosely coupled. The primary server operates
-   in continuous archiving mode, while each standby server operates in
-   continuous recovery mode, reading the WAL files from the primary. No
-   changes to the database tables are required to enable this capability,
-   so it offers low administration overhead compared to some other
-   replication approaches. This configuration also has relatively low
-   performance impact on the primary server.
-  </para>
-
-  <para>
-   Directly moving WAL records from one database server to another
-   is typically described as log shipping. <productname>PostgreSQL</>
-   implements file-based log shipping, which means that WAL records are
-   transferred one file (WAL segment) at a time. WAL files (16MB) can be
-   shipped easily and cheaply over any distance, whether it be to an
-   adjacent system, another system at the same site, or another system on
-   the far side of the globe. The bandwidth required for this technique
-   varies according to the transaction rate of the primary server.
-   Record-based log shipping is also possible with custom-developed
-   procedures, as discussed in <xref linkend="warm-standby-record">.
-  </para>
-
-  <para>
-   It should be noted that the log shipping is asynchronous, i.e., the WAL
-   records are shipped after transaction commit. As a result there is a
-   window for data loss should the primary server suffer a catastrophic
-   failure: transactions not yet shipped will be lost.  The length of the
-   window of data loss can be limited by use of the
-   <varname>archive_timeout</varname> parameter, which can be set as low
-   as a few seconds if required.  However such a low setting will
-   substantially increase the bandwidth required for file shipping.
-   If you need a window of less than a minute or so, it's probably better
-   to consider record-based log shipping.
-  </para>
-
-  <para>
-   The standby server is not available for access, since it is continually
-   performing recovery processing. Recovery performance is sufficiently
-   good that the standby will typically be only moments away from full
-   availability once it has been activated. As a result, we refer to this
-   capability as a warm standby configuration that offers high
-   availability. Restoring a server from an archived base backup and
-   rollforward will take considerably longer, so that technique only
-   offers a solution for disaster recovery, not high availability.
-  </para>
-
-  <sect2 id="warm-standby-planning">
-   <title>Planning</title>
-
-   <para>
-    It is usually wise to create the primary and standby servers
-    so that they are as similar as possible, at least from the
-    perspective of the database server.  In particular, the path names
-    associated with tablespaces will be passed across unmodified, so both
-    primary and standby servers must have the same mount paths for
-    tablespaces if that feature is used.  Keep in mind that if
-    <xref linkend="sql-createtablespace" endterm="sql-createtablespace-title">
-    is executed on the primary, any new mount point needed for it must
-    be created on the primary and all standby servers before the command
-    is executed. Hardware need not be exactly the same, but experience shows
-    that maintaining two identical systems is easier than maintaining two
-    dissimilar ones over the lifetime of the application and system.
-    In any case the hardware architecture must be the same &mdash; shipping
-    from, say, a 32-bit to a 64-bit system will not work.
-   </para>
-
-   <para>
-    In general, log shipping between servers running different major
-    <productname>PostgreSQL</> release
-    levels is not possible. It is the policy of the PostgreSQL Global
-    Development Group not to make changes to disk formats during minor release
-    upgrades, so it is likely that running different minor release levels
-    on primary and standby servers will work successfully. However, no
-    formal support for that is offered and you are advised to keep primary
-    and standby servers at the same release level as much as possible.
-    When updating to a new minor release, the safest policy is to update
-    the standby servers first &mdash; a new minor release is more likely
-    to be able to read WAL files from a previous minor release than vice
-    versa.
-   </para>
-
-   <para>
-    There is no special mode required to enable a standby server. The
-    operations that occur on both primary and standby servers are
-    normal continuous archiving and recovery tasks. The only point of
-    contact between the two database servers is the archive of WAL files
-    that both share: primary writing to the archive, standby reading from
-    the archive. Care must be taken to ensure that WAL archives from separate
-    primary servers do not become mixed together or confused. The archive
-    need not be large if it is only required for standby operation.
-   </para>
-
-   <para>
-    The magic that makes the two loosely coupled servers work together is
-    simply a <varname>restore_command</> used on the standby that,
-    when asked for the next WAL file, waits for it to become available from
-    the primary. The <varname>restore_command</> is specified in the
-    <filename>recovery.conf</> file on the standby server. Normal recovery
-    processing would request a file from the WAL archive, reporting failure
-    if the file was unavailable.  For standby processing it is normal for
-    the next WAL file to be unavailable, so we must be patient and wait for
-    it to appear. For files ending in <literal>.backup</> or
-    <literal>.history</> there is no need to wait, and a non-zero return
-    code must be returned. A waiting <varname>restore_command</> can be
-    written as a custom script that loops after polling for the existence of
-    the next WAL file. There must also be some way to trigger failover, which
-    should interrupt the <varname>restore_command</>, break the loop and
-    return a file-not-found error to the standby server. This ends recovery
-    and the standby will then come up as a normal server.
-   </para>
-
-   <para>
-    Pseudocode for a suitable <varname>restore_command</> is:
-<programlisting>
-triggered = false;
-while (!NextWALFileReady() &amp;&amp; !triggered)
-{
-    sleep(100000L);         /* wait for ~0.1 sec */
-    if (CheckForExternalTrigger())
-        triggered = true;
-}
-if (!triggered)
-        CopyWALFileForRecovery();
-</programlisting>
-   </para>
-
-   <para>
-    A working example of a waiting <varname>restore_command</> is provided
-    as a <filename>contrib</> module named <application>pg_standby</>. It
-    should be used as a reference on how to correctly implement the logic
-    described above. It can also be extended as needed to support specific
-    configurations and environments.
-   </para>
-
-   <para>
-    <productname>PostgreSQL</productname> does not provide the system
-    software required to identify a failure on the primary and notify
-    the standby database server.  Many such tools exist and are well
-    integrated with the operating system facilities required for
-    successful failover, such as IP address migration.
-   </para>
-
-   <para>
-    The method for triggering failover is an important part of planning
-    and design. One potential option is the <varname>restore_command</>
-    command.  It is executed once for each WAL file, but the process
-    running the <varname>restore_command</> is created and dies for
-    each file, so there is no daemon or server process, and we cannot
-    use signals or a signal handler. Therefore, the
-    <varname>restore_command</> is not suitable to trigger failover.
-    It is possible to use a simple timeout facility, especially if
-    used in conjunction with a known <varname>archive_timeout</>
-    setting on the primary. However, this is somewhat error prone
-    since a network problem or busy primary server might be sufficient
-    to initiate failover. A notification mechanism such as the explicit
-    creation of a trigger file is ideal, if this can be arranged.
-   </para>
-
-   <para>
-    The size of the WAL archive can be minimized by using the <literal>%r</>
-    option of the <varname>restore_command</>. This option specifies the
-    last archive file name that needs to be kept to allow the recovery to
-    restart correctly. This can be used to truncate the archive once
-    files are no longer required, assuming the archive is writable from the
-    standby server.
-   </para>
-  </sect2>
-
-  <sect2 id="warm-standby-config">
-   <title>Implementation</title>
-
-   <para>
-    The short procedure for configuring a standby server is as follows. For
-    full details of each step, refer to previous sections as noted.
-    <orderedlist>
-     <listitem>
-      <para>
-       Set up primary and standby systems as nearly identical as
-       possible, including two identical copies of
-       <productname>PostgreSQL</> at the same release level.
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Set up continuous archiving from the primary to a WAL archive
-       directory on the standby server. Ensure that
-       <xref linkend="guc-archive-mode">,
-       <xref linkend="guc-archive-command"> and
-       <xref linkend="guc-archive-timeout">
-       are set appropriately on the primary
-       (see <xref linkend="backup-archiving-wal">).
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Make a base backup of the primary server (see <xref
-       linkend="backup-base-backup">), and load this data onto the standby.
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Begin recovery on the standby server from the local WAL
-       archive, using a <filename>recovery.conf</> that specifies a
-       <varname>restore_command</> that waits as described
-       previously (see <xref linkend="backup-pitr-recovery">).
-      </para>
-     </listitem>
-    </orderedlist>
-   </para>
-
-   <para>
-    Recovery treats the WAL archive as read-only, so once a WAL file has
-    been copied to the standby system it can be copied to tape at the same
-    time as it is being read by the standby database server.
-    Thus, running a standby server for high availability can be performed at
-    the same time as files are stored for longer term disaster recovery
-    purposes.
-   </para>
-
-   <para>
-    For testing purposes, it is possible to run both primary and standby
-    servers on the same system. This does not provide any worthwhile
-    improvement in server robustness, nor would it be described as HA.
-   </para>
-  </sect2>
-
-  <sect2 id="warm-standby-failover">
-   <title>Failover</title>
-
-   <para>
-    If the primary server fails then the standby server should begin
-    failover procedures.
-   </para>
-
-   <para>
-    If the standby server fails then no failover need take place. If the
-    standby server can be restarted, even some time later, then the recovery
-    process can also be immediately restarted, taking advantage of
-    restartable recovery. If the standby server cannot be restarted, then a
-    full new standby server instance should be created.
-   </para>
-
-   <para>
-    If the primary server fails and the standby server becomes the
-    new primary, and then the old primary restarts, you must have
-    a mechanism for informing old primary that it is no longer the primary. This is
-    sometimes known as STONITH (Shoot The Other Node In The Head), which is
-    necessary to avoid situations where both systems think they are the
-    primary, which will lead to confusion and ultimately data loss.
-   </para>
-
-   <para>
-    Many failover systems use just two systems, the primary and the standby,
-    connected by some kind of heartbeat mechanism to continually verify the
-    connectivity between the two and the viability of the primary. It is
-    also possible to use a third system (called a witness server) to prevent
-    some cases of inappropriate failover, but the additional complexity
-    might not be worthwhile unless it is set up with sufficient care and
-    rigorous testing.
-   </para>
-
-   <para>
-    Once failover to the standby occurs, we have only a
-    single server in operation. This is known as a degenerate state.
-    The former standby is now the primary, but the former primary is down
-    and might stay down.  To return to normal operation we must
-    fully recreate a standby server,
-    either on the former primary system when it comes up, or on a third,
-    possibly new, system. Once complete the primary and standby can be
-    considered to have switched roles. Some people choose to use a third
-    server to provide backup for the new primary until the new standby
-    server is recreated,
-    though clearly this complicates the system configuration and
-    operational processes.
-   </para>
-
-   <para>
-    So, switching from primary to standby server can be fast but requires
-    some time to re-prepare the failover cluster. Regular switching from
-    primary to standby is useful, since it allows regular downtime on
-    each system for maintenance. This also serves as a test of the
-    failover mechanism to ensure that it will really work when you need it.
-    Written administration procedures are advised.
-   </para>
-  </sect2>
-
-  <sect2 id="warm-standby-record">
-   <title>Record-based Log Shipping</title>
-
-   <para>
-    <productname>PostgreSQL</productname> directly supports file-based
-    log shipping as described above. It is also possible to implement
-    record-based log shipping, though this requires custom development.
-   </para>
-
-   <para>
-    An external program can call the <function>pg_xlogfile_name_offset()</>
-    function (see <xref linkend="functions-admin">)
-    to find out the file name and the exact byte offset within it of
-    the current end of WAL.  It can then access the WAL file directly
-    and copy the data from the last known end of WAL through the current end
-    over to the standby servers.  With this approach, the window for data
-    loss is the polling cycle time of the copying program, which can be very
-    small, and there is no wasted bandwidth from forcing partially-used
-    segment files to be archived.  Note that the standby servers'
-    <varname>restore_command</> scripts can only deal with whole WAL files,
-    so the incrementally copied data is not ordinarily made available to
-    the standby servers.  It is of use only when the primary dies &mdash;
-    then the last partial WAL file is fed to the standby before allowing
-    it to come up.  The correct implementation of this process requires
-    cooperation of the <varname>restore_command</> script with the data
-    copying program.
-   </para>
-
-   <para>
-    Starting with <productname>PostgreSQL</> version 8.5, you can use
-    streaming replication (see <xref linkend="streaming-replication">) to
-    achieve the same with less effort.
-   </para>
-  </sect2>
-
-  <sect2 id="streaming-replication">
-   <title>Streaming Replication</title>
-
-   <para>
-    <productname>PostgreSQL</> includes a simple streaming replication
-    mechanism, which lets the standby server to stay more up-to-date than
-    file-based replication allows. The standby connects to the primary
-    and the primary starts streaming WAL records from where the standby
-    left off, and continues streaming them as they are generated, without
-    waiting for the WAL file to be filled. So with streaming replication,
-    <varname>archive_timeout</> does not need to be configured.
-   </para>
-
-   <para>
-    Streaming replication relies on file-based continuous archiving for
-    making the base backup and for allowing a standby to catch up if it's
-    disconnected from the primary for long enough for the primary to
-    delete old WAL files still required by the standby.
-   </para>
-
-   <sect3 id="streaming-replication-setup">
-    <title>Setup</title>
-    <para>
-     The short procedure for configuring streaming replication is as follows.
-     For full details of each step, refer to other sections as noted.
-     <orderedlist>
-      <listitem>
-       <para>
-        Set up primary and standby systems as near identically as possible,
-        including two identical copies of <productname>PostgreSQL</> at the
-        same release level.
-       </para>
-      </listitem>
-     <listitem>
-      <para>
-       Set up continuous archiving from the primary to a WAL archive located
-       in a directory on the standby server. Ensure that
-       <xref linkend="guc-archive-mode">,
-       <xref linkend="guc-archive-command"> and
-       <xref linkend="guc-archive-timeout">
-       are set appropriately on the primary
-       (see <xref linkend="backup-archiving-wal">).
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
-       Set up connections and authentication so that the standby server can
-       successfully connect to the pseudo <literal>replication</> database of
-       the primary server (see
-       <xref linkend="streaming-replication-authentication">). Ensure that
-       <xref linkend="guc-listen-addresses"> and <filename>pg_hba.conf</> are
-       configured appropriately on the primary.
-      </para>
-      <para>
-       On systems that support the keepalive socket option, setting
-       <xref linkend="guc-tcp-keepalives-idle">,
-       <xref linkend="guc-tcp-keepalives-interval"> and
-       <xref linkend="guc-tcp-keepalives-count"> helps you to find the
-       troubles with replication (e.g., the network outage or the failure of
-       the standby server) as soon as possible.
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Set the maximum number of concurrent connections from the standby servers
-       (see <xref linkend="guc-max-wal-senders"> for details).
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Enable WAL archiving in the primary server because we need to make a base
-       backup of it later (see <xref linkend="guc-archive-mode"> and
-       <xref linkend="guc-archive-command"> for details).
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Start the <productname>PostgreSQL</> server on the primary.
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Make a base backup of the primary server (see
-       <xref linkend="backup-base-backup">), and load this data onto the
-       standby. Note that all files present in <filename>pg_xlog</>
-       and <filename>pg_xlog/archive_status</> on the <emphasis>standby</>
-       server should be removed because they might be obsolete.
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Set up WAL archiving, connections and authentication like the primary
-       server, because the standby server might work as a primary server after
-       failover. Ensure that your settings are consistent with the
-       <emphasis>future</> environment after the primary and the standby
-       server are interchanged by failover. If you're setting up the standby
-       server for e.g reporting purposes, with no plans to fail over to it,
-       configure the standby accordingly.
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Create a recovery command file <filename>recovery.conf</> in the data
-       directory on the standby server.
-      </para>
-
-      <variablelist id="replication-config-settings" xreflabel="Replication Settings">
-       <varlistentry id="standby-mode" xreflabel="standby_mode">
-        <term><varname>standby_mode</varname> (<type>boolean</type>)</term>
-        <listitem>
-         <para>
-          Specifies whether to start the <productname>PostgreSQL</> server as
-          a standby. If this parameter is <literal>on</>, the streaming
-          replication is enabled and the standby server will try to connect
-          to the primary to receive and apply WAL records continuously. The
-          default is <literal>off</>, which allows only an archive recovery
-          without replication. So, streaming replication requires this
-          parameter to be explicitly set to <literal>on</>.
-         </para>
-        </listitem>
-       </varlistentry>
-       <varlistentry id="primary-conninfo" xreflabel="primary_conninfo">
-        <term><varname>primary_conninfo</varname> (<type>string</type>)</term>
-        <listitem>
-         <para>
-          Specifies a connection string which is used for the standby server
-          to connect with the primary. This string is in the same format as
-          described in <xref linkend="libpq-connect">. If any option is
-          unspecified in this string, then the corresponding environment
-          variable (see <xref linkend="libpq-envars">) is checked. If the
-          environment variable is not set either, then the indicated built-in
-          defaults are used.
-         </para>
-         <para>
-          The built-in replication requires that a host name (or host address)
-          or port number which the primary server listens on should be
-          specified in this string, respectively. Also ensure that a role with
-          the <literal>SUPERUSER</> and <literal>LOGIN</> privileges on the
-          primary is set (see
-          <xref linkend="streaming-replication-authentication">). Note that
-          the password needs to be set if the primary demands password
-          authentication.
-         </para>
-        </listitem>
-       </varlistentry>
-       <varlistentry id="trigger-file" xreflabel="trigger_file">
-        <term><varname>trigger_file</varname> (<type>string</type>)</term>
-        <listitem>
-         <para>
-          Specifies a trigger file whose presence activates the standby.
-          If no trigger file is specified, the standby never exits
-          recovery.
-         </para>
-        </listitem>
-       </varlistentry>
-      </variablelist>
-     </listitem>
-     <listitem>
-      <para>
-       Start the <productname>PostgreSQL</> server on the standby. The standby
-       server will go into recovery mode and proceeds to receive WAL records
-       from the primary and apply them continuously.
-      </para>
-     </listitem>
-     </orderedlist>
-    </para>
-   </sect3>
-   <sect3 id="streaming-replication-authentication">
-    <title>Authentication</title>
-    <para>
-     It's very important that the access privilege for replication are set
-     properly so that only trusted users can read the WAL stream, because it's
-     easy to extract serious information from it.
-    </para>
-    <para>
-     Only superuser is allowed to connect to the primary as the replication
-     standby. So a role with the <literal>SUPERUSER</> and <literal>LOGIN</>
-     privileges needs to be created in the primary.
-    </para>
-    <para>
-     Client authentication for replication is controlled by the
-     <filename>pg_hba.conf</> record specifying <literal>replication</> in the
-     <replaceable>database</> field. For example, if the standby is running on
-     host IP <literal>192.168.1.100</> and the superuser's name for replication
-     is <literal>foo</>, the administrator can add the following line to the
-     <filename>pg_hba.conf</> file on the primary.
-
-<programlisting>
-# Allow the user "foo" from host 192.168.1.100 to connect to the primary
-# as a replication standby if the user's password is correctly supplied.
-#
-# TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
-host    replication     foo             192.168.1.100/32        md5
-</programlisting>
-    </para>
-    <para>
-     The host name and port number of the primary, user name to connect as,
-     and password are specified in the <filename>recovery.conf</> file or
-     the corresponding environment variable on the standby.
-     For example, if the primary is running on host IP <literal>192.168.1.50</>,
-     port <literal>5432</literal>, the superuser's name for replication is
-     <literal>foo</>, and the password is <literal>foopass</>, the administrator
-     can add the following line to the <filename>recovery.conf</> file on the
-     standby.
-
-<programlisting>
-# The standby connects to the primary that is running on host 192.168.1.50
-# and port 5432 as the user "foo" whose password is "foopass".
-primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
-</programlisting>
-    </para>
-   </sect3>
-  </sect2>
-
-  <sect2 id="backup-incremental-updated">
-   <title>Incrementally Updated Backups</title>
-
-  <indexterm zone="backup">
-   <primary>incrementally updated backups</primary>
-  </indexterm>
-
-  <indexterm zone="backup">
-   <primary>change accumulation</primary>
-  </indexterm>
-
-   <para>
-    In a warm standby configuration, it is possible to offload the expense of
-    taking periodic base backups from the primary server; instead base backups
-    can be made by backing
-    up a standby server's files.  This concept is generally known as
-    incrementally updated backups, log change accumulation, or more simply,
-    change accumulation.
-   </para>
-
-   <para>
-    If we take a file system backup of the standby server's data
-    directory while it is processing
-    logs shipped from the primary, we will be able to reload that backup and
-    restart the standby's recovery process from the last restart point.
-    We no longer need to keep WAL files from before the standby's restart point.
-    If we need to recover, it will be faster to recover from the incrementally
-    updated backup than from the original base backup.
-   </para>
-
-   <para>
-    Since the standby server is not <quote>live</>, it is not possible to
-    use <function>pg_start_backup()</> and <function>pg_stop_backup()</>
-    to manage the backup process; it will be up to you to determine how
-    far back you need to keep WAL segment files to have a recoverable
-    backup.  You can do this by running <application>pg_controldata</>
-    on the standby server to inspect the control file and determine the
-    current checkpoint WAL location, or by using the
-    <varname>log_checkpoints</> option to print values to the standby's
-    server log.
-   </para>
-  </sect2>
- </sect1>
-
- <sect1 id="hot-standby">
-  <title>Hot Standby</title>
-
-  <indexterm zone="backup">
-   <primary>Hot Standby</primary>
-  </indexterm>
-
-   <para>
-    Hot Standby is the term used to describe the ability to connect to
-    the server and run queries while the server is in archive recovery. This
-    is useful for both log shipping replication and for restoring a backup
-    to an exact state with great precision.
-    The term Hot Standby also refers to the ability of the server to move
-    from recovery through to normal running while users continue running
-    queries and/or continue their connections.
-   </para>
-
-   <para>
-    Running queries in recovery is in many ways the same as normal running
-    though there are a large number of usage and administrative points
-    to note.
-   </para>
-
-  <sect2 id="hot-standby-users">
-   <title>User's Overview</title>
-
-   <para>
-    Users can connect to the database while the server is in recovery
-    and perform read-only queries. Read-only access to catalogs and views
-    will also occur as normal.
-   </para>
-
-   <para>
-    The data on the standby takes some time to arrive from the primary server
-    so there will be a measurable delay between primary and standby. Running the
-    same query nearly simultaneously on both primary and standby might therefore
-    return differing results. We say that data on the standby is eventually
-    consistent with the primary.
-    Queries executed on the standby will be correct with regard to the transactions
-    that had been recovered at the start of the query, or start of first statement,
-    in the case of serializable transactions. In comparison with the primary,
-    the standby returns query results that could have been obtained on the primary
-    at some exact moment in the past.
-   </para>
-
-   <para>
-    When a transaction is started in recovery, the parameter
-    <varname>transaction_read_only</> will be forced to be true, regardless of the
-    <varname>default_transaction_read_only</> setting in <filename>postgresql.conf</>.
-    It can't be manually set to false either. As a result, all transactions
-    started during recovery will be limited to read-only actions only. In all
-    other ways, connected sessions will appear identical to sessions
-    initiated during normal processing mode. There are no special commands
-    required to initiate a connection at this time, so all interfaces
-    work normally without change. After recovery finishes, the session
-    will allow normal read-write transactions at the start of the next
-    transaction, if these are requested.
-   </para>
-
-   <para>
-    Read-only here means "no writes to the permanent database tables".
-    There are no problems with queries that make use of transient sort and
-    work files.
-   </para>
-
-   <para>
-    The following actions are allowed
-
-    <itemizedlist>
-     <listitem>
-      <para>
-       Query access - SELECT, COPY TO including views and SELECT RULEs
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Cursor commands - DECLARE, FETCH, CLOSE,
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Parameters - SHOW, SET, RESET
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Transaction management commands
-        <itemizedlist>
-         <listitem>
-          <para>
-           BEGIN, END, ABORT, START TRANSACTION
-          </para>
-         </listitem>
-         <listitem>
-          <para>
-           SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
-          </para>
-         </listitem>
-         <listitem>
-          <para>
-           EXCEPTION blocks and other internal subtransactions
-          </para>
-         </listitem>
-        </itemizedlist>
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       LOCK TABLE, though only when explicitly in one of these modes:
-       ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Plugins and extensions - LOAD
-      </para>
-     </listitem>
-    </itemizedlist>
-   </para>
-
-   <para>
-    These actions produce error messages
-
-    <itemizedlist>
-     <listitem>
-      <para>
-       Data Manipulation Language (DML) - INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE.
-       Note that there are no allowed actions that result in a trigger
-       being executed during recovery.
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Data Definition Language (DDL) - CREATE, DROP, ALTER, COMMENT.
-       This also applies to temporary tables currently because currently their
-       definition causes writes to catalog tables.
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       SELECT ... FOR SHARE | UPDATE which cause row locks to be written
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       RULEs on SELECT statements that generate DML commands.
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE.
-       LOCK TABLE that explicitly requests a mode higher than ROW EXCLUSIVE MODE.
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Transaction management commands that explicitly set non-read only state
-        <itemizedlist>
-         <listitem>
-          <para>
-            BEGIN READ WRITE,
-            START TRANSACTION READ WRITE
-          </para>
-         </listitem>
-         <listitem>
-          <para>
-            SET TRANSACTION READ WRITE,
-            SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
-          </para>
-         </listitem>
-         <listitem>
-          <para>
-           SET transaction_read_only = off
-          </para>
-         </listitem>
-        </itemizedlist>
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       Two-phase commit commands - PREPARE TRANSACTION, COMMIT PREPARED,
-       ROLLBACK PREPARED because even read-only transactions need to write
-       WAL in the prepare phase (the first phase of two phase commit).
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       sequence update - nextval()
-      </para>
-     </listitem>
-     <listitem>
-      <para>
-       LISTEN, UNLISTEN, NOTIFY since they currently write to system tables
-      </para>
-     </listitem>
-    </itemizedlist>
-   </para>
-
-   <para>
-    Note that current behaviour of read only transactions when not in
-    recovery is to allow the last two actions, so there are small and
-    subtle differences in behaviour between read-only transactions
-    run on standby and during normal running.
-    It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
-    temporary tables may be lifted in a future release, if their internal
-    implementation is altered to make this possible.
-   </para>
-
-   <para>
-    If failover or switchover occurs the database will switch to normal
-    processing mode. Sessions will remain connected while the server
-    changes mode. Current transactions will continue, though will remain
-    read-only. After recovery is complete, it will be possible to initiate
-    read-write transactions.
-   </para>
-
-   <para>
-    Users will be able to tell whether their session is read-only by
-    issuing SHOW transaction_read_only.  In addition a set of
-    functions <xref linkend="functions-recovery-info-table"> allow users to
-    access information about Hot Standby. These allow you to write
-    functions that are aware of the current state of the database. These
-    can be used to monitor the progress of recovery, or to allow you to
-    write complex programs that restore the database to particular states.
-   </para>
-
-   <para>
-    In recovery, transactions will not be permitted to take any table lock
-    higher than RowExclusiveLock. In addition, transactions may never assign
-    a TransactionId and may never write WAL.
-    Any <command>LOCK TABLE</> command that runs on the standby and requests
-    a specific lock mode higher than ROW EXCLUSIVE MODE will be rejected.
-   </para>
-
-   <para>
-    In general queries will not experience lock conflicts with the database
-    changes made by recovery. This is becase recovery follows normal
-    concurrency control mechanisms, known as <acronym>MVCC</>. There are
-    some types of change that will cause conflicts, covered in the following
-    section.
-   </para>
-  </sect2>
-
-  <sect2 id="hot-standby-conflict">
-   <title>Handling query conflicts</title>
-
-   <para>
-    The primary and standby nodes are in many ways loosely connected. Actions
-    on the primary will have an effect on the standby. As a result, there is
-    potential for negative interactions or conflicts between them. The easiest
-    conflict to understand is performance: if a huge data load is taking place
-    on the primary then this will generate a similar stream of WAL records on the
-    standby, so standby queries may contend for system resources, such as I/O.
-   </para>
-
-   <para>
-    There are also additional types of conflict that can occur with Hot Standby.
-    These conflicts are <emphasis>hard conflicts</> in the sense that we may
-    need to cancel queries and in some cases disconnect sessions to resolve them.
-    The user is provided with a number of optional ways to handle these
-    conflicts, though we must first understand the possible reasons behind a conflict.
-
-      <itemizedlist>
-       <listitem>
-        <para>
-         Access Exclusive Locks from primary node, including both explicit
-         LOCK commands and various kinds of DDL action
-        </para>
-       </listitem>
-       <listitem>
-        <para>
-         Dropping tablespaces on the primary while standby queries are using
-         those tablespaces for temporary work files (work_mem overflow)
-        </para>
-       </listitem>
-       <listitem>
-        <para>
-         Dropping databases on the primary while users are connected to that
-         database on the standby.
-        </para>
-       </listitem>
-       <listitem>
-        <para>
-         Waiting to acquire buffer cleanup locks
-        </para>
-       </listitem>
-       <listitem>
-        <para>
-         Early cleanup of data still visible to the current query's snapshot
-        </para>
-       </listitem>
-      </itemizedlist>
-   </para>
-
-   <para>
-    Some WAL redo actions will be for DDL actions. These DDL actions are
-    repeating actions that have already committed on the primary node, so
-    they must not fail on the standby node. These DDL locks take priority
-    and will automatically *cancel* any read-only transactions that get in
-    their way, after a grace period. This is similar to the possibility of
-    being canceled by the deadlock detector, but in this case the standby
-    process always wins, since the replayed actions must not fail. This
-    also ensures that replication doesn't fall behind while we wait for a
-    query to complete. Again, we assume that the standby is there for high
-    availability purposes primarily.
-   </para>
-
-   <para>
-    An example of the above would be an Administrator on Primary server
-    runs a <command>DROP TABLE</> on a table that's currently being queried
-    in the standby server.
-    Clearly the query cannot continue if we let the <command>DROP TABLE</>
-    proceed. If this situation occurred on the primary, the <command>DROP TABLE</>
-    would wait until the query has finished. When the query is on the standby
-    and the <command>DROP TABLE</> is on the primary, the primary doesn't have
-    information about which queries are running on the standby and so the query
-    does not wait on the primary. The WAL change records come through to the
-    standby while the standby query is still running, causing a conflict.
-   </para>
-
-   <para>
-    The most common reason for conflict between standby queries and WAL redo is
-    "early cleanup". Normally, <productname>PostgreSQL</> allows cleanup of old
-    row versions when there are no users who may need to see them to ensure correct
-    visibility of data (the heart of MVCC). If there is a standby query that has
-    been running for longer than any query on the primary then it is possible
-    for old row versions to be removed by either a vacuum or HOT. This will
-    then generate WAL records that, if applied, would remove data on the
-    standby that might *potentially* be required by the standby query.
-    In more technical language, the primary's xmin horizon is later than
-    the standby's xmin horizon, allowing dead rows to be removed.
-   </para>
-
-   <para>
-    Experienced users should note that both row version cleanup and row version
-    freezing will potentially conflict with recovery queries. Running a
-    manual <command>VACUUM FREEZE</> is likely to cause conflicts even on tables
-    with no updated or deleted rows.
-   </para>
-
-   <para>
-    We have a number of choices for resolving query conflicts.  The default
-    is that we wait and hope the query completes. The server will wait
-    automatically until the lag between primary and standby is at most
-    <varname>max_standby_delay</> seconds. Once that grace period expires,
-    we take one of the following actions:
-
-      <itemizedlist>
-       <listitem>
-        <para>
-         If the conflict is caused by a lock, we cancel the conflicting standby
-         transaction immediately. If the transaction is idle-in-transaction
-         then currently we abort the session instead, though this may change
-         in the future.
-        </para>
-       </listitem>
-
-       <listitem>
-        <para>
-         If the conflict is caused by cleanup records we tell the standby query
-         that a conflict has occurred and that it must cancel itself to avoid the
-         risk that it silently fails to read relevant data because
-         that data has been removed. (This is regrettably very similar to the
-         much feared and iconic error message "snapshot too old"). Some cleanup
-         records only cause conflict with older queries, though some types of
-         cleanup record affect all queries.
-        </para>
-
-        <para>
-         If cancellation does occur, the query and/or transaction can always
-         be re-executed. The error is dynamic and will not necessarily occur
-         the same way if the query is executed again.
-        </para>
-       </listitem>
-      </itemizedlist>
-   </para>
-
-   <para>
-    <varname>max_standby_delay</> is set in <filename>postgresql.conf</>.
-    The parameter applies to the server as a whole so if the delay is all used
-    up by a single query then there may be little or no waiting for queries that
-    follow immediately, though they will have benefited equally from the initial
-    waiting period. The server may take time to catch up again before the grace
-    period is available again, though if there is a heavy and constant stream
-    of conflicts it may seldom catch up fully.
-   </para>
-
-   <para>
-    Users should be clear that tables that are regularly and heavily updated on
-    primary server will quickly cause cancellation of longer running queries on
-    the standby. In those cases <varname>max_standby_delay</> can be
-    considered somewhat but not exactly the same as setting
-    <varname>statement_timeout</>.
-    </para>
-
-   <para>
-    Other remedial actions exist if the number of cancellations is unacceptable.
-    The first option is to connect to primary server and keep a query active
-    for as long as we need to run queries on the standby. This guarantees that
-    a WAL cleanup record is never generated and we don't ever get query
-    conflicts as described above. This could be done using contrib/dblink
-    and pg_sleep(), or via other mechanisms. If you do this, you should note
-    that this will delay cleanup of dead rows by vacuum or HOT and many
-    people may find this undesirable. However, we should remember that
-    primary and standby nodes are linked via the WAL, so this situation is no
-    different to the case where we ran the query on the primary node itself
-    except we have the benefit of off-loading the execution onto the standby.
-   </para>
-
-   <para>
-    It is also possible to set <varname>vacuum_defer_cleanup_age</> on the primary
-    to defer the cleanup of records by autovacuum, vacuum and HOT. This may allow
-    more time for queries to execute before they are cancelled on the standby,
-    without the need for setting a high <varname>max_standby_delay</>.
-   </para>
-
-   <para>
-    Three-way deadlocks are possible between AccessExclusiveLocks arriving from
-    the primary, cleanup WAL records that require buffer cleanup locks and
-    user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks
-    are resolved by time-out when we exceed <varname>max_standby_delay</>.
-   </para>
-
-   <para>
-    Dropping tablespaces or databases is discussed in the administrator's
-    section since they are not typical user situations.
-   </para>
-  </sect2>
-
-  <sect2 id="hot-standby-admin">
-   <title>Administrator's Overview</title>
-
-   <para>
-    If there is a <filename>recovery.conf</> file present the server will start
-    in Hot Standby mode by default, though <varname>recovery_connections</> can
-    be disabled via <filename>postgresql.conf</>, if required. The server may take
-    some time to enable recovery connections since the server must first complete
-    sufficient recovery to provide a consistent state against which queries
-    can run before enabling read only connections. Look for these messages
-    in the server logs
-
-<programlisting>
-LOG:  initializing recovery connections
-
-... then some time later ...
-
-LOG:  consistent recovery state reached
-LOG:  database system is ready to accept read only connections
-</programlisting>
-
-    Consistency information is recorded once per checkpoint on the primary, as long
-    as <varname>recovery_connections</> is enabled (on the primary). If this parameter
-    is disabled, it will not be possible to enable recovery connections on the standby.
-    The consistent state can also be delayed in the presence of both of these conditions
-
-      <itemizedlist>
-       <listitem>
-        <para>
-         a write transaction has more than 64 subtransactions
-        </para>
-       </listitem>
-       <listitem>
-        <para>
-         very long-lived write transactions
-        </para>
-       </listitem>
-      </itemizedlist>
-
-    If you are running file-based log shipping ("warm standby"), you may need
-    to wait until the next WAL file arrives, which could be as long as the
-    <varname>archive_timeout</> setting on the primary.
-   </para>
-
-   <para>
-    The setting of some parameters on the standby will need reconfiguration
-    if they have been changed on the primary. The value on the standby must
-    be equal to or greater than the value on the primary. If these parameters
-    are not set high enough then the standby will not be able to track work
-    correctly from recovering transactions. If these values are set too low the
-    the server will halt. Higher values can then be supplied and the server
-    restarted to begin recovery again.
-
-      <itemizedlist>
-       <listitem>
-        <para>
-         <varname>max_connections</>
-        </para>
-       </listitem>
-       <listitem>
-        <para>
-         <varname>max_prepared_transactions</>
-        </para>
-       </listitem>
-       <listitem>
-        <para>
-         <varname>max_locks_per_transaction</>
-        </para>
-       </listitem>
-      </itemizedlist>
-   </para>
-
-   <para>
-    It is important that the administrator consider the appropriate setting
-    of <varname>max_standby_delay</>, set in <filename>postgresql.conf</>.
-    There is no optimal setting and should be set according to business
-    priorities. For example if the server is primarily tasked as a High
-    Availability server, then you may wish to lower
-    <varname>max_standby_delay</> or even set it to zero, though that is a
-    very aggressive setting. If the standby server is tasked as an additional
-    server for decision support queries then it may be acceptable to set this
-    to a value of many hours (in seconds).
-   </para>
-
-   <para>
-    Transaction status "hint bits" written on primary are not WAL-logged,
-    so data on standby will likely re-write the hints again on the standby.
-    Thus the main database blocks will produce write I/Os even though
-    all users are read-only; no changes have occurred to the data values
-    themselves.  Users will be able to write large sort temp files and
-    re-generate relcache info files, so there is no part of the database
-    that is truly read-only during hot standby mode. There is no restriction
-    on the use of set returning functions, or other users of tuplestore/tuplesort
-    code. Note also that writes to remote databases will still be possible,
-    even though the transaction is read-only locally.
-   </para>
-
-   <para>
-    The following types of administrator command are not accepted
-    during recovery mode
-
-      <itemizedlist>
-       <listitem>
-        <para>
-         Data Definition Language (DDL) - e.g. CREATE INDEX
-        </para>
-       </listitem>
-       <listitem>
-        <para>
-         Privilege and Ownership - GRANT, REVOKE, REASSIGN
-        </para>
-       </listitem>
-       <listitem>
-        <para>
-         Maintenance commands - ANALYZE, VACUUM, CLUSTER, REINDEX
-        </para>
-       </listitem>
-      </itemizedlist>
-   </para>
-
-   <para>
-    Note again that some of these commands are actually allowed during
-    "read only" mode transactions on the primary.
-   </para>
-
-   <para>
-    As a result, you cannot create additional indexes that exist solely
-    on the standby, nor can statistics that exist solely on the standby.
-    If these administrator commands are needed they should be executed
-    on the primary so that the changes will propagate through to the
-    standby.
-   </para>
-
-   <para>
-    <function>pg_cancel_backend()</> will work on user backends, but not the
-    Startup process, which performs recovery. pg_stat_activity does not
-    show an entry for the Startup process, nor do recovering transactions
-    show as active. As a result, pg_prepared_xacts is always empty during
-    recovery. If you wish to resolve in-doubt prepared transactions
-    then look at pg_prepared_xacts on the primary and issue commands to
-    resolve those transactions there.
-   </para>
-
-   <para>
-    pg_locks will show locks held by backends as normal. pg_locks also shows
-    a virtual transaction managed by the Startup process that owns all
-    AccessExclusiveLocks held by transactions being replayed by recovery.
-    Note that Startup process does not acquire locks to
-    make database changes and thus locks other than AccessExclusiveLocks
-    do not show in pg_locks for the Startup process, they are just presumed
-    to exist.
-   </para>
-
-   <para>
-    <productname>check_pgsql</> will work, but it is very simple.
-    <productname>check_postgres</> will also work, though many some actions
-    could give different or confusing results.
-    e.g. last vacuum time will not be maintained for example, since no
-    vacuum occurs on the standby (though vacuums running on the primary do
-    send their changes to the standby).
-   </para>
-
-   <para>
-    WAL file control commands will not work during recovery
-    e.g. <function>pg_start_backup</>, <function>pg_switch_xlog</> etc..
-   </para>
-
-   <para>
-    Dynamically loadable modules work, including pg_stat_statements.
-   </para>
-
-   <para>
-    Advisory locks work normally in recovery, including deadlock detection.
-    Note that advisory locks are never WAL logged, so it is not possible for
-    an advisory lock on either the primary or the standby to conflict with WAL
-    replay. Nor is it possible to acquire an advisory lock on the primary
-    and have it initiate a similar advisory lock on the standby. Advisory
-    locks relate only to a single server on which they are acquired.
-   </para>
-
-   <para>
-    Trigger-based replication systems such as <productname>Slony</>,
-    <productname>Londiste</> and <productname>Bucardo</> won't run on the
-    standby at all, though they will run happily on the primary server as
-    long as the changes are not sent to standby servers to be applied.
-    WAL replay is not trigger-based so you cannot relay from the
-    standby to any system that requires additional database writes or
-    relies on the use of triggers.
-   </para>
-
-   <para>
-    New oids cannot be assigned, though some <acronym>UUID</> generators may still
-    work as long as they do not rely on writing new status to the database.
-   </para>
-
-   <para>
-    Currently, temp table creation is not allowed during read only
-    transactions, so in some cases existing scripts will not run correctly.
-    It is possible we may relax that restriction in a later release. This is
-    both a SQL Standard compliance issue and a technical issue.
-   </para>
-
-   <para>
-    <command>DROP TABLESPACE</> can only succeed if the tablespace is empty.
-    Some standby users may be actively using the tablespace via their
-    <varname>temp_tablespaces</> parameter. If there are temp files in the
-    tablespace we currently cancel all active queries to ensure that temp
-    files are removed, so that we can remove the tablespace and continue with
-    WAL replay.
-   </para>
-
-   <para>
-    Running <command>DROP DATABASE</>, <command>ALTER DATABASE ... SET TABLESPACE</>,
-    or <command>ALTER DATABASE ... RENAME</> on primary will generate a log message
-    that will cause all users connected to that database on the standby to be
-    forcibly disconnected. This action occurs immediately, whatever the setting of
-    <varname>max_standby_delay</>.
-   </para>
-
-   <para>
-    In normal running, if you issue <command>DROP USER</> or <command>DROP ROLE</>
-    for a role with login capability while that user is still connected then
-    nothing happens to the connected user - they remain connected. The user cannot
-    reconnect however. This behaviour applies in recovery also, so a
-    <command>DROP USER</> on the primary does not disconnect that user on the standby.
-   </para>
-
-   <para>
-    Stats collector is active during recovery. All scans, reads, blocks,
-    index usage etc will all be recorded normally on the standby. Replayed
-    actions will not duplicate their effects on primary, so replaying an
-    insert will not increment the Inserts column of pg_stat_user_tables.
-    The stats file is deleted at start of recovery, so stats from primary
-    and standby will differ; this is considered a feature not a bug.
-   </para>
-
-   <para>
-    Autovacuum is not active during recovery, though will start normally
-    at the end of recovery.
-   </para>
-
-   <para>
-    Background writer is active during recovery and will perform
-    restartpoints (similar to checkpoints on primary) and normal block
-    cleaning activities. The <command>CHECKPOINT</> command is accepted during recovery,
-    though performs a restartpoint rather than a new checkpoint.
-   </para>
-  </sect2>
-
-  <sect2 id="hot-standby-parameters">
-   <title>Hot Standby Parameter Reference</title>
-
-   <para>
-    Various parameters have been mentioned above in the <xref linkend="hot-standby-admin">
-    and <xref linkend="hot-standby-conflict"> sections.
-   </para>
-
-   <para>
-    On the primary, parameters <varname>recovery_connections</> and
-    <varname>vacuum_defer_cleanup_age</> can be used to enable and control the
-    primary server to assist the successful configuration of Hot Standby servers.
-    <varname>max_standby_delay</> has no effect if set on the primary.
-   </para>
-
-   <para>
-    On the standby, parameters <varname>recovery_connections</> and
-    <varname>max_standby_delay</> can be used to enable and control Hot Standby.
-    standby server to assist the successful configuration of Hot Standby servers.
-    <varname>vacuum_defer_cleanup_age</> has no effect during recovery.
-   </para>
-  </sect2>
-
-  <sect2 id="hot-standby-caveats">
-   <title>Caveats</title>
-
-   <para>
-    At this writing, there are several limitations of Hot Standby.
-    These can and probably will be fixed in future releases:
-
-  <itemizedlist>
-   <listitem>
-    <para>
-     Operations on hash indexes are not presently WAL-logged, so
-     replay will not update these indexes.  Hash indexes will not be
-     used for query plans during recovery.
-    </para>
-   </listitem>
-   <listitem>
-    <para>
-     Full knowledge of running transactions is required before snapshots
-     may be taken. Transactions that take use large numbers of subtransactions
-     (currently greater than 64) will delay the start of read only
-     connections until the completion of the longest running write transaction.
-     If this situation occurs explanatory messages will be sent to server log.
-    </para>
-   </listitem>
-   <listitem>
-    <para>
-     Valid starting points for recovery connections are generated at each
-     checkpoint on the master. If the standby is shutdown while the master
-     is in a shutdown state it may not be possible to re-enter Hot Standby
-     until the primary is started up so that it generates further starting
-     points in the WAL logs. This is not considered a serious issue
-     because the standby is usually switched into the primary role while
-     the first node is taken down.
-    </para>
-   </listitem>
-   <listitem>
-    <para>
-     At the end of recovery, AccessExclusiveLocks held by prepared transactions
-     will require twice the normal number of lock table entries. If you plan
-     on running either a large number of concurrent prepared transactions
-     that normally take AccessExclusiveLocks, or you plan on having one
-     large transaction that takes many AccessExclusiveLocks then you are
-     advised to select a larger value of <varname>max_locks_per_transaction</>,
-     up to, but never more than twice the value of the parameter setting on
-     the primary server in rare extremes. You need not consider this at all if
-     your setting of <varname>max_prepared_transactions</> is <literal>0</>.
-    </para>
-   </listitem>
-  </itemizedlist>
-
-   </para>
-  </sect2>
-
- </sect1>
-
   <sect1 id="migration">
    <title>Migration Between Releases</title>
  
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml

index 0e8351ed634057602f3664e4dbf1ff220365622c..ad8f4b011354a79bd678d2a8a6ccae70f219b07a 100644 (file)
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.39 2010/02/08 13:41:06 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.40 2010/02/09 16:50:25 heikki Exp $ -->
  
  <chapter id="high-availability">
   <title>High Availability, Load Balancing, and Replication</title>
@@ -79,6 +79,9 @@
    also available.
   </para>
  
+ <sect1 id="different-replication-solutions">
+ <title>Comparison of different solutions</title>
+
   <variablelist>
  
    <varlistentry>
@@ -450,4 +453,1388 @@ protocol to make nodes agree on a serializable transactional order.
  
   </variablelist>
  
+ </sect1>
+
+ <sect1 id="warm-standby">
+  <title>File-based Log Shipping</title>
+
+  <indexterm zone="high-availability">
+   <primary>warm standby</primary>
+  </indexterm>
+
+  <indexterm zone="high-availability">
+   <primary>PITR standby</primary>
+  </indexterm>
+
+  <indexterm zone="high-availability">
+   <primary>standby server</primary>
+  </indexterm>
+
+  <indexterm zone="high-availability">
+   <primary>log shipping</primary>
+  </indexterm>
+
+  <indexterm zone="high-availability">
+   <primary>witness server</primary>
+  </indexterm>
+
+  <indexterm zone="high-availability">
+   <primary>STONITH</primary>
+  </indexterm>
+
+  <para>
+   Continuous archiving can be used to create a <firstterm>high
+   availability</> (HA) cluster configuration with one or more
+   <firstterm>standby servers</> ready to take over operations if the
+   primary server fails. This capability is widely referred to as
+   <firstterm>warm standby</> or <firstterm>log shipping</>.
+  </para>
+
+  <para>
+   The primary and standby server work together to provide this capability,
+   though the servers are only loosely coupled. The primary server operates
+   in continuous archiving mode, while each standby server operates in
+   continuous recovery mode, reading the WAL files from the primary. No
+   changes to the database tables are required to enable this capability,
+   so it offers low administration overhead compared to some other
+   replication approaches. This configuration also has relatively low
+   performance impact on the primary server.
+  </para>
+
+  <para>
+   Directly moving WAL records from one database server to another
+   is typically described as log shipping. <productname>PostgreSQL</>
+   implements file-based log shipping, which means that WAL records are
+   transferred one file (WAL segment) at a time. WAL files (16MB) can be
+   shipped easily and cheaply over any distance, whether it be to an
+   adjacent system, another system at the same site, or another system on
+   the far side of the globe. The bandwidth required for this technique
+   varies according to the transaction rate of the primary server.
+   Record-based log shipping is also possible with custom-developed
+   procedures, as discussed in <xref linkend="warm-standby-record">.
+  </para>
+
+  <para>
+   It should be noted that the log shipping is asynchronous, i.e., the WAL
+   records are shipped after transaction commit. As a result there is a
+   window for data loss should the primary server suffer a catastrophic
+   failure: transactions not yet shipped will be lost.  The length of the
+   window of data loss can be limited by use of the
+   <varname>archive_timeout</varname> parameter, which can be set as low
+   as a few seconds if required.  However such a low setting will
+   substantially increase the bandwidth required for file shipping.
+   If you need a window of less than a minute or so, it's probably better
+   to consider record-based log shipping.
+  </para>
+
+  <para>
+   The standby server is not available for access, since it is continually
+   performing recovery processing. Recovery performance is sufficiently
+   good that the standby will typically be only moments away from full
+   availability once it has been activated. As a result, we refer to this
+   capability as a warm standby configuration that offers high
+   availability. Restoring a server from an archived base backup and
+   rollforward will take considerably longer, so that technique only
+   offers a solution for disaster recovery, not high availability.
+  </para>
+
+  <sect2 id="warm-standby-planning">
+   <title>Planning</title>
+
+   <para>
+    It is usually wise to create the primary and standby servers
+    so that they are as similar as possible, at least from the
+    perspective of the database server.  In particular, the path names
+    associated with tablespaces will be passed across unmodified, so both
+    primary and standby servers must have the same mount paths for
+    tablespaces if that feature is used.  Keep in mind that if
+    <xref linkend="sql-createtablespace" endterm="sql-createtablespace-title">
+    is executed on the primary, any new mount point needed for it must
+    be created on the primary and all standby servers before the command
+    is executed. Hardware need not be exactly the same, but experience shows
+    that maintaining two identical systems is easier than maintaining two
+    dissimilar ones over the lifetime of the application and system.
+    In any case the hardware architecture must be the same &mdash; shipping
+    from, say, a 32-bit to a 64-bit system will not work.
+   </para>
+
+   <para>
+    In general, log shipping between servers running different major
+    <productname>PostgreSQL</> release
+    levels is not possible. It is the policy of the PostgreSQL Global
+    Development Group not to make changes to disk formats during minor release
+    upgrades, so it is likely that running different minor release levels
+    on primary and standby servers will work successfully. However, no
+    formal support for that is offered and you are advised to keep primary
+    and standby servers at the same release level as much as possible.
+    When updating to a new minor release, the safest policy is to update
+    the standby servers first &mdash; a new minor release is more likely
+    to be able to read WAL files from a previous minor release than vice
+    versa.
+   </para>
+
+   <para>
+    There is no special mode required to enable a standby server. The
+    operations that occur on both primary and standby servers are
+    normal continuous archiving and recovery tasks. The only point of
+    contact between the two database servers is the archive of WAL files
+    that both share: primary writing to the archive, standby reading from
+    the archive. Care must be taken to ensure that WAL archives from separate
+    primary servers do not become mixed together or confused. The archive
+    need not be large if it is only required for standby operation.
+   </para>
+
+   <para>
+    The magic that makes the two loosely coupled servers work together is
+    simply a <varname>restore_command</> used on the standby that,
+    when asked for the next WAL file, waits for it to become available from
+    the primary. The <varname>restore_command</> is specified in the
+    <filename>recovery.conf</> file on the standby server. Normal recovery
+    processing would request a file from the WAL archive, reporting failure
+    if the file was unavailable.  For standby processing it is normal for
+    the next WAL file to be unavailable, so we must be patient and wait for
+    it to appear. For files ending in <literal>.backup</> or
+    <literal>.history</> there is no need to wait, and a non-zero return
+    code must be returned. A waiting <varname>restore_command</> can be
+    written as a custom script that loops after polling for the existence of
+    the next WAL file. There must also be some way to trigger failover, which
+    should interrupt the <varname>restore_command</>, break the loop and
+    return a file-not-found error to the standby server. This ends recovery
+    and the standby will then come up as a normal server.
+   </para>
+
+   <para>
+    Pseudocode for a suitable <varname>restore_command</> is:
+<programlisting>
+triggered = false;
+while (!NextWALFileReady() &amp;&amp; !triggered)
+{
+    sleep(100000L);         /* wait for ~0.1 sec */
+    if (CheckForExternalTrigger())
+        triggered = true;
+}
+if (!triggered)
+        CopyWALFileForRecovery();
+</programlisting>
+   </para>
+
+   <para>
+    A working example of a waiting <varname>restore_command</> is provided
+    as a <filename>contrib</> module named <application>pg_standby</>. It
+    should be used as a reference on how to correctly implement the logic
+    described above. It can also be extended as needed to support specific
+    configurations and environments.
+   </para>
+
+   <para>
+    <productname>PostgreSQL</productname> does not provide the system
+    software required to identify a failure on the primary and notify
+    the standby database server.  Many such tools exist and are well
+    integrated with the operating system facilities required for
+    successful failover, such as IP address migration.
+   </para>
+
+   <para>
+    The method for triggering failover is an important part of planning
+    and design. One potential option is the <varname>restore_command</>
+    command.  It is executed once for each WAL file, but the process
+    running the <varname>restore_command</> is created and dies for
+    each file, so there is no daemon or server process, and we cannot
+    use signals or a signal handler. Therefore, the
+    <varname>restore_command</> is not suitable to trigger failover.
+    It is possible to use a simple timeout facility, especially if
+    used in conjunction with a known <varname>archive_timeout</>
+    setting on the primary. However, this is somewhat error prone
+    since a network problem or busy primary server might be sufficient
+    to initiate failover. A notification mechanism such as the explicit
+    creation of a trigger file is ideal, if this can be arranged.
+   </para>
+
+   <para>
+    The size of the WAL archive can be minimized by using the <literal>%r</>
+    option of the <varname>restore_command</>. This option specifies the
+    last archive file name that needs to be kept to allow the recovery to
+    restart correctly. This can be used to truncate the archive once
+    files are no longer required, assuming the archive is writable from the
+    standby server.
+   </para>
+  </sect2>
+
+  <sect2 id="warm-standby-config">
+   <title>Implementation</title>
+
+   <para>
+    The short procedure for configuring a standby server is as follows. For
+    full details of each step, refer to previous sections as noted.
+    <orderedlist>
+     <listitem>
+      <para>
+       Set up primary and standby systems as nearly identical as
+       possible, including two identical copies of
+       <productname>PostgreSQL</> at the same release level.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Set up continuous archiving from the primary to a WAL archive
+       directory on the standby server. Ensure that
+       <xref linkend="guc-archive-mode">,
+       <xref linkend="guc-archive-command"> and
+       <xref linkend="guc-archive-timeout">
+       are set appropriately on the primary
+       (see <xref linkend="backup-archiving-wal">).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Make a base backup of the primary server (see <xref
+       linkend="backup-base-backup">), and load this data onto the standby.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Begin recovery on the standby server from the local WAL
+       archive, using a <filename>recovery.conf</> that specifies a
+       <varname>restore_command</> that waits as described
+       previously (see <xref linkend="backup-pitr-recovery">).
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    Recovery treats the WAL archive as read-only, so once a WAL file has
+    been copied to the standby system it can be copied to tape at the same
+    time as it is being read by the standby database server.
+    Thus, running a standby server for high availability can be performed at
+    the same time as files are stored for longer term disaster recovery
+    purposes.
+   </para>
+
+   <para>
+    For testing purposes, it is possible to run both primary and standby
+    servers on the same system. This does not provide any worthwhile
+    improvement in server robustness, nor would it be described as HA.
+   </para>
+  </sect2>
+
+  <sect2 id="warm-standby-record">
+   <title>Record-based Log Shipping</title>
+
+   <para>
+    <productname>PostgreSQL</productname> directly supports file-based
+    log shipping as described above. It is also possible to implement
+    record-based log shipping, though this requires custom development.
+   </para>
+
+   <para>
+    An external program can call the <function>pg_xlogfile_name_offset()</>
+    function (see <xref linkend="functions-admin">)
+    to find out the file name and the exact byte offset within it of
+    the current end of WAL.  It can then access the WAL file directly
+    and copy the data from the last known end of WAL through the current end
+    over to the standby servers.  With this approach, the window for data
+    loss is the polling cycle time of the copying program, which can be very
+    small, and there is no wasted bandwidth from forcing partially-used
+    segment files to be archived.  Note that the standby servers'
+    <varname>restore_command</> scripts can only deal with whole WAL files,
+    so the incrementally copied data is not ordinarily made available to
+    the standby servers.  It is of use only when the primary dies &mdash;
+    then the last partial WAL file is fed to the standby before allowing
+    it to come up.  The correct implementation of this process requires
+    cooperation of the <varname>restore_command</> script with the data
+    copying program.
+   </para>
+
+   <para>
+    Starting with <productname>PostgreSQL</> version 8.5, you can use
+    streaming replication (see <xref linkend="streaming-replication">) to
+    achieve the same with less effort.
+   </para>
+  </sect2>
+ </sect1>
+
+  <sect1 id="streaming-replication">
+   <title>Streaming Replication</title>
+
+   <indexterm zone="high-availability">
+    <primary>Streaming Replication</primary>
+   </indexterm>
+
+   <para>
+    <productname>PostgreSQL</> includes a simple streaming replication
+    mechanism, which lets the standby server to stay more up-to-date than
+    file-based replication allows. The standby connects to the primary
+    and the primary starts streaming WAL records from where the standby
+    left off, and continues streaming them as they are generated, without
+    waiting for the WAL file to be filled. So with streaming replication,
+    <varname>archive_timeout</> does not need to be configured.
+   </para>
+
+   <para>
+    Streaming replication relies on file-based continuous archiving for
+    making the base backup and for allowing a standby to catch up if it's
+    disconnected from the primary for long enough for the primary to
+    delete old WAL files still required by the standby.
+   </para>
+
+   <sect2 id="streaming-replication-setup">
+    <title>Setup</title>
+    <para>
+     The short procedure for configuring streaming replication is as follows.
+     For full details of each step, refer to other sections as noted.
+     <orderedlist>
+      <listitem>
+       <para>
+        Set up primary and standby systems as near identically as possible,
+        including two identical copies of <productname>PostgreSQL</> at the
+        same release level.
+       </para>
+      </listitem>
+     <listitem>
+      <para>
+       Set up continuous archiving from the primary to a WAL archive located
+       in a directory on the standby server. Ensure that
+       <xref linkend="guc-archive-mode">,
+       <xref linkend="guc-archive-command"> and
+       <xref linkend="guc-archive-timeout">
+       are set appropriately on the primary
+       (see <xref linkend="backup-archiving-wal">).
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       Set up connections and authentication so that the standby server can
+       successfully connect to the pseudo <literal>replication</> database of
+       the primary server (see
+       <xref linkend="streaming-replication-authentication">). Ensure that
+       <xref linkend="guc-listen-addresses"> and <filename>pg_hba.conf</> are
+       configured appropriately on the primary.
+      </para>
+      <para>
+       On systems that support the keepalive socket option, setting
+       <xref linkend="guc-tcp-keepalives-idle">,
+       <xref linkend="guc-tcp-keepalives-interval"> and
+       <xref linkend="guc-tcp-keepalives-count"> helps you to find the
+       troubles with replication (e.g., the network outage or the failure of
+       the standby server) as soon as possible.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Set the maximum number of concurrent connections from the standby servers
+       (see <xref linkend="guc-max-wal-senders"> for details).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Enable WAL archiving in the primary server because we need to make a base
+       backup of it later (see <xref linkend="guc-archive-mode"> and
+       <xref linkend="guc-archive-command"> for details).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Start the <productname>PostgreSQL</> server on the primary.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Make a base backup of the primary server (see
+       <xref linkend="backup-base-backup">), and load this data onto the
+       standby. Note that all files present in <filename>pg_xlog</>
+       and <filename>pg_xlog/archive_status</> on the <emphasis>standby</>
+       server should be removed because they might be obsolete.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Set up WAL archiving, connections and authentication like the primary
+       server, because the standby server might work as a primary server after
+       failover. Ensure that your settings are consistent with the
+       <emphasis>future</> environment after the primary and the standby
+       server are interchanged by failover. If you're setting up the standby
+       server for e.g reporting purposes, with no plans to fail over to it,
+       configure the standby accordingly.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Create a recovery command file <filename>recovery.conf</> in the data
+       directory on the standby server.
+      </para>
+
+      <variablelist id="replication-config-settings" xreflabel="Replication Settings">
+       <varlistentry id="standby-mode" xreflabel="standby_mode">
+        <term><varname>standby_mode</varname> (<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether to start the <productname>PostgreSQL</> server as
+          a standby. If this parameter is <literal>on</>, the streaming
+          replication is enabled and the standby server will try to connect
+          to the primary to receive and apply WAL records continuously. The
+          default is <literal>off</>, which allows only an archive recovery
+          without replication. So, streaming replication requires this
+          parameter to be explicitly set to <literal>on</>.
+         </para>
+        </listitem>
+       </varlistentry>
+       <varlistentry id="primary-conninfo" xreflabel="primary_conninfo">
+        <term><varname>primary_conninfo</varname> (<type>string</type>)</term>
+        <listitem>
+         <para>
+          Specifies a connection string which is used for the standby server
+          to connect with the primary. This string is in the same format as
+          described in <xref linkend="libpq-connect">. If any option is
+          unspecified in this string, then the corresponding environment
+          variable (see <xref linkend="libpq-envars">) is checked. If the
+          environment variable is not set either, then the indicated built-in
+          defaults are used.
+         </para>
+         <para>
+          The built-in replication requires that a host name (or host address)
+          or port number which the primary server listens on should be
+          specified in this string, respectively. Also ensure that a role with
+          the <literal>SUPERUSER</> and <literal>LOGIN</> privileges on the
+          primary is set (see
+          <xref linkend="streaming-replication-authentication">). Note that
+          the password needs to be set if the primary demands password
+          authentication.
+         </para>
+        </listitem>
+       </varlistentry>
+       <varlistentry id="trigger-file" xreflabel="trigger_file">
+        <term><varname>trigger_file</varname> (<type>string</type>)</term>
+        <listitem>
+         <para>
+          Specifies a trigger file whose presence activates the standby.
+          If no trigger file is specified, the standby never exits
+          recovery.
+         </para>
+        </listitem>
+       </varlistentry>
+      </variablelist>
+     </listitem>
+     <listitem>
+      <para>
+       Start the <productname>PostgreSQL</> server on the standby. The standby
+       server will go into recovery mode and proceeds to receive WAL records
+       from the primary and apply them continuously.
+      </para>
+     </listitem>
+     </orderedlist>
+    </para>
+   </sect2>
+   <sect2 id="streaming-replication-authentication">
+    <title>Authentication</title>
+    <para>
+     It's very important that the access privilege for replication are set
+     properly so that only trusted users can read the WAL stream, because it's
+     easy to extract serious information from it.
+    </para>
+    <para>
+     Only superuser is allowed to connect to the primary as the replication
+     standby. So a role with the <literal>SUPERUSER</> and <literal>LOGIN</>
+     privileges needs to be created in the primary.
+    </para>
+    <para>
+     Client authentication for replication is controlled by the
+     <filename>pg_hba.conf</> record specifying <literal>replication</> in the
+     <replaceable>database</> field. For example, if the standby is running on
+     host IP <literal>192.168.1.100</> and the superuser's name for replication
+     is <literal>foo</>, the administrator can add the following line to the
+     <filename>pg_hba.conf</> file on the primary.
+
+<programlisting>
+# Allow the user "foo" from host 192.168.1.100 to connect to the primary
+# as a replication standby if the user's password is correctly supplied.
+#
+# TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
+host    replication     foo             192.168.1.100/32        md5
+</programlisting>
+    </para>
+    <para>
+     The host name and port number of the primary, user name to connect as,
+     and password are specified in the <filename>recovery.conf</> file or
+     the corresponding environment variable on the standby.
+     For example, if the primary is running on host IP <literal>192.168.1.50</>,
+     port <literal>5432</literal>, the superuser's name for replication is
+     <literal>foo</>, and the password is <literal>foopass</>, the administrator
+     can add the following line to the <filename>recovery.conf</> file on the
+     standby.
+
+<programlisting>
+# The standby connects to the primary that is running on host 192.168.1.50
+# and port 5432 as the user "foo" whose password is "foopass".
+primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
+</programlisting>
+    </para>
+   </sect2>
+  </sect1>
+
+  <sect1 id="warm-standby-failover">
+   <title>Failover</title>
+
+   <para>
+    If the primary server fails then the standby server should begin
+    failover procedures.
+   </para>
+
+   <para>
+    If the standby server fails then no failover need take place. If the
+    standby server can be restarted, even some time later, then the recovery
+    process can also be immediately restarted, taking advantage of
+    restartable recovery. If the standby server cannot be restarted, then a
+    full new standby server instance should be created.
+   </para>
+
+   <para>
+    If the primary server fails and the standby server becomes the
+    new primary, and then the old primary restarts, you must have
+    a mechanism for informing old primary that it is no longer the primary. This is
+    sometimes known as STONITH (Shoot The Other Node In The Head), which is
+    necessary to avoid situations where both systems think they are the
+    primary, which will lead to confusion and ultimately data loss.
+   </para>
+
+   <para>
+    Many failover systems use just two systems, the primary and the standby,
+    connected by some kind of heartbeat mechanism to continually verify the
+    connectivity between the two and the viability of the primary. It is
+    also possible to use a third system (called a witness server) to prevent
+    some cases of inappropriate failover, but the additional complexity
+    might not be worthwhile unless it is set up with sufficient care and
+    rigorous testing.
+   </para>
+
+   <para>
+    Once failover to the standby occurs, we have only a
+    single server in operation. This is known as a degenerate state.
+    The former standby is now the primary, but the former primary is down
+    and might stay down.  To return to normal operation we must
+    fully recreate a standby server,
+    either on the former primary system when it comes up, or on a third,
+    possibly new, system. Once complete the primary and standby can be
+    considered to have switched roles. Some people choose to use a third
+    server to provide backup for the new primary until the new standby
+    server is recreated,
+    though clearly this complicates the system configuration and
+    operational processes.
+   </para>
+
+   <para>
+    So, switching from primary to standby server can be fast but requires
+    some time to re-prepare the failover cluster. Regular switching from
+    primary to standby is useful, since it allows regular downtime on
+    each system for maintenance. This also serves as a test of the
+    failover mechanism to ensure that it will really work when you need it.
+    Written administration procedures are advised.
+   </para>
+  </sect1>
+
+ <sect1 id="hot-standby">
+  <title>Hot Standby</title>
+
+  <indexterm zone="high-availability">
+   <primary>Hot Standby</primary>
+  </indexterm>
+
+   <para>
+    Hot Standby is the term used to describe the ability to connect to
+    the server and run queries while the server is in archive recovery. This
+    is useful for both log shipping replication and for restoring a backup
+    to an exact state with great precision.
+    The term Hot Standby also refers to the ability of the server to move
+    from recovery through to normal running while users continue running
+    queries and/or continue their connections.
+   </para>
+
+   <para>
+    Running queries in recovery is in many ways the same as normal running
+    though there are a large number of usage and administrative points
+    to note.
+   </para>
+
+  <sect2 id="hot-standby-users">
+   <title>User's Overview</title>
+
+   <para>
+    Users can connect to the database while the server is in recovery
+    and perform read-only queries. Read-only access to catalogs and views
+    will also occur as normal.
+   </para>
+
+   <para>
+    The data on the standby takes some time to arrive from the primary server
+    so there will be a measurable delay between primary and standby. Running the
+    same query nearly simultaneously on both primary and standby might therefore
+    return differing results. We say that data on the standby is eventually
+    consistent with the primary.
+    Queries executed on the standby will be correct with regard to the transactions
+    that had been recovered at the start of the query, or start of first statement,
+    in the case of serializable transactions. In comparison with the primary,
+    the standby returns query results that could have been obtained on the primary
+    at some exact moment in the past.
+   </para>
+
+   <para>
+    When a transaction is started in recovery, the parameter
+    <varname>transaction_read_only</> will be forced to be true, regardless of the
+    <varname>default_transaction_read_only</> setting in <filename>postgresql.conf</>.
+    It can't be manually set to false either. As a result, all transactions
+    started during recovery will be limited to read-only actions only. In all
+    other ways, connected sessions will appear identical to sessions
+    initiated during normal processing mode. There are no special commands
+    required to initiate a connection at this time, so all interfaces
+    work normally without change. After recovery finishes, the session
+    will allow normal read-write transactions at the start of the next
+    transaction, if these are requested.
+   </para>
+
+   <para>
+    Read-only here means "no writes to the permanent database tables".
+    There are no problems with queries that make use of transient sort and
+    work files.
+   </para>
+
+   <para>
+    The following actions are allowed
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       Query access - SELECT, COPY TO including views and SELECT RULEs
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Cursor commands - DECLARE, FETCH, CLOSE,
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Parameters - SHOW, SET, RESET
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Transaction management commands
+        <itemizedlist>
+         <listitem>
+          <para>
+           BEGIN, END, ABORT, START TRANSACTION
+          </para>
+         </listitem>
+         <listitem>
+          <para>
+           SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
+          </para>
+         </listitem>
+         <listitem>
+          <para>
+           EXCEPTION blocks and other internal subtransactions
+          </para>
+         </listitem>
+        </itemizedlist>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       LOCK TABLE, though only when explicitly in one of these modes:
+       ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Plugins and extensions - LOAD
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    These actions produce error messages
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       Data Manipulation Language (DML) - INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE.
+       Note that there are no allowed actions that result in a trigger
+       being executed during recovery.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Data Definition Language (DDL) - CREATE, DROP, ALTER, COMMENT.
+       This also applies to temporary tables currently because currently their
+       definition causes writes to catalog tables.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       SELECT ... FOR SHARE | UPDATE which cause row locks to be written
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       RULEs on SELECT statements that generate DML commands.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE.
+       LOCK TABLE that explicitly requests a mode higher than ROW EXCLUSIVE MODE.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Transaction management commands that explicitly set non-read only state
+        <itemizedlist>
+         <listitem>
+          <para>
+            BEGIN READ WRITE,
+            START TRANSACTION READ WRITE
+          </para>
+         </listitem>
+         <listitem>
+          <para>
+            SET TRANSACTION READ WRITE,
+            SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
+          </para>
+         </listitem>
+         <listitem>
+          <para>
+           SET transaction_read_only = off
+          </para>
+         </listitem>
+        </itemizedlist>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Two-phase commit commands - PREPARE TRANSACTION, COMMIT PREPARED,
+       ROLLBACK PREPARED because even read-only transactions need to write
+       WAL in the prepare phase (the first phase of two phase commit).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       sequence update - nextval()
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       LISTEN, UNLISTEN, NOTIFY since they currently write to system tables
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    Note that current behaviour of read only transactions when not in
+    recovery is to allow the last two actions, so there are small and
+    subtle differences in behaviour between read-only transactions
+    run on standby and during normal running.
+    It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
+    temporary tables may be lifted in a future release, if their internal
+    implementation is altered to make this possible.
+   </para>
+
+   <para>
+    If failover or switchover occurs the database will switch to normal
+    processing mode. Sessions will remain connected while the server
+    changes mode. Current transactions will continue, though will remain
+    read-only. After recovery is complete, it will be possible to initiate
+    read-write transactions.
+   </para>
+
+   <para>
+    Users will be able to tell whether their session is read-only by
+    issuing SHOW transaction_read_only.  In addition a set of
+    functions <xref linkend="functions-recovery-info-table"> allow users to
+    access information about Hot Standby. These allow you to write
+    functions that are aware of the current state of the database. These
+    can be used to monitor the progress of recovery, or to allow you to
+    write complex programs that restore the database to particular states.
+   </para>
+
+   <para>
+    In recovery, transactions will not be permitted to take any table lock
+    higher than RowExclusiveLock. In addition, transactions may never assign
+    a TransactionId and may never write WAL.
+    Any <command>LOCK TABLE</> command that runs on the standby and requests
+    a specific lock mode higher than ROW EXCLUSIVE MODE will be rejected.
+   </para>
+
+   <para>
+    In general queries will not experience lock conflicts with the database
+    changes made by recovery. This is becase recovery follows normal
+    concurrency control mechanisms, known as <acronym>MVCC</>. There are
+    some types of change that will cause conflicts, covered in the following
+    section.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-conflict">
+   <title>Handling query conflicts</title>
+
+   <para>
+    The primary and standby nodes are in many ways loosely connected. Actions
+    on the primary will have an effect on the standby. As a result, there is
+    potential for negative interactions or conflicts between them. The easiest
+    conflict to understand is performance: if a huge data load is taking place
+    on the primary then this will generate a similar stream of WAL records on the
+    standby, so standby queries may contend for system resources, such as I/O.
+   </para>
+
+   <para>
+    There are also additional types of conflict that can occur with Hot Standby.
+    These conflicts are <emphasis>hard conflicts</> in the sense that we may
+    need to cancel queries and in some cases disconnect sessions to resolve them.
+    The user is provided with a number of optional ways to handle these
+    conflicts, though we must first understand the possible reasons behind a conflict.
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         Access Exclusive Locks from primary node, including both explicit
+         LOCK commands and various kinds of DDL action
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Dropping tablespaces on the primary while standby queries are using
+         those tablespaces for temporary work files (work_mem overflow)
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Dropping databases on the primary while users are connected to that
+         database on the standby.
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Waiting to acquire buffer cleanup locks
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Early cleanup of data still visible to the current query's snapshot
+        </para>
+       </listitem>
+      </itemizedlist>
+   </para>
+
+   <para>
+    Some WAL redo actions will be for DDL actions. These DDL actions are
+    repeating actions that have already committed on the primary node, so
+    they must not fail on the standby node. These DDL locks take priority
+    and will automatically *cancel* any read-only transactions that get in
+    their way, after a grace period. This is similar to the possibility of
+    being canceled by the deadlock detector, but in this case the standby
+    process always wins, since the replayed actions must not fail. This
+    also ensures that replication doesn't fall behind while we wait for a
+    query to complete. Again, we assume that the standby is there for high
+    availability purposes primarily.
+   </para>
+
+   <para>
+    An example of the above would be an Administrator on Primary server
+    runs a <command>DROP TABLE</> on a table that's currently being queried
+    in the standby server.
+    Clearly the query cannot continue if we let the <command>DROP TABLE</>
+    proceed. If this situation occurred on the primary, the <command>DROP TABLE</>
+    would wait until the query has finished. When the query is on the standby
+    and the <command>DROP TABLE</> is on the primary, the primary doesn't have
+    information about which queries are running on the standby and so the query
+    does not wait on the primary. The WAL change records come through to the
+    standby while the standby query is still running, causing a conflict.
+   </para>
+
+   <para>
+    The most common reason for conflict between standby queries and WAL redo is
+    "early cleanup". Normally, <productname>PostgreSQL</> allows cleanup of old
+    row versions when there are no users who may need to see them to ensure correct
+    visibility of data (the heart of MVCC). If there is a standby query that has
+    been running for longer than any query on the primary then it is possible
+    for old row versions to be removed by either a vacuum or HOT. This will
+    then generate WAL records that, if applied, would remove data on the
+    standby that might *potentially* be required by the standby query.
+    In more technical language, the primary's xmin horizon is later than
+    the standby's xmin horizon, allowing dead rows to be removed.
+   </para>
+
+   <para>
+    Experienced users should note that both row version cleanup and row version
+    freezing will potentially conflict with recovery queries. Running a
+    manual <command>VACUUM FREEZE</> is likely to cause conflicts even on tables
+    with no updated or deleted rows.
+   </para>
+
+   <para>
+    We have a number of choices for resolving query conflicts.  The default
+    is that we wait and hope the query completes. The server will wait
+    automatically until the lag between primary and standby is at most
+    <varname>max_standby_delay</> seconds. Once that grace period expires,
+    we take one of the following actions:
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         If the conflict is caused by a lock, we cancel the conflicting standby
+         transaction immediately. If the transaction is idle-in-transaction
+         then currently we abort the session instead, though this may change
+         in the future.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         If the conflict is caused by cleanup records we tell the standby query
+         that a conflict has occurred and that it must cancel itself to avoid the
+         risk that it silently fails to read relevant data because
+         that data has been removed. (This is regrettably very similar to the
+         much feared and iconic error message "snapshot too old"). Some cleanup
+         records only cause conflict with older queries, though some types of
+         cleanup record affect all queries.
+        </para>
+
+        <para>
+         If cancellation does occur, the query and/or transaction can always
+         be re-executed. The error is dynamic and will not necessarily occur
+         the same way if the query is executed again.
+        </para>
+       </listitem>
+      </itemizedlist>
+   </para>
+
+   <para>
+    <varname>max_standby_delay</> is set in <filename>postgresql.conf</>.
+    The parameter applies to the server as a whole so if the delay is all used
+    up by a single query then there may be little or no waiting for queries that
+    follow immediately, though they will have benefited equally from the initial
+    waiting period. The server may take time to catch up again before the grace
+    period is available again, though if there is a heavy and constant stream
+    of conflicts it may seldom catch up fully.
+   </para>
+
+   <para>
+    Users should be clear that tables that are regularly and heavily updated on
+    primary server will quickly cause cancellation of longer running queries on
+    the standby. In those cases <varname>max_standby_delay</> can be
+    considered somewhat but not exactly the same as setting
+    <varname>statement_timeout</>.
+    </para>
+
+   <para>
+    Other remedial actions exist if the number of cancellations is unacceptable.
+    The first option is to connect to primary server and keep a query active
+    for as long as we need to run queries on the standby. This guarantees that
+    a WAL cleanup record is never generated and we don't ever get query
+    conflicts as described above. This could be done using contrib/dblink
+    and pg_sleep(), or via other mechanisms. If you do this, you should note
+    that this will delay cleanup of dead rows by vacuum or HOT and many
+    people may find this undesirable. However, we should remember that
+    primary and standby nodes are linked via the WAL, so this situation is no
+    different to the case where we ran the query on the primary node itself
+    except we have the benefit of off-loading the execution onto the standby.
+   </para>
+
+   <para>
+    It is also possible to set <varname>vacuum_defer_cleanup_age</> on the primary
+    to defer the cleanup of records by autovacuum, vacuum and HOT. This may allow
+    more time for queries to execute before they are cancelled on the standby,
+    without the need for setting a high <varname>max_standby_delay</>.
+   </para>
+
+   <para>
+    Three-way deadlocks are possible between AccessExclusiveLocks arriving from
+    the primary, cleanup WAL records that require buffer cleanup locks and
+    user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks
+    are resolved by time-out when we exceed <varname>max_standby_delay</>.
+   </para>
+
+   <para>
+    Dropping tablespaces or databases is discussed in the administrator's
+    section since they are not typical user situations.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-admin">
+   <title>Administrator's Overview</title>
+
+   <para>
+    If there is a <filename>recovery.conf</> file present the server will start
+    in Hot Standby mode by default, though <varname>recovery_connections</> can
+    be disabled via <filename>postgresql.conf</>, if required. The server may take
+    some time to enable recovery connections since the server must first complete
+    sufficient recovery to provide a consistent state against which queries
+    can run before enabling read only connections. Look for these messages
+    in the server logs
+
+<programlisting>
+LOG:  initializing recovery connections
+
+... then some time later ...
+
+LOG:  consistent recovery state reached
+LOG:  database system is ready to accept read only connections
+</programlisting>
+
+    Consistency information is recorded once per checkpoint on the primary, as long
+    as <varname>recovery_connections</> is enabled (on the primary). If this parameter
+    is disabled, it will not be possible to enable recovery connections on the standby.
+    The consistent state can also be delayed in the presence of both of these conditions
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         a write transaction has more than 64 subtransactions
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         very long-lived write transactions
+        </para>
+       </listitem>
+      </itemizedlist>
+
+    If you are running file-based log shipping ("warm standby"), you may need
+    to wait until the next WAL file arrives, which could be as long as the
+    <varname>archive_timeout</> setting on the primary.
+   </para>
+
+   <para>
+    The setting of some parameters on the standby will need reconfiguration
+    if they have been changed on the primary. The value on the standby must
+    be equal to or greater than the value on the primary. If these parameters
+    are not set high enough then the standby will not be able to track work
+    correctly from recovering transactions. If these values are set too low the
+    the server will halt. Higher values can then be supplied and the server
+    restarted to begin recovery again.
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         <varname>max_connections</>
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         <varname>max_prepared_transactions</>
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         <varname>max_locks_per_transaction</>
+        </para>
+       </listitem>
+      </itemizedlist>
+   </para>
+
+   <para>
+    It is important that the administrator consider the appropriate setting
+    of <varname>max_standby_delay</>, set in <filename>postgresql.conf</>.
+    There is no optimal setting and should be set according to business
+    priorities. For example if the server is primarily tasked as a High
+    Availability server, then you may wish to lower
+    <varname>max_standby_delay</> or even set it to zero, though that is a
+    very aggressive setting. If the standby server is tasked as an additional
+    server for decision support queries then it may be acceptable to set this
+    to a value of many hours (in seconds).
+   </para>
+
+   <para>
+    Transaction status "hint bits" written on primary are not WAL-logged,
+    so data on standby will likely re-write the hints again on the standby.
+    Thus the main database blocks will produce write I/Os even though
+    all users are read-only; no changes have occurred to the data values
+    themselves.  Users will be able to write large sort temp files and
+    re-generate relcache info files, so there is no part of the database
+    that is truly read-only during hot standby mode. There is no restriction
+    on the use of set returning functions, or other users of tuplestore/tuplesort
+    code. Note also that writes to remote databases will still be possible,
+    even though the transaction is read-only locally.
+   </para>
+
+   <para>
+    The following types of administrator command are not accepted
+    during recovery mode
+
+      <itemizedlist>
+       <listitem>
+        <para>
+         Data Definition Language (DDL) - e.g. CREATE INDEX
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Privilege and Ownership - GRANT, REVOKE, REASSIGN
+        </para>
+       </listitem>
+       <listitem>
+        <para>
+         Maintenance commands - ANALYZE, VACUUM, CLUSTER, REINDEX
+        </para>
+       </listitem>
+      </itemizedlist>
+   </para>
+
+   <para>
+    Note again that some of these commands are actually allowed during
+    "read only" mode transactions on the primary.
+   </para>
+
+   <para>
+    As a result, you cannot create additional indexes that exist solely
+    on the standby, nor can statistics that exist solely on the standby.
+    If these administrator commands are needed they should be executed
+    on the primary so that the changes will propagate through to the
+    standby.
+   </para>
+
+   <para>
+    <function>pg_cancel_backend()</> will work on user backends, but not the
+    Startup process, which performs recovery. pg_stat_activity does not
+    show an entry for the Startup process, nor do recovering transactions
+    show as active. As a result, pg_prepared_xacts is always empty during
+    recovery. If you wish to resolve in-doubt prepared transactions
+    then look at pg_prepared_xacts on the primary and issue commands to
+    resolve those transactions there.
+   </para>
+
+   <para>
+    pg_locks will show locks held by backends as normal. pg_locks also shows
+    a virtual transaction managed by the Startup process that owns all
+    AccessExclusiveLocks held by transactions being replayed by recovery.
+    Note that Startup process does not acquire locks to
+    make database changes and thus locks other than AccessExclusiveLocks
+    do not show in pg_locks for the Startup process, they are just presumed
+    to exist.
+   </para>
+
+   <para>
+    <productname>check_pgsql</> will work, but it is very simple.
+    <productname>check_postgres</> will also work, though many some actions
+    could give different or confusing results.
+    e.g. last vacuum time will not be maintained for example, since no
+    vacuum occurs on the standby (though vacuums running on the primary do
+    send their changes to the standby).
+   </para>
+
+   <para>
+    WAL file control commands will not work during recovery
+    e.g. <function>pg_start_backup</>, <function>pg_switch_xlog</> etc..
+   </para>
+
+   <para>
+    Dynamically loadable modules work, including pg_stat_statements.
+   </para>
+
+   <para>
+    Advisory locks work normally in recovery, including deadlock detection.
+    Note that advisory locks are never WAL logged, so it is not possible for
+    an advisory lock on either the primary or the standby to conflict with WAL
+    replay. Nor is it possible to acquire an advisory lock on the primary
+    and have it initiate a similar advisory lock on the standby. Advisory
+    locks relate only to a single server on which they are acquired.
+   </para>
+
+   <para>
+    Trigger-based replication systems such as <productname>Slony</>,
+    <productname>Londiste</> and <productname>Bucardo</> won't run on the
+    standby at all, though they will run happily on the primary server as
+    long as the changes are not sent to standby servers to be applied.
+    WAL replay is not trigger-based so you cannot relay from the
+    standby to any system that requires additional database writes or
+    relies on the use of triggers.
+   </para>
+
+   <para>
+    New oids cannot be assigned, though some <acronym>UUID</> generators may still
+    work as long as they do not rely on writing new status to the database.
+   </para>
+
+   <para>
+    Currently, temp table creation is not allowed during read only
+    transactions, so in some cases existing scripts will not run correctly.
+    It is possible we may relax that restriction in a later release. This is
+    both a SQL Standard compliance issue and a technical issue.
+   </para>
+
+   <para>
+    <command>DROP TABLESPACE</> can only succeed if the tablespace is empty.
+    Some standby users may be actively using the tablespace via their
+    <varname>temp_tablespaces</> parameter. If there are temp files in the
+    tablespace we currently cancel all active queries to ensure that temp
+    files are removed, so that we can remove the tablespace and continue with
+    WAL replay.
+   </para>
+
+   <para>
+    Running <command>DROP DATABASE</>, <command>ALTER DATABASE ... SET TABLESPACE</>,
+    or <command>ALTER DATABASE ... RENAME</> on primary will generate a log message
+    that will cause all users connected to that database on the standby to be
+    forcibly disconnected. This action occurs immediately, whatever the setting of
+    <varname>max_standby_delay</>.
+   </para>
+
+   <para>
+    In normal running, if you issue <command>DROP USER</> or <command>DROP ROLE</>
+    for a role with login capability while that user is still connected then
+    nothing happens to the connected user - they remain connected. The user cannot
+    reconnect however. This behaviour applies in recovery also, so a
+    <command>DROP USER</> on the primary does not disconnect that user on the standby.
+   </para>
+
+   <para>
+    Stats collector is active during recovery. All scans, reads, blocks,
+    index usage etc will all be recorded normally on the standby. Replayed
+    actions will not duplicate their effects on primary, so replaying an
+    insert will not increment the Inserts column of pg_stat_user_tables.
+    The stats file is deleted at start of recovery, so stats from primary
+    and standby will differ; this is considered a feature not a bug.
+   </para>
+
+   <para>
+    Autovacuum is not active during recovery, though will start normally
+    at the end of recovery.
+   </para>
+
+   <para>
+    Background writer is active during recovery and will perform
+    restartpoints (similar to checkpoints on primary) and normal block
+    cleaning activities. The <command>CHECKPOINT</> command is accepted during recovery,
+    though performs a restartpoint rather than a new checkpoint.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-parameters">
+   <title>Hot Standby Parameter Reference</title>
+
+   <para>
+    Various parameters have been mentioned above in the <xref linkend="hot-standby-admin">
+    and <xref linkend="hot-standby-conflict"> sections.
+   </para>
+
+   <para>
+    On the primary, parameters <varname>recovery_connections</> and
+    <varname>vacuum_defer_cleanup_age</> can be used to enable and control the
+    primary server to assist the successful configuration of Hot Standby servers.
+    <varname>max_standby_delay</> has no effect if set on the primary.
+   </para>
+
+   <para>
+    On the standby, parameters <varname>recovery_connections</> and
+    <varname>max_standby_delay</> can be used to enable and control Hot Standby.
+    standby server to assist the successful configuration of Hot Standby servers.
+    <varname>vacuum_defer_cleanup_age</> has no effect during recovery.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-caveats">
+   <title>Caveats</title>
+
+   <para>
+    At this writing, there are several limitations of Hot Standby.
+    These can and probably will be fixed in future releases:
+
+  <itemizedlist>
+   <listitem>
+    <para>
+     Operations on hash indexes are not presently WAL-logged, so
+     replay will not update these indexes.  Hash indexes will not be
+     used for query plans during recovery.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     Full knowledge of running transactions is required before snapshots
+     may be taken. Transactions that take use large numbers of subtransactions
+     (currently greater than 64) will delay the start of read only
+     connections until the completion of the longest running write transaction.
+     If this situation occurs explanatory messages will be sent to server log.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     Valid starting points for recovery connections are generated at each
+     checkpoint on the master. If the standby is shutdown while the master
+     is in a shutdown state it may not be possible to re-enter Hot Standby
+     until the primary is started up so that it generates further starting
+     points in the WAL logs. This is not considered a serious issue
+     because the standby is usually switched into the primary role while
+     the first node is taken down.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     At the end of recovery, AccessExclusiveLocks held by prepared transactions
+     will require twice the normal number of lock table entries. If you plan
+     on running either a large number of concurrent prepared transactions
+     that normally take AccessExclusiveLocks, or you plan on having one
+     large transaction that takes many AccessExclusiveLocks then you are
+     advised to select a larger value of <varname>max_locks_per_transaction</>,
+     up to, but never more than twice the value of the parameter setting on
+     the primary server in rare extremes. You need not consider this at all if
+     your setting of <varname>max_prepared_transactions</> is <literal>0</>.
+    </para>
+   </listitem>
+  </itemizedlist>
+
+   </para>
+  </sect2>
+
+ </sect1>
+
+  <sect1 id="backup-incremental-updated">
+   <title>Incrementally Updated Backups</title>
+
+  <indexterm zone="high-availability">
+   <primary>incrementally updated backups</primary>
+  </indexterm>
+
+  <indexterm zone="high-availability">
+   <primary>change accumulation</primary>
+  </indexterm>
+
+   <para>
+    In a warm standby configuration, it is possible to offload the expense of
+    taking periodic base backups from the primary server; instead base backups
+    can be made by backing
+    up a standby server's files.  This concept is generally known as
+    incrementally updated backups, log change accumulation, or more simply,
+    change accumulation.
+   </para>
+
+   <para>
+    If we take a file system backup of the standby server's data
+    directory while it is processing
+    logs shipped from the primary, we will be able to reload that backup and
+    restart the standby's recovery process from the last restart point.
+    We no longer need to keep WAL files from before the standby's restart point.
+    If we need to recover, it will be faster to recover from the incrementally
+    updated backup than from the original base backup.
+   </para>
+
+   <para>
+    Since the standby server is not <quote>live</>, it is not possible to
+    use <function>pg_start_backup()</> and <function>pg_stop_backup()</>
+    to manage the backup process; it will be up to you to determine how
+    far back you need to keep WAL segment files to have a recoverable
+    backup.  You can do this by running <application>pg_controldata</>
+    on the standby server to inspect the control file and determine the
+    current checkpoint WAL location, or by using the
+    <varname>log_checkpoints</> option to print values to the standby's
+    server log.
+   </para>
+  </sect1>
+
  </chapter>
author	Heikki Linnakangas <heikki.linnakangas@iki.fi>
	Tue, 9 Feb 2010 16:50:25 +0000 (16:50 +0000)
committer	Heikki Linnakangas <heikki.linnakangas@iki.fi>
	Tue, 9 Feb 2010 16:50:25 +0000 (16:50 +0000)
doc/src/sgml/backup.sgml		patch \| blob \| history
doc/src/sgml/high-availability.sgml		patch \| blob \| history