-<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.84 2006/09/15 21:55:07 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.85 2006/09/15 22:02:21 momjian Exp $ -->
<chapter id="backup">
<title>Backup and Restore</title>
</sect2>
</sect1>
+ <sect1 id="warm-standby">
+ <title>Warm Standby Servers for High Availability</title>
+
+ <indexterm zone="backup">
+ <primary>Warm Standby</primary>
+ </indexterm>
+
+ <indexterm zone="backup">
+ <primary>PITR Standby</primary>
+ </indexterm>
+
+ <indexterm zone="backup">
+ <primary>Standby Server</primary>
+ </indexterm>
+
+ <indexterm zone="backup">
+ <primary>Log Shipping</primary>
+ </indexterm>
+
+ <indexterm zone="backup">
+ <primary>Witness Server</primary>
+ </indexterm>
+
+ <indexterm zone="backup">
+ <primary>STONITH</primary>
+ </indexterm>
+
+ <indexterm zone="backup">
+ <primary>High Availability</primary>
+ </indexterm>
+
+ <para>
+ Continuous Archiving can be used to create a High Availability (HA)
+ cluster configuration with one or more Standby Servers ready to take
+ over operations in the case that the Primary Server fails. This
+ capability is more widely known as Warm Standby Log Shipping.
+ </para>
+
+ <para>
+ The Primary and Standby Server work together to provide this capability,
+ though the servers are only loosely coupled. The Primary Server operates
+ in Continuous Archiving mode, while the Standby Server operates in a
+ continuous Recovery mode, reading the WAL files from the Primary. No
+ changes to the database tables are required to enable this capability,
+ so it offers a low administration overhead in comparison with other
+ replication approaches. This configuration also has a very low
+ performance impact on the Primary server.
+ </para>
+
+ <para>
+ Directly moving WAL or "log" records from one database server to another
+ is typically described as Log Shipping. PostgreSQL implements file-based
+ Log Shipping, meaning WAL records are batched one file at a time. WAL
+ files can be shipped easily and cheaply over any distance, whether it be
+ to an adjacent system, another system on the same site or another system
+ on the far side of the globe. The bandwidth required for this technique
+ varies according to the transaction rate of the Primary Server.
+ Record-based Log Shipping is also possible with custom-developed
+ procedures, discussed in a later section. Future developments are likely
+ to include options for synchronous and/or integrated record-based log
+ shipping.
+ </para>
+
+ <para>
+ It should be noted that the log shipping is asynchronous, i.e. the WAL
+ records are shipped after transaction commit. As a result there can be a
+ small window of data loss, should the Primary Server suffer a
+ catastrophic failure. The window of data loss is minimised by the use of
+ the archive_timeout parameter, which can be set as low as a few seconds
+ if required. A very low setting can increase the bandwidth requirements
+ for file shipping.
+ </para>
+
+ <para>
+ The Standby server is not available for access, since it is continually
+ performing recovery processing. Recovery performance is sufficiently
+ good that the Standby will typically be only minutes away from full
+ availability once it has been activated. As a result, we refer to this
+ capability as a Warm Standby configuration that offers High
+ Availability. Restoring a server from an archived base backup and
+ rollforward can take considerably longer and so that technique only
+ really offers a solution for Disaster Recovery, not HA.
+ </para>
+
+ <para>
+ Other mechanisms for High Availability replication are available, both
+ commercially and as open-source software.
+ </para>
+
+ <para>
+ In general, log shipping between servers running different release
+ levels will not be possible. It is the policy of the PostgreSQL Worldwide
+ Development Group not to make changes to disk formats during minor release
+ upgrades, so it is likely that running different minor release levels
+ on Primary and Standby servers will work successfully. However, no
+ formal support for that is offered and you are advised not to allow this
+ to occur over long periods.
+ </para>
+
+ <sect2 id="warm-standby-planning">
+ <title>Planning</title>
+
+ <para>
+ On the Standby server all tablespaces and paths will refer to similarly
+ named mount points, so it is important to create the Primary and Standby
+ servers so that they are as similar as possible, at least from the
+ perspective of the database server. Furthermore, any CREATE TABLESPACE
+ commands will be passed across as-is, so any new mount points must be
+ created on both servers before they are used on the Primary. Hardware
+ need not be the same, but experience shows that maintaining two
+ identical systems is easier than maintaining two dissimilar ones over
+ the whole lifetime of the application and system.
+ </para>
+
+ <para>
+ There is no special mode required to enable a Standby server. The
+ operations that occur on both Primary and Standby servers are entirely
+ normal continuous archiving and recovery tasks. The primary point of
+ contact between the two database servers is the archive of WAL files
+ that both share: Primary writing to the archive, Standby reading from
+ the archive. Care must be taken to ensure that WAL archives for separate
+ servers do not become mixed together or confused.
+ </para>
+
+ <para>
+ The magic that makes the two loosely coupled servers work together is
+ simply a restore_command that waits for the next WAL file to be archived
+ from the Primary. The restore_command is specified in the recovery.conf
+ file on the Standby Server. Normal recovery processing would request a
+ file from the WAL archive, causing an error if the file was unavailable.
+ For Standby processing it is normal for the next file to be unavailable,
+ so we must be patient and wait for it to appear. A waiting
+ restore_command can be written as a custom script that loops after
+ polling for the existence of the next WAL file. There must also be some
+ way to trigger failover, which should interrupt the restore_command,
+ break the loop and return a file not found error to the Standby Server.
+ This then ends recovery and the Standby will then come up as a normal
+ server.
+ </para>
+
+ <para>
+ Sample code for the C version of the restore_command would be be:
+<programlisting>
+triggered = false;
+while (!NextWALFileReady() && !triggered)
+{
+ sleep(100000L); // wait for ~0.1 sec
+ if (CheckForExternalTrigger())
+ triggered = true;
+}
+if (!triggered)
+ CopyWALFileForRecovery();
+</programlisting>
+ </para>
+
+ <para>
+ PostgreSQL does not provide the system software required to identify a
+ failure on the Primary and notify the Standby system and then the
+ Standby database server. Many such tools exist and are well integrated
+ with other aspects of a system failover, such as ip address migration.
+ </para>
+
+ <para>
+ Triggering failover is an important part of planning and design. The
+ restore_command is executed in full once for each WAL file. The process
+ running the restore_command is therefore created and dies for each file,
+ so there is no daemon or server process and so we cannot use signals and
+ a signal handler. A more permanent notification is required to trigger
+ the failover. It is possible to use a simple timeout facility,
+ especially if used in conjunction with a known archive_timeout setting
+ on the Primary. This is somewhat error prone since a network or busy
+ Primary server might be sufficient to initiate failover. A notification
+ mechanism such as the explicit creation of a trigger file is less error
+ prone, if this can be arranged.
+ </para>
+ </sect2>
+
+ <sect2 id="warm-standby-config">
+ <title>Implementation</title>
+
+ <para>
+ The short procedure for configuring a Standby Server is as follows. For
+ full details of each step, refer to previous sections as noted.
+ <orderedlist>
+ <listitem>
+ <para>
+ Set up Primary and Standby systems as near identically as possible,
+ including two identical copies of PostgreSQL at same release level.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Set up Continuous Archiving from the Primary to a WAL archive located
+ in a directory on the Standby Server. Ensure that both <xref
+ linkend="guc-archive-command"> and <xref linkend="guc-archive-timeout">
+ are set. (See <xref linkend="backup-archiving-wal">)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Make a Base Backup of the Primary Server. (See <xref
+ linkend="backup-base-backup">)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Begin recovery on the Standby Server from the local WAL archive,
+ using a recovery.conf that specifies a restore_command that waits as
+ described previously. (See <xref linkend="backup-pitr-recovery">)
+ </para>
+ </listitem>
+ </orderedlist>
+ </para>
+
+ <para>
+ Recovery treats the WAL Archive as read-only, so once a WAL file has
+ been copied to the Standby system it can be copied to tape at the same
+ time as it is being used by the Standby database server to recover.
+ Thus, running a Standby Server for High Availability can be performed at
+ the same time as files are stored for longer term Disaster Recovery
+ purposes.
+ </para>
+
+ <para>
+ For testing purposes, it is possible to run both Primary and Standby
+ servers on the same system. This does not provide any worthwhile
+ improvement on server robustness, nor would it be described as HA.
+ </para>
+ </sect2>
+
+ <sect2 id="warm-standby-failover">
+ <title>Failover</title>
+
+ <para>
+ If the Primary Server fails then the Standby Server should take begin
+ failover procedures.
+ </para>
+
+ <para>
+ If the Standby Server fails then no failover need take place. If the
+ Standby Server can be restarted, then the recovery process can also be
+ immediately restarted, taking advantage of Restartable Recovery.
+ </para>
+
+ <para>
+ If the Primary Server fails and then immediately restarts, you must have
+ a mechanism for informing it that it is no longer the Primary. This is
+ sometimes known as STONITH (Should the Other Node In The Head), which is
+ necessary to avoid situations where both systems think they are the
+ Primary, which can lead to confusion and ultimately data loss.
+ </para>
+
+ <para>
+ Many failover systems use just two systems, the Primary and the Standby,
+ connected by some kind of heartbeat mechanism to continually verify the
+ connectivity between the two and the viability of the Primary. It is
+ also possible to use a third system, known as a Witness Server to avoid
+ some problems of inappropriate failover, but the additional complexity
+ may not be worthwhile unless it is set-up with sufficient care and
+ rigorous testing.
+ </para>
+
+ <para>
+ At the instant that failover takes place to the Standby, we have only a
+ single server in operation. This is known as a degenerate state.
+ The former Standby is now the Primary, but the former Primary is down
+ and may stay down. We must now fully re-create a Standby server,
+ either on the former Primary system when it comes up, or on a third,
+ possibly new, system. Once complete the Primary and Standby can be
+ considered to have switched roles. Some people choose to use a third
+ server to provide additional protection across the failover interval,
+ though clearly this complicates the system configuration and
+ operational processes (and this can also act as a Witness Server).
+ </para>
+
+ <para>
+ So, switching from Primary to Standby Server can be fast, but requires
+ some time to re-prepare the failover cluster. Regular switching from
+ Primary to Standby is encouraged, since it allows the regular downtime
+ one each system required to maintain HA. This also acts as a test of the
+ failover so that it definitely works when you really need it. Written
+ administration procedures are advised.
+ </para>
+ </sect2>
+
+ <sect2 id="warm-standby-record">
+ <title>Implementing Record-based Log Shipping</title>
+
+ <para>
+ The main features for Log Shipping in this release are based around the
+ file-based Log Shipping described above. It is also possible to
+ implement record-based Log Shipping using the pg_xlogfile_name_offset()
+ function, though this requires custom development.
+ </para>
+
+ <para>
+ An external program can call pg_xlogfile_name_offset() to find out the
+ filename and the exact byte offset within it of the latest WAL pointer.
+ If the external program regularly polls the server it can find out how
+ far forward the pointer has moved. It can then access the WAL file
+ directly and copy those bytes across to a less up-to-date copy on a
+ Standby Server.
+ </para>
+ </sect2>
+ </sect1>
+
<sect1 id="migration">
<title>Migration Between Releases</title>