- <para>
- There is no special mode required to enable a standby server. The
- operations that occur on both primary and standby servers are entirely
- normal continuous archiving and recovery tasks. The only point of
- contact between the two database servers is the archive of WAL files
- that both share: primary writing to the archive, standby reading from
- the archive. Care must be taken to ensure that WAL archives for separate
- primary servers do not become mixed together or confused. The archive
- need not be large, if it is only required for the standby operation.
- </para>
-
- <para>
- The magic that makes the two loosely coupled servers work together is
- simply a <varname>restore_command</> used on the standby that waits
- for the next WAL file to become available from the primary. The
- <varname>restore_command</> is specified in the
- <filename>recovery.conf</> file on the standby server. Normal recovery
- processing would request a file from the WAL archive, reporting failure
- if the file was unavailable. For standby processing it is normal for
- the next file to be unavailable, so we must be patient and wait for
- it to appear. A waiting <varname>restore_command</> can be written as
- a custom script that loops after polling for the existence of the next
- WAL file. There must also be some way to trigger failover, which should
- interrupt the <varname>restore_command</>, break the loop and return
- a file-not-found error to the standby server. This ends recovery and
- the standby will then come up as a normal server.
- </para>
-
- <para>
- Pseudocode for a suitable <varname>restore_command</> is:
-<programlisting>
-triggered = false;
-while (!NextWALFileReady() && !triggered)
-{
- sleep(100000L); /* wait for ~0.1 sec */
- if (CheckForExternalTrigger())
- triggered = true;
-}
-if (!triggered)
- CopyWALFileForRecovery();
-</programlisting>
- </para>
-
- <para>
- A working example of a waiting <varname>restore_command</> is provided
- as a contrib module, named <application>pg_standby</>. This can be
- extended as needed to support specific configurations or environments.
- </para>
-
- <para>
- <productname>PostgreSQL</productname> does not provide the system
- software required to identify a failure on the primary and notify
- the standby system and then the standby database server. Many such
- tools exist and are well integrated with other aspects required for
- successful failover, such as IP address migration.
- </para>
-
- <para>
- The means for triggering failover is an important part of planning and
- design. The <varname>restore_command</> is executed in full once
- for each WAL file. The process running the <varname>restore_command</>
- is therefore created and dies for each file, so there is no daemon
- or server process and so we cannot use signals and a signal
- handler. A more permanent notification is required to trigger the
- failover. It is possible to use a simple timeout facility,
- especially if used in conjunction with a known
- <varname>archive_timeout</> setting on the primary. This is
- somewhat error prone since a network problem or busy primary server might
- be sufficient to initiate failover. A notification mechanism such
- as the explicit creation of a trigger file is less error prone, if
- this can be arranged.
- </para>
-
- <para>
- The size of the WAL archive can be minimized by using the <literal>%r</>
- option of the <varname>restore_command</>. This option specifies the
- last archive filename that needs to be kept to allow the recovery to
- restart correctly. This can be used to truncate the archive once
- files are no longer required, if the archive is writable from the
- standby server.
- </para>
- </sect2>
-
- <sect2 id="warm-standby-config">
- <title>Implementation</title>
-
- <para>
- The short procedure for configuring a standby server is as follows. For
- full details of each step, refer to previous sections as noted.
- <orderedlist>
- <listitem>
- <para>
- Set up primary and standby systems as near identically as
- possible, including two identical copies of
- <productname>PostgreSQL</> at the same release level.
- </para>
- </listitem>
- <listitem>
- <para>
- Set up continuous archiving from the primary to a WAL archive located
- in a directory on the standby server. Ensure that
- <xref linkend="guc-archive-mode">,
- <xref linkend="guc-archive-command"> and
- <xref linkend="guc-archive-timeout">
- are set appropriately on the primary
- (see <xref linkend="backup-archiving-wal">).
- </para>
- </listitem>
- <listitem>
- <para>
- Make a base backup of the primary server (see <xref
- linkend="backup-base-backup">), and load this data onto the standby.
- </para>
- </listitem>
- <listitem>
- <para>
- Begin recovery on the standby server from the local WAL
- archive, using a <filename>recovery.conf</> that specifies a
- <varname>restore_command</> that waits as described
- previously (see <xref linkend="backup-pitr-recovery">).
- </para>
- </listitem>
- </orderedlist>
- </para>
-
- <para>
- Recovery treats the WAL archive as read-only, so once a WAL file has
- been copied to the standby system it can be copied to tape at the same
- time as it is being read by the standby database server.
- Thus, running a standby server for high availability can be performed at
- the same time as files are stored for longer term disaster recovery
- purposes.
- </para>
-
- <para>
- For testing purposes, it is possible to run both primary and standby
- servers on the same system. This does not provide any worthwhile
- improvement in server robustness, nor would it be described as HA.
- </para>
- </sect2>
-
- <sect2 id="warm-standby-failover">
- <title>Failover</title>
-
- <para>
- If the primary server fails then the standby server should begin
- failover procedures.
- </para>
-
- <para>
- If the standby server fails then no failover need take place. If the
- standby server can be restarted, even some time later, then the recovery
- process can also be immediately restarted, taking advantage of
- restartable recovery. If the standby server cannot be restarted, then a
- full new standby server should be created.
- </para>
-
- <para>
- If the primary server fails and then immediately restarts, you must have
- a mechanism for informing it that it is no longer the primary. This is
- sometimes known as STONITH (Shoot the Other Node In The Head), which is
- necessary to avoid situations where both systems think they are the
- primary, which can lead to confusion and ultimately data loss.
- </para>
-
- <para>
- Many failover systems use just two systems, the primary and the standby,
- connected by some kind of heartbeat mechanism to continually verify the
- connectivity between the two and the viability of the primary. It is
- also possible to use a third system (called a witness server) to avoid
- some problems of inappropriate failover, but the additional complexity
- might not be worthwhile unless it is set-up with sufficient care and
- rigorous testing.
- </para>
-
- <para>
- Once failover to the standby occurs, we have only a
- single server in operation. This is known as a degenerate state.
- The former standby is now the primary, but the former primary is down
- and might stay down. To return to normal operation we must
- fully recreate a standby server,
- either on the former primary system when it comes up, or on a third,
- possibly new, system. Once complete the primary and standby can be
- considered to have switched roles. Some people choose to use a third
- server to provide backup to the new primary until the new standby
- server is recreated,
- though clearly this complicates the system configuration and
- operational processes.
- </para>
-
- <para>
- So, switching from primary to standby server can be fast but requires
- some time to re-prepare the failover cluster. Regular switching from
- primary to standby is encouraged, since it allows regular downtime on
- each system for maintenance. This also acts as a test of the
- failover mechanism to ensure that it will really work when you need it.
- Written administration procedures are advised.
- </para>
- </sect2>
-
- <sect2 id="warm-standby-record">
- <title>Record-based Log Shipping</title>
-
- <para>
- <productname>PostgreSQL</productname> directly supports file-based
- log shipping as described above. It is also possible to implement
- record-based log shipping, though this requires custom development.
- </para>
-
- <para>
- An external program can call the <function>pg_xlogfile_name_offset()</>
- function (see <xref linkend="functions-admin">)
- to find out the file name and the exact byte offset within it of
- the current end of WAL. It can then access the WAL file directly
- and copy the data from the last known end of WAL through the current end
- over to the standby server(s). With this approach, the window for data
- loss is the polling cycle time of the copying program, which can be very
- small, but there is no wasted bandwidth from forcing partially-used
- segment files to be archived. Note that the standby servers'
- <varname>restore_command</> scripts still deal in whole WAL files,
- so the incrementally copied data is not ordinarily made available to
- the standby servers. It is of use only when the primary dies —
- then the last partial WAL file is fed to the standby before allowing
- it to come up. So correct implementation of this process requires
- cooperation of the <varname>restore_command</> script with the data
- copying program.
- </para>
- </sect2>
-
- <sect2 id="backup-incremental-updated">
- <title>Incrementally Updated Backups</title>
-
- <indexterm zone="backup">
- <primary>incrementally updated backups</primary>
- </indexterm>
-
- <indexterm zone="backup">
- <primary>change accumulation</primary>
- </indexterm>
-
- <para>
- In a warm standby configuration, it is possible to offload the expense of
- taking periodic base backups from the primary server; instead base backups
- can be made by backing
- up a standby server's files. This concept is generally known as
- incrementally updated backups, log change accumulation or more simply,
- change accumulation.
- </para>