From: Heikki Linnakangas Date: Wed, 31 Mar 2010 20:35:09 +0000 (+0000) Subject: Enhance standby documentation. X-Git-Tag: REL9_0_ALPHA5~9 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=ec9ee9381fab6e2cf8f14722a2e5dfe0beedbe15;p=postgresql Enhance standby documentation. Original patch by Fujii Masao, with heavy editing and bitrot-fixing after my other commit. --- diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index 9b24ebbb2a..8b923a84fc 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -1,4 +1,4 @@ - + High Availability, Load Balancing, and Replication @@ -622,7 +622,8 @@ protocol to make nodes agree on a serializable transactional order. Preparing Master for Standby Servers - Set up continuous archiving to a WAL archive on the master, as described + Set up continuous archiving on the primary to an archive directory + accessible from the standby, as described in . The archive location should be accessible from the standby even when the master is down, ie. it should reside on the standby server itself or another trusted server, not on @@ -646,11 +647,11 @@ protocol to make nodes agree on a serializable transactional order. To set up the standby server, restore the base backup taken from primary - server (see ). In the recovery command file - recovery.conf in the standby's cluster data directory, - turn on standby_mode. Set restore_command to - a simple command to copy files from the WAL archive. If you want to - use streaming replication, set primary_conninfo. + server (see ). Create a recovery + command file recovery.conf in the standby's cluster data + directory, and turn on standby_mode. Set + restore_command to a simple command to copy files from + the WAL archive. @@ -664,17 +665,38 @@ protocol to make nodes agree on a serializable transactional order. - You can use restartpoint_command to prune the archive of files no longer - needed by the standby. + If you want to use streaming replication, fill in + primary_conninfo with a libpq connection string, including + the host name (or IP address) and any additional details needed to + connect to the primary server. If the primary needs a password for + authentication, the password needs to be specified in + primary_conninfo as well. + + + + You can use restartpoint_command to prune the archive of + files no longer needed by the standby. If you're setting up the standby server for high availability purposes, set up WAL archiving, connections and authentication like the primary server, because the standby server will work as a primary server after - failover. If you're setting up the standby server for reporting - purposes, with no plans to fail over to it, configure the standby - accordingly. + failover. You will also need to set trigger_file to make + it possible to fail over. + If you're setting up the standby server for reporting + purposes, with no plans to fail over to it, trigger_file + is not required. + + + + A simple example of a recovery.conf is: + +standby_mode = 'on' +primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' +restore_command = 'cp /path/to/archive/%f %p' +trigger_file = '/path/to/trigger_file' + @@ -731,7 +753,7 @@ protocol to make nodes agree on a serializable transactional order. On systems that support the keepalive socket option, setting , and - helps the master promptly + helps the primary promptly notice a broken connection. @@ -798,6 +820,29 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' primary_conninfo then a FATAL error will be raised. + + + Monitoring + + The WAL files required for the standby's recovery are not deleted from + the pg_xlog directory on the primary while the standby is + connected. If the standby lags far behind the primary, many WAL files + will accumulate in there, and can fill up the disk. It is therefore + important to monitor the lag to ensure the health of the standby and + to avoid disk full situations in the primary. + You can calculate the lag by comparing the current WAL write + location on the primary with the last WAL location received by the + standby. They can be retrieved using + pg_current_xlog_location on the primary and the + pg_last_xlog_receive_location on the standby, + respectively (see and + for details). + The last WAL receive location in the standby is also displayed in the + process status of the WAL receiver process, displayed using the + ps command (see for details). + + + @@ -1898,16 +1943,64 @@ LOG: database system is ready to accept read only connections updated backup than from the original base backup. + + The procedure for taking a file system backup of the standby server's + data directory while it's processing logs shipped from the primary is: + + + + Perform the backup, without using pg_start_backup and + pg_stop_backup. Note that the pg_control + file must be backed up first, as in: + +cp /var/lib/pgsql/data/global/pg_control /tmp +cp -r /var/lib/pgsql/data /path/to/backup +mv /tmp/pg_control /path/to/backup/data/global + + pg_control contains the location where WAL replay will + begin after restoring from the backup; backing it up first ensures + that it points to the last restartpoint when the backup started, not + some later restartpoint that happened while files were copied to the + backup. + + + + + Make note of the backup ending WAL location by calling the + pg_last_xlog_replay_location function at the end of the backup, + and keep it with the backup. + +psql -c "select pg_last_xlog_replay_location();" > /path/to/backup/end_location + + When recovering from the incrementally updated backup, the server + can begin accepting connections and complete the recovery successfully + before the database has become consistent. To avoid that, you must + ensure the database is consistent before users try to connect to the + server and when the recovery ends. You can do that by comparing the + progress of the recovery with the stored backup ending WAL location: + the server is not consistent until recovery has reached the backup end + location. The progress of the recovery can also be observed with the + pg_last_xlog_replay_location function, but that required + connecting to the server while it might not be consistent yet, so + care should be taken with that method. + + + + + + + Since the standby server is not live, it is not possible to use pg_start_backup() and pg_stop_backup() to manage the backup process; it will be up to you to determine how far back you need to keep WAL segment files to have a recoverable - backup. You can do this by running pg_controldata - on the standby server to inspect the control file and determine the - current checkpoint WAL location, or by using the - log_checkpoints option to print values to the standby's - server log. + backup. That is determined by the last restartpoint when the backup + was taken, any WAL older than that can be deleted from the archive + once the backup is complete. You can determine the last restartpoint + by running pg_controldata on the standby server before + taking the backup, or by using the log_checkpoints option + to print values to the standby's server log.