-<!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.18 2007/11/08 19:16:30 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.19 2007/11/08 19:18:23 momjian Exp $ -->
<chapter id="high-availability">
<title>High Availability, Load Balancing, and Replication</title>
<variablelist>
- <varlistentry>
- <term>Shared Disk Failover</term>
- <listitem>
-
- <para>
- Shared disk failover avoids synchronization overhead by having only one
- copy of the database. It uses a single disk array that is shared by
- multiple servers. If the main database server fails, the standby server
- is able to mount and start the database as though it was recovering from
- a database crash. This allows rapid failover with no data loss.
- </para>
-
- <para>
- Shared hardware functionality is common in network storage devices.
- Using a network file system is also possible, though care must be
- taken that the file system has full POSIX behavior (see <xref
- linkend="creating-cluster-nfs">). One significant limitation of this
- method is that if the shared disk array fails or becomes corrupt, the
- primary and standby servers are both nonfunctional. Another issue is
- that the standby server should never access the shared storage while
- the primary server is running.
- </para>
-
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>File System Replication</term>
- <listitem>
-
- <para>
- A modified version of shared hardware functionality is file system
- replication, where all changes to a file system are mirrored to a file
- system residing on another computer. The only restriction is that
- the mirroring must be done in a way that ensures the standby server
- has a consistent copy of the file system — specifically, writes
- to the standby must be done in the same order as those on the master.
- DRBD is a popular file system replication solution for Linux.
- </para>
+ <varlistentry>
+ <term>Shared Disk Failover</term>
+ <listitem>
+
+ <para>
+ Shared disk failover avoids synchronization overhead by having only one
+ copy of the database. It uses a single disk array that is shared by
+ multiple servers. If the main database server fails, the standby server
+ is able to mount and start the database as though it was recovering from
+ a database crash. This allows rapid failover with no data loss.
+ </para>
+
+ <para>
+ Shared hardware functionality is common in network storage devices.
+ Using a network file system is also possible, though care must be
+ taken that the file system has full POSIX behavior (see <xref
+ linkend="creating-cluster-nfs">). One significant limitation of this
+ method is that if the shared disk array fails or becomes corrupt, the
+ primary and standby servers are both nonfunctional. Another issue is
+ that the standby server should never access the shared storage while
+ the primary server is running.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>File System Replication</term>
+ <listitem>
+
+ <para>
+ A modified version of shared hardware functionality is file system
+ replication, where all changes to a file system are mirrored to a file
+ system residing on another computer. The only restriction is that
+ the mirroring must be done in a way that ensures the standby server
+ has a consistent copy of the file system — specifically, writes
+ to the standby must be done in the same order as those on the master.
+ DRBD is a popular file system replication solution for Linux.
+ </para>
<!--
https://forge.continuent.org/pipermail/sequoia/2006-November/004070.html
protocol to make nodes agree on a serializable transactional order.
-->
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Warm Standby Using Point-In-Time Recovery (<acronym>PITR</>)</term>
- <listitem>
-
- <para>
- A warm standby server (see <xref linkend="warm-standby">) can
- be kept current by reading a stream of write-ahead log (WAL)
- records. If the main server fails, the warm standby contains
- almost all of the data of the main server, and can be quickly
- made the new master database server. This is asynchronous and
- can only be done for the entire database server.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Master-Slave Replication</term>
- <listitem>
-
- <para>
- A master-slave replication setup sends all data modification
- queries to the master server. The master server asynchronously
- sends data changes to the slave server. The slave can answer
- read-only queries while the master server is running. The
- slave server is ideal for data warehouse queries.
- </para>
-
- <para>
- Slony-I is an example of this type of replication, with per-table
- granularity, and support for multiple slaves. Because it
- updates the slave server asynchronously (in batches), there is
- possible data loss during fail over.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Statement-Based Replication Middleware</term>
- <listitem>
-
- <para>
- With statement-based replication middleware, a program intercepts
- every SQL query and sends it to one or all servers. Each server
- operates independently. Read-write queries are sent to all servers,
- while read-only queries can be sent to just one server, allowing
- the read workload to be distributed.
- </para>
-
- <para>
- If queries are simply broadcast unmodified, functions like
- <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
- sequences would have different values on different servers.
- This is because each server operates independently, and because
- SQL queries are broadcast (and not actual modified rows). If
- this is unacceptable, either the middleware or the application
- must query such values from a single server and then use those
- values in write queries. Also, care must be taken that all
- transactions either commit or abort on all servers, perhaps
- using two-phase commit (<xref linkend="sql-prepare-transaction"
- endterm="sql-prepare-transaction-title"> and <xref
- linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
- Pgpool and Sequoia are an example of this type of replication.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Asynchronous Multi-Master Replication</term>
- <listitem>
-
- <para>
- For servers that are not regularly connected, like laptops or
- remote servers, keeping data consistent among servers is a
- challenge. Using asynchronous multi-master replication, each
- server works independently, and periodically communicates with
- the other servers to identify conflicting transactions. The
- conflicts can be resolved by users or conflict resolution rules.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Synchronous Multi-Master Replication</term>
- <listitem>
-
- <para>
- In synchronous multi-master replication, each server can accept
- write requests, and modified data is transmitted from the
- original server to every other server before each transaction
- commits. Heavy write activity can cause excessive locking,
- leading to poor performance. In fact, write performance is
- often worse than that of a single server. Read requests can
- be sent to any server. Some implementations use shared disk
- to reduce the communication overhead. Synchronous multi-master
- replication is best for mostly read workloads, though its big
- advantage is that any server can accept write requests —
- there is no need to partition workloads between master and
- slave servers, and because the data changes are sent from one
- server to another, there is no problem with non-deterministic
- functions like <function>random()</>.
- </para>
-
- <para>
- <productname>PostgreSQL</> does not offer this type of replication,
- though <productname>PostgreSQL</> two-phase commit (<xref
- linkend="sql-prepare-transaction"
- endterm="sql-prepare-transaction-title"> and <xref
- linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
- can be used to implement this in application code or middleware.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Data Partitioning</term>
- <listitem>
-
- <para>
- Data partitioning splits tables into data sets. Each set can
- be modified by only one server. For example, data can be
- partitioned by offices, e.g. London and Paris, with a server
- in each office. If queries combining London and Paris data
- are necessary, an application can query both servers, or
- master/slave replication can be used to keep a read-only copy
- of the other office's data on each server.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Commercial Solutions</term>
- <listitem>
-
- <para>
- Because <productname>PostgreSQL</> is open source and easily
- extended, a number of companies have taken <productname>PostgreSQL</>
- and created commercial closed-source solutions with unique
- failover, replication, and load balancing capabilities.
- </para>
- </listitem>
- </varlistentry>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Warm Standby Using Point-In-Time Recovery (<acronym>PITR</>)</term>
+ <listitem>
+
+ <para>
+ A warm standby server (see <xref linkend="warm-standby">) can
+ be kept current by reading a stream of write-ahead log (WAL)
+ records. If the main server fails, the warm standby contains
+ almost all of the data of the main server, and can be quickly
+ made the new master database server. This is asynchronous and
+ can only be done for the entire database server.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Master-Slave Replication</term>
+ <listitem>
+
+ <para>
+ A master-slave replication setup sends all data modification
+ queries to the master server. The master server asynchronously
+ sends data changes to the slave server. The slave can answer
+ read-only queries while the master server is running. The
+ slave server is ideal for data warehouse queries.
+ </para>
+
+ <para>
+ Slony-I is an example of this type of replication, with per-table
+ granularity, and support for multiple slaves. Because it
+ updates the slave server asynchronously (in batches), there is
+ possible data loss during fail over.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Statement-Based Replication Middleware</term>
+ <listitem>
+
+ <para>
+ With statement-based replication middleware, a program intercepts
+ every SQL query and sends it to one or all servers. Each server
+ operates independently. Read-write queries are sent to all servers,
+ while read-only queries can be sent to just one server, allowing
+ the read workload to be distributed.
+ </para>
+
+ <para>
+ If queries are simply broadcast unmodified, functions like
+ <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
+ sequences would have different values on different servers.
+ This is because each server operates independently, and because
+ SQL queries are broadcast (and not actual modified rows). If
+ this is unacceptable, either the middleware or the application
+ must query such values from a single server and then use those
+ values in write queries. Also, care must be taken that all
+ transactions either commit or abort on all servers, perhaps
+ using two-phase commit (<xref linkend="sql-prepare-transaction"
+ endterm="sql-prepare-transaction-title"> and <xref
+ linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
+ Pgpool and Sequoia are an example of this type of replication.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Asynchronous Multi-Master Replication</term>
+ <listitem>
+
+ <para>
+ For servers that are not regularly connected, like laptops or
+ remote servers, keeping data consistent among servers is a
+ challenge. Using asynchronous multi-master replication, each
+ server works independently, and periodically communicates with
+ the other servers to identify conflicting transactions. The
+ conflicts can be resolved by users or conflict resolution rules.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Synchronous Multi-Master Replication</term>
+ <listitem>
+
+ <para>
+ In synchronous multi-master replication, each server can accept
+ write requests, and modified data is transmitted from the
+ original server to every other server before each transaction
+ commits. Heavy write activity can cause excessive locking,
+ leading to poor performance. In fact, write performance is
+ often worse than that of a single server. Read requests can
+ be sent to any server. Some implementations use shared disk
+ to reduce the communication overhead. Synchronous multi-master
+ replication is best for mostly read workloads, though its big
+ advantage is that any server can accept write requests —
+ there is no need to partition workloads between master and
+ slave servers, and because the data changes are sent from one
+ server to another, there is no problem with non-deterministic
+ functions like <function>random()</>.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> does not offer this type of replication,
+ though <productname>PostgreSQL</> two-phase commit (<xref
+ linkend="sql-prepare-transaction"
+ endterm="sql-prepare-transaction-title"> and <xref
+ linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
+ can be used to implement this in application code or middleware.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Data Partitioning</term>
+ <listitem>
+
+ <para>
+ Data partitioning splits tables into data sets. Each set can
+ be modified by only one server. For example, data can be
+ partitioned by offices, e.g. London and Paris, with a server
+ in each office. If queries combining London and Paris data
+ are necessary, an application can query both servers, or
+ master/slave replication can be used to keep a read-only copy
+ of the other office's data on each server.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Commercial Solutions</term>
+ <listitem>
+
+ <para>
+ Because <productname>PostgreSQL</> is open source and easily
+ extended, a number of companies have taken <productname>PostgreSQL</>
+ and created commercial closed-source solutions with unique
+ failover, replication, and load balancing capabilities.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>