Received: from ( [])
by (8.9.0/8.9.0) with ESMTP id LAA11295
for <>; Fri, 24 Dec 1999 11:01:17 -0500 (EST)
-Received: from ( []) by (o1/$Revision: 1.9 $) with ESMTP id KAA20310 for <>; Fri, 24 Dec 1999 10:39:18 -0500 (EST)
+Received: from ( []) by (o1/$Revision: 1.10 $) with ESMTP id KAA20310 for <>; Fri, 24 Dec 1999 10:39:18 -0500 (EST)
Received: from localhost (majordom@localhost)
by (8.9.3/8.9.3) with SMTP id KAA61760;
Fri, 24 Dec 1999 10:31:13 -0500 (EST)
Received: from ( [])
by (8.9.0/8.9.0) with ESMTP id TAA26244
for <>; Fri, 24 Dec 1999 19:31:02 -0500 (EST)
-Received: from ( []) by (o1/$Revision: 1.9 $) with ESMTP id TAA12730 for <>; Fri, 24 Dec 1999 19:30:05 -0500 (EST)
+Received: from ( []) by (o1/$Revision: 1.10 $) with ESMTP id TAA12730 for <>; Fri, 24 Dec 1999 19:30:05 -0500 (EST)
Received: from localhost (majordom@localhost)
by (8.9.3/8.9.3) with SMTP id TAA57851;
Fri, 24 Dec 1999 19:23:31 -0500 (EST)
Received: from ( [])
by (8.9.0/8.9.0) with ESMTP id WAA02578
for <>; Fri, 24 Dec 1999 22:31:09 -0500 (EST)
-Received: from ( []) by (o1/$Revision: 1.9 $) with ESMTP id WAA16641 for <>; Fri, 24 Dec 1999 22:18:56 -0500 (EST)
+Received: from ( []) by (o1/$Revision: 1.10 $) with ESMTP id WAA16641 for <>; Fri, 24 Dec 1999 22:18:56 -0500 (EST)
Received: from localhost (majordom@localhost)
by (8.9.3/8.9.3) with SMTP id WAA89135;
Fri, 24 Dec 1999 22:11:12 -0500 (EST)
Received: from ( [])
by (8.9.0/8.9.0) with ESMTP id JAA17976
for <>; Sun, 26 Dec 1999 09:31:07 -0500 (EST)
-Received: from ( []) by (o1/$Revision: 1.9 $) with ESMTP id JAA23337 for <>; Sun, 26 Dec 1999 09:28:36 -0500 (EST)
+Received: from ( []) by (o1/$Revision: 1.10 $) with ESMTP id JAA23337 for <>; Sun, 26 Dec 1999 09:28:36 -0500 (EST)
Received: from localhost (majordom@localhost)
by (8.9.3/8.9.3) with SMTP id JAA90738;
Sun, 26 Dec 1999 09:21:58 -0500 (EST)
Received: from ( [])
by (8.9.0/8.9.0) with ESMTP id JAA10317
for <>; Thu, 30 Dec 1999 09:01:08 -0500 (EST)
-Received: from ( []) by (o1/$Revision: 1.9 $) with ESMTP id IAA02365 for <>; Thu, 30 Dec 1999 08:37:10 -0500 (EST)
+Received: from ( []) by (o1/$Revision: 1.10 $) with ESMTP id IAA02365 for <>; Thu, 30 Dec 1999 08:37:10 -0500 (EST)
Received: from localhost (majordom@localhost)
by (8.9.3/8.9.3) with SMTP id IAA87902;
Thu, 30 Dec 1999 08:34:22 -0500 (EST)
Received: from ( [])
by (8.9.0/8.9.0) with ESMTP id AAA16274
for <>; Mon, 3 Jan 2000 00:01:28 -0500 (EST)
-Received: from ( []) by (o1/$Revision: 1.9 $) with ESMTP id XAA02655 for <>; Sun, 2 Jan 2000 23:45:55 -0500 (EST)
+Received: from ( []) by (o1/$Revision: 1.10 $) with ESMTP id XAA02655 for <>; Sun, 2 Jan 2000 23:45:55 -0500 (EST)
Received: from ( [])
by (8.9.3/8.9.3) with ESMTP id XAA13828;
Sun, 2 Jan 2000 23:40:47 -0500 (EST)
Received: from ( [])
by (8.9.0/8.9.0) with ESMTP id LAA17522
for <>; Tue, 4 Jan 2000 11:31:00 -0500 (EST)
-Received: from ( []) by (o1/$Revision: 1.9 $) with ESMTP id LAA01541 for <>; Tue, 4 Jan 2000 11:27:30 -0500 (EST)
+Received: from ( []) by (o1/$Revision: 1.10 $) with ESMTP id LAA01541 for <>; Tue, 4 Jan 2000 11:27:30 -0500 (EST)
Received: from localhost (majordom@localhost)
by (8.9.3/8.9.3) with SMTP id LAA09992;
Tue, 4 Jan 2000 11:18:07 -0500 (EST)
subscribe-nomail command to so that your
message can get through to the mailing list cleanly
+From Mon Feb 4 19:16:17 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g150GGP03822
+ for <>; Mon, 4 Feb 2002 19:16:16 -0500 (EST)
+Received: (qmail 77444 invoked by alias); 5 Feb 2002 00:16:11 -0000
+Received: from unknown (HELO (
+ by with SMTP; 5 Feb 2002 00:16:11 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g150Esl77040
+ for <>; Mon, 4 Feb 2002 19:14:54 -0500 (EST)
+ (envelope-from
+Received: from (localhost [])
+ by (8.11.6/8.11.6) with ESMTP id g150AWh08676
+ for <>; Mon, 4 Feb 2002 19:10:33 -0500
+Message-ID: <>
+Date: Mon, 04 Feb 2002 19:10:32 -0500
+From: mlw <>
+X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686)
+X-Accept-Language: en
+MIME-Version: 1.0
+To: PostgreSQL-development <>
+Subject: [HACKERS] Replication
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Precedence: bulk
+Status: OR
+I re-wrote to C, and wrote a replication daemon. It works, but it
+works like the whole rserv project. I don't like it.
+OK, what the hell do we need to do to get PostgreSQL replicating?
+---------------------------(end of broadcast)---------------------------
+TIP 4: Don't 'kill -9' the postmaster
+From Mon Feb 4 19:57:01 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g150v0P06518
+ for <>; Mon, 4 Feb 2002 19:57:00 -0500 (EST)
+Received: (qmail 90440 invoked by alias); 5 Feb 2002 00:56:59 -0000
+Received: from unknown (HELO (
+ by with SMTP; 5 Feb 2002 00:56:59 -0000
+Received: from ([])
+ by (8.11.3/8.11.4) with ESMTP id g150rMl89885
+ for <>; Mon, 4 Feb 2002 19:53:22 -0500 (EST)
+ (envelope-from
+Received: from (wall [])
+ by (8.9.3/8.9.3) with ESMTP id AAA06047;
+ Tue, 5 Feb 2002 00:53:22 GMT
+Received: from localhost (ssinger@localhost)
+ by (8.9.3/8.9.3) with ESMTP id AAA10675;
+ Tue, 5 Feb 2002 00:52:43 GMT
+Date: Tue, 5 Feb 2002 00:52:43 +0000 (GMT)
+From: Steven <>
+X-X-Sender: <>
+To: mlw <>
+cc: PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+In-Reply-To: <>
+Message-ID: <>
+MIME-Version: 1.0
+Content-Type: TEXT/PLAIN; charset=US-ASCII
+Precedence: bulk
+Status: OR
+On Mon, 4 Feb 2002, mlw wrote:
+I've developed a replacement for Rserv and we are planning on releasing
+it as open source(ie as a contrib module).
+Like Rserv its trigger based but its much more flexible.
+The key adventages it has over Rserv is that it has
+-Support for multiple slaves
+-It Perserves transactions while doing the mirroring. Ie If rows A,B are
+originally added in the same transaction they will be mirrored in the same
+We have plans on adding filtering based on data/selective mirroring as
+well. (Ie only rows with COUNTRY='Canada' go to
+slave A, and rows with COUNTRY='China' go to slave B).
+But I'm not sure when I'll get to that.
+Support for conflict resolution(If allow edits to be made on the slaves)
+would be nice.
+I hope to be able to send a tarball with the source to the pgpatches list
+within the next few days.
+We've been using the system operationally for a number of months and have
+been happy with it.
+> I re-wrote to C, and wrote a replication daemon. It works, but it
+> works like the whole rserv project. I don't like it.
+> OK, what the hell do we need to do to get PostgreSQL replicating?
+> ---------------------------(end of broadcast)---------------------------
+> TIP 4: Don't 'kill -9' the postmaster
+Steven Singer
+Aircraft Performance Systems Phone: 519-747-1170 ext 282
+Navtech Systems Support Inc. AFTN: CYYZXNSX SITA: YYZNSCR
+Waterloo, Ontario ARINC: YKFNSCR
+---------------------------(end of broadcast)---------------------------
+TIP 2: you can get off all lists at once with the unregister command
+ (send "unregister YourEmailAddressHere" to
+From Mon Feb 4 20:06:57 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g1516vP07508
+ for <>; Mon, 4 Feb 2002 20:06:57 -0500 (EST)
+Received: (qmail 92753 invoked by alias); 5 Feb 2002 01:06:55 -0000
+Received: from unknown (HELO (
+ by with SMTP; 5 Feb 2002 01:06:55 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g150vhl91978
+ for <>; Mon, 4 Feb 2002 19:57:44 -0500 (EST)
+ (envelope-from
+Received: from ( [])
+ by (Postfix) with ESMTP
+ id 9D6EE8779; Mon, 4 Feb 2002 19:57:46 -0500 (EST)
+Date: Mon, 4 Feb 2002 19:57:34 -0500 (EST)
+From: bpalmer <>
+To: mlw <>
+cc: PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+In-Reply-To: <>
+Message-ID: <>
+MIME-Version: 1.0
+Content-Type: TEXT/PLAIN; charset=US-ASCII
+Precedence: bulk
+Status: OR
+> OK, what the hell do we need to do to get PostgreSQL replicating?
+I hope you understand that replication, done right, is a massive
+project. I know that Darren any myself (and the rest of the pg-repl
+folks) have been waiting till 7.2 went gold till we did anymore work. I
+think we hope to have master / slave replicatin working for 7.3 and then
+target multimaster for 7.4. At least that's the hope.
+- Brandon
+ c: 646-456-5455 h: 201-798-4983
+ b. palmer,
+---------------------------(end of broadcast)---------------------------
+TIP 2: you can get off all lists at once with the unregister command
+ (send "unregister YourEmailAddressHere" to
+From Mon Feb 4 21:16:56 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g152GtP10503
+ for <>; Mon, 4 Feb 2002 21:16:55 -0500 (EST)
+Received: (qmail 6711 invoked by alias); 5 Feb 2002 02:16:53 -0000
+Received: from unknown (HELO (
+ by with SMTP; 5 Feb 2002 02:16:53 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g151qSl99469
+ for <>; Mon, 4 Feb 2002 20:52:28 -0500 (EST)
+ (envelope-from
+Received: from (localhost [])
+ by (8.11.6/8.11.6) with ESMTP id g151lph09147;
+ Mon, 4 Feb 2002 20:47:51 -0500
+Message-ID: <>
+Date: Mon, 04 Feb 2002 20:47:51 -0500
+From: mlw <>
+X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686)
+X-Accept-Language: en
+MIME-Version: 1.0
+To: Steven <>
+cc: PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+References: <>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Precedence: bulk
+Status: OR
+Steven wrote:
+> On Mon, 4 Feb 2002, mlw wrote:
+> I've developed a replacement for Rserv and we are planning on releasing
+> it as open source(ie as a contrib module).
+> Like Rserv its trigger based but its much more flexible.
+> The key adventages it has over Rserv is that it has
+> -Support for multiple slaves
+> -It Perserves transactions while doing the mirroring. Ie If rows A,B are
+> originally added in the same transaction they will be mirrored in the same
+> transaction.
+I did a similar thing. I took the rserv trigger "as is," but rewrote the
+replication support code. What I eventually did was write a "snapshot daemon"
+which created snapshot files. Then a "slave daemon" which would check the last
+snapshot applied and apply all the snapshots, in order, as needed. One would
+run one of these daemons per slave server.
+---------------------------(end of broadcast)---------------------------
+TIP 5: Have you checked our extensive FAQ?
+From Mon Feb 4 20:57:25 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g151vOP09239
+ for <>; Mon, 4 Feb 2002 20:57:24 -0500 (EST)
+Received: (qmail 99828 invoked by alias); 5 Feb 2002 01:57:19 -0000
+Received: from unknown (HELO (
+ by with SMTP; 5 Feb 2002 01:57:19 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g151s0l99529
+ for <>; Mon, 4 Feb 2002 20:54:00 -0500 (EST)
+ (envelope-from
+Received: from (localhost [])
+ by (8.11.6/8.11.6) with ESMTP id g151nah09156;
+ Mon, 4 Feb 2002 20:49:37 -0500
+Message-ID: <>
+Date: Mon, 04 Feb 2002 20:49:36 -0500
+From: mlw <>
+X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686)
+X-Accept-Language: en
+MIME-Version: 1.0
+To: bpalmer <>
+cc: PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+References: <>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Precedence: bulk
+Status: OR
+bpalmer wrote:
+> >
+> > OK, what the hell do we need to do to get PostgreSQL replicating?
+> I hope you understand that replication, done right, is a massive
+> project. I know that Darren any myself (and the rest of the pg-repl
+> folks) have been waiting till 7.2 went gold till we did anymore work. I
+> think we hope to have master / slave replicatin working for 7.3 and then
+> target multimaster for 7.4. At least that's the hope.
+I do know how hard replication is. I also understand how important it is.
+If you guys have a project going, and need developers, I am more than willing.
+---------------------------(end of broadcast)---------------------------
+TIP 5: Have you checked our extensive FAQ?
+From Mon Feb 4 21:42:13 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g152gCP11957
+ for <>; Mon, 4 Feb 2002 21:42:13 -0500 (EST)
+Received: (qmail 14229 invoked by alias); 5 Feb 2002 02:42:09 -0000
+Received: from unknown (HELO (
+ by with SMTP; 5 Feb 2002 02:42:09 -0000
+Received: from ([])
+ by (8.11.3/8.11.4) with ESMTP id g152SBl10682
+ for <>; Mon, 4 Feb 2002 21:28:11 -0500 (EST)
+ (envelope-from
+Received: from (wall [])
+ by (8.9.3/8.9.3) with ESMTP id CAA06384;
+ Tue, 5 Feb 2002 02:28:13 GMT
+Received: from localhost (ssinger@localhost)
+ by (8.9.3/8.9.3) with ESMTP id CAA10682;
+ Tue, 5 Feb 2002 02:27:35 GMT
+Date: Tue, 5 Feb 2002 02:27:35 +0000 (GMT)
+From: Steven <>
+X-X-Sender: <>
+To: mlw <>
+cc: PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+In-Reply-To: <>
+Message-ID: <>
+MIME-Version: 1.0
+Content-Type: TEXT/PLAIN; charset=US-ASCII
+Precedence: bulk
+Status: OR
+DBMirror doesn't use snapshot's instead it records a log of transactions
+that are committed to the database in a pair of tables.
+In the case of an INSERT this is the row that is being added.
+In the case of a delete the primary key of the row being deleted.
+And in the case of an UPDATE, the primary key before the update along with
+all of the data the row should have after an update.
+Then for each slave database a perl script walks though the transactions
+that are pending for that host and reconstructs SQL to send the row edits
+to that host. A record of the fact that transaction Y has been sent to
+host X is also kept.
+When transaction X has been sent to all of the hosts that are in the
+system it is then deleted from the Pending tables.
+I suspect that all of the information I'm storing in the Pending tables is
+also being stored by Postgres in its log but I haven't investigated how
+the information could be extracted(or how long it is kept for). That
+would reduce the extra storage overhead that the replication system
+As I remember(Its been a while since I've looked at it) RServ uses OID's
+in its tables to point to the data that needs to be replicated. We tried
+a similar approach but found difficulties with doing partial updates.
+On Mon, 4 Feb 2002, mlw wrote:
+> I did a similar thing. I took the rserv trigger "as is," but rewrote the
+> replication support code. What I eventually did was write a "snapshot daemon"
+> which created snapshot files. Then a "slave daemon" which would check the last
+> snapshot applied and apply all the snapshots, in order, as needed. One would
+> run one of these daemons per slave server.
+Steven Singer
+Aircraft Performance Systems Phone: 519-747-1170 ext 282
+Navtech Systems Support Inc. AFTN: CYYZXNSX SITA: YYZNSCR
+Waterloo, Ontario ARINC: YKFNSCR
+---------------------------(end of broadcast)---------------------------
+TIP 2: you can get off all lists at once with the unregister command
+ (send "unregister YourEmailAddressHere" to
+From Thu Feb 7 02:49:48 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g177nlP04347
+ for <>; Thu, 7 Feb 2002 02:49:47 -0500 (EST)
+Received: (qmail 22556 invoked by alias); 7 Feb 2002 07:49:49 -0000
+Received: from unknown (HELO (
+ by with SMTP; 7 Feb 2002 07:49:49 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g177QfE19572
+ for <>; Thu, 7 Feb 2002 02:26:42 -0500 (EST)
+ (envelope-from
+Received: from localhost (swm@localhost)
+ by (8.11.4/8.11.4) with ESMTP id g177RiU06086;
+ Thu, 7 Feb 2002 18:27:45 +1100
+Date: Thu, 7 Feb 2002 18:27:44 +1100 (EST)
+From: Gavin Sherry <>
+To: mlw <>
+cc: PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+In-Reply-To: <>
+Message-ID: <>
+MIME-Version: 1.0
+Content-Type: TEXT/PLAIN; charset=US-ASCII
+Precedence: bulk
+Status: OR
+On Mon, 4 Feb 2002, mlw wrote:
+> I re-wrote to C, and wrote a replication daemon. It works, but it
+> works like the whole rserv project. I don't like it.
+> OK, what the hell do we need to do to get PostgreSQL replicating?
+The trigger model is not a very sophisticated one. I think I have a better
+-- though more complicated -- one. This model would be able to handle
+multiple masters and master->slave.
+First of all, all machines in the cluster would have to be aware all the
+machines in the cluster. This would have to be stored in a new system
+The FE/BE protocol would need to be modified to accepted parsed node trees
+generated by pg_analyze_and_rewrite(). These could then be dispatched by
+the executing server, inside of pg_exec_query_string, to all other servers
+in the cluster (excluding itself). Naturally, this dispatch would need to
+be non-blocking.
+pg_exec_query_string() would need to check that nodetags to make sure
+selects and perhaps some commands are not dispatched.
+Before the executing server runs finish_xact_command(), it would check
+that the query was successfully executed on all machines otherwise
+abort. Such a system would need a few configuration options: whether or
+not you abort on failed replication to slaves, the ability to replicate
+only certain tables, etc.
+Naturally, this would slow down writes to the system (possibly a lot
+depending on the performance difference between the executing machine and
+the least powerful machine in the cluster), but most usages of postgresql
+are read intensive, not write.
+Any reason this model would not work?
+---------------------------(end of broadcast)---------------------------
+TIP 4: Don't 'kill -9' the postmaster
+From Thu Feb 7 08:31:00 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g17DUxP13923
+ for <>; Thu, 7 Feb 2002 08:30:59 -0500 (EST)
+Received: (qmail 91796 invoked by alias); 7 Feb 2002 13:30:55 -0000
+Received: from unknown (HELO (
+ by with SMTP; 7 Feb 2002 13:30:55 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g17Cw0E87782
+ for <>; Thu, 7 Feb 2002 07:58:01 -0500 (EST)
+ (envelope-from
+Received: from (localhost [])
+ by (8.11.6/8.11.6) with ESMTP id g17CqNt16887;
+ Thu, 7 Feb 2002 07:52:24 -0500
+Message-ID: <>
+Date: Thu, 07 Feb 2002 07:52:23 -0500
+From: mlw <>
+X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686)
+X-Accept-Language: en
+MIME-Version: 1.0
+To: Gavin Sherry <>
+cc: PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+References: <>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Precedence: bulk
+Status: OR
+Gavin Sherry wrote:
+> Naturally, this would slow down writes to the system (possibly a lot
+> depending on the performance difference between the executing machine and
+> the least powerful machine in the cluster), but most usages of postgresql
+> are read intensive, not write.
+> Any reason this model would not work?
+What, then is the purpose of replication to multiple masters?
+I can think of only two reasons why you want replication. (1) Redundancy, make
+sure that if one server dies, then another server has the same data and is used
+seamlessly. (2) Increase performance over one system.
+In reason (1) I submit that a server load balance which sits on top of
+PostgreSQL, and executes writes on both servers while distributing reads would
+be best. This is a HUGE project. The load balancer must know EXACTLY how the
+system is configured, which includes all functions and everything.
+In reason (2) your system would fail to provide the scalability that would be
+needed. If writes take a long time, but reads are fine, what is the difference
+between the trigger based replicator?
+I have in the back of my mind, an idea of patching into the WAL stuff, and
+using that mechanism to push changes out to the slaves.
+Where one machine is still the master, but no trigger stuff, just a WAL patch.
+Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure
+exactly, the idea hasn't completely formed yet.
+---------------------------(end of broadcast)---------------------------
+TIP 5: Have you checked our extensive FAQ?
+From Thu Feb 7 12:51:42 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g17HpfP16661
+ for <>; Thu, 7 Feb 2002 12:51:41 -0500 (EST)
+Received: (qmail 62955 invoked by alias); 7 Feb 2002 17:50:42 -0000
+Received: from unknown (HELO (
+ by with SMTP; 7 Feb 2002 17:50:42 -0000
+Received: from ([])
+ by (8.11.3/8.11.4) with ESMTP id g17HnTE62256
+ for <>; Thu, 7 Feb 2002 12:49:29 -0500 (EST)
+ (envelope-from
+Received: from (wall [])
+ by (8.9.3/8.9.3) with ESMTP id RAA07908;
+ Thu, 7 Feb 2002 17:49:31 GMT
+Received: from localhost (ssinger@localhost)
+ by (8.9.3/8.9.3) with ESMTP id RAA05687;
+ Thu, 7 Feb 2002 17:48:52 GMT
+Date: Thu, 7 Feb 2002 17:48:51 +0000 (GMT)
+From: Steven Singer <>
+X-X-Sender: <>
+To: Gavin Sherry <>
+cc: mlw <>,
+ PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+In-Reply-To: <>
+Message-ID: <>
+MIME-Version: 1.0
+Content-Type: TEXT/PLAIN; charset=US-ASCII
+Precedence: bulk
+Status: OR
+What you describe sounds like a form of a two-stage commit protocol.
+If the command worked on two of the replicated databases but failed on a
+third then the executing server would have to be able to undo the command
+on the replicated databases as well as itself.
+The problems with two stage commit type approches to replication are
+1) Speed as you mentioned. Write speed isn't a concern for some
+applications but it is very important in others.
+2) All of the databases must be able to communicate with each other at
+all times in order for any edits to work. If the servers are
+connected over some sort of WAN that periodically has short outages this
+is a problem. Also if your using replication because you want to be able
+to take down one of the databases for short periods of time without
+bringing down the others your in trouble.
+btw: I posted the alternative to Rserv that I mentioned the other day to
+the pg-patches mailing list. If anyone is intreasted you should be able
+to grab it off the archives.
+On Thu, 7 Feb 2002, Gavin Sherry wrote:
+> First of all, all machines in the cluster would have to be aware all the
+> machines in the cluster. This would have to be stored in a new system
+> table.
+> The FE/BE protocol would need to be modified to accepted parsed node trees
+> generated by pg_analyze_and_rewrite(). These could then be dispatched by
+> the executing server, inside of pg_exec_query_string, to all other servers
+> in the cluster (excluding itself). Naturally, this dispatch would need to
+> be non-blocking.
+> pg_exec_query_string() would need to check that nodetags to make sure
+> selects and perhaps some commands are not dispatched.
+> Before the executing server runs finish_xact_command(), it would check
+> that the query was successfully executed on all machines otherwise
+> abort. Such a system would need a few configuration options: whether or
+> not you abort on failed replication to slaves, the ability to replicate
+> only certain tables, etc.
+> Naturally, this would slow down writes to the system (possibly a lot
+> depending on the performance difference between the executing machine and
+> the least powerful machine in the cluster), but most usages of postgresql
+> are read intensive, not write.
+> Any reason this model would not work?
+> Gavin
+> ---------------------------(end of broadcast)---------------------------
+> TIP 4: Don't 'kill -9' the postmaster
+Steven Singer
+Aircraft Performance Systems Phone: 519-747-1170 ext 282
+Navtech Systems Support Inc. AFTN: CYYZXNSX SITA: YYZNSCR
+Waterloo, Ontario ARINC: YKFNSCR
+---------------------------(end of broadcast)---------------------------
+TIP 1: subscribe and unsubscribe commands go to
+From Thu Feb 7 17:50:42 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g17MoeP27121
+ for <>; Thu, 7 Feb 2002 17:50:40 -0500 (EST)
+Received: (qmail 39930 invoked by alias); 7 Feb 2002 22:50:17 -0000
+Received: from unknown (HELO (
+ by with SMTP; 7 Feb 2002 22:50:17 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g17Ma4E38041
+ for <>; Thu, 7 Feb 2002 17:36:04 -0500 (EST)
+ (envelope-from
+Received: from (fharvell@localhost)
+ by (8.11.6/8.11.6) with ESMTP id g17MZhR17707;
+ Thu, 7 Feb 2002 17:35:43 -0500
+Message-ID: <>
+X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
+From: F Harvell <>
+To: mlw <>
+cc: Gavin Sherry <>,
+ PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+In-Reply-To: Message from mlw
+ of "Thu, 07 Feb 2002 07:52:23 EST."
+ <>
+MIME-Version: 1.0
+Content-Type: text/plain; charset=us-ascii
+Date: Thu, 07 Feb 2002 17:35:43 -0500
+Precedence: bulk
+Status: OR
+I'm not that familiar with the whole replication issues in PostgreSQL,
+however, I would be partial to replication that was based upon the
+playback of the (a?) journal file. (I believe that the WAL is a
+journal file.)
+By being based upon a journal file, it would be possible to accomplish
+two significant items. First, it would be possible to "restore" a
+database to an exact state just before a failure. Most commercial
+databases provide the ability to do this. Banks, etc. log the journal
+files directly to tape to provide a complete transaction history such
+that they can rebuild their database from any given snapshot. (Note
+that the journal file needs to be "editable" as a failure may be
+"delete from x" with a missing where clause.)
+This leads directly into the second advantage, the ability to have a
+replicated database operating anywhere, over any connection on any
+server. Speed of writes would not be a factor. In essence, as long
+as the replicated database had a snapshot of the database and then was
+provided with all journal files since the snapshot, it would be
+possible to build a current database. If the replicant got behind in
+the processing, it would catch up when things slowed down.
+In my opionion, the first advantage is in many ways most important.
+Replication becomes simply the restoration of the database in realtime
+on a second server. The "replication" task becomes the definition of
+a protocol for distributing the journal file. At least one major
+database vendor does replication (shadowing) in exactly this mannor.
+Maybe I'm all wet and the journal file and journal playback already
+exists. If so, IMHO, basing replication off of this would be the
+right direction.
+On Thu, 07 Feb 2002 07:52:23 EST, mlw wrote:
+> I have in the back of my mind, an idea of patching into the WAL stuff, and
+> using that mechanism to push changes out to the slaves.
+> Where one machine is still the master, but no trigger stuff, just a WAL patch.
+> Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure
+> exactly, the idea hasn't completely formed yet.
+---------------------------(end of broadcast)---------------------------
+TIP 4: Don't 'kill -9' the postmaster
+From Fri Feb 8 00:50:08 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g185o7P27878
+ for <>; Fri, 8 Feb 2002 00:50:07 -0500 (EST)
+Received: (qmail 17348 invoked by alias); 8 Feb 2002 05:50:03 -0000
+Received: from unknown (HELO (
+ by with SMTP; 8 Feb 2002 05:50:03 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g185cTE15241
+ for <>; Fri, 8 Feb 2002 00:38:29 -0500 (EST)
+ (envelope-from
+Received: from ([]) by
+ (InterMail vM. 201-253-122-122-105-20011231) with ESMTP
+ id <>
+ for <>;
+ Fri, 8 Feb 2002 00:38:33 -0500
+Message-ID: <>
+Date: Fri, 08 Feb 2002 00:29:22 -0500
+From: Darren Johnson <>
+User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20001108 Netscape6/6.0
+X-Accept-Language: en
+MIME-Version: 1.0
+To: PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+References: <>
+Content-Type: text/plain; charset=us-ascii; format=flowed
+Content-Transfer-Encoding: 7bit
+Precedence: bulk
+Status: OR
+ >
+ > The problems with two stage commit type approches to replication are
+IMHO the biggest problem with two phased commit is it doesn't scale.
+The more servers
+you add to the replica the slower it goes. Also there's the potential
+for dead locks across
+server boundaries.
+ >
+ > 2) All of the databases must be able to communicate with each other at
+ > all times in order for any edits to work. If the servers are
+ > connected over some sort of WAN that periodically has short outages this
+ > is a problem. Also if your using replication because you want to be
+ > to take down one of the databases for short periods of time without
+ > bringing down the others your in trouble.
+All true for two phased commit protocol. To have multi master
+replication, you must have all
+systems communicating, but you can use a multicast group communication
+system instead of
+2PC. Using total order messaging, you can ensure all changes are
+delivered to all servers in the
+replica in the same order. This group communication system also allows
+failures to be detected
+while other servers in the replica continue processing.
+A few of us are working with this theory, and trying to integrate with
+7.2. There is a working
+model for 6.4, but its very limited. (insert, update, and deletes) We
+are currently hosted at
+But the site has been down the last 2 days. I've contacted the web
+master, but haven't seen
+any results yet. If any one knows what going on with gborg, I'd
+appreciate a status.
+---------------------------(end of broadcast)---------------------------
+TIP 2: you can get off all lists at once with the unregister command
+ (send "unregister YourEmailAddressHere" to
+From Fri Feb 8 06:20:44 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g18BKhP06132
+ for <>; Fri, 8 Feb 2002 06:20:43 -0500 (EST)
+Received: (qmail 90815 invoked by alias); 8 Feb 2002 11:20:40 -0000
+Received: from unknown (HELO (
+ by with SMTP; 8 Feb 2002 11:20:40 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g18B9ZE89589
+ for <>; Fri, 8 Feb 2002 06:09:36 -0500 (EST)
+ (envelope-from
+Received: from (localhost.localdomain [])
+ by (Postfix) with SMTP
+ id 598393A132; Fri, 8 Feb 2002 11:09:36 +0000 (GMT)
+From: Bradley Kieser <>
+Date: Fri, 08 Feb 2002 11:09:36 GMT
+Message-ID: <>
+Subject: Re: [HACKERS] Replication
+To: Darren Johnson <>
+cc: PostgreSQL-development <>
+In-Reply-To: <>
+References: <> <>
+X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
+X-Priority: 3 (Normal)
+MIME-Version: 1.0
+Content-Type: text/plain; charset=ISO-8859-1
+Content-Transfer-Encoding: 8bit
+X-MIME-Autoconverted: from quoted-printable to 8bit by id g18BJoF90352
+Precedence: bulk
+Status: OR
+Given that different replication strategies will probably be developed
+for PG, do you envisage DBAs to be able to select the type of replication
+for their installation? I.e. Replication being selectable rther like
+storage structures?
+Would be a killer bit of flexibility, given how enormous the impact of
+replication will be to corporate adoption of PG.
+>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<
+On 2/8/02, 5:29:22 AM, Darren Johnson <> wrote
+regarding Re: [HACKERS] Replication:
+> >
+> > The problems with two stage commit type approches to replication are
+> IMHO the biggest problem with two phased commit is it doesn't scale.
+> The more servers
+> you add to the replica the slower it goes. Also there's the potential
+> for dead locks across
+> server boundaries.
+> >
+> > 2) All of the databases must be able to communicate with each other at
+> > all times in order for any edits to work. If the servers are
+> > connected over some sort of WAN that periodically has short outages this
+> > is a problem. Also if your using replication because you want to be
+> able
+> > to take down one of the databases for short periods of time without
+> > bringing down the others your in trouble.
+> All true for two phased commit protocol. To have multi master
+> replication, you must have all
+> systems communicating, but you can use a multicast group communication
+> system instead of
+> 2PC. Using total order messaging, you can ensure all changes are
+> delivered to all servers in the
+> replica in the same order. This group communication system also allows
+> failures to be detected
+> while other servers in the replica continue processing.
+> A few of us are working with this theory, and trying to integrate with
+> 7.2. There is a working
+> model for 6.4, but its very limited. (insert, update, and deletes) We
+> are currently hosted at
+> But the site has been down the last 2 days. I've contacted the web
+> master, but haven't seen
+> any results yet. If any one knows what going on with gborg, I'd
+> appreciate a status.
+> Darren
+> ---------------------------(end of broadcast)---------------------------
+> TIP 2: you can get off all lists at once with the unregister command
+> (send "unregister YourEmailAddressHere" to
+---------------------------(end of broadcast)---------------------------
+TIP 1: subscribe and unsubscribe commands go to
+From Fri Feb 8 12:40:36 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g18HeZP08450
+ for <>; Fri, 8 Feb 2002 12:40:35 -0500 (EST)
+Received: (qmail 74089 invoked by alias); 8 Feb 2002 17:40:30 -0000
+Received: from unknown (HELO (
+ by with SMTP; 8 Feb 2002 17:40:30 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g18HbwE73437
+ for <>; Fri, 8 Feb 2002 12:37:58 -0500 (EST)
+ (envelope-from
+Received: from ([]) by
+ (InterMail vM. 201-253-122-122-105-20011231) with ESMTP
+ id <>;
+ Fri, 8 Feb 2002 12:38:04 -0500
+Message-ID: <>
+Date: Fri, 08 Feb 2002 11:23:13 -0500
+From: Darren Johnson <>
+User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m18) Gecko/20010131 Netscape6/6.01
+X-Accept-Language: en
+MIME-Version: 1.0
+To: Bradley Kieser <>
+Subject: Re: [HACKERS] Replication
+References: <> <> <>
+Content-Type: text/plain; charset=us-ascii; format=flowed
+Content-Transfer-Encoding: 7bit
+Precedence: bulk
+Status: OR
+> Given that different replication strategies will probably be developed
+> for PG, do you envisage DBAs to be able to select the type of replication
+> for their installation? I.e. Replication being selectable rther like
+> storage structures?
+I can't speak for other replication solutions, but we are using the
+--with-replication or
+-r parameter when starting postmaster. Some day I hope there will be
+parameters for
+master/slave partial/full and sync/async, but it will be some time
+before we cross those
+---------------------------(end of broadcast)---------------------------
+TIP 6: Have you searched our list archives?
+From Fri Feb 8 14:42:40 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g18JgdP28166
+ for <>; Fri, 8 Feb 2002 14:42:39 -0500 (EST)
+Received: (qmail 18650 invoked by alias); 8 Feb 2002 19:42:39 -0000
+Received: from unknown (HELO (
+ by with SMTP; 8 Feb 2002 19:42:39 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g18JYBE17341
+ for <>; Fri, 8 Feb 2002 14:34:11 -0500 (EST)
+ (envelope-from
+Received: from (unknown [])
+ by (Postfix) with ESMTP id A785066B04
+ for <>; Fri, 8 Feb 2002 14:33:28 -0500 (EST)
+Date: Fri, 8 Feb 2002 14:34:34 -0500 (EST)
+From: Randall Jonasz <>
+X-X-Sender: <>
+To: PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+In-Reply-To: <>
+Message-ID: <>
+MIME-Version: 1.0
+Content-Type: TEXT/PLAIN; charset=US-ASCII
+Precedence: bulk
+Status: OR
+I've been looking into database replication theory lately and have found
+some interesting papers discussing various approaches. (Here's
+one paper that struck me as being very helpful,
+ ) So far I favour an
+eager replication system which is predicated on a read local/write all
+available. The system should not depend on two phase commit or primary
+copy algorithms. The former leads to the whole system being as quick as
+the slowest machine. In addition, 2 phase commit involves 2n messages for
+each transaction which does not scale well at all. This idea will also
+have to take into account a crashed node which did not ack a transaction.
+The primary copy algorithms I've seen suffer from a single point of
+failure and potential bottlenecks at the primary node.
+Instead I like the master to master or peer to peer algorithm as discussed
+in the above paper. This approach accounts for network partitions, nodes
+leaving and joining a cluster and the ability to commit a transaction once
+the communication module has determined the total order of the said
+transaction, i.e. no need for waiting for acks. This scales well and
+research has shown it to increase the number of transactions/second a
+database cluster can handle over a single node.
+Postgres-R is another interesting approach which I think should be taken
+seriously. Anyone interested can read a paper on this at
+Anyways, my two cents
+Randall Jonasz
+Software Engineer
+Click2net Inc.
+On Thu, 7 Feb 2002, mlw wrote:
+> Gavin Sherry wrote:
+> > Naturally, this would slow down writes to the system (possibly a lot
+> > depending on the performance difference between the executing machine and
+> > the least powerful machine in the cluster), but most usages of postgresql
+> > are read intensive, not write.
+> >
+> > Any reason this model would not work?
+> What, then is the purpose of replication to multiple masters?
+> I can think of only two reasons why you want replication. (1) Redundancy, make
+> sure that if one server dies, then another server has the same data and is used
+> seamlessly. (2) Increase performance over one system.
+> In reason (1) I submit that a server load balance which sits on top of
+> PostgreSQL, and executes writes on both servers while distributing reads would
+> be best. This is a HUGE project. The load balancer must know EXACTLY how the
+> system is configured, which includes all functions and everything.
+> In reason (2) your system would fail to provide the scalability that would be
+> needed. If writes take a long time, but reads are fine, what is the difference
+> between the trigger based replicator?
+> I have in the back of my mind, an idea of patching into the WAL stuff, and
+> using that mechanism to push changes out to the slaves.
+> Where one machine is still the master, but no trigger stuff, just a WAL patch.
+> Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure
+> exactly, the idea hasn't completely formed yet.
+> ---------------------------(end of broadcast)---------------------------
+> TIP 5: Have you checked our extensive FAQ?
+---------------------------(end of broadcast)---------------------------
+TIP 5: Have you checked our extensive FAQ?
+From Fri Feb 8 15:20:32 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g18KKSP03731
+ for <>; Fri, 8 Feb 2002 15:20:29 -0500 (EST)
+Received: (qmail 28961 invoked by alias); 8 Feb 2002 20:20:27 -0000
+Received: from unknown (HELO (
+ by with SMTP; 8 Feb 2002 20:20:27 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g18KC7E27667
+ for <>; Fri, 8 Feb 2002 15:12:07 -0500 (EST)
+ (envelope-from
+Received: from ( [])
+ by (Postfix) with ESMTP
+ id 1066F8787; Fri, 8 Feb 2002 15:12:08 -0500 (EST)
+Date: Fri, 8 Feb 2002 15:12:00 -0500 (EST)
+From: bpalmer <>
+To: Randall Jonasz <>
+cc: PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+In-Reply-To: <>
+Message-ID: <>
+MIME-Version: 1.0
+Content-Type: TEXT/PLAIN; charset=US-ASCII
+Precedence: bulk
+Status: OR
+I've not looked at the first paper, but I wil.
+> Postgres-R is another interesting approach which I think should be taken
+> seriously. Anyone interested can read a paper on this at
+I would point you to the info on gborg, but it seems to be down at the
+- Brandon
+ c: 646-456-5455 h: 201-798-4983
+ b. palmer,
+---------------------------(end of broadcast)---------------------------
+TIP 3: if posting/reading through Usenet, please send an appropriate
+subscribe-nomail command to so that your
+message can get through to the mailing list cleanly
+From Fri Feb 8 17:41:03 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g18Mf2P18046
+ for <>; Fri, 8 Feb 2002 17:41:03 -0500 (EST)
+Received: (qmail 63057 invoked by alias); 8 Feb 2002 22:41:02 -0000
+Received: from unknown (HELO (
+ by with SMTP; 8 Feb 2002 22:41:02 -0000
+Received: from ( [])
+ by (8.11.3/8.11.4) with ESMTP id g18MR9E60361
+ for <>; Fri, 8 Feb 2002 17:27:11 -0500 (EST)
+ (envelope-from
+Received: from ([]) by
+ (InterMail vM. 201-253-122-122-105-20011231) with ESMTP
+ id <>;
+ Fri, 8 Feb 2002 17:26:34 -0500
+Message-ID: <>
+Date: Fri, 08 Feb 2002 16:11:43 -0500
+From: Darren Johnson <>
+User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m18) Gecko/20010131 Netscape6/6.01
+X-Accept-Language: en
+MIME-Version: 1.0
+To: Randall Jonasz <>
+cc: PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+References: <>
+Content-Type: text/plain; charset=us-ascii; format=flowed
+Content-Transfer-Encoding: 7bit
+Precedence: bulk
+Status: OR
+> I've been looking into database replication theory lately and have found
+> some interesting papers discussing various approaches. (Here's
+> one paper that struck me as being very helpful,
+> )
+Here is another one from that same group, that addresses the WAN issues.
+---------------------------(end of broadcast)---------------------------
+TIP 1: subscribe and unsubscribe commands go to
+From Fri Feb 8 19:20:30 2002
+Return-path: <>
+Received: from ( [])
+ by (8.11.6/8.10.1) with SMTP id g190KTP26980
+ for <>; Fri, 8 Feb 2002 19:20:29 -0500 (EST)
+Received: (qmail 88124 invoked by alias); 9 Feb 2002 00:20:27 -0000
+Received: from unknown (HELO (
+ by with SMTP; 9 Feb 2002 00:20:27 -0000
+Received: from localhost.localdomain ( [])
+ by (8.11.3/8.11.4) with ESMTP id g190H3E87489
+ for <>; Fri, 8 Feb 2002 19:17:03 -0500 (EST)
+ (envelope-from
+Received: from localhost (camber@localhost)
+ by localhost.localdomain (8.11.6/8.11.6) with ESMTP id g190H0P18427;
+ Fri, 8 Feb 2002 19:17:00 -0500
+X-Authentication-Warning: localhost.localdomain: camber owned process doing -bs
+Date: Fri, 8 Feb 2002 19:17:00 -0500 (EST)
+From: Brian Bruns <>
+X-X-Sender: <camber@localhost.localdomain>
+To: Randall Jonasz <>
+cc: PostgreSQL-development <>
+Subject: Re: [HACKERS] Replication
+In-Reply-To: <>
+Message-ID: <Pine.LNX.4.33.0202081904190.18420-100000@localhost.localdomain>
+MIME-Version: 1.0
+Content-Type: TEXT/PLAIN; charset=US-ASCII
+Precedence: bulk
+Status: OR
+> > I have in the back of my mind, an idea of patching into the WAL stuff, and
+> > using that mechanism to push changes out to the slaves.
+> >
+> > Where one machine is still the master, but no trigger stuff, just a WAL patch.
+> > Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure
+> > exactly, the idea hasn't completely formed yet.
+> >
+FWIW, Sybase Replication Server does just such a thing.
+They have a secondary log marker (prevents the log from truncating past
+the oldest unreplicated transaction). A thread within the system called
+the "rep agent" (but it use to be a separate process call the LTM), reads
+the log and forwards it to the rep server, once the rep server has the
+whole transaction and it is written to a stable device (aka synced to
+disk) the rep server responds to the LTM telling him it's OK to move the
+log marker forward.
+Anyway, once the replication server proper has the transaction it uses a
+publish/subscribe methodology to see who wants get the update.
+Bidirectional replication is done by making two oneway replications. The
+whole thing is table based, it marks the tables as replicated or not in
+the database to save the trip to the repserver on un replicated tables.
+Plus you can take parts of a database (replicate all rows where the
+country is "us" to this server and all the rows with "uk" to that server).
+Or opposite you can roll up smaller regional databases to bigger ones,
+it's very flexible.
+---------------------------(end of broadcast)---------------------------
+TIP 4: Don't 'kill -9' the postmaster