From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat, 1 Jul 2017 16:15:51 +0000 (-0400)
Subject: Reduce delay for last logicalrep feedback message when master goes idle.
X-Git-Tag: REL_10_BETA2~37
X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=f32678c0163d7d966560bdaf41bfbc2cf179a260;p=postgresql

Reduce delay for last logicalrep feedback message when master goes idle.

The regression tests contain numerous cases where we do some activity on a
master server and then wait till the slave has ack'd flushing its copy of
that transaction.  Because WAL flush on the slave is asynchronous to the
logicalrep worker process, the worker cannot send such a feedback message
during the LogicalRepApplyLoop iteration where it processes the last data
from the master.  In the previous coding, the feedback message would come
out only when the loop's WaitLatchOrSocket call returned WL_TIMEOUT.  That
requires one full second of delay (NAPTIME_PER_CYCLE); and to add insult
to injury, it could take more than that if the WaitLatchOrSocket was
interrupted a few times by latch-setting events.

In reality we can expect the slave's walwriter process to have flushed the
WAL data after, more or less, WalWriterDelay (typically 200ms).  Hence,
if there are unacked transactions pending, make the wait delay only that
long rather than the full NAPTIME_PER_CYCLE.  Also, move one of the
send_feedback() calls into the loop main line, so that we'll check for the
need to send feedback even if we were woken by a latch event and not either
socket data or timeout.

It's not clear how much this matters for production purposes, but
it's definitely helpful for testing.

Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us
---

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 898c497d12..0d48dfa494 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -52,6 +52,7 @@
 
 #include "postmaster/bgworker.h"
 #include "postmaster/postmaster.h"
+#include "postmaster/walwriter.h"
 
 #include "replication/decode.h"
 #include "replication/logical.h"
@@ -1027,6 +1028,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		bool		endofstream = false;
 		TimestampTz last_recv_timestamp = GetCurrentTimestamp();
 		bool		ping_sent = false;
+		long		wait_time;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -1114,11 +1116,11 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 
 				len = walrcv_receive(wrconn, &buf, &fd);
 			}
-
-			/* confirm all writes at once */
-			send_feedback(last_received, false, false);
 		}
 
+		/* confirm all writes so far */
+		send_feedback(last_received, false, false);
+
 		if (!in_remote_transaction)
 		{
 			/*
@@ -1147,12 +1149,21 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		}
 
 		/*
-		 * Wait for more data or latch.
+		 * Wait for more data or latch.  If we have unflushed transactions,
+		 * wake up after WalWriterDelay to see if they've been flushed yet (in
+		 * which case we should send a feedback message).  Otherwise, there's
+		 * no particular urgency about waking up unless we get data or a
+		 * signal.
 		 */
+		if (!dlist_is_empty(&lsn_mapping))
+			wait_time = WalWriterDelay;
+		else
+			wait_time = NAPTIME_PER_CYCLE;
+
 		rc = WaitLatchOrSocket(MyLatch,
 							   WL_SOCKET_READABLE | WL_LATCH_SET |
 							   WL_TIMEOUT | WL_POSTMASTER_DEATH,
-							   fd, NAPTIME_PER_CYCLE,
+							   fd, wait_time,
 							   WAIT_EVENT_LOGICAL_APPLY_MAIN);
 
 		/* Emergency bailout if postmaster has died */