From: Tom Lane Date: Sat, 1 Jul 2017 16:15:51 +0000 (-0400) Subject: Reduce delay for last logicalrep feedback message when master goes idle. X-Git-Tag: REL_10_BETA2~37 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=f32678c0163d7d966560bdaf41bfbc2cf179a260;p=postgresql Reduce delay for last logicalrep feedback message when master goes idle. The regression tests contain numerous cases where we do some activity on a master server and then wait till the slave has ack'd flushing its copy of that transaction. Because WAL flush on the slave is asynchronous to the logicalrep worker process, the worker cannot send such a feedback message during the LogicalRepApplyLoop iteration where it processes the last data from the master. In the previous coding, the feedback message would come out only when the loop's WaitLatchOrSocket call returned WL_TIMEOUT. That requires one full second of delay (NAPTIME_PER_CYCLE); and to add insult to injury, it could take more than that if the WaitLatchOrSocket was interrupted a few times by latch-setting events. In reality we can expect the slave's walwriter process to have flushed the WAL data after, more or less, WalWriterDelay (typically 200ms). Hence, if there are unacked transactions pending, make the wait delay only that long rather than the full NAPTIME_PER_CYCLE. Also, move one of the send_feedback() calls into the loop main line, so that we'll check for the need to send feedback even if we were woken by a latch event and not either socket data or timeout. It's not clear how much this matters for production purposes, but it's definitely helpful for testing. Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us --- diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 898c497d12..0d48dfa494 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -52,6 +52,7 @@ #include "postmaster/bgworker.h" #include "postmaster/postmaster.h" +#include "postmaster/walwriter.h" #include "replication/decode.h" #include "replication/logical.h" @@ -1027,6 +1028,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received) bool endofstream = false; TimestampTz last_recv_timestamp = GetCurrentTimestamp(); bool ping_sent = false; + long wait_time; CHECK_FOR_INTERRUPTS(); @@ -1114,11 +1116,11 @@ LogicalRepApplyLoop(XLogRecPtr last_received) len = walrcv_receive(wrconn, &buf, &fd); } - - /* confirm all writes at once */ - send_feedback(last_received, false, false); } + /* confirm all writes so far */ + send_feedback(last_received, false, false); + if (!in_remote_transaction) { /* @@ -1147,12 +1149,21 @@ LogicalRepApplyLoop(XLogRecPtr last_received) } /* - * Wait for more data or latch. + * Wait for more data or latch. If we have unflushed transactions, + * wake up after WalWriterDelay to see if they've been flushed yet (in + * which case we should send a feedback message). Otherwise, there's + * no particular urgency about waking up unless we get data or a + * signal. */ + if (!dlist_is_empty(&lsn_mapping)) + wait_time = WalWriterDelay; + else + wait_time = NAPTIME_PER_CYCLE; + rc = WaitLatchOrSocket(MyLatch, WL_SOCKET_READABLE | WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, - fd, NAPTIME_PER_CYCLE, + fd, wait_time, WAIT_EVENT_LOGICAL_APPLY_MAIN); /* Emergency bailout if postmaster has died */