From bc7fcab5e36b9597857fa7e3fa6d9ba54aaea167 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 23 Dec 2015 14:06:52 -0500
Subject: [PATCH] Read from the same worker repeatedly until it returns no
 tuple.

The original coding read tuples from workers in round-robin fashion,
but performance testing shows that it works much better to read enough
to empty one queue before moving on to the next.  I believe the
reason for this is that, with the old approach, we could easily wake
up a worker repeatedly to write only one new tuple into the shm_mq
each time.  With this approach, by the time the process gets scheduled,
it has a decent chance of being able to fill the entire buffer in
one go.

Patch by me.  Dilip Kumar helped with performance testing.
---
 src/backend/executor/nodeGather.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index f32da1e235..db5883d28e 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -359,14 +359,20 @@ gather_readnext(GatherState *gatherstate)
 			continue;
 		}
 
-		/* Advance nextreader pointer in round-robin fashion. */
-		gatherstate->nextreader =
-			(gatherstate->nextreader + 1) % gatherstate->nreaders;
-
 		/* If we got a tuple, return it. */
 		if (tup)
 			return tup;
 
+		/*
+		 * Advance nextreader pointer in round-robin fashion.  Note that we
+		 * only reach this code if we weren't able to get a tuple from the
+		 * current worker.  We used to advance the nextreader pointer after
+		 * every tuple, but it turns out to be much more efficient to keep
+		 * reading from the same queue until that would require blocking.
+		 */
+		gatherstate->nextreader =
+			(gatherstate->nextreader + 1) % gatherstate->nreaders;
+
 		/* Have we visited every TupleQueueReader? */
 		if (gatherstate->nextreader == waitpos)
 		{
-- 
2.50.1