Test results from buildfarm members mastodon/narwhal (Windows Server 2003)
make it look like that platform just plain loses FD_READ events
occasionally, and the only reason our previous coding seemed to work was
that it timed out every couple of seconds and retried the whole operation.
Try to verify this by reinserting a finite timeout into the pgstat loop.
This isn't meant to be a permanent patch either, just to confirm or
disprove a theory.
elog(LOG, "pgstat: waiting");
/* Sleep until there's something to do */
+ /* XXX should not need a timeout here */
wr = WaitLatchOrSocket(&pgStatLatch,
- WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_SOCKET_READABLE,
+ WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_SOCKET_READABLE | WL_TIMEOUT,
pgStatSock,
- -1L);
+ 2000L);
elog(LOG, "pgstat: wait result 0x%x", wr);