]> granicus.if.org Git - postgresql/commitdiff
Block signals while computing the sleep time in postmaster's main loop.
authorAndres Freund <andres@anarazel.de>
Wed, 1 Oct 2014 12:23:43 +0000 (14:23 +0200)
committerAndres Freund <andres@anarazel.de>
Wed, 1 Oct 2014 12:28:20 +0000 (14:28 +0200)
DetermineSleepTime() was previously called without blocked
signals. That's not good, because it allows signal handlers to
interrupt its workings.

DetermineSleepTime() was added in 9.3 with the addition of background
workers (da07a1e856511), where it only read from
BackgroundWorkerList.

Since 9.4, where dynamic background workers were added (7f7485a0cde),
the list is also manipulated in DetermineSleepTime(). That's bad
because the list now can be persistently corrupted if modified by both
a signal handler and DetermineSleepTime().

This was discovered during the investigation of hangs on buildfarm
member anole. It's unclear whether this bug is the source of these
hangs or not, but it's worth fixing either way. I have confirmed that
it can cause crashes.

It luckily looks like this only can cause problems when bgworkers are
actively used.

Discussion: 20140929193733.GB14400@awork2.anarazel.de

Backpatch to 9.3 where background workers were introduced.

src/backend/postmaster/postmaster.c

index 39fa5641914d128c0e11f6d5ee79ce3026923e7b..15b0fe6d0bd519c93d9f17bef951be2297d469c2 100644 (file)
@@ -1486,6 +1486,8 @@ DetermineSleepTime(struct timeval * timeout)
 
 /*
  * Main idle loop of postmaster
+ *
+ * NB: Needs to be called with signals blocked
  */
 static int
 ServerLoop(void)
@@ -1507,34 +1509,38 @@ ServerLoop(void)
                /*
                 * Wait for a connection request to arrive.
                 *
+                * We block all signals except while sleeping. That makes it safe for
+                * signal handlers, which again block all signals while executing, to
+                * do nontrivial work.
+                *
                 * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
-                * any new connections, so we don't call select() at all; just sleep
-                * for a little bit with signals unblocked.
+                * any new connections, so we don't call select(), and just sleep.
                 */
                memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
 
-               PG_SETMASK(&UnBlockSig);
-
                if (pmState == PM_WAIT_DEAD_END)
                {
+                       PG_SETMASK(&UnBlockSig);
+
                        pg_usleep(100000L); /* 100 msec seems reasonable */
                        selres = 0;
+
+                       PG_SETMASK(&BlockSig);
                }
                else
                {
                        /* must set timeout each time; some OSes change it! */
                        struct timeval timeout;
 
+                       /* Needs to run with blocked signals! */
                        DetermineSleepTime(&timeout);
 
+                       PG_SETMASK(&UnBlockSig);
+
                        selres = select(nSockets, &rmask, NULL, NULL, &timeout);
-               }
 
-               /*
-                * Block all signals until we wait again.  (This makes it safe for our
-                * signal handlers to do nontrivial work.)
-                */
-               PG_SETMASK(&BlockSig);
+                       PG_SETMASK(&BlockSig);
+               }
 
                /* Now check the select() result */
                if (selres < 0)