]> granicus.if.org Git - postgresql/commit
Prevent panic during shutdown checkpoint
authorPeter Eisentraut <peter_e@gmx.net>
Mon, 1 May 2017 19:09:06 +0000 (15:09 -0400)
committerPeter Eisentraut <peter_e@gmx.net>
Fri, 5 May 2017 14:31:42 +0000 (10:31 -0400)
commit086221cf6b1727c2baed4703c582f657b7c5350e
treef8f720fbef69d0ecaaad9dbbf3db25efe01fa4ec
parent499ae5f5db99c84035e9951fd30e428adf0f40d2
Prevent panic during shutdown checkpoint

When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and throws
a PANIC if so.  At that point, only walsenders are still active, so one
might think this could not happen, but walsenders can also generate WAL,
for instance in BASE_BACKUP and certain variants of
CREATE_REPLICATION_SLOT.  So they can trigger this panic if such a
command is run while the shutdown checkpoint is being written.

To fix this, divide the walsender shutdown into two phases.  First, the
postmaster sends a SIGUSR2 signal to all walsenders.  The walsenders
then put themselves into the "stopping" state.  In this state, they
reject any new commands.  (For simplicity, we reject all new commands,
so that in the future we do not have to track meticulously which
commands might generate WAL.)  The checkpointer waits for all walsenders
to reach this state before proceeding with the shutdown checkpoint.
After the shutdown checkpoint is done, the postmaster sends
SIGINT (previously unused) to the walsenders.  This triggers the
existing shutdown behavior of sending out the shutdown checkpoint record
and then terminating.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
doc/src/sgml/monitoring.sgml
src/backend/access/transam/xlog.c
src/backend/postmaster/postmaster.c
src/backend/replication/walsender.c
src/include/replication/walsender.h
src/include/replication/walsender_private.h