]> granicus.if.org Git - postgresql/commit
Fix various concurrency issues in logical replication worker launching
authorPeter Eisentraut <peter_e@gmx.net>
Wed, 26 Apr 2017 14:43:04 +0000 (10:43 -0400)
committerPeter Eisentraut <peter_e@gmx.net>
Wed, 26 Apr 2017 14:45:59 +0000 (10:45 -0400)
commitde4389712206d2686e09ad8d6dd112dc4b6c6d42
tree4a9822ccae00a1a9b4da75420d4c95020b299355
parent309191f66a947c5b63dd348a13aafa52b5847f94
Fix various concurrency issues in logical replication worker launching

The code was originally written with assumption that launcher is the
only process starting the worker.  However that hasn't been true since
commit 7c4f52409 which failed to modify the worker management code
adequately.

This patch adds an in_use field to the LogicalRepWorker struct to
indicate whether the worker slot is being used and uses proper locking
everywhere this flag is set or read.

However if the parent process dies while the new worker is starting and
the new worker fails to attach to shared memory, this flag would never
get cleared.  We solve this rare corner case by adding a sort of garbage
collector for in_use slots.  This uses another field in the
LogicalRepWorker struct named launch_time that contains the time when
the worker was started.  If any request to start a new worker does not
find free slot, we'll check for workers that were supposed to start but
took too long to actually do so, and reuse their slot.

In passing also fix possible race conditions when stopping a worker that
hasn't finished starting yet.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
src/backend/replication/logical/launcher.c
src/include/replication/worker_internal.h