From f332c89f3b5c9cae347be134a2a9bc480816517f Mon Sep 17 00:00:00 2001 From: Michael Friedrich Date: Fri, 1 Aug 2014 16:47:14 +0200 Subject: [PATCH] Documentation: Basic cluster troubleshooting guide Partly refs #6703 --- doc/4-monitoring-remote-systems.md | 5 +++ doc/7-troubleshooting.md | 51 +++++++++++++++++++++++++++++- 2 files changed, 55 insertions(+), 1 deletion(-) diff --git a/doc/4-monitoring-remote-systems.md b/doc/4-monitoring-remote-systems.md index 8e4f90874..e1ed90dd2 100644 --- a/doc/4-monitoring-remote-systems.md +++ b/doc/4-monitoring-remote-systems.md @@ -157,6 +157,11 @@ An Icinga 2 cluster can be used for the following scenarios: * [Distributed Zones](#cluster-scenarios-distributed-zones). A master zone and one or more satellites in their zones. * [Load Distribution](#cluster-scenarios-load-distribution). A configuration master and multiple checker satellites. +> **Tip** +> +> If you're looking for troubleshooting cluster problems, check the general +> [troubleshooting](#troubleshooting-cluster) section. + Before you start configuring the diffent nodes it is necessary to setup the underlying communication layer based on SSL. diff --git a/doc/7-troubleshooting.md b/doc/7-troubleshooting.md index 91a388e03..e9eb142a7 100644 --- a/doc/7-troubleshooting.md +++ b/doc/7-troubleshooting.md @@ -97,6 +97,56 @@ You should add your own command definitions to a new file in `conf.d/` called `c or similar. +## Cluster Troubleshooting + +You should configure the [cluster health checks](#cluster-health-check) if you haven't +done so already. + +> **Note** +> +> Some problems just exist due to wrong file permissions or packet filters applied. Make +> sure to check these in the first place. + +### Cluster Troubleshooting Connection Errors + +General connection errors normally lead you to one of the following problems: + +* Wrong network configuration +* Packet loss on the connection +* Firewall rules preventing traffic + +Use tools like `netstat`, `tcpdump`, `nmap`, etc to make sure that the cluster communication +happens (default port is `5665`). + + # tcpdump -n port 5665 -i any + + # netstat -tulpen | grep icinga + + # nmap yourclusternode.localdomain + +### Cluster Troubleshooting SSL Errors + +If the cluster communication fails with cryptic SSL error messages, make sure to check +the following + +* File permissions on the SSL certificate files + + # ls -la /etc/icinga2/pki + +* Does the used CA match for all cluster endpoints? + + +### Cluster Troubleshooting Message Errors + +At some point, when the network connection is broken or gone, the Icinga 2 instances +will be disconnected. If the connection can't be re-established between zones and endpoints, +they remain in a Split-Brain-mode and history may differ. + +Although the Icinga 2 cluster protocol stores historical events in a replay log for later synchronisation, +you should make sure to check why the network connection failed. + + + ## Debug Icinga 2 Make sure that the debug symbols are available for Icinga 2. @@ -177,4 +227,3 @@ afterwards. If you want to delete all breakpoints, use `d` and select `yes`. (gdb) d - -- 2.50.1