remote sender to push check results into the Icinga 2 `ExternalCommandListener`
feature.
-## <a id="distributed-monitoring-high-availability"></a> Distributed Monitoring and High Availability
+> **Note**
+>
+> This addon works in a similar fashion like the Icinga 1.x distributed model. If you
+> are looking for a real distributed architecture with Icinga 2, scroll down.
+
-An Icinga 2 cluster consists of two or more nodes and can reside on multiple
-architectures. The base concept of Icinga 2 is the possibility to add additional
-features using components. In case of a cluster setup you have to add the api feature
-to all nodes.
+## <a id="distributed-monitoring-high-availability"></a> Distributed Monitoring and High Availability
-An Icinga 2 cluster can be used for the following scenarios:
+Building distributed environments with high availability included is fairly easy with Icinga 2.
+The cluster feature is built-in and allows you to build many scenarios based on your requirements:
* [High Availability](#cluster-scenarios-high-availability). All instances in the `Zone` elect one active master and run as Active/Active cluster.
* [Distributed Zones](#cluster-scenarios-distributed-zones). A master zone and one or more satellites in their zones.
* [Load Distribution](#cluster-scenarios-load-distribution). A configuration master and multiple checker satellites.
+You can combine these scenarios into a global setup fitting your requirements.
+
+Each instance got their own event scheduler, and does not depend on a centralized master
+coordinating and distributing the events. In case of a cluster failure, all nodes
+continue to run independently. Be alarmed when your cluster fails and a Split-Brain-scenario
+is in effect - all alive instances continue to do their job, and history will begin to differ.
+
+> ** Note **
+>
+> Before you start, make sure to read the [requirements](#distributed-monitoring-requirements).
+
+
+### <a id="cluster-requirements"></a> Cluster Requirements
+
+Before you start deploying, keep the following things in mind:
+
+* Your [SSL CA and certificates](#certificate-authority-certificates) are mandatory for secure communication
+* Get pen and paper or a drawing board and design your nodes and zones!
+** all nodes in a cluster zone are providing high availability functionality and trust each other
+** cluster zones can be built in a Top-Down-design where the child trusts the parent
+** communication between zones happens bi-directional which means that a DMZ-located node can still reach the master node, or vice versa
+* Update firewall rules and ACLs
+* Decide whether to use the built-in [configuration syncronization](#cluster-zone-config-sync) or use an external tool (Puppet, Ansible, Chef, Salt, etc) to manage the configuration deployment
+
+
> **Tip**
>
> If you're looking for troubleshooting cluster problems, check the general
> [troubleshooting](#troubleshooting-cluster) section.
-Before you start configuring the diffent nodes it is necessary to setup the underlying
-communication layer based on SSL.
+#### <a id="cluster-naming-convention"></a> Cluster Naming Convention
+
+The SSL certificate common name (CN) will be used by the [ApiListener](#objecttype-apilistener)
+object to determine the local authority. This name must match the local [Endpoint](#objecttype-endpoint)
+object name.
+
+Example:
+
+ # icinga2-build-key icinga2a
+ ...
+ Common Name (e.g. server FQDN or YOUR name) [icinga2a]:
+
+ # vim cluster.conf
+
+ object Endpoint "icinga2a" {
+ host = "icinga2a.icinga.org"
+ }
+
+The [Endpoint](#objecttype-endpoint) name is further referenced as `endpoints` attribute on the
+[Zone](objecttype-zone) object.
+
+ object Endpoint "icinga2b" {
+ host = "icinga2b.icinga.org"
+ }
+
+ object Zone "config-ha-master" {
+ endpoints = [ "icinga2a", "icinga2b" ]
+ }
+
+Specifying the local node name using the [NodeName](#configure-nodename) variable requires
+the same name as used for the endpoint name and common name above. If not set, the FQDN is used.
+
+ const NodeName = "icinga2a"
+
### <a id="certificate-authority-certificates"></a> Certificate Authority and Certificates
Icinga 2 ships two scripts assisting with CA and node certificate creation
for your Icinga 2 cluster.
-The first step is the creation of CA running the following command:
-
- # icinga2-build-ca
+> **Note**
+>
+> You're free to use your own method to generated a valid ca and signed client
+> certificates.
Please make sure to export the environment variable `ICINGA_CA` pointing to
an empty folder for the newly created CA files:
# export ICINGA_CA="/root/icinga-ca"
+The scripts will put all generated data and the required certificates in there.
+
+The first step is the creation of the certificate authority (CA) running the
+following command:
+
+ # icinga2-build-ca
+
Now create a certificate and key file for each node running the following command
(replace `icinga2a` with the required hostname):
# icinga2-build-key icinga2a
-Repeat the step for all nodes in your cluster scenario. Save the CA key in case
-you want to set up certificates for additional nodes at a later time.
+Repeat the step for all nodes in your cluster scenario.
+
+Save the CA key in a secure location in case you want to set up certificates for
+additional nodes at a later time.
+
+Navigate to the location of your newly generated certificate files, and manually
+copy/transfer them to `/etc/icinga2/pki` in your Icinga 2 configuration folder.
+
+> **Note**
+>
+> The certificate files must be readable by the user Icinga 2 is running as. Also,
+> the private key file must not be world-readable.
Each node requires the following files in `/etc/icinga2/pki` (replace `fqdn-nodename` with
the host's FQDN):
* <fqdn-nodename>.key
+### <a id="cluster-configuration"></a> Cluster Configuration
+
+The following section describe which configuration must be updated/created
+in order to get your cluster running with basic functionality.
+
+* [configure the node name](#configure-nodename)
+* [configure the ApiListener object](#configure-apilistener-object)
+* [configure cluster endpoints](#configure-cluster-endpoints)
+* [configure cluster zones](#configure-cluster-zones)
+
+Once you're finished with the basic setup the following section will
+describe how to use [zone configuration synchronisation](#cluster-zone-config-sync)
+and configure [cluster scenarios](#cluster-scenarios).
-### <a id="configure-nodename"></a> Configure the Icinga Node Name
+#### <a id="configure-nodename"></a> Configure the Icinga Node Name
Instead of using the default FQDN as node name you can optionally set
that value using the [NodeName](#global-constants) constant.
+
+> ** Note **
+>
+> Skip this step if your FQDN already matches the default `NodeName` set
+> in `/etc/icinga2/constants.conf`.
+
This setting must be unique for each node, and must also match
the name of the local [Endpoint](#objecttype-endpoint) object and the
-SSL certificate common name.
+SSL certificate common name as described in the
+[cluster naming convention](#cluster-naming-convention).
+
+ vim /etc/icinga2/constants.conf
+ /* Our local instance name. By default this is the server's hostname as returned by `hostname --fqdn`.
+ * This should be the common name from the API certificate.
+ */
const NodeName = "icinga2a"
+
Read further about additional [naming conventions](#cluster-naming-convention).
Not specifying the node name will make Icinga 2 using the FQDN. Make sure that all
configured endpoint names and common names are in sync.
-### <a id="cluster-naming-convention"></a> Cluster Naming Convention
-
-The SSL certificate common name (CN) will be used by the [ApiListener](#objecttype-apilistener)
-object to determine the local authority. This name must match the local [Endpoint](#objecttype-endpoint)
-object name.
-
-Example:
-
- # icinga2-build-key icinga2a
- ...
- Common Name (e.g. server FQDN or YOUR name) [icinga2a]:
-
- # vim cluster.conf
-
- object Endpoint "icinga2a" {
- host = "icinga2a.icinga.org"
- }
-
-The [Endpoint](#objecttype-endpoint) name is further referenced as `endpoints` attribute on the
-[Zone](objecttype-zone) object.
-
- object Endpoint "icinga2b" {
- host = "icinga2b.icinga.org"
- }
-
- object Zone "config-ha-master" {
- endpoints = [ "icinga2a", "icinga2b" ]
- }
-
-Specifying the local node name using the [NodeName](#global-constants) variable requires
-the same name as used for the endpoint name and common name above. If not set, the FQDN is used.
-
- const NodeName = "icinga2a"
-
-
-### <a id="configure-clusterlistener-object"></a> Configure the ApiListener Object
+#### <a id="configure-apilistener-object"></a> Configure the ApiListener Object
The [ApiListener](#objecttype-apilistener) object needs to be configured on
every node in the cluster with the following settings:
> The certificate files must be readable by the user Icinga 2 is running as. Also,
> the private key file must not be world-readable.
-
-### <a id="configure-cluster-endpoints"></a> Configure Cluster Endpoints
+#### <a id="configure-cluster-endpoints"></a> Configure Cluster Endpoints
`Endpoint` objects specify the `host` and `port` settings for the cluster nodes.
This configuration can be the same on all nodes in the cluster only containing
If this endpoint object is reachable on a different port, you must configure the
`ApiListener` on the local `Endpoint` object accordingly too.
-
-### <a id="configure-cluster-zones"></a> Configure Cluster Zones
+#### <a id="configure-cluster-zones"></a> Configure Cluster Zones
`Zone` objects specify the endpoints located in a zone. That way your distributed setup can be
seen as zones connected together instead of multiple instances in that specific zone.
}
-#### <a id="cluster-zone-config-sync"></a> Zone Configuration Synchronisation
+### <a id="cluster-zone-config-sync"></a> Zone Configuration Synchronisation
By default all objects for specific zones should be organized in
> determines the required include directory. This can be overridden using the
> [global constant](#global-constants) `ZonesDir`.
-#### <a id="zone-synchronisation-permissions"></a> Global Configuration Zone
+#### <a id="zone-global-config-templates"></a> Global Configuration Zone for Templates
If your zone configuration setup shares the same templates, groups, commands, timeperiods, etc.
you would have to duplicate quite a lot of configuration objects making the merged configuration
on your configuration master unique.
+> ** Note **
+>
+> Only put templates, groups, etc into this zone. DO NOT add checkable objects such as
+> hosts or services here. If they are checked by all instances globally, this will lead
+> into duplicated check results and unclear state history. Not easy to troubleshoot too -
+> you've been warned.
+
That is not necessary by defining a global zone shipping all those templates. By setting
`global = true` you ensure that this zone serving common configuration templates will be
synchronized to all involved nodes (only if they accept configuration though).
> **Note**
>
> If the remote node does not have this zone configured, it will ignore the configuration
-> update, if it accepts configuration.
+> update, if it accepts synchronized configuration.
If you don't require any global configuration, skip this setting.
-#### <a id="zone-synchronisation-permissions"></a> Zone Configuration Permissions
+#### <a id="zone-config-sync-permissions"></a> Zone Configuration Synchronisation Permissions
Each [ApiListener](#objecttype-apilistener) object must have the `accept_config` attribute
set to `true` to receive configuration from the parent `Zone` members. Default value is `false`.
accept_config = true
}
-### <a id="initial-cluster-sync"></a> Initial Cluster Sync
+If `accept_config` is set to `false`, this instance won't accept configuration from remote
+master instances anymore.
-In order to make sure that all of your cluster nodes have the same state you will
-have to pick one of the nodes as your initial "master" and copy its state file
-to all the other nodes.
-
-You can find the state file in `/var/lib/icinga2/icinga2.state`. Before copying
-the state file you should make sure that all your cluster nodes are properly shut
-down.
+> ** Tip **
+>
+> Look into the [troubleshooting guides](#troubleshooting-cluster-config-sync) for debugging
+> problems with the configuration synchronisation.
### <a id="cluster-health-check"></a> Cluster Health Check
Example:
- apply Service "cluster" {
+ object Service "cluster" {
check_command = "cluster"
check_interval = 5s
retry_interval = 1s
- assign where host.name == "icinga2a"
+ host_name = "icinga2a"
}
Each cluster node should execute its own local cluster health check to
Example for the `checker` zone checking the connection to the `master` zone:
- apply Service "cluster-zone-master" {
+ object Service "cluster-zone-master" {
check_command = "cluster-zone"
check_interval = 5s
retry_interval = 1s
vars.cluster_zone = "master"
- assign where host.name == "icinga2b"
+ host_name = "icinga2b"
}
-### <a id="host-multiple-cluster-nodes"></a> Host With Multiple Cluster Nodes
-
-Special scenarios might require multiple cluster nodes running on a single host.
-By default Icinga 2 and its features will place their runtime data below the prefix
-`LocalStateDir`. By default packages will set that path to `/var`.
-You can either set that variable as constant configuration
-definition in [icinga2.conf](#icinga2-conf) or pass it as runtime variable to
-the Icinga 2 daemon.
-
- # icinga2 -c /etc/icinga2/node1/icinga2.conf -DLocalStateDir=/opt/node1/var
-
-### <a id="high-availability-db-ido"></a> High Availability with DB IDO
-
-All instances within the same zone (e.g. the `master` zone as HA cluster) must
-have the DB IDO feature enabled.
-
-Example DB IDO MySQL:
-
- # icinga2-enable-feature ido-mysql
- The feature 'ido-mysql' is already enabled.
-
-By default the DB IDO feature only runs on the elected zone master. All other nodes
-disable the active IDO database connection at runtime.
-
-> **Note**
->
-> The DB IDO HA feature can be disabled by setting the `enable_ha` attribute to `false`
-> for the [IdoMysqlConnection](#objecttype-idomysqlconnection) or
-> [IdoPgsqlConnection](#objecttype-idopgsqlconnection) object on all nodes in the
-> same zone.
->
-> All endpoints will enable the DB IDO feature then, connect to the configured
-> database and dump configuration, status and historical data on their own.
-
-If the instance with the active DB IDO connection dies, the HA functionality will
-re-enable the DB IDO connection on the newly elected zone master.
-
-The DB IDO feature will try to determine which cluster endpoint is currently writing
-to the database and bail out if another endpoint is active. You can manually verify that
-by running the following query:
-
- icinga=> SELECT status_update_time, endpoint_name FROM icinga_programstatus;
- status_update_time | endpoint_name
- ------------------------+---------------
- 2014-08-15 15:52:26+02 | icinga2a
- (1 Zeile)
-
-This is useful when the cluster connection between endpoints breaks, and prevents
-data duplication in split-brain-scenarios. The failover timeout can be set for the
-`failover_timeout` attribute, but not lower than 60 seconds.
-
-
### <a id="cluster-scenarios"></a> Cluster Scenarios
All cluster nodes are full-featured Icinga 2 instances. You only need to enabled
the features for their role (for example, a `Checker` node only requires the `checker`
feature enabled, but not `notification` or `ido-mysql` features).
-Each instance got their own event scheduler, and does not depend on a centralized master
-coordinating and distributing the events. In case of a cluster failure, all nodes
-continue to run independently. Be alarmed when your cluster fails and a Split-Brain-scenario
-is in effect - all alive instances continue to do their job, and history will begin to differ.
+#### <a id="cluster-scenarios-security"></a> Security in Cluster Scenarios
+
+While there are certain capabilities to ensure the safe communication between all
+nodes (firewalls, policies, software hardening, etc) the Icinga 2 cluster also provides
+additional security itself:
+
+* [SSL certificates](#certificate-authority-certificates) are mandatory for cluster communication.
+* Child zones only receive event updates (check results, commands, etc) for their configured updates.
+* Zones cannot influence/interfere other zones. Each checked object is assigned to only one zone.
+* All nodes in a zone trust each other.
+* [Configuration sync](#zone-config-sync-permissions) is disabled by default.
#### <a id="cluster-scenarios-features"></a> Features in Cluster Zones
re-schedule a check or acknowledge a problem on the master, and it gets replicated to the
actual slave checker node.
-DB IDO on the left, graphite on the right side - works.
+DB IDO on the left, graphite on the right side - works (if you disable
+[DB IDO HA](#high-availability-db-ido)).
Icinga Web 2 on the left, checker and notifications on the right side - works too.
-Everything on the left and on the right side - make sure to deal with duplicated notifications
-and automated check distribution.
-
+Everything on the left and on the right side - make sure to deal with
+[load-balanced notifications and checks](#high-availability-features) in a
+[HA zone](#cluster-scenarios-high-availability).
+configure-cluster-zones
#### <a id="cluster-scenarios-distributed-zones"></a> Distributed Zones
That scenario fits if your instances are spread over the globe and they all report
The `nuremberg-master` zone will only execute local checks, and receive
check results from the satellite nodes in the zones `berlin` and `vienna`.
-
#### <a id="cluster-scenarios-load-distribution"></a> Load Distribution
If you are planning to off-load the checks to a defined set of remote workers
global = true
}
-
-#### <a id="cluster-scenarios-high-availability"></a> High Availability
+#### <a id="cluster-scenarios-high-availability"></a> Cluster High Availability
High availability with Icinga 2 is possible by putting multiple nodes into
-a dedicated `Zone`. All nodes will elect their active master, and retry an
+a dedicated `Zone`. All nodes will elect one active master, and retry an
election once the current active master failed.
-Selected features (such as [DB IDO](#high-availability-db-ido)) will only be
-active on the current active master.
-All other passive nodes will pause the features without reload/restart.
-
+Selected features provide advanced [HA functionality](#high-availability-features).
Checks and notifications are load-balanced between nodes in the high availability
zone.
> configuration files in the `zones.d` directory. All other nodes must not
> have that directory populated. Detail in the [Configuration Sync Chapter](#cluster-zone-config-sync).
-
#### <a id="cluster-scenarios-multiple-hierachies"></a> Multiple Hierachies
Your master zone collects all check results for reporting and graphing and also
The instances in the departments will serve a local interface, and allow the administrators
to reschedule checks or acknowledge problems for their services.
+
+
+### <a id="high-availability-features"></a> High Availability for Icinga 2 features
+
+All nodes in the same zone require the same features enabled for High Availability (HA)
+amongst them.
+
+By default the following features provide advanced HA functionality:
+
+* [Checks](#high-availability-checks) (load balanced, automated failover)
+* [Notifications](#high-availability-notifications) (load balanced, automated failover)
+* DB IDO (Run-Once, automated failover)
+
+#### <a id="high-availability-checks"></a> High Availability with Checks
+
+All nodes in the same zone automatically load-balance the check execution. When one instance
+fails the other nodes will automatically take over the reamining checks.
+
+> **Note**
+>
+> If a node should not check anything, disable the `checker` feature explicitely and
+> reload Icinga 2.
+
+ # icinga2-disable-feature checker
+ # service icinga2 reload
+
+#### <a id="high-availability-notifications"></a> High Availability with Notifications
+
+Notifications are load balanced amongst all nodes in a zone. By default this functionality
+is enabled.
+If your nodes should notify independent from any other nodes (this will cause
+duplicated notifications if not properly handled!), you can set `enable_ha = false`
+in the [NotificationComponent](#objecttype-notificationcomponent) feature.
+
+#### <a id="high-availability-db-ido"></a> High Availability with DB IDO
+
+All instances within the same zone (e.g. the `master` zone as HA cluster) must
+have the DB IDO feature enabled.
+
+Example DB IDO MySQL:
+
+ # icinga2-enable-feature ido-mysql
+ The feature 'ido-mysql' is already enabled.
+
+By default the DB IDO feature only runs on the elected zone master. All other passive
+nodes disable the active IDO database connection at runtime.
+
+> **Note**
+>
+> The DB IDO HA feature can be disabled by setting the `enable_ha` attribute to `false`
+> for the [IdoMysqlConnection](#objecttype-idomysqlconnection) or
+> [IdoPgsqlConnection](#objecttype-idopgsqlconnection) object on all nodes in the
+> same zone.
+>
+> All endpoints will enable the DB IDO feature then, connect to the configured
+> database and dump configuration, status and historical data on their own.
+
+If the instance with the active DB IDO connection dies, the HA functionality will
+re-enable the DB IDO connection on the newly elected zone master.
+
+The DB IDO feature will try to determine which cluster endpoint is currently writing
+to the database and bail out if another endpoint is active. You can manually verify that
+by running the following query:
+
+ icinga=> SELECT status_update_time, endpoint_name FROM icinga_programstatus;
+ status_update_time | endpoint_name
+ ------------------------+---------------
+ 2014-08-15 15:52:26+02 | icinga2a
+ (1 Zeile)
+
+This is useful when the cluster connection between endpoints breaks, and prevents
+data duplication in split-brain-scenarios. The failover timeout can be set for the
+`failover_timeout` attribute, but not lower than 60 seconds.
+
+
+### <a id="cluster-add-node"></a> Add a new cluster endpoint
+
+These steps are required for integrating a new cluster endpoint:
+
+* generate a new [SSL client certificate](#certificate-authority-certificates)
+* identify its location in the zones
+* update the `zones.conf` file on each involved node ([endpoint](#configure-cluster-endpoints), [zones](#configure-cluster-zones))
+** a new slave zone node requires updates for the master and slave zones
+* if the node requires the existing zone history: [initial cluster sync](#initial-cluster-sync)
+* add a [cluster health check](#cluster-health-check)
+
+#### <a id="initial-cluster-sync"></a> Initial Cluster Sync
+
+In order to make sure that all of your cluster nodes have the same state you will
+have to pick one of the nodes as your initial "master" and copy its state file
+to all the other nodes.
+
+You can find the state file in `/var/lib/icinga2/icinga2.state`. Before copying
+the state file you should make sure that all your cluster nodes are properly shut
+down.
+
+
+### <a id="host-multiple-cluster-nodes"></a> Host With Multiple Cluster Nodes
+
+Special scenarios might require multiple cluster nodes running on a single host.
+By default Icinga 2 and its features will place their runtime data below the prefix
+`LocalStateDir`. By default packages will set that path to `/var`.
+You can either set that variable as constant configuration
+definition in [icinga2.conf](#icinga2-conf) or pass it as runtime variable to
+the Icinga 2 daemon.
+
+ # icinga2 -c /etc/icinga2/node1/icinga2.conf -DLocalStateDir=/opt/node1/var