granicus.if.org Git - icinga2/blob - doc/13-distributed-monitoring-ha.md

   1 # <a id="distributed-monitoring-high-availability"></a> Distributed Monitoring and High Availability
   2
   3 Building distributed environments with high availability included is fairly easy with Icinga 2.
   4 The cluster feature is built-in and allows you to build many scenarios based on your requirements:
   5
   6 * [High Availability](13-distributed-monitoring-ha.md#cluster-scenarios-high-availability). All instances in the `Zone` run as Active/Active cluster.
   7 * [Distributed Zones](13-distributed-monitoring-ha.md#cluster-scenarios-distributed-zones). A master zone and one or more satellites in their zones.
   8 * [Load Distribution](13-distributed-monitoring-ha.md#cluster-scenarios-load-distribution). A configuration master and multiple checker satellites.
   9
  10 You can combine these scenarios into a global setup fitting your requirements.
  11
  12 Each instance got their own event scheduler, and does not depend on a centralized master
  13 coordinating and distributing the events. In case of a cluster failure, all nodes
  14 continue to run independently. Be alarmed when your cluster fails and a Split-Brain-scenario
  15 is in effect -- all alive instances continue to do their job, and history will begin to differ.
  16
  17
  18 ## <a id="cluster-requirements"></a> Cluster Requirements
  19
  20 Before you start deploying, keep the following things in mind:
  21
  22 Your [SSL CA and certificates](13-distributed-monitoring-ha.md#manual-certificate-generation) are mandatory for secure communication.
  23
  24 Communication between zones requires one of these connection directions:
  25
  26 * The parent zone nodes are able to connect to the child zone nodes (`parent => child`).
  27 * The child zone nodes are able to connect to the parent zone nodes (`parent <= child`).
  28 * Both connnection directions work.
  29
  30 Update firewall rules and ACLs.
  31
  32 * Icinga 2 master, satellite and client instances communicate using the default tcp port `5665`.
  33
  34 Get pen and paper or a drawing board and design your nodes and zones!
  35
  36 * Keep the [naming convention](13-distributed-monitoring-ha.md#cluster-naming-convention) for nodes in mind.
  37 * All nodes (endpoints) in a cluster zone provide high availability functionality and trust each other.
  38 * Cluster zones can be built in a Top-Down-design where the child trusts the parent.
  39
  40 Decide whether to use the built-in [configuration syncronization](13-distributed-monitoring-ha.md#cluster-zone-config-sync) or use an external tool (Puppet, Ansible, Chef, Salt, etc.) to manage the configuration deployment.
  41
  42
  43 > **Tip**
  44 >
  45 > If you're looking for troubleshooting cluster problems, check the general
  46 > [troubleshooting](16-troubleshooting.md#troubleshooting-cluster) section.
  47
  48 ## <a id="manual-certificate-generation"></a> Manual SSL Certificate Generation
  49
  50 Icinga 2 provides [CLI commands](8-cli-commands.md#cli-command-pki) assisting with CA
  51 and node certificate creation for your Icinga 2 distributed setup.
  52
  53 You can also use the master and client setup wizards to install the cluster nodes
  54 using CSR-Autosigning.
  55
  56 The manual steps are helpful if you want to use your own and/or existing CA (for example
  57 Puppet CA).
  58
  59 You're free to use your own method to generated a valid ca and signed client
  60 certificates.
  61
  62 The first step is the creation of the certificate authority (CA) by running the
  63 following command:
  64
  65     # icinga2 pki new-ca
  66
  67 Now create a certificate and key file for each node running the following command
  68 (replace `icinga2a` with the required hostname):
  69
  70     # icinga2 pki new-cert --cn icinga2a --key icinga2a.key --csr icinga2a.csr
  71     # icinga2 pki sign-csr --csr icinga2a.csr --cert icinga2a.crt
  72
  73 Repeat the step for all nodes in your cluster scenario.
  74
  75 Save the CA key in a secure location in case you want to set up certificates for
  76 additional nodes at a later time.
  77
  78 Navigate to the location of your newly generated certificate files, and manually
  79 copy/transfer them to `/etc/icinga2/pki` in your Icinga 2 configuration folder.
  80
  81 > **Note**
  82 >
  83 > The certificate files must be readable by the user Icinga 2 is running as. Also,
  84 > the private key file must not be world-readable.
  85
  86 Each node requires the following files in `/etc/icinga2/pki` (replace `fqdn-nodename` with
  87 the host's FQDN):
  88
  89 * ca.crt
  90 * &lt;fqdn-nodename&gt;.crt
  91 * &lt;fqdn-nodename&gt;.key
  92
  93 If you're planning to use your existing CA and certificates, please note that you *must not*
  94 use wildcard certificates. The common name (CN) is mandatory for the cluster communication and
  95 therefore must be unique for each connecting instance.
  96
  97 ## <a id="cluster-naming-convention"></a> Cluster Naming Convention
  98
  99 The SSL certificate common name (CN) will be used by the [ApiListener](6-object-types.md#objecttype-apilistener)
 100 object to determine the local authority. This name must match the local [Endpoint](6-object-types.md#objecttype-endpoint)
 101 object name.
 102
 103 Certificate generation for host with the FQDN `icinga2a`:
 104
 105     # icinga2 pki new-cert --cn icinga2a --key icinga2a.key --csr icinga2a.csr
 106     # icinga2 pki sign-csr --csr icinga2a.csr --cert icinga2a.crt
 107
 108 Add a new `Endpoint` object named `icinga2a`:
 109
 110     # vim zones.conf
 111
 112     object Endpoint "icinga2a" {
 113       host = "icinga2a.icinga.org"
 114     }
 115
 116 The [Endpoint](6-object-types.md#objecttype-endpoint) name is further referenced as `endpoints` attribute on the
 117 [Zone](6-object-types.md#objecttype-zone) object.
 118
 119     object Endpoint "icinga2b" {
 120       host = "icinga2b.icinga.org"
 121     }
 122
 123     object Zone "config-ha-master" {
 124       endpoints = [ "icinga2a", "icinga2b" ]
 125     }
 126
 127 Specifying the local node name using the [NodeName](13-distributed-monitoring-ha.md#configure-nodename) variable requires
 128 the same name as used for the endpoint name and common name above. If not set, the FQDN is used.
 129
 130     const NodeName = "icinga2a"
 131
 132 If you're using the host's FQDN everywhere, you're on the safe side. The setup wizards
 133 will do the very same.
 134
 135 ## <a id="cluster-configuration"></a> Cluster Configuration
 136
 137 The following section describe which configuration must be updated/created
 138 in order to get your cluster running with basic functionality.
 139
 140 * [configure the node name](13-distributed-monitoring-ha.md#configure-nodename)
 141 * [configure the ApiListener object](13-distributed-monitoring-ha.md#configure-apilistener-object)
 142 * [configure cluster endpoints](13-distributed-monitoring-ha.md#configure-cluster-endpoints)
 143 * [configure cluster zones](13-distributed-monitoring-ha.md#configure-cluster-zones)
 144
 145 Once you're finished with the basic setup the following section will
 146 describe how to use [zone configuration synchronisation](13-distributed-monitoring-ha.md#cluster-zone-config-sync)
 147 and configure [cluster scenarios](13-distributed-monitoring-ha.md#cluster-scenarios).
 148
 149 ### <a id="configure-nodename"></a> Configure the Icinga Node Name
 150
 151 Instead of using the default FQDN as node name you can optionally set
 152 that value using the [NodeName](18-language-reference.md#constants) constant.
 153
 154 > ** Note **
 155 >
 156 > Skip this step if your FQDN already matches the default `NodeName` set
 157 > in `/etc/icinga2/constants.conf`.
 158
 159 This setting must be unique for each node, and must also match
 160 the name of the local [Endpoint](6-object-types.md#objecttype-endpoint) object and the
 161 SSL certificate common name as described in the
 162 [cluster naming convention](13-distributed-monitoring-ha.md#cluster-naming-convention).
 163
 164     vim /etc/icinga2/constants.conf
 165
 166     /* Our local instance name. By default this is the server's hostname as returned by `hostname --fqdn`.
 167      * This should be the common name from the API certificate.
 168      */
 169     const NodeName = "icinga2a"
 170
 171
 172 Read further about additional [naming conventions](13-distributed-monitoring-ha.md#cluster-naming-convention).
 173
 174 Not specifying the node name will make Icinga 2 using the FQDN. Make sure that all
 175 configured endpoint names and common names are in sync.
 176
 177 ### <a id="configure-apilistener-object"></a> Configure the ApiListener Object
 178
 179 The [ApiListener](6-object-types.md#objecttype-apilistener) object needs to be configured on
 180 every node in the cluster with the following settings:
 181
 182 A sample config looks like:
 183
 184     object ApiListener "api" {
 185       cert_path = SysconfDir + "/icinga2/pki/" + NodeName + ".crt"
 186       key_path = SysconfDir + "/icinga2/pki/" + NodeName + ".key"
 187       ca_path = SysconfDir + "/icinga2/pki/ca.crt"
 188       accept_config = true
 189       accept_commands = true
 190     }
 191
 192 You can simply enable the `api` feature using
 193
 194     # icinga2 feature enable api
 195
 196 Edit `/etc/icinga2/features-enabled/api.conf` if you require the configuration
 197 synchronisation enabled for this node. Set the `accept_config` attribute to `true`.
 198
 199 If you want to use this node as [remote client for command execution](11-icinga2-client.md#icinga2-client-configuration-command-bridge),
 200 set the `accept_commands` attribute to `true`.
 201
 202 > **Note**
 203 >
 204 > The certificate files must be readable by the user Icinga 2 is running as. Also,
 205 > the private key file must not be world-readable.
 206
 207 ### <a id="configure-cluster-endpoints"></a> Configure Cluster Endpoints
 208
 209 `Endpoint` objects specify the `host` and `port` settings for the cluster node
 210 connection information.
 211 This configuration can be the same on all nodes in the cluster only containing
 212 connection information.
 213
 214 A sample configuration looks like:
 215
 216     /**
 217      * Configure config master endpoint
 218      */
 219
 220     object Endpoint "icinga2a" {
 221       host = "icinga2a.icinga.org"
 222     }
 223
 224 If this endpoint object is reachable on a different port, you must configure the
 225 `ApiListener` on the local `Endpoint` object accordingly too.
 226
 227 If you don't want the local instance to connect to the remote instance, remove the
 228 `host` attribute locally. Keep in mind that the configuration is now different amongst
 229 all instances and point-of-view dependant.
 230
 231 ### <a id="configure-cluster-zones"></a> Configure Cluster Zones
 232
 233 `Zone` objects specify the endpoints located in a zone. That way your distributed setup can be
 234 seen as zones connected together instead of multiple instances in that specific zone.
 235
 236 Zones can be used for [high availability](13-distributed-monitoring-ha.md#cluster-scenarios-high-availability),
 237 [distributed setups](13-distributed-monitoring-ha.md#cluster-scenarios-distributed-zones) and
 238 [load distribution](13-distributed-monitoring-ha.md#cluster-scenarios-load-distribution).
 239 Furthermore zones are used for the [Icinga 2 remote client](11-icinga2-client.md#icinga2-client).
 240
 241 Each Icinga 2 `Endpoint` must be put into its respective `Zone`. In this example, you will
 242 define the zone `config-ha-master` where the `icinga2a` and `icinga2b` endpoints
 243 are located. The `check-satellite` zone consists of `icinga2c` only, but more nodes could
 244 be added.
 245
 246 The `config-ha-master` zone acts as High-Availability setup -- the Icinga 2 instances elect
 247 one instance running a check, notification or feature (DB IDO), for example `icinga2a`. In case of
 248 failure of the `icinga2a` instance, `icinga2b` will take over automatically.
 249
 250     object Zone "config-ha-master" {
 251       endpoints = [ "icinga2a", "icinga2b" ]
 252     }
 253
 254 The `check-satellite` zone is a separated location and only sends back their checkresults to
 255 the defined parent zone `config-ha-master`.
 256
 257     object Zone "check-satellite" {
 258       endpoints = [ "icinga2c" ]
 259       parent = "config-ha-master"
 260     }
 261
 262
 263 ## <a id="cluster-zone-config-sync"></a> Zone Configuration Synchronisation
 264
 265 In case you are using the Icinga 2 API for creating, modifying and deleting objects
 266 at runtime, please continue over [here](9-icinga2-api.md#icinga2-api-config-objects-cluster-sync).
 267
 268 By default all objects for specific zones should be organized in
 269
 270     /etc/icinga2/zones.d/<zonename>
 271
 272 on the configuration master.
 273
 274 Your child zones and endpoint members **must not** have their config copied to `zones.d`.
 275 The built-in configuration synchronisation takes care of that if your nodes accept
 276 configuration from the parent zone. You can define that in the
 277 [ApiListener](13-distributed-monitoring-ha.md#configure-apilistener-object) object by configuring the `accept_config`
 278 attribute accordingly.
 279
 280 You should remove the sample config included in `conf.d` by commenting the `recursive_include`
 281 statement in [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf):
 282
 283     //include_recursive "conf.d"
 284
 285 This applies to any other non-used configuration directories as well (e.g. `repository.d`
 286 if not used).
 287
 288 Better use a dedicated directory name for local configuration like `local` or similar, and
 289 include that one if your nodes require local configuration not being synced to other nodes. That's
 290 useful for local [health checks](13-distributed-monitoring-ha.md#cluster-health-check) for example.
 291
 292 > **Note**
 293 >
 294 > In a [high availability](13-distributed-monitoring-ha.md#cluster-scenarios-high-availability)
 295 > setup only one assigned node can act as configuration master. All other zone
 296 > member nodes **must not** have the `/etc/icinga2/zones.d` directory populated.
 297
 298
 299 These zone packages are then distributed to all nodes in the same zone, and
 300 to their respective target zone instances.
 301
 302 Each configured zone must exist with the same directory name. The parent zone
 303 syncs the configuration to the child zones if allowed using the `accept_config`
 304 attribute of the [ApiListener](13-distributed-monitoring-ha.md#configure-apilistener-object) object.
 305
 306 Config on node `icinga2a`:
 307
 308     object Zone "master" {
 309       endpoints = [ "icinga2a" ]
 310     }
 311
 312     object Zone "checker" {
 313       endpoints = [ "icinga2b" ]
 314       parent = "master"
 315     }
 316
 317     /etc/icinga2/zones.d
 318       master
 319         health.conf
 320       checker
 321         health.conf
 322         demo.conf
 323
 324 Config on node `icinga2b`:
 325
 326     object Zone "master" {
 327       endpoints = [ "icinga2a" ]
 328     }
 329
 330     object Zone "checker" {
 331       endpoints = [ "icinga2b" ]
 332       parent = "master"
 333     }
 334
 335     /etc/icinga2/zones.d
 336       EMPTY_IF_CONFIG_SYNC_ENABLED
 337
 338 If the local configuration is newer than the received update, Icinga 2 will skip the synchronisation
 339 process.
 340
 341 > **Note**
 342 >
 343 > `zones.d` must not be included in [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf). Icinga 2 automatically
 344 > determines the required include directory. This can be overridden using the
 345 > [global constant](18-language-reference.md#constants) `ZonesDir`.
 346
 347 ### <a id="zone-global-config-templates"></a> Global Configuration Zone for Templates
 348
 349 If your zone configuration setup shares the same templates, groups, commands, timeperiods, etc.,
 350 you would have to duplicate quite a lot of configuration objects making the merged configuration
 351 on your configuration master unique.
 352
 353 > ** Note **
 354 >
 355 > Only put templates, groups, etc. into this zone. DO NOT add checkable objects such as
 356 > hosts or services here. If they are checked by all instances globally, this will lead
 357 > into duplicated check results and unclear state history. Not easy to troubleshoot too -
 358 > you have been warned.
 359
 360 That is not necessary by defining a global zone shipping all those templates. By setting
 361 `global = true` you ensure that this zone serving common configuration templates will be
 362 synchronized to all involved nodes (only if they accept configuration though).
 363
 364 Config on configuration master:
 365
 366     /etc/icinga2/zones.d
 367       global-templates/
 368         templates.conf
 369         groups.conf
 370       master
 371         health.conf
 372       checker
 373         health.conf
 374         demo.conf
 375
 376 In this example, the global zone is called `global-templates` and must be defined in
 377 your zone configuration visible to all nodes.
 378
 379     object Zone "global-templates" {
 380       global = true
 381     }
 382
 383 If the remote node does not have this zone configured, it will ignore the configuration
 384 update if it accepts synchronized configuration.
 385
 386 If you do not require any global configuration, skip this setting.
 387
 388 ### <a id="zone-config-sync-permissions"></a> Zone Configuration Synchronisation Permissions
 389
 390 Each [ApiListener](6-object-types.md#objecttype-apilistener) object must have the `accept_config` attribute
 391 set to `true` to receive configuration from the parent `Zone` members. Default value is `false`.
 392
 393     object ApiListener "api" {
 394       cert_path = SysconfDir + "/icinga2/pki/" + NodeName + ".crt"
 395       key_path = SysconfDir + "/icinga2/pki/" + NodeName + ".key"
 396       ca_path = SysconfDir + "/icinga2/pki/ca.crt"
 397       accept_config = true
 398     }
 399
 400 If `accept_config` is set to `false`, this instance won't accept configuration from remote
 401 master instances anymore.
 402
 403 > ** Tip **
 404 >
 405 > Look into the [troubleshooting guides](16-troubleshooting.md#troubleshooting-cluster-config-sync) for debugging
 406 > problems with the configuration synchronisation.
 407
 408
 409 ### <a id="zone-config-sync-best-practice"></a> Zone Configuration Synchronisation Best Practice
 410
 411 The configuration synchronisation works with multiple hierarchies. The following example
 412 illustrate a quite common setup where the master is reponsible for configuration deployment:
 413
 414 * [High-Availability master zone](13-distributed-monitoring-ha.md#distributed-monitoring-high-availability)
 415 * [Distributed satellites](13-distributed-monitoring-ha.md#cluster-scenarios-distributed-zones)
 416 * [Remote clients](11-icinga2-client.md#icinga2-client-scenarios) connected to the satellite
 417
 418 While you could use the clients with local configuration and service discovery on the satellite/master
 419 **bottom up**, the configuration sync could be more reasonable working **top-down** in a cascaded scenario.
 420
 421 Take pen and paper and draw your network scenario including the involved zone and endpoint names.
 422 Once you've added them to your zones.conf as connection and permission configuration, start over with
 423 the actual configuration organization:
 424
 425 * Ensure that `command` object definitions are globally available. That way you can use the
 426 `command_endpoint` configuration more easily on clients as [command execution bridge](11-icinga2-client.md#icinga2-client-configuration-command-bridge)
 427 * Generic `Templates`, `timeperiods`, `downtimes` should be synchronized in a global zone as well.
 428 * [Apply rules](3-monitoring-basics.md#using-apply) can be synchronized globally. Keep in mind that they are evaluated on each instance,
 429 and might require additional filters (e.g. `match("icinga2*", NodeName) or similar based on the zone information.
 430 * Host configuration must be put into the specific zone directory.
 431 * Duplicated host and service objects (also generated by apply rules) will generate a configuration error.
 432 * Consider using custom constants in your host/service configuration. Each instance may set their local value, e.g. for `PluginDir`.
 433
 434 This example specifies the following hierarchy over three levels:
 435
 436 * `ha-master` zone with two child zones `dmz1-checker` and `dmz2-checker`
 437 * `dmz1-checker` has two client child zones `dmz1-client1` and `dmz1-client2`
 438 * `dmz2-checker` has one client child zone `dmz2-client9`
 439
 440 The configuration tree could look like this:
 441
 442     # tree /etc/icinga2/zones.d
 443     /etc/icinga2/zones.d
 444     ├── dmz1-checker
 445     │   └── health.conf
 446     ├── dmz1-client1
 447     │   └── hosts.conf
 448     ├── dmz1-client2
 449     │   └── hosts.conf
 450     ├── dmz2-checker
 451     │   └── health.conf
 452     ├── dmz2-client9
 453     │   └── hosts.conf
 454     ├── global-templates
 455     │   ├── apply_notifications.conf
 456     │   ├── apply_services.conf
 457     │   ├── commands.conf
 458     │   ├── groups.conf
 459     │   ├── templates.conf
 460     │   └── users.conf
 461     ├── ha-master
 462     │   └── health.conf
 463     └── README
 464
 465     7 directories, 13 files
 466
 467 If you prefer a different naming schema for directories or file names, go for it. If you
 468 are unsure about the best method, join the [support channels](1-about.md#support) and discuss
 469 with the community.
 470
 471 If you are planning to synchronize local service health checks inside a zone, look into the
 472 [command endpoint](13-distributed-monitoring-ha.md#cluster-health-check-command-endpoint)
 473 explainations.
 474
 475 [Apply rules](3-monitoring-basics.md#using-apply) in zone directories underneath `zones.d`
 476 also match against objects defined outside of that particular zone directory.
 477
 478 To work around this issue you can use an `assign where` rule to limit the apply rule to
 479 a specific zone:
 480
 481     assign where host.zone == "dmz1-checker"
 482
 483 ## <a id="cluster-health-check"></a> Cluster Health Check
 484
 485 The Icinga 2 [ITL](7-icinga-template-library.md#icinga-template-library) provides
 486 an internal check command checking all configured `EndPoints` in the cluster setup.
 487 The check result will become critical if one or more configured nodes are not connected.
 488
 489 Example:
 490
 491     object Host "icinga2a" {
 492       display_name = "Health Checks on icinga2a"
 493
 494       address = "192.168.33.10"
 495       check_command = "hostalive"
 496     }
 497
 498     object Service "cluster" {
 499         check_command = "cluster"
 500         check_interval = 5s
 501         retry_interval = 1s
 502
 503         host_name = "icinga2a"
 504     }
 505
 506 Each cluster node should execute its own local cluster health check to
 507 get an idea about network related connection problems from different
 508 points of view.
 509
 510 Additionally you can monitor the connection from the local zone to the remote
 511 connected zones.
 512
 513 Example for the `checker` zone checking the connection to the `master` zone:
 514
 515     object Service "cluster-zone-master" {
 516       check_command = "cluster-zone"
 517       check_interval = 5s
 518       retry_interval = 1s
 519       vars.cluster_zone = "master"
 520
 521       host_name = "icinga2b"
 522     }
 523
 524 ## <a id="cluster-health-check-command-endpoint"></a> Cluster Health Check with Command Endpoints
 525
 526 If you are planning to sync the zone configuration inside a [High-Availability]()
 527 cluster zone, you can also use the `command_endpoint` object attribute to
 528 pin host/service checks to a specific endpoint inside the same zone.
 529
 530 This requires the `accept_commands` setting inside the [ApiListener](13-distributed-monitoring-ha.md#configure-apilistener-object)
 531 object set to `true` similar to the [remote client command execution bridge](11-icinga2-client.md#icinga2-client-configuration-command-bridge)
 532 setup.
 533
 534 Make sure to set `command_endpoint` to the correct endpoint instance.
 535 The example below assumes that the endpoint name is the same as the
 536 host name configured for health checks. If it differs, define a host
 537 custom attribute providing [this information](11-icinga2-client.md#icinga2-client-configuration-command-bridge-master-config).
 538
 539     apply Service "cluster-ha" {
 540       check_command = "cluster"
 541       check_interval = 5s
 542       retry_interval = 1s
 543       /* make sure host.name is the same as endpoint name */
 544       command_endpoint = host.name
 545
 546       assign where regex("^icinga2[a|b]", host.name)
 547     }
 548
 549
 550 ## <a id="cluster-scenarios"></a> Cluster Scenarios
 551
 552 All cluster nodes are full-featured Icinga 2 instances. You only need to enabled
 553 the features for their role (for example, a `Checker` node only requires the `checker`
 554 feature enabled, but not `notification` or `ido-mysql` features).
 555
 556 > **Tip**
 557 >
 558 > There's a [Vagrant demo setup](https://github.com/Icinga/icinga-vagrant/tree/master/icinga2x-cluster)
 559 > available featuring a two node cluster showcasing several aspects (config sync,
 560 > remote command execution, etc.).
 561
 562 ### <a id="cluster-scenarios-master-satellite-clients"></a> Cluster with Master, Satellites and Remote Clients
 563
 564 You can combine "classic" cluster scenarios from HA to Master-Checker with the
 565 Icinga 2 Remote Client modes. Each instance plays a certain role in that picture.
 566
 567 Imagine the following scenario:
 568
 569 * The master zone acts as High-Availability zone
 570 * Remote satellite zones execute local checks and report them to the master
 571 * All satellites query remote clients and receive check results (which they also replay to the master)
 572 * All involved nodes share the same configuration logic: zones, endpoints, apilisteners
 573
 574 You'll need to think about the following:
 575
 576 * Deploy the entire configuration from the master to satellites and cascading remote clients? ("top down")
 577 * Use local client configuration instead and report the inventory to satellites and cascading to the master? ("bottom up")
 578 * Combine that with command execution bridges on remote clients and also satellites
 579
 580 In case you want to use [CSR Auto-Signing](11-icinga2-client.md#csr-autosigning-requirements) in
 581 a three level cluster you'll need to ensure that the clients can connect to the master node once.
 582 The setup wizard can still be configured to connect to the satellite node following the example
 583 below.
 584
 585     # icinga2 node wizard
 586     ...
 587     Please specify the master endpoint(s) this node should connect to:
 588     Master Common Name (CN from your master setup): icinga2-satellite1.localdomain
 589     Please fill out the master connection information:
 590     Master endpoint host (optional, your master's IP address or FQDN): icinga2-satellite1.localdomain
 591     ...
 592     Please specify the master connection for CSR auto-signing (defaults to master endpoint host):
 593     Host [icinga2-satellite1.localdomain]: icinga2-master1.localdomain
 594
 595 Alternatively you can copy the CA director from your master in `/var/lib/icinga2/ca` to your satellites
 596 and connect to them using the client setup wizards.
 597
 598
 599 ### <a id="cluster-scenarios-security"></a> Security in Cluster Scenarios
 600
 601 While there are certain capabilities to ensure the safe communication between all
 602 nodes (firewalls, policies, software hardening, etc.) the Icinga 2 cluster also provides
 603 additional security itself:
 604
 605 * [SSL certificates](13-distributed-monitoring-ha.md#manual-certificate-generation) are mandatory for cluster communication.
 606 * Child zones only receive event updates (check results, commands, etc.) for their configured updates.
 607 * Zones cannot influence/interfere other zones. Each checked object is assigned to only one zone.
 608 * All nodes in a zone trust each other.
 609 * [Configuration sync](13-distributed-monitoring-ha.md#zone-config-sync-permissions) is disabled by default.
 610
 611 ### <a id="cluster-scenarios-features"></a> Features in Cluster Zones
 612
 613 Each cluster zone may use all available features. If you have multiple locations
 614 or departments, they may write to their local database, or populate graphite.
 615 Even further all commands are distributed amongst connected nodes. For example, you could
 616 re-schedule a check or acknowledge a problem on the master, and it gets replicated to the
 617 actual slave checker node.
 618
 619 > **Note**
 620 >
 621 > All features must be same on all endpoints inside an [HA zone](13-distributed-monitoring-ha.md#cluster-scenarios-high-availability).
 622 > There are additional [High-Availability-enabled features](13-distributed-monitoring-ha.md#high-availability-features) available.
 623
 624 ### <a id="cluster-scenarios-distributed-zones"></a> Distributed Zones
 625
 626 That scenario fits if your instances are spread over the globe and they all report
 627 to a master instance. Their network connection only works towards the master master
 628 (or the master is able to connect, depending on firewall policies) which means
 629 remote instances won't see each/connect to each other.
 630
 631 All events (check results, downtimes, comments, etc.) are synced to the master node,
 632 but the remote nodes can still run local features such as a web interface, reporting,
 633 graphing, etc. in their own specified zone.
 634
 635 Imagine the following example with a master node in Nuremberg, and two remote DMZ
 636 based instances in Berlin and Vienna. Additonally you'll specify
 637 [global templates](13-distributed-monitoring-ha.md#zone-global-config-templates) available in all zones.
 638
 639 The configuration tree on the master instance `nuremberg` could look like this:
 640
 641     zones.d
 642       global-templates/
 643         templates.conf
 644         groups.conf
 645       nuremberg/
 646         local.conf
 647       berlin/
 648         hosts.conf
 649       vienna/
 650         hosts.conf
 651
 652 The configuration deployment will take care of automatically synchronising
 653 the child zone configuration:
 654
 655 * The master node sends `zones.d/berlin` to the `berlin` child zone.
 656 * The master node sends `zones.d/vienna` to the `vienna` child zone.
 657 * The master node sends `zones.d/global-templates` to the `vienna` and `berlin` child zones.
 658
 659 The endpoint configuration would look like:
 660
 661     object Endpoint "nuremberg-master" {
 662       host = "nuremberg.icinga.org"
 663     }
 664
 665     object Endpoint "berlin-satellite" {
 666       host = "berlin.icinga.org"
 667     }
 668
 669     object Endpoint "vienna-satellite" {
 670       host = "vienna.icinga.org"
 671     }
 672
 673 The zones would look like:
 674
 675     object Zone "nuremberg" {
 676       endpoints = [ "nuremberg-master" ]
 677     }
 678
 679     object Zone "berlin" {
 680       endpoints = [ "berlin-satellite" ]
 681       parent = "nuremberg"
 682     }
 683
 684     object Zone "vienna" {
 685       endpoints = [ "vienna-satellite" ]
 686       parent = "nuremberg"
 687     }
 688
 689     object Zone "global-templates" {
 690       global = true
 691     }
 692
 693 The `nuremberg-master` zone will only execute local checks, and receive
 694 check results from the satellite nodes in the zones `berlin` and `vienna`.
 695
 696 > **Note**
 697 >
 698 > The child zones `berlin` and `vienna` will get their configuration synchronised
 699 > from the configuration master 'nuremberg'. The endpoints in the child
 700 > zones **must not** have their `zones.d` directory populated if this endpoint
 701 > [accepts synced configuration](13-distributed-monitoring-ha.md#zone-config-sync-permissions).
 702
 703 ### <a id="cluster-scenarios-load-distribution"></a> Load Distribution
 704
 705 If you are planning to off-load the checks to a defined set of remote workers,
 706 you can achieve that by:
 707
 708 * Deploying the configuration on all nodes.
 709 * Let Icinga 2 distribute the load amongst all available nodes.
 710
 711 That way all remote check instances will receive the same configuration
 712 but only execute their part. The master instance located in the `master` zone
 713 can also execute checks, but you may also disable the `Checker` feature.
 714
 715 Configuration on the master node:
 716
 717     zones.d/
 718       global-templates/
 719       master/
 720       checker/
 721
 722 If you are planning to have some checks executed by a specific set of checker nodes,
 723 you have to define additional zones and define these check objects there.
 724
 725 Endpoints:
 726
 727     object Endpoint "master-node" {
 728       host = "master.icinga.org"
 729     }
 730
 731     object Endpoint "checker1-node" {
 732       host = "checker1.icinga.org"
 733     }
 734
 735     object Endpoint "checker2-node" {
 736       host = "checker2.icinga.org"
 737     }
 738
 739
 740 Zones:
 741
 742     object Zone "master" {
 743       endpoints = [ "master-node" ]
 744     }
 745
 746     object Zone "checker" {
 747       endpoints = [ "checker1-node", "checker2-node" ]
 748       parent = "master"
 749     }
 750
 751     object Zone "global-templates" {
 752       global = true
 753     }
 754
 755 > **Note**
 756 >
 757 > The child zones `checker` will get its configuration synchronised
 758 > from the configuration master 'master'. The endpoints in the child
 759 > zone **must not** have their `zones.d` directory populated if this endpoint
 760 > [accepts synced configuration](13-distributed-monitoring-ha.md#zone-config-sync-permissions).
 761
 762 ### <a id="cluster-scenarios-high-availability"></a> Cluster High Availability
 763
 764 High availability with Icinga 2 is possible by putting multiple nodes into
 765 a dedicated [zone](13-distributed-monitoring-ha.md#configure-cluster-zones). All nodes will elect one
 766 active master, and retry an election once the current active master is down.
 767
 768 Selected features provide advanced [HA functionality](13-distributed-monitoring-ha.md#high-availability-features).
 769 Checks and notifications are load-balanced between nodes in the high availability
 770 zone.
 771
 772 Connections from other zones will be accepted by all active and passive nodes
 773 but all are forwarded to the current active master dealing with the check results,
 774 commands, etc.
 775
 776     object Zone "config-ha-master" {
 777       endpoints = [ "icinga2a", "icinga2b", "icinga2c" ]
 778     }
 779
 780 Two or more nodes in a high availability setup require an [initial cluster sync](13-distributed-monitoring-ha.md#initial-cluster-sync).
 781
 782 > **Note**
 783 >
 784 > Keep in mind that **only one node acts as configuration master** having the
 785 > configuration files in the `zones.d` directory. All other nodes **must not**
 786 > have that directory populated. Instead they are required to
 787 > [accept synced configuration](13-distributed-monitoring-ha.md#zone-config-sync-permissions).
 788 > Details in the [Configuration Sync Chapter](13-distributed-monitoring-ha.md#cluster-zone-config-sync).
 789
 790 ### <a id="cluster-scenarios-multiple-hierarchies"></a> Multiple Hierarchies
 791
 792 Your master zone collects all check results for reporting and graphing and also
 793 does some sort of additional notifications.
 794 The customers got their own instances in their local DMZ zones. They are limited to read/write
 795 only their services, but replicate all events back to the master instance.
 796 Within each DMZ there are additional check instances also serving interfaces for local
 797 departments. The customers instances will collect all results, but also send them back to
 798 your master instance.
 799 Additionally the customers instance on the second level in the middle prohibits you from
 800 sending commands to the subjacent department nodes. You're only allowed to receive the
 801 results, and a subset of each customers configuration too.
 802
 803 Your master zone will generate global reports, aggregate alert notifications, and check
 804 additional dependencies (for example, the customers internet uplink and bandwidth usage).
 805
 806 The customers zone instances will only check a subset of local services and delegate the rest
 807 to each department. Even though it acts as configuration master with a master dashboard
 808 for all departments managing their configuration tree which is then deployed to all
 809 department instances. Furthermore the master NOC is able to see what's going on.
 810
 811 The instances in the departments will serve a local interface, and allow the administrators
 812 to reschedule checks or acknowledge problems for their services.
 813
 814
 815 ## <a id="high-availability-features"></a> High Availability for Icinga 2 features
 816
 817 All nodes in the same zone require the same features enabled for High Availability (HA)
 818 amongst them.
 819
 820 By default the following features provide advanced HA functionality:
 821
 822 * [Checks](13-distributed-monitoring-ha.md#high-availability-checks) (load balanced, automated failover)
 823 * [Notifications](13-distributed-monitoring-ha.md#high-availability-notifications) (load balanced, automated failover)
 824 * [DB IDO](13-distributed-monitoring-ha.md#high-availability-db-ido) (Run-Once, automated failover)
 825
 826 ### <a id="high-availability-checks"></a> High Availability with Checks
 827
 828 All instances within the same zone (e.g. the `master` zone as HA cluster) must
 829 have the `checker` feature enabled.
 830
 831 Example:
 832
 833     # icinga2 feature enable checker
 834
 835 All nodes in the same zone load-balance the check execution. When one instance shuts down
 836 the other nodes will automatically take over the reamining checks.
 837
 838 ### <a id="high-availability-notifications"></a> High Availability with Notifications
 839
 840 All instances within the same zone (e.g. the `master` zone as HA cluster) must
 841 have the `notification` feature enabled.
 842
 843 Example:
 844
 845     # icinga2 feature enable notification
 846
 847 Notifications are load balanced amongst all nodes in a zone. By default this functionality
 848 is enabled.
 849 If your nodes should notify independent from any other nodes (this will cause
 850 duplicated notifications if not properly handled!), you can set `enable_ha = false`
 851 in the [NotificationComponent](6-object-types.md#objecttype-notificationcomponent) feature.
 852
 853 ### <a id="high-availability-db-ido"></a> High Availability with DB IDO
 854
 855 All instances within the same zone (e.g. the `master` zone as HA cluster) must
 856 have the DB IDO feature enabled.
 857
 858 Example DB IDO MySQL:
 859
 860     # icinga2 feature enable ido-mysql
 861
 862 By default the DB IDO feature only runs on one node. All other nodes in the same zone disable
 863 the active IDO database connection at runtime. The node with the active DB IDO connection is
 864 not necessarily the zone master.
 865
 866 > **Note**
 867 >
 868 > The DB IDO HA feature can be disabled by setting the `enable_ha` attribute to `false`
 869 > for the [IdoMysqlConnection](6-object-types.md#objecttype-idomysqlconnection) or
 870 > [IdoPgsqlConnection](6-object-types.md#objecttype-idopgsqlconnection) object on **all** nodes in the
 871 > **same** zone.
 872 >
 873 > All endpoints will enable the DB IDO feature and connect to the configured
 874 > database and dump configuration, status and historical data on their own.
 875
 876 If the instance with the active DB IDO connection dies, the HA functionality will
 877 automatically elect a new DB IDO master.
 878
 879 The DB IDO feature will try to determine which cluster endpoint is currently writing
 880 to the database and bail out if another endpoint is active. You can manually verify that
 881 by running the following query:
 882
 883     icinga=> SELECT status_update_time, endpoint_name FROM icinga_programstatus;
 884        status_update_time   | endpoint_name
 885     ------------------------+---------------
 886      2014-08-15 15:52:26+02 | icinga2a
 887     (1 Zeile)
 888
 889 This is useful when the cluster connection between endpoints breaks, and prevents
 890 data duplication in split-brain-scenarios. The failover timeout can be set for the
 891 `failover_timeout` attribute, but not lower than 60 seconds.
 892
 893
 894 ## <a id="cluster-add-node"></a> Add a new cluster endpoint
 895
 896 These steps are required for integrating a new cluster endpoint:
 897
 898 * generate a new [SSL client certificate](13-distributed-monitoring-ha.md#manual-certificate-generation)
 899 * identify its location in the zones
 900 * update the `zones.conf` file on each involved node ([endpoint](13-distributed-monitoring-ha.md#configure-cluster-endpoints), [zones](13-distributed-monitoring-ha.md#configure-cluster-zones))
 901     * a new slave zone node requires updates for the master and slave zones
 902     * verify if this endpoints requires [configuration synchronisation](13-distributed-monitoring-ha.md#cluster-zone-config-sync) enabled
 903 * if the node requires the existing zone history: [initial cluster sync](13-distributed-monitoring-ha.md#initial-cluster-sync)
 904 * add a [cluster health check](13-distributed-monitoring-ha.md#cluster-health-check)
 905
 906 ### <a id="initial-cluster-sync"></a> Initial Cluster Sync
 907
 908 In order to make sure that all of your cluster nodes have the same state you will
 909 have to pick one of the nodes as your initial "master" and copy its state file
 910 to all the other nodes.
 911
 912 You can find the state file in `/var/lib/icinga2/icinga2.state`. Before copying
 913 the state file you should make sure that all your cluster nodes are properly shut
 914 down.
 915
 916
 917 ## <a id="host-multiple-cluster-nodes"></a> Host With Multiple Cluster Nodes
 918
 919 Special scenarios might require multiple cluster nodes running on a single host.
 920 By default Icinga 2 and its features will place their runtime data below the prefix
 921 `LocalStateDir`. By default packages will set that path to `/var`.
 922 You can either set that variable as constant configuration
 923 definition in [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) or pass it as runtime variable to
 924 the Icinga 2 daemon.
 925
 926     # icinga2 -c /etc/icinga2/node1/icinga2.conf -DLocalStateDir=/opt/node1/var