granicus.if.org Git - icinga2/blob - doc/12-distributed-monitoring-ha.md

   1 # <a id="distributed-monitoring-high-availability"></a> Distributed Monitoring and High Availability
   2
   3 Building distributed environments with high availability included is fairly easy with Icinga 2.
   4 The cluster feature is built-in and allows you to build many scenarios based on your requirements:
   5
   6 * [High Availability](12-distributed-monitoring-ha.md#cluster-scenarios-high-availability). All instances in the `Zone` run as Active/Active cluster.
   7 * [Distributed Zones](12-distributed-monitoring-ha.md#cluster-scenarios-distributed-zones). A master zone and one or more satellites in their zones.
   8 * [Load Distribution](12-distributed-monitoring-ha.md#cluster-scenarios-load-distribution). A configuration master and multiple checker satellites.
   9
  10 You can combine these scenarios into a global setup fitting your requirements.
  11
  12 Each instance got their own event scheduler, and does not depend on a centralized master
  13 coordinating and distributing the events. In case of a cluster failure, all nodes
  14 continue to run independently. Be alarmed when your cluster fails and a Split-Brain-scenario
  15 is in effect - all alive instances continue to do their job, and history will begin to differ.
  16
  17
  18 ## <a id="cluster-requirements"></a> Cluster Requirements
  19
  20 Before you start deploying, keep the following things in mind:
  21
  22 * Your [SSL CA and certificates](12-distributed-monitoring-ha.md#manual-certificate-generation) are mandatory for secure communication
  23 * Get pen and paper or a drawing board and design your nodes and zones!
  24     * all nodes in a cluster zone are providing high availability functionality and trust each other
  25     * cluster zones can be built in a Top-Down-design where the child trusts the parent
  26     * communication between zones happens bi-directional which means that a DMZ-located node can still reach the master node, or vice versa
  27 * Update firewall rules and ACLs
  28 * Decide whether to use the built-in [configuration syncronization](12-distributed-monitoring-ha.md#cluster-zone-config-sync) or use an external tool (Puppet, Ansible, Chef, Salt, etc) to manage the configuration deployment
  29
  30
  31 > **Tip**
  32 >
  33 > If you're looking for troubleshooting cluster problems, check the general
  34 > [troubleshooting](16-troubleshooting.md#troubleshooting-cluster) section.
  35
  36 ## <a id="manual-certificate-generation"></a> Manual SSL Certificate Generation
  37
  38 Icinga 2 provides [CLI commands](8-cli-commands.md#cli-command-pki) assisting with CA
  39 and node certificate creation for your Icinga 2 distributed setup.
  40
  41 > **Tip**
  42 >
  43 > You can also use the master and client setup wizards to install the cluster nodes
  44 > using CSR-Autosigning.
  45 >
  46 > The manual steps are helpful if you want to use your own and/or existing CA (for example
  47 > Puppet CA).
  48
  49 > **Note**
  50 >
  51 > You're free to use your own method to generated a valid ca and signed client
  52 > certificates.
  53
  54 The first step is the creation of the certificate authority (CA) by running the
  55 following command:
  56
  57     # icinga2 pki new-ca
  58
  59 Now create a certificate and key file for each node running the following command
  60 (replace `icinga2a` with the required hostname):
  61
  62     # icinga2 pki new-cert --cn icinga2a --key icinga2a.key --csr icinga2a.csr
  63     # icinga2 pki sign-csr --csr icinga2a.csr --cert icinga2a.crt
  64
  65 Repeat the step for all nodes in your cluster scenario.
  66
  67 Save the CA key in a secure location in case you want to set up certificates for
  68 additional nodes at a later time.
  69
  70 Navigate to the location of your newly generated certificate files, and manually
  71 copy/transfer them to `/etc/icinga2/pki` in your Icinga 2 configuration folder.
  72
  73 > **Note**
  74 >
  75 > The certificate files must be readable by the user Icinga 2 is running as. Also,
  76 > the private key file must not be world-readable.
  77
  78 Each node requires the following files in `/etc/icinga2/pki` (replace `fqdn-nodename` with
  79 the host's FQDN):
  80
  81 * ca.crt
  82 * &lt;fqdn-nodename&gt;.crt
  83 * &lt;fqdn-nodename&gt;.key
  84
  85 If you're planning to use your existing CA and certificates please note that you *must not*
  86 use wildcard certificates. The common name (CN) is mandatory for the cluster communication and
  87 therefore must be unique for each connecting instance.
  88
  89 ### <a id="cluster-naming-convention"></a> Cluster Naming Convention
  90
  91 The SSL certificate common name (CN) will be used by the [ApiListener](6-object-types.md#objecttype-apilistener)
  92 object to determine the local authority. This name must match the local [Endpoint](6-object-types.md#objecttype-endpoint)
  93 object name.
  94
  95 Example:
  96
  97     # icinga2 pki new-cert --cn icinga2a --key icinga2a.key --csr icinga2a.csr
  98     # icinga2 pki sign-csr --csr icinga2a.csr --cert icinga2a.crt
  99
 100     # vim zones.conf
 101
 102     object Endpoint "icinga2a" {
 103       host = "icinga2a.icinga.org"
 104     }
 105
 106 The [Endpoint](6-object-types.md#objecttype-endpoint) name is further referenced as `endpoints` attribute on the
 107 [Zone](6-object-types.md#objecttype-zone) object.
 108
 109     object Endpoint "icinga2b" {
 110       host = "icinga2b.icinga.org"
 111     }
 112
 113     object Zone "config-ha-master" {
 114       endpoints = [ "icinga2a", "icinga2b" ]
 115     }
 116
 117 Specifying the local node name using the [NodeName](12-distributed-monitoring-ha.md#configure-nodename) variable requires
 118 the same name as used for the endpoint name and common name above. If not set, the FQDN is used.
 119
 120     const NodeName = "icinga2a"
 121
 122
 123 ## <a id="cluster-configuration"></a> Cluster Configuration
 124
 125 The following section describe which configuration must be updated/created
 126 in order to get your cluster running with basic functionality.
 127
 128 * [configure the node name](12-distributed-monitoring-ha.md#configure-nodename)
 129 * [configure the ApiListener object](12-distributed-monitoring-ha.md#configure-apilistener-object)
 130 * [configure cluster endpoints](12-distributed-monitoring-ha.md#configure-cluster-endpoints)
 131 * [configure cluster zones](12-distributed-monitoring-ha.md#configure-cluster-zones)
 132
 133 Once you're finished with the basic setup the following section will
 134 describe how to use [zone configuration synchronisation](12-distributed-monitoring-ha.md#cluster-zone-config-sync)
 135 and configure [cluster scenarios](12-distributed-monitoring-ha.md#cluster-scenarios).
 136
 137 ### <a id="configure-nodename"></a> Configure the Icinga Node Name
 138
 139 Instead of using the default FQDN as node name you can optionally set
 140 that value using the [NodeName](19-language-reference.md#constants) constant.
 141
 142 > ** Note **
 143 >
 144 > Skip this step if your FQDN already matches the default `NodeName` set
 145 > in `/etc/icinga2/constants.conf`.
 146
 147 This setting must be unique for each node, and must also match
 148 the name of the local [Endpoint](6-object-types.md#objecttype-endpoint) object and the
 149 SSL certificate common name as described in the
 150 [cluster naming convention](12-distributed-monitoring-ha.md#cluster-naming-convention).
 151
 152     vim /etc/icinga2/constants.conf
 153
 154     /* Our local instance name. By default this is the server's hostname as returned by `hostname --fqdn`.
 155      * This should be the common name from the API certificate.
 156      */
 157     const NodeName = "icinga2a"
 158
 159
 160 Read further about additional [naming conventions](12-distributed-monitoring-ha.md#cluster-naming-convention).
 161
 162 Not specifying the node name will make Icinga 2 using the FQDN. Make sure that all
 163 configured endpoint names and common names are in sync.
 164
 165 ### <a id="configure-apilistener-object"></a> Configure the ApiListener Object
 166
 167 The [ApiListener](6-object-types.md#objecttype-apilistener) object needs to be configured on
 168 every node in the cluster with the following settings:
 169
 170 A sample config looks like:
 171
 172     object ApiListener "api" {
 173       cert_path = SysconfDir + "/icinga2/pki/" + NodeName + ".crt"
 174       key_path = SysconfDir + "/icinga2/pki/" + NodeName + ".key"
 175       ca_path = SysconfDir + "/icinga2/pki/ca.crt"
 176       accept_config = true
 177       accept_commands = true
 178     }
 179
 180 You can simply enable the `api` feature using
 181
 182     # icinga2 feature enable api
 183
 184 Edit `/etc/icinga2/features-enabled/api.conf` if you require the configuration
 185 synchronisation enabled for this node. Set the `accept_config` attribute to `true`.
 186
 187 If you want to use this node as [remote client for command execution](10-icinga2-client.md#icinga2-client-configuration-command-bridge)
 188 set the `accept_commands` attribute to `true`.
 189
 190 > **Note**
 191 >
 192 > The certificate files must be readable by the user Icinga 2 is running as. Also,
 193 > the private key file must not be world-readable.
 194
 195 ### <a id="configure-cluster-endpoints"></a> Configure Cluster Endpoints
 196
 197 `Endpoint` objects specify the `host` and `port` settings for the cluster node
 198 connection information.
 199 This configuration can be the same on all nodes in the cluster only containing
 200 connection information.
 201
 202 A sample configuration looks like:
 203
 204     /**
 205      * Configure config master endpoint
 206      */
 207
 208     object Endpoint "icinga2a" {
 209       host = "icinga2a.icinga.org"
 210     }
 211
 212 If this endpoint object is reachable on a different port, you must configure the
 213 `ApiListener` on the local `Endpoint` object accordingly too.
 214
 215 If you don't want the local instance to connect to the remote instance, remove the
 216 `host` attribute locally. Keep in mind that the configuration is now different amongst
 217 all instances and point-of-view dependant.
 218
 219 ### <a id="configure-cluster-zones"></a> Configure Cluster Zones
 220
 221 `Zone` objects specify the endpoints located in a zone. That way your distributed setup can be
 222 seen as zones connected together instead of multiple instances in that specific zone.
 223
 224 Zones can be used for [high availability](12-distributed-monitoring-ha.md#cluster-scenarios-high-availability),
 225 [distributed setups](12-distributed-monitoring-ha.md#cluster-scenarios-distributed-zones) and
 226 [load distribution](12-distributed-monitoring-ha.md#cluster-scenarios-load-distribution).
 227 Furthermore zones are used for the [Icinga 2 remote client](10-icinga2-client.md#icinga2-client).
 228
 229 Each Icinga 2 `Endpoint` must be put into its respective `Zone`. In this example, you will
 230 define the zone `config-ha-master` where the `icinga2a` and `icinga2b` endpoints
 231 are located. The `check-satellite` zone consists of `icinga2c` only, but more nodes could
 232 be added.
 233
 234 The `config-ha-master` zone acts as High-Availability setup - the Icinga 2 instances elect
 235 one instance running a check, notification or feature (DB IDO), for example `icinga2a`. In case of
 236 failure of the `icinga2a` instance, `icinga2b` will take over automatically.
 237
 238     object Zone "config-ha-master" {
 239       endpoints = [ "icinga2a", "icinga2b" ]
 240     }
 241
 242 The `check-satellite` zone is a separated location and only sends back their checkresults to
 243 the defined parent zone `config-ha-master`.
 244
 245     object Zone "check-satellite" {
 246       endpoints = [ "icinga2c" ]
 247       parent = "config-ha-master"
 248     }
 249
 250
 251 ## <a id="cluster-zone-config-sync"></a> Zone Configuration Synchronisation
 252
 253 By default all objects for specific zones should be organized in
 254
 255     /etc/icinga2/zones.d/<zonename>
 256
 257 on the configuration master.
 258
 259 Your child zones and endpoint members **must not** have their config copied to `zones.d`.
 260 The built-in configuration synchronisation takes care of that if your nodes accept
 261 configuration from the parent zone. You can define that in the
 262 [ApiListener](12-distributed-monitoring-ha.md#configure-apilistener-object) object by configuring the `accept_config`
 263 attribute accordingly.
 264
 265 You should remove the sample config included in `conf.d` by commenting the `recursive_include`
 266 statement in [icinga2.conf](5-configuring-icinga-2.md#icinga2-conf):
 267
 268     //include_recursive "conf.d"
 269
 270 Better use a dedicated directory name like `cluster` or similar, and include that
 271 one if your nodes require local configuration not being synced to other nodes. That's
 272 useful for local [health checks](12-distributed-monitoring-ha.md#cluster-health-check) for example.
 273
 274 > **Note**
 275 >
 276 > In a [high availability](12-distributed-monitoring-ha.md#cluster-scenarios-high-availability)
 277 > setup only one assigned node can act as configuration master. All other zone
 278 > member nodes **must not** have the `/etc/icinga2/zones.d` directory populated.
 279
 280 These zone packages are then distributed to all nodes in the same zone, and
 281 to their respective target zone instances.
 282
 283 Each configured zone must exist with the same directory name. The parent zone
 284 syncs the configuration to the child zones, if allowed using the `accept_config`
 285 attribute of the [ApiListener](12-distributed-monitoring-ha.md#configure-apilistener-object) object.
 286
 287 Config on node `icinga2a`:
 288
 289     object Zone "master" {
 290       endpoints = [ "icinga2a" ]
 291     }
 292
 293     object Zone "checker" {
 294       endpoints = [ "icinga2b" ]
 295       parent = "master"
 296     }
 297
 298     /etc/icinga2/zones.d
 299       master
 300         health.conf
 301       checker
 302         health.conf
 303         demo.conf
 304
 305 Config on node `icinga2b`:
 306
 307     object Zone "master" {
 308       endpoints = [ "icinga2a" ]
 309     }
 310
 311     object Zone "checker" {
 312       endpoints = [ "icinga2b" ]
 313       parent = "master"
 314     }
 315
 316     /etc/icinga2/zones.d
 317       EMPTY_IF_CONFIG_SYNC_ENABLED
 318
 319 If the local configuration is newer than the received update Icinga 2 will skip the synchronisation
 320 process.
 321
 322 > **Note**
 323 >
 324 > `zones.d` must not be included in [icinga2.conf](5-configuring-icinga-2.md#icinga2-conf). Icinga 2 automatically
 325 > determines the required include directory. This can be overridden using the
 326 > [global constant](19-language-reference.md#constants) `ZonesDir`.
 327
 328 ### <a id="zone-global-config-templates"></a> Global Configuration Zone for Templates
 329
 330 If your zone configuration setup shares the same templates, groups, commands, timeperiods, etc.
 331 you would have to duplicate quite a lot of configuration objects making the merged configuration
 332 on your configuration master unique.
 333
 334 > ** Note **
 335 >
 336 > Only put templates, groups, etc into this zone. DO NOT add checkable objects such as
 337 > hosts or services here. If they are checked by all instances globally, this will lead
 338 > into duplicated check results and unclear state history. Not easy to troubleshoot too -
 339 > you've been warned.
 340
 341 That is not necessary by defining a global zone shipping all those templates. By setting
 342 `global = true` you ensure that this zone serving common configuration templates will be
 343 synchronized to all involved nodes (only if they accept configuration though).
 344
 345 Config on configuration master:
 346
 347     /etc/icinga2/zones.d
 348       global-templates/
 349         templates.conf
 350         groups.conf
 351       master
 352         health.conf
 353       checker
 354         health.conf
 355         demo.conf
 356
 357 In this example, the global zone is called `global-templates` and must be defined in
 358 your zone configuration visible to all nodes.
 359
 360     object Zone "global-templates" {
 361       global = true
 362     }
 363
 364 > **Note**
 365 >
 366 > If the remote node does not have this zone configured, it will ignore the configuration
 367 > update, if it accepts synchronized configuration.
 368
 369 If you don't require any global configuration, skip this setting.
 370
 371 ### <a id="zone-config-sync-permissions"></a> Zone Configuration Synchronisation Permissions
 372
 373 Each [ApiListener](6-object-types.md#objecttype-apilistener) object must have the `accept_config` attribute
 374 set to `true` to receive configuration from the parent `Zone` members. Default value is `false`.
 375
 376     object ApiListener "api" {
 377       cert_path = SysconfDir + "/icinga2/pki/" + NodeName + ".crt"
 378       key_path = SysconfDir + "/icinga2/pki/" + NodeName + ".key"
 379       ca_path = SysconfDir + "/icinga2/pki/ca.crt"
 380       accept_config = true
 381     }
 382
 383 If `accept_config` is set to `false`, this instance won't accept configuration from remote
 384 master instances anymore.
 385
 386 > ** Tip **
 387 >
 388 > Look into the [troubleshooting guides](16-troubleshooting.md#troubleshooting-cluster-config-sync) for debugging
 389 > problems with the configuration synchronisation.
 390
 391
 392 ## <a id="cluster-health-check"></a> Cluster Health Check
 393
 394 The Icinga 2 [ITL](7-icinga-template-library.md#icinga-template-library) provides
 395 an internal check command checking all configured `EndPoints` in the cluster setup.
 396 The check result will become critical if one or more configured nodes are not connected.
 397
 398 Example:
 399
 400     object Host "icinga2a" {
 401       display_name = "Health Checks on icinga2a"
 402
 403       address = "192.168.33.10"
 404       check_command = "hostalive"
 405     }
 406
 407     object Service "cluster" {
 408         check_command = "cluster"
 409         check_interval = 5s
 410         retry_interval = 1s
 411
 412         host_name = "icinga2a"
 413     }
 414
 415 Each cluster node should execute its own local cluster health check to
 416 get an idea about network related connection problems from different
 417 points of view.
 418
 419 Additionally you can monitor the connection from the local zone to the remote
 420 connected zones.
 421
 422 Example for the `checker` zone checking the connection to the `master` zone:
 423
 424     object Service "cluster-zone-master" {
 425       check_command = "cluster-zone"
 426       check_interval = 5s
 427       retry_interval = 1s
 428       vars.cluster_zone = "master"
 429
 430       host_name = "icinga2b"
 431     }
 432
 433 ## <a id="cluster-health-check-command-endpoint"></a> Cluster Health Check with Command Endpoints
 434
 435 If you are planning to sync the zone configuration inside a [High-Availability]()
 436 cluster zone, you can also use the `command_endpoint` object attribute to
 437 pin host/service checks to a specific endpoint inside the same zone.
 438
 439 This requires the `accept_commands` setting inside the [ApiListener](12-distributed-monitoring-ha.md#configure-apilistener-object)
 440 object set to `true` similar to the [remote client command execution bridge](10-icinga2-client.md#icinga2-client-configuration-command-bridge)
 441 setup.
 442
 443 Make sure to set `command_endpoint` to the correct endpoint instance.
 444 The example below assumes that the endpoint name is the same as the
 445 host name configured for health checks. If it differs, define a host
 446 custom attribute providing [this information](10-icinga2-client.md#icinga2-client-configuration-command-bridge-master-config).
 447
 448     apply Service "cluster-ha" {
 449       check_command = "cluster"
 450       check_interval = 5s
 451       retry_interval = 1s
 452       /* make sure host.name is the same as endpoint name */
 453       command_endpoint = host.name
 454
 455       assign where regex("^icinga2[a|b]", host.name)
 456     }
 457
 458
 459 ## <a id="cluster-scenarios"></a> Cluster Scenarios
 460
 461 All cluster nodes are full-featured Icinga 2 instances. You only need to enabled
 462 the features for their role (for example, a `Checker` node only requires the `checker`
 463 feature enabled, but not `notification` or `ido-mysql` features).
 464
 465 > **Tip**
 466 >
 467 > There's a [Vagrant demo setup](https://github.com/Icinga/icinga-vagrant/tree/master/icinga2x-cluster)
 468 > available featuring a two node cluster showcasing several aspects (config sync,
 469 > remote command execution, etc).
 470
 471 ### <a id="cluster-scenarios-master-satellite-clients"></a> Cluster with Master, Satellites and Remote Clients
 472
 473 You can combine "classic" cluster scenarios from HA to Master-Checker with the
 474 Icinga 2 Remote Client modes. Each instance plays a certain role in that picture.
 475
 476 Imagine the following scenario:
 477
 478 * The master zone acts as High-Availability zone
 479 * Remote satellite zones execute local checks and report them to the master
 480 * All satellites query remote clients and receive check results (which they also replay to the master)
 481 * All involved nodes share the same configuration logic: zones, endpoints, apilisteners
 482
 483 You'll need to think about the following:
 484
 485 * Deploy the entire configuration from the master to satellites and cascading remote clients? ("top down")
 486 * Use local client configuration instead and report the inventory to satellites and cascading to the master? ("bottom up")
 487 * Combine that with command execution brdiges on remote clients and also satellites
 488
 489
 490
 491
 492 ### <a id="cluster-scenarios-security"></a> Security in Cluster Scenarios
 493
 494 While there are certain capabilities to ensure the safe communication between all
 495 nodes (firewalls, policies, software hardening, etc) the Icinga 2 cluster also provides
 496 additional security itself:
 497
 498 * [SSL certificates](12-distributed-monitoring-ha.md#manual-certificate-generation) are mandatory for cluster communication.
 499 * Child zones only receive event updates (check results, commands, etc) for their configured updates.
 500 * Zones cannot influence/interfere other zones. Each checked object is assigned to only one zone.
 501 * All nodes in a zone trust each other.
 502 * [Configuration sync](12-distributed-monitoring-ha.md#zone-config-sync-permissions) is disabled by default.
 503
 504 ### <a id="cluster-scenarios-features"></a> Features in Cluster Zones
 505
 506 Each cluster zone may use all available features. If you have multiple locations
 507 or departments, they may write to their local database, or populate graphite.
 508 Even further all commands are distributed amongst connected nodes. For example, you could
 509 re-schedule a check or acknowledge a problem on the master, and it gets replicated to the
 510 actual slave checker node.
 511
 512 DB IDO on the left, graphite on the right side - works (if you disable
 513 [DB IDO HA](12-distributed-monitoring-ha.md#high-availability-db-ido)).
 514 Icinga Web 2 on the left, checker and notifications on the right side - works too.
 515 Everything on the left and on the right side - make sure to deal with
 516 [load-balanced notifications and checks](12-distributed-monitoring-ha.md#high-availability-features) in a
 517 [HA zone](12-distributed-monitoring-ha.md#cluster-scenarios-high-availability).
 518
 519
 520 ### <a id="cluster-scenarios-distributed-zones"></a> Distributed Zones
 521
 522 That scenario fits if your instances are spread over the globe and they all report
 523 to a master instance. Their network connection only works towards the master master
 524 (or the master is able to connect, depending on firewall policies) which means
 525 remote instances won't see each/connect to each other.
 526
 527 All events (check results, downtimes, comments, etc) are synced to the master node,
 528 but the remote nodes can still run local features such as a web interface, reporting,
 529 graphing, etc. in their own specified zone.
 530
 531 Imagine the following example with a master node in Nuremberg, and two remote DMZ
 532 based instances in Berlin and Vienna. Additonally you'll specify
 533 [global templates](12-distributed-monitoring-ha.md#zone-global-config-templates) available in all zones.
 534
 535 The configuration tree on the master instance `nuremberg` could look like this:
 536
 537     zones.d
 538       global-templates/
 539         templates.conf
 540         groups.conf
 541       nuremberg/
 542         local.conf
 543       berlin/
 544         hosts.conf
 545       vienna/
 546         hosts.conf
 547
 548 The configuration deployment will take care of automatically synchronising
 549 the child zone configuration:
 550
 551 * The master node sends `zones.d/berlin` to the `berlin` child zone.
 552 * The master node sends `zones.d/vienna` to the `vienna` child zone.
 553 * The master node sends `zones.d/global-templates` to the `vienna` and `berlin` child zones.
 554
 555 The endpoint configuration would look like:
 556
 557     object Endpoint "nuremberg-master" {
 558       host = "nuremberg.icinga.org"
 559     }
 560
 561     object Endpoint "berlin-satellite" {
 562       host = "berlin.icinga.org"
 563     }
 564
 565     object Endpoint "vienna-satellite" {
 566       host = "vienna.icinga.org"
 567     }
 568
 569 The zones would look like:
 570
 571     object Zone "nuremberg" {
 572       endpoints = [ "nuremberg-master" ]
 573     }
 574
 575     object Zone "berlin" {
 576       endpoints = [ "berlin-satellite" ]
 577       parent = "nuremberg"
 578     }
 579
 580     object Zone "vienna" {
 581       endpoints = [ "vienna-satellite" ]
 582       parent = "nuremberg"
 583     }
 584
 585     object Zone "global-templates" {
 586       global = true
 587     }
 588
 589 The `nuremberg-master` zone will only execute local checks, and receive
 590 check results from the satellite nodes in the zones `berlin` and `vienna`.
 591
 592 > **Note**
 593 >
 594 > The child zones `berlin` and `vienna` will get their configuration synchronised
 595 > from the configuration master 'nuremberg'. The endpoints in the child
 596 > zones **must not** have their `zones.d` directory populated if this endpoint
 597 > [accepts synced configuration](12-distributed-monitoring-ha.md#zone-config-sync-permissions).
 598
 599 ### <a id="cluster-scenarios-load-distribution"></a> Load Distribution
 600
 601 If you are planning to off-load the checks to a defined set of remote workers
 602 you can achieve that by:
 603
 604 * Deploying the configuration on all nodes.
 605 * Let Icinga 2 distribute the load amongst all available nodes.
 606
 607 That way all remote check instances will receive the same configuration
 608 but only execute their part. The master instance located in the `master` zone
 609 can also execute checks, but you may also disable the `Checker` feature.
 610
 611 Configuration on the master node:
 612
 613     zones.d/
 614       global-templates/
 615       master/
 616       checker/
 617
 618 If you are planning to have some checks executed by a specific set of checker nodes
 619 you have to define additional zones and define these check objects there.
 620
 621 Endpoints:
 622
 623     object Endpoint "master-node" {
 624       host = "master.icinga.org"
 625     }
 626
 627     object Endpoint "checker1-node" {
 628       host = "checker1.icinga.org"
 629     }
 630
 631     object Endpoint "checker2-node" {
 632       host = "checker2.icinga.org"
 633     }
 634
 635
 636 Zones:
 637
 638     object Zone "master" {
 639       endpoints = [ "master-node" ]
 640     }
 641
 642     object Zone "checker" {
 643       endpoints = [ "checker1-node", "checker2-node" ]
 644       parent = "master"
 645     }
 646
 647     object Zone "global-templates" {
 648       global = true
 649     }
 650
 651 > **Note**
 652 >
 653 > The child zones `checker` will get its configuration synchronised
 654 > from the configuration master 'master'. The endpoints in the child
 655 > zone **must not** have their `zones.d` directory populated if this endpoint
 656 > [accepts synced configuration](12-distributed-monitoring-ha.md#zone-config-sync-permissions).
 657
 658 ### <a id="cluster-scenarios-high-availability"></a> Cluster High Availability
 659
 660 High availability with Icinga 2 is possible by putting multiple nodes into
 661 a dedicated [zone](12-distributed-monitoring-ha.md#configure-cluster-zones). All nodes will elect one
 662 active master, and retry an election once the current active master is down.
 663
 664 Selected features provide advanced [HA functionality](12-distributed-monitoring-ha.md#high-availability-features).
 665 Checks and notifications are load-balanced between nodes in the high availability
 666 zone.
 667
 668 Connections from other zones will be accepted by all active and passive nodes
 669 but all are forwarded to the current active master dealing with the check results,
 670 commands, etc.
 671
 672     object Zone "config-ha-master" {
 673       endpoints = [ "icinga2a", "icinga2b", "icinga2c" ]
 674     }
 675
 676 Two or more nodes in a high availability setup require an [initial cluster sync](12-distributed-monitoring-ha.md#initial-cluster-sync).
 677
 678 > **Note**
 679 >
 680 > Keep in mind that **only one node acts as configuration master** having the
 681 > configuration files in the `zones.d` directory. All other nodes **must not**
 682 > have that directory populated. Instead they are required to
 683 > [accept synced configuration](12-distributed-monitoring-ha.md#zone-config-sync-permissions).
 684 > Details in the [Configuration Sync Chapter](12-distributed-monitoring-ha.md#cluster-zone-config-sync).
 685
 686 ### <a id="cluster-scenarios-multiple-hierarchies"></a> Multiple Hierarchies
 687
 688 Your master zone collects all check results for reporting and graphing and also
 689 does some sort of additional notifications.
 690 The customers got their own instances in their local DMZ zones. They are limited to read/write
 691 only their services, but replicate all events back to the master instance.
 692 Within each DMZ there are additional check instances also serving interfaces for local
 693 departments. The customers instances will collect all results, but also send them back to
 694 your master instance.
 695 Additionally the customers instance on the second level in the middle prohibits you from
 696 sending commands to the subjacent department nodes. You're only allowed to receive the
 697 results, and a subset of each customers configuration too.
 698
 699 Your master zone will generate global reports, aggregate alert notifications, and check
 700 additional dependencies (for example, the customers internet uplink and bandwidth usage).
 701
 702 The customers zone instances will only check a subset of local services and delegate the rest
 703 to each department. Even though it acts as configuration master with a master dashboard
 704 for all departments managing their configuration tree which is then deployed to all
 705 department instances. Furthermore the master NOC is able to see what's going on.
 706
 707 The instances in the departments will serve a local interface, and allow the administrators
 708 to reschedule checks or acknowledge problems for their services.
 709
 710
 711 ## <a id="high-availability-features"></a> High Availability for Icinga 2 features
 712
 713 All nodes in the same zone require the same features enabled for High Availability (HA)
 714 amongst them.
 715
 716 By default the following features provide advanced HA functionality:
 717
 718 * [Checks](12-distributed-monitoring-ha.md#high-availability-checks) (load balanced, automated failover)
 719 * [Notifications](12-distributed-monitoring-ha.md#high-availability-notifications) (load balanced, automated failover)
 720 * [DB IDO](12-distributed-monitoring-ha.md#high-availability-db-ido) (Run-Once, automated failover)
 721
 722 ### <a id="high-availability-checks"></a> High Availability with Checks
 723
 724 All nodes in the same zone load-balance the check execution. When one instance
 725 fails the other nodes will automatically take over the reamining checks.
 726
 727 > **Note**
 728 >
 729 > If a node should not check anything, disable the `checker` feature explicitely and
 730 > reload Icinga 2.
 731
 732     # icinga2 feature disable checker
 733     # service icinga2 reload
 734
 735 ### <a id="high-availability-notifications"></a> High Availability with Notifications
 736
 737 Notifications are load balanced amongst all nodes in a zone. By default this functionality
 738 is enabled.
 739 If your nodes should notify independent from any other nodes (this will cause
 740 duplicated notifications if not properly handled!), you can set `enable_ha = false`
 741 in the [NotificationComponent](6-object-types.md#objecttype-notificationcomponent) feature.
 742
 743 ### <a id="high-availability-db-ido"></a> High Availability with DB IDO
 744
 745 All instances within the same zone (e.g. the `master` zone as HA cluster) must
 746 have the DB IDO feature enabled.
 747
 748 Example DB IDO MySQL:
 749
 750     # icinga2 feature enable ido-mysql
 751     The feature 'ido-mysql' is already enabled.
 752
 753 By default the DB IDO feature only runs on one node. All other nodes in the same zone disable
 754 the active IDO database connection at runtime. The node with the active DB IDO connection is
 755 not necessarily the zone master.
 756
 757 > **Note**
 758 >
 759 > The DB IDO HA feature can be disabled by setting the `enable_ha` attribute to `false`
 760 > for the [IdoMysqlConnection](6-object-types.md#objecttype-idomysqlconnection) or
 761 > [IdoPgsqlConnection](6-object-types.md#objecttype-idopgsqlconnection) object on **all** nodes in the
 762 > **same** zone.
 763 >
 764 > All endpoints will enable the DB IDO feature and connect to the configured
 765 > database and dump configuration, status and historical data on their own.
 766
 767 If the instance with the active DB IDO connection dies, the HA functionality will
 768 automatically elect a new DB IDO master.
 769
 770 The DB IDO feature will try to determine which cluster endpoint is currently writing
 771 to the database and bail out if another endpoint is active. You can manually verify that
 772 by running the following query:
 773
 774     icinga=> SELECT status_update_time, endpoint_name FROM icinga_programstatus;
 775        status_update_time   | endpoint_name
 776     ------------------------+---------------
 777      2014-08-15 15:52:26+02 | icinga2a
 778     (1 Zeile)
 779
 780 This is useful when the cluster connection between endpoints breaks, and prevents
 781 data duplication in split-brain-scenarios. The failover timeout can be set for the
 782 `failover_timeout` attribute, but not lower than 60 seconds.
 783
 784
 785 ## <a id="cluster-add-node"></a> Add a new cluster endpoint
 786
 787 These steps are required for integrating a new cluster endpoint:
 788
 789 * generate a new [SSL client certificate](12-distributed-monitoring-ha.md#manual-certificate-generation)
 790 * identify its location in the zones
 791 * update the `zones.conf` file on each involved node ([endpoint](12-distributed-monitoring-ha.md#configure-cluster-endpoints), [zones](12-distributed-monitoring-ha.md#configure-cluster-zones))
 792     * a new slave zone node requires updates for the master and slave zones
 793     * verify if this endpoints requires [configuration synchronisation](12-distributed-monitoring-ha.md#cluster-zone-config-sync) enabled
 794 * if the node requires the existing zone history: [initial cluster sync](12-distributed-monitoring-ha.md#initial-cluster-sync)
 795 * add a [cluster health check](12-distributed-monitoring-ha.md#cluster-health-check)
 796
 797 ### <a id="initial-cluster-sync"></a> Initial Cluster Sync
 798
 799 In order to make sure that all of your cluster nodes have the same state you will
 800 have to pick one of the nodes as your initial "master" and copy its state file
 801 to all the other nodes.
 802
 803 You can find the state file in `/var/lib/icinga2/icinga2.state`. Before copying
 804 the state file you should make sure that all your cluster nodes are properly shut
 805 down.
 806
 807
 808 ## <a id="host-multiple-cluster-nodes"></a> Host With Multiple Cluster Nodes
 809
 810 Special scenarios might require multiple cluster nodes running on a single host.
 811 By default Icinga 2 and its features will place their runtime data below the prefix
 812 `LocalStateDir`. By default packages will set that path to `/var`.
 813 You can either set that variable as constant configuration
 814 definition in [icinga2.conf](5-configuring-icinga-2.md#icinga2-conf) or pass it as runtime variable to
 815 the Icinga 2 daemon.
 816
 817     # icinga2 -c /etc/icinga2/node1/icinga2.conf -DLocalStateDir=/opt/node1/var