granicus.if.org Git - icinga2/blob - doc/3-monitoring-basics.md

   1 # <a id="monitoring-basics"></a> Monitoring Basics
   2
   3 This part of the Icinga 2 documentation provides an overview of all the basic
   4 monitoring concepts you need to know to run Icinga 2.
   5
   6 ## <a id="hosts-services"></a> Hosts and Services
   7
   8 Icinga 2 can be used to monitor the availability of hosts and services. Hosts
   9 and services can be virtually anything which can be checked in some way:
  10
  11 * Network services (HTTP, SMTP, SNMP, SSH, etc.)
  12 * Printers
  13 * Switches / routers
  14 * Temperature sensors
  15 * Other local or network-accessible services
  16
  17 Host objects provide a mechanism to group services that are running
  18 on the same physical device.
  19
  20 Here is an example of a host object which defines two child services:
  21
  22     object Host "my-server1" {
  23       address = "10.0.0.1"
  24       check_command = "hostalive"
  25     }
  26
  27     object Service "ping4" {
  28       host_name = "my-server1"
  29       check_command = "ping4"
  30     }
  31
  32     object Service "http" {
  33       host_name = "my-server1"
  34       check_command = "http"
  35     }
  36
  37 The example creates two services `ping4` and `http` which belong to the
  38 host `my-server1`.
  39
  40 It also specifies that the host should perform its own check using the `hostalive`
  41 check command.
  42
  43 The `address` attribute is used by check commands to determine which network
  44 address is associated with the host object.
  45
  46 Details on troubleshooting check problems can be found [here](13-troubleshooting.md#troubleshooting).
  47
  48 ### <a id="host-states"></a> Host States
  49
  50 Hosts can be in any of the following states:
  51
  52   Name        | Description
  53   ------------|--------------
  54   UP          | The host is available.
  55   DOWN        | The host is unavailable.
  56
  57 ### <a id="service-states"></a> Service States
  58
  59 Services can be in any of the following states:
  60
  61   Name        | Description
  62   ------------|--------------
  63   OK          | The service is working properly.
  64   WARNING     | The service is experiencing some problems but is still considered to be in working condition.
  65   CRITICAL    | The service is in a critical state.
  66   UNKNOWN     | The check could not determine the service's state.
  67
  68 ### <a id="hard-soft-states"></a> Hard and Soft States
  69
  70 When detecting a problem with a host/service Icinga re-checks the object a number of
  71 times (based on the `max_check_attempts` and `retry_interval` settings) before sending
  72 notifications. This ensures that no unnecessary notifications are sent for
  73 transient failures. During this time the object is in a `SOFT` state.
  74
  75 After all re-checks have been executed and the object is still in a non-OK
  76 state the host/service switches to a `HARD` state and notifications are sent.
  77
  78   Name        | Description
  79   ------------|--------------
  80   HARD        | The host/service's state hasn't recently changed.
  81   SOFT        | The host/service has recently changed state and is being re-checked.
  82
  83 ### <a id="host-service-checks"></a> Host and Service Checks
  84
  85 Hosts and services determine their state by running checks in a regular interval.
  86
  87     object Host "router" {
  88       check_command = "hostalive"
  89       address = "10.0.0.1"
  90     }
  91
  92 The `hostalive` command is one of several built-in check commands. It sends ICMP
  93 echo requests to the IP address specified in the `address` attribute to determine
  94 whether a host is online.
  95
  96 A number of other [built-in check commands](#plugin-check-comamnds) are also
  97 available. In addition to these commands the next few chapters will explain in
  98 detail how to set up your own check commands.
  99
 100
 101 ## <a id="object-inheritance-using-templates"></a> Templates
 102
 103 Templates may be used to apply a set of identical attributes to more than one
 104 object:
 105
 106     template Service "generic-service" {
 107       max_check_attempts = 3
 108       check_interval = 5m
 109       retry_interval = 1m
 110       enable_perfdata = true
 111     }
 112
 113     apply Service "ping4" {
 114       import "generic-service"
 115
 116       check_command = "ping4"
 117
 118       assign where host.address
 119     }
 120
 121     apply Service "ping6" {
 122       import "generic-service"
 123
 124       check_command = "ping6"
 125
 126       assign where host.address6
 127     }
 128
 129
 130 In this example the `ping4` and `ping6` services inherit properties from the
 131 template `generic-service`.
 132
 133 Objects as well as templates themselves can import an arbitrary number of
 134 other templates. Attributes inherited from a template can be overridden in the
 135 object if necessary.
 136
 137 You can also import existing non-template objects. Note that templates
 138 and objects share the same namespace, i.e. you can't define a template
 139 that has the same name like an object.
 140
 141
 142 ## <a id="custom-attributes"></a> Custom Attributes
 143
 144 In addition to built-in attributes you can define your own attributes:
 145
 146     object Host "localhost" {
 147       vars.ssh_port = 2222
 148     }
 149
 150 Valid values for custom attributes include:
 151
 152 * Strings and numbers
 153 * Arrays and dictionaries
 154 * Functions
 155
 156 ### <a id="custom-attributes-functions"></a> Functions as Custom Attributes
 157
 158 Icinga lets you specify functions for custom attributes. The special case here
 159 is that whenever Icinga needs the value for such a custom attribute it runs
 160 the function and uses whatever value the function returns:
 161
 162     object CheckCommand "random-value" {
 163       import "plugin-check-command"
 164
 165       command = [ PluginDir + "/check_dummy", "0", "$text$" ]
 166
 167       vars.text = {{ Math.random() * 100 }}
 168     }
 169
 170 This example uses the [abbreviated lambda syntax](16-language-reference.md#nullary-lambdas).
 171
 172 These functions have access to a number of variables:
 173
 174   Variable     | Description
 175   -------------|---------------
 176   user         | The User object (for notifications).
 177   service      | The Service object (for service checks/notifications/event handlers).
 178   host         | The Host object.
 179   command      | The command object (e.g. a CheckCommand object for checks).
 180
 181 Here's an example:
 182
 183     vars.text = {{ host.check_interval }}
 184
 185 In addition to these variables the `macro` function can be used to retrieve the
 186 value of arbitrary macro expressions:
 187
 188     vars.text = {{
 189       if (macro("$address$") == "127.0.0.1") {
 190         log("Running a check for localhost!")
 191       }
 192
 193       return "Some text"
 194     }}
 195
 196 The [Object Accessor Functions](17-library-reference.md#object-accessor-functions) can be used to retrieve references
 197 to other objects by name.
 198
 199 ## <a id="runtime-macros"></a> Runtime Macros
 200
 201 Macros can be used to access other objects' attributes at runtime. For example they
 202 are used in command definitions to figure out which IP address a check should be
 203 run against:
 204
 205     object CheckCommand "my-ping" {
 206       import "plugin-check-command"
 207
 208       command = [ PluginDir + "/check_ping", "-H", "$ping_address$" ]
 209
 210       arguments = {
 211         "-w" = "$ping_wrta$,$ping_wpl$%"
 212         "-c" = "$ping_crta$,$ping_cpl$%"
 213         "-p" = "$ping_packets$"
 214       }
 215
 216       vars.ping_wrta = 100
 217       vars.ping_wpl = 5
 218
 219       vars.ping_crta = 250
 220       vars.ping_cpl = 10
 221
 222       vars.ping_packets = 5
 223     }
 224
 225     object Host "router" {
 226       check_command = "my-ping"
 227       address = "10.0.0.1"
 228     }
 229
 230 In this example we are using the `$address$` macro to refer to the host's `address`
 231 attribute.
 232
 233 We can also directly refer to custom attributes, e.g. by using `$ping_wrta$`. Icinga
 234 automatically tries to find the closest match for the attribute you specified. The
 235 exact rules for this are explained in the next section.
 236
 237
 238 ### <a id="macro-evaluation-order"></a> Evaluation Order
 239
 240 When executing commands Icinga 2 checks the following objects in this order to look
 241 up macros and their respective values:
 242
 243 1. User object (only for notifications)
 244 2. Service object
 245 3. Host object
 246 4. Command object
 247 5. Global custom attributes in the `Vars` constant
 248
 249 This execution order allows you to define default values for custom attributes
 250 in your command objects.
 251
 252 Here's how you can override the custom attribute `ping_packets` from the previous
 253 example:
 254
 255     object Service "ping" {
 256       host_name = "localhost"
 257       check_command = "my-ping"
 258
 259       vars.ping_packets = 10 // Overrides the default value of 5 given in the command
 260     }
 261
 262 If a custom attribute isn't defined anywhere an empty value is used and a warning is
 263 written to the Icinga 2 log.
 264
 265 You can also directly refer to a specific attribute - thereby ignoring these evaluation
 266 rules - by specifying the full attribute name:
 267
 268     $service.vars.ping_wrta$
 269
 270 This retrieves the value of the `ping_wrta` custom attribute for the service. This
 271 returns an empty value if the server does not have such a custom attribute no matter
 272 whether another object such as the host has this attribute.
 273
 274
 275 ### <a id="host-runtime-macros"></a> Host Runtime Macros
 276
 277 The following host custom attributes are available in all commands that are executed for
 278 hosts or services:
 279
 280   Name                         | Description
 281   -----------------------------|--------------
 282   host.name                    | The name of the host object.
 283   host.display_name            | The value of the `display_name` attribute.
 284   host.state                   | The host's current state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
 285   host.state_id                | The host's current state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
 286   host.state_type              | The host's current state type. Can be one of `SOFT` and `HARD`.
 287   host.check_attempt           | The current check attempt number.
 288   host.max_check_attempts      | The maximum number of checks which are executed before changing to a hard state.
 289   host.last_state              | The host's previous state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
 290   host.last_state_id           | The host's previous state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
 291   host.last_state_type         | The host's previous state type. Can be one of `SOFT` and `HARD`.
 292   host.last_state_change       | The last state change's timestamp.
 293   host.downtime_depth          | The number of active downtimes.
 294   host.duration_sec            | The time since the last state change.
 295   host.latency                 | The host's check latency.
 296   host.execution_time          | The host's check execution time.
 297   host.output                  | The last check's output.
 298   host.perfdata                | The last check's performance data.
 299   host.last_check              | The timestamp when the last check was executed.
 300   host.check_source            | The monitoring instance that performed the last check.
 301   host.num_services            | Number of services associated with the host.
 302   host.num_services_ok         | Number of services associated with the host which are in an `OK` state.
 303   host.num_services_warning    | Number of services associated with the host which are in a `WARNING` state.
 304   host.num_services_unknown    | Number of services associated with the host which are in an `UNKNOWN` state.
 305   host.num_services_critical   | Number of services associated with the host which are in a `CRITICAL` state.
 306
 307 ### <a id="service-runtime-macros"></a> Service Runtime Macros
 308
 309 The following service macros are available in all commands that are executed for
 310 services:
 311
 312   Name                       | Description
 313   ---------------------------|--------------
 314   service.name               | The short name of the service object.
 315   service.display_name       | The value of the `display_name` attribute.
 316   service.check_command      | The short name of the command along with any arguments to be used for the check.
 317   service.state              | The service's current state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
 318   service.state_id           | The service's current state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
 319   service.state_type         | The service's current state type. Can be one of `SOFT` and `HARD`.
 320   service.check_attempt      | The current check attempt number.
 321   service.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
 322   service.last_state         | The service's previous state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
 323   service.last_state_id      | The service's previous state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
 324   service.last_state_type    | The service's previous state type. Can be one of `SOFT` and `HARD`.
 325   service.last_state_change  | The last state change's timestamp.
 326   service.downtime_depth     | The number of active downtimes.
 327   service.duration_sec       | The time since the last state change.
 328   service.latency            | The service's check latency.
 329   service.execution_time     | The service's check execution time.
 330   service.output             | The last check's output.
 331   service.perfdata           | The last check's performance data.
 332   service.last_check         | The timestamp when the last check was executed.
 333   service.check_source       | The monitoring instance that performed the last check.
 334
 335 ### <a id="command-runtime-macros"></a> Command Runtime Macros
 336
 337 The following custom attributes are available in all commands:
 338
 339   Name                   | Description
 340   -----------------------|--------------
 341   command.name           | The name of the command object.
 342
 343 ### <a id="user-runtime-macros"></a> User Runtime Macros
 344
 345 The following custom attributes are available in all commands that are executed for
 346 users:
 347
 348   Name                   | Description
 349   -----------------------|--------------
 350   user.name              | The name of the user object.
 351   user.display_name      | The value of the display_name attribute.
 352
 353 ### <a id="notification-runtime-macros"></a> Notification Runtime Macros
 354
 355   Name                   | Description
 356   -----------------------|--------------
 357   notification.type      | The type of the notification.
 358   notification.author    | The author of the notification comment, if existing.
 359   notification.comment   | The comment of the notification, if existing.
 360
 361 ### <a id="global-runtime-macros"></a> Global Runtime Macros
 362
 363 The following macros are available in all executed commands:
 364
 365   Name                   | Description
 366   -----------------------|--------------
 367   icinga.timet           | Current UNIX timestamp.
 368   icinga.long_date_time  | Current date and time including timezone information. Example: `2014-01-03 11:23:08 +0000`
 369   icinga.short_date_time | Current date and time. Example: `2014-01-03 11:23:08`
 370   icinga.date            | Current date. Example: `2014-01-03`
 371   icinga.time            | Current time including timezone information. Example: `11:23:08 +0000`
 372   icinga.uptime          | Current uptime of the Icinga 2 process.
 373
 374 The following macros provide global statistics:
 375
 376   Name                              | Description
 377   ----------------------------------|--------------
 378   icinga.num_services_ok            | Current number of services in state 'OK'.
 379   icinga.num_services_warning       | Current number of services in state 'Warning'.
 380   icinga.num_services_critical      | Current number of services in state 'Critical'.
 381   icinga.num_services_unknown       | Current number of services in state 'Unknown'.
 382   icinga.num_services_pending       | Current number of pending services.
 383   icinga.num_services_unreachable   | Current number of unreachable services.
 384   icinga.num_services_flapping      | Current number of flapping services.
 385   icinga.num_services_in_downtime   | Current number of services in downtime.
 386   icinga.num_services_acknowledged  | Current number of acknowledged service problems.
 387   icinga.num_hosts_up               | Current number of hosts in state 'Up'.
 388   icinga.num_hosts_down             | Current number of hosts in state 'Down'.
 389   icinga.num_hosts_unreachable      | Current number of unreachable hosts.
 390   icinga.num_hosts_flapping         | Current number of flapping hosts.
 391   icinga.num_hosts_in_downtime      | Current number of hosts in downtime.
 392   icinga.num_hosts_acknowledged     | Current number of acknowledged host problems.
 393
 394
 395
 396
 397 ## <a id="using-apply"></a> Apply Rules
 398
 399 Instead of assigning each object ([Service](6-object-types.md#objecttype-service),
 400 [Notification](6-object-types.md#objecttype-notification), [Dependency](6-object-types.md#objecttype-dependency),
 401 [ScheduledDowntime](6-object-types.md#objecttype-scheduleddowntime))
 402 based on attribute identifiers for example `host_name` objects can be [applied](16-language-reference.md#apply).
 403
 404 Before you start using the apply rules keep the following in mind:
 405
 406 * Define the best match.
 407     * A set of unique [custom attributes](#custom-attributes-apply) for these hosts/services?
 408     * Or [group](3-monitoring-basics.md#groups) memberships, e.g. a host being a member of a hostgroup, applying services to it?
 409     * A generic pattern [match](16-language-reference.md#function-calls) on the host/service name?
 410     * [Multiple expressions combined](3-monitoring-basics.md#using-apply-expressions) with `&&` or `||` [operators](16-language-reference.md#expression-operators)
 411 * All expressions must return a boolean value (an empty string is equal to `false` e.g.)
 412
 413 > **Note**
 414 >
 415 > You can set/override object attributes in apply rules using the respectively available
 416 > objects in that scope (host and/or service objects).
 417
 418 [Custom attributes](3-monitoring-basics.md#custom-attributes) can also store nested dictionaries and arrays. That way you can use them
 419 for not only matching for their existance or values in apply expressions, but also assign
 420 ("inherit") their values into the generated objected from apply rules.
 421
 422 * [Apply services to hosts](3-monitoring-basics.md#using-apply-services)
 423 * [Apply notifications to hosts and services](3-monitoring-basics.md#using-apply-notifications)
 424 * [Apply dependencies to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes)
 425 * [Apply scheduled downtimes to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes)
 426
 427 A more advanced example is using [apply with for loops on arrays or
 428 dictionaries](#using-apply-for) for example provided by
 429 [custom atttributes](#custom-attributes-apply) or groups.
 430
 431 > **Tip**
 432 >
 433 > Building configuration in that dynamic way requires detailed information
 434 > of the generated objects. Use the `object list` [CLI command](8-cli-commands.md#cli-command-object)
 435 > after successful [configuration validation](8-cli-commands.md#config-validation).
 436
 437
 438 ### <a id="using-apply-expressions"></a> Apply Rules Expressions
 439
 440 You can use simple or advanced combinations of apply rule expressions. Each
 441 expression must evaluate into the boolean `true` value. An empty string
 442 will be for instance interpreted as `false`. In a similar fashion undefined
 443 attributes will return `false`.
 444
 445 Returns `false`:
 446
 447     assign where host.vars.attribute_does_not_exist
 448
 449 Multiple `assign where` condition rows are evaluated as `OR` condition.
 450
 451 You can combine multiple expressions for matching only a subset of objects. In some cases,
 452 you want to be able to add more than one assign/ignore where expression which matches
 453 a specific condition. To achieve this you can use the logical `and` and `or` operators.
 454
 455
 456 Match all `*mysql*` patterns in the host name and (`&&`) custom attribute `prod_mysql_db`
 457 matches the `db-*` pattern. All hosts with the custom attribute `test_server` set to `true`
 458 should be ignored, or any host name ending with `*internal` pattern.
 459
 460     object HostGroup "mysql-server" {
 461       display_name = "MySQL Server"
 462
 463       assign where match("*mysql*", host.name) && match("db-*", host.vars.prod_mysql_db)
 464       ignore where host.vars.test_server == true
 465       ignore where match("*internal", host.name)
 466     }
 467
 468 Similar example for advanced notification apply rule filters: If the service
 469 attribute `notes` contains the `has gold support 24x7` string `AND` one of the
 470 two condition passes: Either the `customer` host custom attribute is set to `customer-xy`
 471 `OR` the host custom attribute `always_notify` is set to `true`.
 472
 473 The notification is ignored for services whose host name ends with `*internal`
 474 `OR` the `priority` custom attribute is [less than](16-language-reference.md#expression-operators) `2`.
 475
 476     template Notification "cust-xy-notification" {
 477       users = [ "noc-xy", "mgmt-xy" ]
 478       command = "mail-service-notification"
 479     }
 480
 481     apply Notification "notify-cust-xy-mysql" to Service {
 482       import "cust-xy-notification"
 483
 484       assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true
 485       ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true)
 486     }
 487
 488
 489
 490
 491 ### <a id="using-apply-services"></a> Apply Services to Hosts
 492
 493 The sample configuration already includes a detailed example in [hosts.conf](5-configuring-icinga-2.md#hosts-conf)
 494 and [services.conf](5-configuring-icinga-2.md#services-conf) for this use case.
 495
 496 The example for `ssh` applies a service object to all hosts with the `address`
 497 attribute being defined and the custom attribute `os` set to the string `Linux` in `vars`.
 498
 499     apply Service "ssh" {
 500       import "generic-service"
 501
 502       check_command = "ssh"
 503
 504       assign where host.address && host.vars.os == "Linux"
 505     }
 506
 507
 508 Other detailed scenario examples are used in their respective chapters, for example
 509 [apply services with custom command arguments](#using-apply-services-command-arguments).
 510
 511 ### <a id="using-apply-notifications"></a> Apply Notifications to Hosts and Services
 512
 513 Notifications are applied to specific targets (`Host` or `Service`) and work in a similar
 514 manner:
 515
 516
 517     apply Notification "mail-noc" to Service {
 518       import "mail-service-notification"
 519
 520       user_groups = [ "noc" ]
 521
 522       assign where host.vars.notification.mail
 523     }
 524
 525
 526 In this example the `mail-noc` notification will be created as object for all services having the
 527 `notification.mail` custom attribute defined. The notification command is set to `mail-service-notification`
 528 and all members of the user group `noc` will get notified.
 529
 530 ### <a id="using-apply-dependencies"></a> Apply Dependencies to Hosts and Services
 531
 532 Detailed examples can be found in the [dependencies](3-monitoring-basics.md#dependencies) chapter.
 533
 534 ### <a id="using-apply-scheduledowntimes"></a> Apply Recurring Downtimes to Hosts and Services
 535
 536 The sample confituration includes an example in [downtimes.conf](5-configuring-icinga-2.md#downtimes-conf).
 537
 538 Detailed examples can be found in the [recurring downtimes](4-advanced-topics.md#recurring-downtimes) chapter.
 539
 540
 541 ### <a id="using-apply-for"></a> Using Apply For Rules
 542
 543 Next to the standard way of using apply rules there is the requirement of generating
 544 apply rules objects based on set (array or dictionary). That way you'll save quite
 545 of a lot of duplicated apply rules by combining them into one generic generating
 546 the object name with or without a prefix.
 547
 548 The sample configuration already includes a detailed example in [hosts.conf](5-configuring-icinga-2.md#hosts-conf)
 549 and [services.conf](5-configuring-icinga-2.md#services-conf) for this use case.
 550
 551 Imagine a different example: You are monitoring your switch (hosts) with many
 552 interfaces (services). The following requirements/problems apply:
 553
 554 * Each interface service check should be named with a prefix and a running number
 555 * Each interface has its own vlan tag
 556 * Some interfaces have QoS enabled
 557 * Additional attributes such as `display_name` or `notes, `notes_url` and `action_url` must be
 558 dynamically generated
 559
 560 By defining the `interfaces` dictionary with three example interfaces on the `core-switch`
 561 host object, you'll make sure to pass the storage required by the for loop in the service apply
 562 rule.
 563
 564
 565     object Host "core-switch" {
 566       import "generic-host"
 567       address = "127.0.0.1"
 568
 569       vars.interfaces["0"] = {
 570         port = 1
 571         vlan = "internal"
 572         address = "127.0.0.2"
 573         qos = "enabled"
 574       }
 575       vars.interfaces["1"] = {
 576         port = 2
 577         vlan = "mgmt"
 578         address = "127.0.1.2"
 579       }
 580       vars.interfaces["2"] = {
 581         port = 3
 582         vlan = "remote"
 583         address = "127.0.2.2"
 584       }
 585     }
 586
 587 You can also omit the `"if-"` string, then all generated service names are directly
 588 taken from the `if_name` variable value.
 589
 590 The config dictionary contains all key-value pairs for the specific interface in one
 591 loop cycle, like `port`, `vlan`, `address` and `qos` for the `0` interface.
 592
 593 By defining a default value for the custom attribute `qos` in the `vars` dictionary
 594 before adding the `config` dictionary we''ll ensure that this attribute is always defined.
 595
 596 After `vars` is fully populated, all object attributes can be set. For strings, you can use
 597 string concatention with the `+` operator.
 598
 599 You can also specifiy the check command that way.
 600
 601     apply Service "if-" for (if_name => config in host.vars.interfaces) {
 602       import "generic-service"
 603       check_command = "ping4"
 604
 605       vars.qos = "disabled"
 606       vars += config
 607
 608       display_name = "if-" + if_name + "-" + vars.vlan
 609
 610       notes = "Interface check for Port " + string(vars.port) + " in VLAN " + vars.vlan + " on Address " + vars.address + " QoS " + vars.qos
 611       notes_url = "http://foreman.company.com/hosts/" + host.name
 612       action_url = "http://snmp.checker.company.com/" + host.name + "if-" + if_name
 613     }
 614
 615 Note that numbers must be explicitely casted to string when adding to strings.
 616 This can be achieved by wrapping them into the [string()](16-language-reference.md#function-calls) function.
 617
 618 > **Tip**
 619 >
 620 > Building configuration in that dynamic way requires detailed information
 621 > of the generated objects. Use the `object list` [CLI command](8-cli-commands.md#cli-command-object)
 622 > after successful [configuration validation](8-cli-commands.md#config-validation).
 623
 624
 625 ### <a id="using-apply-object attributes"></a> Use Object Attributes in Apply Rules
 626
 627 Since apply rules are evaluated after the generic objects, you
 628 can reference existing host and/or service object attributes as
 629 values for any object attribute specified in that apply rule.
 630
 631     object Host "opennebula-host" {
 632       import "generic-host"
 633       address = "10.1.1.2"
 634
 635       vars.hosting["xyz"] = {
 636         http_uri = "/shop"
 637         customer_name = "Customer xyz"
 638         customer_id = "7568"
 639         support_contract = "gold"
 640       }
 641       vars.hosting["abc"] = {
 642         http_uri = "/shop"
 643         customer_name = "Customer xyz"
 644         customer_id = "7568"
 645         support_contract = "silver"
 646       }
 647     }
 648
 649     apply Service for (customer => config in host.vars.hosting) {
 650       import "generic-service"
 651       check_command = "ping4"
 652
 653       vars.qos = "disabled"
 654
 655       vars += config
 656
 657       vars.http_uri = "/" + vars.customer + "/" + config.http_uri
 658
 659       display_name = "Shop Check for " + vars.customer_name + "-" + vars.customer_id
 660
 661       notes = "Support contract: " + vars.support_contract + " for Customer " + vars.customer_name + " (" + vars.customer_id + ")."
 662
 663       notes_url = "http://foreman.company.com/hosts/" + host.name
 664       action_url = "http://snmp.checker.company.com/" + host.name + "/" + vars.customer_id
 665     }
 666
 667 ## <a id="groups"></a> Groups
 668
 669 A group is a collection of similar objects. Groups are primarily used as a
 670 visualization aid in web interfaces.
 671
 672 Group membership is defined at the respective object itself. If
 673 you have a hostgroup name `windows` for example, and want to assign
 674 specific hosts to this group for later viewing the group on your
 675 alert dashboard, first create a HostGroup object:
 676
 677     object HostGroup "windows" {
 678       display_name = "Windows Servers"
 679     }
 680
 681 Then add your hosts to this group:
 682
 683     template Host "windows-server" {
 684       groups += [ "windows" ]
 685     }
 686
 687     object Host "mssql-srv1" {
 688       import "windows-server"
 689
 690       vars.mssql_port = 1433
 691     }
 692
 693     object Host "mssql-srv2" {
 694       import "windows-server"
 695
 696       vars.mssql_port = 1433
 697     }
 698
 699 This can be done for service and user groups the same way:
 700
 701     object UserGroup "windows-mssql-admins" {
 702       display_name = "Windows MSSQL Admins"
 703     }
 704
 705     template User "generic-windows-mssql-users" {
 706       groups += [ "windows-mssql-admins" ]
 707     }
 708
 709     object User "win-mssql-noc" {
 710       import "generic-windows-mssql-users"
 711
 712       email = "noc@example.com"
 713     }
 714
 715     object User "win-mssql-ops" {
 716       import "generic-windows-mssql-users"
 717
 718       email = "ops@example.com"
 719     }
 720
 721 ### <a id="group-assign-intro"></a> Group Membership Assign
 722
 723 Instead of manually assigning each object to a group you can also assign objects
 724 to a group based on their attributes:
 725
 726     object HostGroup "prod-mssql" {
 727       display_name = "Production MSSQL Servers"
 728
 729       assign where host.vars.mssql_port && host.vars.prod_mysql_db
 730       ignore where host.vars.test_server == true
 731       ignore where match("*internal", host.name)
 732     }
 733
 734 In this example all hosts with the `vars` attribute `mssql_port`
 735 will be added as members to the host group `mssql`. However, all `*internal`
 736 hosts or with the `test_server` attribute set to `true` are not added to this
 737 group.
 738
 739 Details on the `assign where` syntax can be found in the
 740 [Language Reference](16-language-reference.md#apply)
 741
 742 ## <a id="notifications"></a> Notifications
 743
 744 Notifications for service and host problems are an integral part of your
 745 monitoring setup.
 746
 747 When a host or service is in a downtime, a problem has been acknowledged or
 748 the dependency logic determined that the host/service is unreachable, no
 749 notifications are sent. You can configure additional type and state filters
 750 refining the notifications being actually sent.
 751
 752 There are many ways of sending notifications, e.g. by e-mail, XMPP,
 753 IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications.
 754 Instead it relies on external mechanisms such as shell scripts to notify users.
 755
 756 A notification specification requires one or more users (and/or user groups)
 757 who will be notified in case of problems. These users must have all custom
 758 attributes defined which will be used in the `NotificationCommand` on execution.
 759
 760 The user `icingaadmin` in the example below will get notified only on `WARNING` and
 761 `CRITICAL` states and `problem` and `recovery` notification types.
 762
 763     object User "icingaadmin" {
 764       display_name = "Icinga 2 Admin"
 765       enable_notifications = true
 766       states = [ OK, Warning, Critical ]
 767       types = [ Problem, Recovery ]
 768       email = "icinga@localhost"
 769     }
 770
 771 If you don't set the `states` and `types` configuration attributes for the `User`
 772 object, notifications for all states and types will be sent.
 773
 774 Details on troubleshooting notification problems can be found [here](13-troubleshooting.md#troubleshooting).
 775
 776 > **Note**
 777 >
 778 > Make sure that the [notification](8-cli-commands.md#features) feature is enabled
 779 > in order to execute notification commands.
 780
 781 You should choose which information you (and your notified users) are interested in
 782 case of emergency, and also which information does not provide any value to you and
 783 your environment.
 784
 785 An example notification command is explained [here](3-monitoring-basics.md#notification-commands).
 786
 787 You can add all shared attributes to a `Notification` template which is inherited
 788 to the defined notifications. That way you'll save duplicated attributes in each
 789 `Notification` object. Attributes can be overridden locally.
 790
 791     template Notification "generic-notification" {
 792       interval = 15m
 793
 794       command = "mail-service-notification"
 795
 796       states = [ Warning, Critical, Unknown ]
 797       types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
 798                 FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
 799
 800       period = "24x7"
 801     }
 802
 803 The time period `24x7` is included as example configuration with Icinga 2.
 804
 805 Use the `apply` keyword to create `Notification` objects for your services:
 806
 807     apply Notification "notify-cust-xy-mysql" to Service {
 808       import "generic-notification"
 809
 810       users = [ "noc-xy", "mgmt-xy" ]
 811
 812       assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true
 813       ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true)
 814     }
 815
 816
 817 Instead of assigning users to notifications, you can also add the `user_groups`
 818 attribute with a list of user groups to the `Notification` object. Icinga 2 will
 819 send notifications to all group members.
 820
 821 > **Note**
 822 >
 823 > Only users who have been notified of a problem before  (`Warning`, `Critical`, `Unknown`
 824 > states for services, `Down` for hosts) will receive `Recovery` notifications.
 825
 826 ### <a id="notification-escalations"></a> Notification Escalations
 827
 828 When a problem notification is sent and a problem still exists at the time of re-notification
 829 you may want to escalate the problem to the next support level. A different approach
 830 is to configure the default notification by email, and escalate the problem via SMS
 831 if not already solved.
 832
 833 You can define notification start and end times as additional configuration
 834 attributes making the `Notification` object a so-called `notification escalation`.
 835 Using templates you can share the basic notification attributes such as users or the
 836 `interval` (and override them for the escalation then).
 837
 838 Using the example from above, you can define additional users being escalated for SMS
 839 notifications between start and end time.
 840
 841     object User "icinga-oncall-2nd-level" {
 842       display_name = "Icinga 2nd Level"
 843
 844       vars.mobile = "+1 555 424642"
 845     }
 846
 847     object User "icinga-oncall-1st-level" {
 848       display_name = "Icinga 1st Level"
 849
 850       vars.mobile = "+1 555 424642"
 851     }
 852
 853 Define an additional [NotificationCommand](#notification) for SMS notifications.
 854
 855 > **Note**
 856 >
 857 > The example is not complete as there are many different SMS providers.
 858 > Please note that sending SMS notifications will require an SMS provider
 859 > or local hardware with a SIM card active.
 860
 861     object NotificationCommand "sms-notification" {
 862        command = [
 863          PluginDir + "/send_sms_notification",
 864          "$mobile$",
 865          "..."
 866     }
 867
 868 The two new notification escalations are added onto the local host
 869 and its service `ping4` using the `generic-notification` template.
 870 The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification`
 871 command) after `30m` until `1h`.
 872
 873 > **Note**
 874 >
 875 > The `interval` was set to 15m in the `generic-notification`
 876 > template example. Lower that value in your escalations by using a secondary
 877 > template or by overriding the attribute directly in the `notifications` array
 878 > position for `escalation-sms-2nd-level`.
 879
 880 If the problem does not get resolved nor acknowledged preventing further notifications
 881 the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was
 882 notified, but only for one hour (`2h` as `end` key for the `times` dictionary).
 883
 884     apply Notification "mail" to Service {
 885       import "generic-notification"
 886
 887       command = "mail-notification"
 888       users = [ "icingaadmin" ]
 889
 890       assign where service.name == "ping4"
 891     }
 892
 893     apply Notification "escalation-sms-2nd-level" to Service {
 894       import "generic-notification"
 895
 896       command = "sms-notification"
 897       users = [ "icinga-oncall-2nd-level" ]
 898
 899       times = {
 900         begin = 30m
 901         end = 1h
 902       }
 903
 904       assign where service.name == "ping4"
 905     }
 906
 907     apply Notification "escalation-sms-1st-level" to Service {
 908       import "generic-notification"
 909
 910       command = "sms-notification"
 911       users = [ "icinga-oncall-1st-level" ]
 912
 913       times = {
 914         begin = 1h
 915         end = 2h
 916       }
 917
 918       assign where service.name == "ping4"
 919     }
 920
 921 ### <a id="notification-delay"></a> Notification Delay
 922
 923 Sometimes the problem in question should not be notified when the notification is due
 924 (the object reaching the `HARD` state) but a defined time duration afterwards. In Icinga 2
 925 you can use the `times` dictionary and set `begin = 15m` as key and value if you want to
 926 postpone the notification window for 15 minutes. Leave out the `end` key - if not set,
 927 Icinga 2 will not check against any end time for this notification. Make sure to
 928 specify a relatively low notification `interval` to get notified soon enough again.
 929
 930     apply Notification "mail" to Service {
 931       import "generic-notification"
 932
 933       command = "mail-notification"
 934       users = [ "icingaadmin" ]
 935
 936       interval = 5m
 937
 938       times.begin = 15m // delay notification window
 939
 940       assign where service.name == "ping4"
 941     }
 942
 943 ### <a id="disable-renotification"></a> Disable Re-notifications
 944
 945 If you prefer to be notified only once, you can disable re-notifications by setting the
 946 `interval` attribute to `0`.
 947
 948     apply Notification "notify-once" to Service {
 949       import "generic-notification"
 950
 951       command = "mail-notification"
 952       users = [ "icingaadmin" ]
 953
 954       interval = 0 // disable re-notification
 955
 956       assign where service.name == "ping4"
 957     }
 958
 959 ### <a id="notification-filters-state-type"></a> Notification Filters by State and Type
 960
 961 If there are no notification state and type filter attributes defined at the `Notification`
 962 or `User` object Icinga 2 assumes that all states and types are being notified.
 963
 964 Available state and type filters for notifications are:
 965
 966     template Notification "generic-notification" {
 967
 968       states = [ Warning, Critical, Unknown ]
 969       types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
 970                 FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
 971     }
 972
 973 If you are familiar with Icinga 1.x `notification_options` please note that they have been split
 974 into type and state to allow more fine granular filtering for example on downtimes and flapping.
 975 You can filter for acknowledgements and custom notifications too.s and custom notifications too.
 976
 977
 978 ## <a id="commands"></a> Commands
 979
 980 Icinga 2 uses three different command object types to specify how
 981 checks should be performed, notifications should be sent, and
 982 events should be handled.
 983
 984 ### <a id="check-commands"></a> Check Commands
 985
 986 [CheckCommand](6-object-types.md#objecttype-checkcommand) objects define the command line how
 987 a check is called.
 988
 989 [CheckCommand](6-object-types.md#objecttype-checkcommand) objects are referenced by
 990 [Host](6-object-types.md#objecttype-host) and [Service](6-object-types.md#objecttype-service) objects
 991 using the `check_command` attribute.
 992
 993 > **Note**
 994 >
 995 > Make sure that the [checker](8-cli-commands.md#features) feature is enabled in order to
 996 > execute checks.
 997
 998 #### <a id="command-plugin-integration"></a> Integrate the Plugin with a CheckCommand Definition
 999
1000 [CheckCommand](6-object-types.md#objecttype-checkcommand) objects require the [ITL template](7-icinga-template-library.md#itl-plugin-check-command)
1001 `plugin-check-command` to support native plugin based check methods.
1002
1003 Unless you have done so already, download your check plugin and put it
1004 into the [PluginDir](5-configuring-icinga-2.md#constants-conf) directory. The following example uses the
1005 `check_disk` plugin contained in the Monitoring Plugins package.
1006
1007 The plugin path and all command arguments are made a list of
1008 double-quoted string arguments for proper shell escaping.
1009
1010 Call the `check_disk` plugin with the `--help` parameter to see
1011 all available options. Our example defines warning (`-w`) and
1012 critical (`-c`) thresholds for the disk usage. Without any
1013 partition defined (`-p`) it will check all local partitions.
1014
1015     icinga@icinga2 $ /usr/lib/nagios/plugins/check_disk --help
1016     ...
1017     This plugin checks the amount of used disk space on a mounted file system
1018     and generates an alert if free space is less than one of the threshold values
1019
1020
1021     Usage:
1022      check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device}
1023     [-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ]
1024     [-t timeout] [-u unit] [-v] [-X type] [-N type]
1025     ...
1026
1027 > **Note**
1028 >
1029 > Don't execute plugins as `root` and always use the absolute path to the plugin! Trust us.
1030
1031 Next step is to understand how command parameters are being passed from
1032 a host or service object, and add a [CheckCommand](6-object-types.md#objecttype-checkcommand)
1033 definition based on these required parameters and/or default values.
1034
1035 #### <a id="command-passing-parameters"></a> Passing Check Command Parameters from Host or Service
1036
1037 Check command parameters are defined as custom attributes which can be accessed as runtime macros
1038 by the executed check command.
1039
1040 Define the default check command custom attribute `disk_wfree` and `disk_cfree`
1041 (freely definable naming schema) and their default threshold values. You can
1042 then use these custom attributes as runtime macros for [command arguments](3-monitoring-basics.md#command-arguments)
1043 on the command line.
1044
1045 > **Tip**
1046 >
1047 > Use a common command type as prefix for your command arguments to increase
1048 > readability. `disk_wfree` helps understanding the context better than just
1049 > `wfree` as argument.
1050
1051 The default custom attributes can be overridden by the custom attributes
1052 defined in the service using the check command `my-disk`. The custom attributes
1053 can also be inherited from a parent template using additive inheritance (`+=`).
1054
1055     object CheckCommand "my-disk" {
1056       import "plugin-check-command"
1057
1058       command = [ PluginDir + "/check_disk" ]
1059
1060       arguments = {
1061         "-w" = "$disk_wfree$%"
1062         "-c" = "$disk_cfree$%"
1063         "-W" = "$disk_inode_wfree$%"
1064         "-K" = "$disk_inode_cfree$%"
1065         "-p" = "$disk_partitions$"
1066         "-x" = "$disk_partitions_excluded$"
1067       }
1068
1069       vars.disk_wfree = 20
1070       vars.disk_cfree = 10
1071     }
1072
1073 > **Note**
1074 >
1075 > A proper example for the `check_disk` plugin is already shipped with Icinga 2
1076 > ready to use with the [plugin check commands](7-icinga-template-library.md#plugin-check-command-disk).
1077
1078 The host `localhost` with the applied service `basic-partitions` checks a basic set of disk partitions
1079 with modified custom attributes (warning thresholds at `10%`, critical thresholds at `5%`
1080 free disk space).
1081
1082 The custom attribute `disk_partition` can either hold a single string or an array of
1083 string values for passing multiple partitions to the `check_disk` check plugin.
1084
1085     object Host "my-server" {
1086       import "generic-host"
1087       address = "127.0.0.1"
1088       address6 = "::1"
1089
1090       vars.local_disks["basic-partitions"] = {
1091         disk_partitions = [ "/", "/tmp", "/var", "/home" ]
1092       }
1093     }
1094
1095     apply Service for (disk => config in host.vars.local_disks) {
1096       import "generic-service"
1097       check_command = "my-disk"
1098
1099       vars += config
1100
1101       vars.disk_wfree = 10
1102       vars.disk_cfree = 5
1103     }
1104
1105
1106 More details on using arrays in custom attributes can be found in
1107 [this chapter](#runtime-custom-attributes).
1108
1109
1110 #### <a id="command-arguments"></a> Command Arguments
1111
1112 By defining a check command line using the `command` attribute Icinga 2
1113 will resolve all macros in the static string or array. Sometimes it is
1114 required to extend the arguments list based on a met condition evaluated
1115 at command execution. Or making arguments optional - only set if the
1116 macro value can be resolved by Icinga 2.
1117
1118     object CheckCommand "check_http" {
1119       import "plugin-check-command"
1120
1121       command = [ PluginDir + "/check_http" ]
1122
1123       arguments = {
1124         "-H" = "$http_vhost$"
1125         "-I" = "$http_address$"
1126         "-u" = "$http_uri$"
1127         "-p" = "$http_port$"
1128         "-S" = {
1129           set_if = "$http_ssl$"
1130         }
1131         "--sni" = {
1132           set_if = "$http_sni$"
1133         }
1134         "-a" = {
1135           value = "$http_auth_pair$"
1136           description = "Username:password on sites with basic authentication"
1137         }
1138         "--no-body" = {
1139           set_if = "$http_ignore_body$"
1140         }
1141         "-r" = "$http_expect_body_regex$"
1142         "-w" = "$http_warn_time$"
1143         "-c" = "$http_critical_time$"
1144         "-e" = "$http_expect$"
1145       }
1146
1147       vars.http_address = "$address$"
1148       vars.http_ssl = false
1149       vars.http_sni = false
1150     }
1151
1152 The example shows the `check_http` check command defining the most common
1153 arguments. Each of them is optional by default and will be omitted if
1154 the value is not set. For example if the service calling the check command
1155 does not have `vars.http_port` set, it won't get added to the command
1156 line.
1157
1158 If the `vars.http_ssl` custom attribute is set in the service, host or command
1159 object definition, Icinga 2 will add the `-S` argument based on the `set_if`
1160 numeric value to the command line. String values are not supported.
1161
1162 If the macro value cannot be resolved, Icinga 2 will not add the defined argument
1163 to the final command argument array. Empty strings for macro values won't omit
1164 the argument.
1165
1166 That way you can use the `check_http` command definition for both, with and
1167 without SSL enabled checks saving you duplicated command definitions.
1168
1169 Details on all available options can be found in the
1170 [CheckCommand object definition](6-object-types.md#objecttype-checkcommand).
1171
1172
1173 ### <a id="notification-commands"></a> Notification Commands
1174
1175 [NotificationCommand](6-object-types.md#objecttype-notificationcommand) objects define how notifications are delivered to external
1176 interfaces (E-Mail, XMPP, IRC, Twitter, etc).
1177
1178 [NotificationCommand](6-object-types.md#objecttype-notificationcommand) objects are referenced by
1179 [Notification](6-object-types.md#objecttype-notification) objects using the `command` attribute.
1180
1181 `NotificationCommand` objects require the [ITL template](7-icinga-template-library.md#itl-plugin-notification-command)
1182 `plugin-notification-command` to support native plugin-based notifications.
1183
1184 > **Note**
1185 >
1186 > Make sure that the [notification](8-cli-commands.md#features) feature is enabled
1187 > in order to execute notification commands.
1188
1189 Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for
1190 the current check output) sending an email to the user(s) associated with the
1191 notification itself (`$user.email$`).
1192
1193 If you want to specify default values for some of the custom attribute definitions,
1194 you can add a `vars` dictionary as shown for the `CheckCommand` object.
1195
1196     object NotificationCommand "mail-service-notification" {
1197       import "plugin-notification-command"
1198
1199       command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ]
1200
1201       env = {
1202         NOTIFICATIONTYPE = "$notification.type$"
1203         SERVICEDESC = "$service.name$"
1204         HOSTALIAS = "$host.display_name$"
1205         HOSTADDRESS = "$address$"
1206         SERVICESTATE = "$service.state$"
1207         LONGDATETIME = "$icinga.long_date_time$"
1208         SERVICEOUTPUT = "$service.output$"
1209         NOTIFICATIONAUTHORNAME = "$notification.author$"
1210         NOTIFICATIONCOMMENT = "$notification.comment$"
1211         HOSTDISPLAYNAME = "$host.display_name$"
1212         SERVICEDISPLAYNAME = "$service.display_name$"
1213         USEREMAIL = "$user.email$"
1214       }
1215     }
1216
1217 The command attribute in the `mail-service-notification` command refers to the following
1218 shell script. The macros specified in the `env` array are exported
1219 as environment variables and can be used in the notification script:
1220
1221     #!/usr/bin/env bash
1222     template=$(cat <<TEMPLATE
1223     ***** Icinga  *****
1224
1225     Notification Type: $NOTIFICATIONTYPE
1226
1227     Service: $SERVICEDESC
1228     Host: $HOSTALIAS
1229     Address: $HOSTADDRESS
1230     State: $SERVICESTATE
1231
1232     Date/Time: $LONGDATETIME
1233
1234     Additional Info: $SERVICEOUTPUT
1235
1236     Comment: [$NOTIFICATIONAUTHORNAME] $NOTIFICATIONCOMMENT
1237     TEMPLATE
1238     )
1239
1240     /usr/bin/printf "%b" $template | mail -s "$NOTIFICATIONTYPE - $HOSTDISPLAYNAME - $SERVICEDISPLAYNAME is $SERVICESTATE" $USEREMAIL
1241
1242 > **Note**
1243 >
1244 > This example is for `exim` only. Requires changes for `sendmail` and
1245 > other MTAs.
1246
1247 While it's possible to specify the entire notification command right
1248 in the NotificationCommand object it is generally advisable to create a
1249 shell script in the `/etc/icinga2/scripts` directory and have the
1250 NotificationCommand object refer to that.
1251
1252 ### <a id="event-commands"></a> Event Commands
1253
1254 Unlike notifications event commands for hosts/services are called on every
1255 check execution if one of these conditions match:
1256
1257 * The host/service is in a [soft state](3-monitoring-basics.md#hard-soft-states)
1258 * The host/service state changes into a [hard state](3-monitoring-basics.md#hard-soft-states)
1259 * The host/service state recovers from a [soft or hard state](3-monitoring-basics.md#hard-soft-states) to [OK](3-monitoring-basics.md#service-states)/[Up](3-monitoring-basics.md#host-states)
1260
1261 [EventCommand](6-object-types.md#objecttype-eventcommand) objects are referenced by
1262 [Host](6-object-types.md#objecttype-host) and [Service](6-object-types.md#objecttype-service) objects
1263 using the `event_command` attribute.
1264
1265 Therefore the `EventCommand` object should define a command line
1266 evaluating the current service state and other service runtime attributes
1267 available through runtime vars. Runtime macros such as `$service.state_type$`
1268 and `$service.state$` will be processed by Icinga 2 helping on fine-granular
1269 events being triggered.
1270
1271 Common use case scenarios are a failing HTTP check requiring an immediate
1272 restart via event command, or if an application is locked and requires
1273 a restart upon detection.
1274
1275 `EventCommand` objects require the ITL template `plugin-event-command`
1276 to support native plugin based checks.
1277
1278 #### <a id="event-command-restart-service-daemon"></a> Use Event Commands to Restart Service Daemon
1279
1280 The following example will triggert a restart of the `httpd` daemon
1281 via ssh when the `http` service check fails. If the service state is
1282 `OK`, it will not trigger any event action.
1283
1284 Requirements:
1285
1286 * ssh connection
1287 * icinga user with public key authentication
1288 * icinga user with sudo permissions for restarting the httpd daemon.
1289
1290 Example on Debian:
1291
1292     # ls /home/icinga/.ssh/
1293     authorized_keys
1294
1295     # visudo
1296     icinga  ALL=(ALL) NOPASSWD: /etc/init.d/apache2 restart
1297
1298
1299 Define a generic [EventCommand](6-object-types.md#objecttype-eventcommand) object `event_by_ssh`
1300 which can be used for all event commands triggered using ssh:
1301
1302     /* pass event commands through ssh */
1303     object EventCommand "event_by_ssh" {
1304       import "plugin-event-command"
1305
1306       command = [ PluginDir + "/check_by_ssh" ]
1307
1308       arguments = {
1309         "-H" = "$event_by_ssh_address$"
1310         "-p" = "$event_by_ssh_port$"
1311         "-C" = "$event_by_ssh_command$"
1312         "-l" = "$event_by_ssh_logname$"
1313         "-i" = "$event_by_ssh_identity$"
1314         "-q" = {
1315           set_if = "$event_by_ssh_quiet$"
1316         }
1317         "-w" = "$event_by_ssh_warn$"
1318         "-c" = "$event_by_ssh_crit$"
1319         "-t" = "$event_by_ssh_timeout$"
1320       }
1321
1322       vars.event_by_ssh_address = "$address$"
1323       vars.event_by_ssh_quiet = false
1324     }
1325
1326 The actual event command only passes the `event_by_ssh_command` attribute.
1327 The `event_by_ssh_service` custom attribute takes care of passing the correct
1328 daemon name, while `test $service.state_id$ -gt 0` makes sure that the daemon
1329 is only restarted when the service is an a not `OK` state.
1330
1331
1332     object EventCommand "event_by_ssh_restart_service" {
1333       import "event_by_ssh"
1334
1335       //only restart the daemon if state > 0 (not-ok)
1336       //requires sudo permissions for the icinga user
1337       vars.event_by_ssh_command = "test $service.state_id$ -gt 0 && sudo /etc/init.d/$event_by_ssh_service$ restart"
1338     }
1339
1340
1341 Now set the `event_command` attribute to `event_by_ssh_restart_service` and tell it
1342 which service should be restarted using the `event_by_ssh_service` attribute.
1343
1344     object Service "http" {
1345       import "generic-service"
1346       host_name = "remote-http-host"
1347       check_command = "http"
1348
1349       event_command = "event_by_ssh_restart_service"
1350       vars.event_by_ssh_service = "$host.vars.httpd_name$"
1351
1352       //vars.event_by_ssh_logname = "icinga"
1353       //vars.event_by_ssh_identity = "/home/icinga/.ssh/id_rsa.pub"
1354     }
1355
1356
1357 Each host with this service then must define the `httpd_name` custom attribute
1358 (for example generated from your cmdb):
1359
1360     object Host "remote-http-host" {
1361       import "generic-host"
1362       address = "192.168.1.100"
1363
1364       vars.httpd_name = "apache2"
1365     }
1366
1367 You can testdrive this example by manually stopping the `httpd` daemon
1368 on your `remote-http-host`. Enable the `debuglog` feature and tail the
1369 `/var/log/icinga2/debug.log` file.
1370
1371 Remote Host Terminal:
1372
1373     # date; service apache2 status
1374     Mon Sep 15 18:57:39 CEST 2014
1375     Apache2 is running (pid 23651).
1376     # date; service apache2 stop
1377     Mon Sep 15 18:57:47 CEST 2014
1378     [ ok ] Stopping web server: apache2 ... waiting .
1379
1380 Icinga 2 Host Terminal:
1381
1382     [2014-09-15 18:58:32 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_http' '-I' '192.168.1.100': PID 32622
1383     [2014-09-15 18:58:32 +0200] notice/Process: PID 32622 ('/usr/lib64/nagios/plugins/check_http' '-I' '192.168.1.100') terminated with exit code 2
1384     [2014-09-15 18:58:32 +0200] notice/Checkable: State Change: Checkable remote-http-host!http soft state change from OK to CRITICAL detected.
1385     [2014-09-15 18:58:32 +0200] notice/Checkable: Executing event handler 'event_by_ssh_restart_service' for service 'remote-http-host!http'
1386     [2014-09-15 18:58:32 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_by_ssh' '-C' 'test 2 -gt 0 && sudo /etc/init.d/apache2 restart' '-H' '192.168.1.100': PID 32623
1387     [2014-09-15 18:58:33 +0200] notice/Process: PID 32623 ('/usr/lib64/nagios/plugins/check_by_ssh' '-C' 'test 2 -gt 0 && sudo /etc/init.d/apache2 restart' '-H' '192.168.1.100') terminated with exit code 0
1388
1389 Remote Host Terminal:
1390
1391     # date; service apache2 status
1392     Mon Sep 15 18:58:44 CEST 2014
1393     Apache2 is running (pid 24908).
1394
1395
1396 ## <a id="dependencies"></a> Dependencies
1397
1398 Icinga 2 uses host and service [Dependency](6-object-types.md#objecttype-dependency) objects
1399 for determing their network reachability.
1400
1401 A service can depend on a host, and vice versa. A service has an implicit
1402 dependency (parent) to its host. A host to host dependency acts implicitly
1403 as host parent relation.
1404 When dependencies are calculated, not only the immediate parent is taken into
1405 account but all parents are inherited.
1406
1407 The `parent_host_name` and `parent_service_name` attributes are mandatory for
1408 service dependencies, `parent_host_name` is required for host dependencies.
1409 [Apply rules](3-monitoring-basics.md#using-apply) will allow you to
1410 [determine these attributes](3-monitoring-basics.md#dependencies-apply-custom-attributes) in a more
1411 dynamic fashion if required.
1412
1413     parent_host_name = "core-router"
1414     parent_service_name = "uplink-port"
1415
1416 Notifications are suppressed by default if a host or service becomes unreachable.
1417 You can control that option by defining the `disable_notifications` attribute.
1418
1419     disable_notifications = false
1420
1421 The dependency state filter must be defined based on the parent object being
1422 either a host (`Up`, `Down`) or a service (`OK`, `Warning`, `Critical`, `Unknown`).
1423
1424 The following example will make the dependency fail and trigger it if the parent
1425 object is **not** in one of these states:
1426
1427     states = [ OK, Critical, Unknown ]
1428
1429 Rephrased: If the parent service object changes into the `Warning` state, this
1430 dependency will fail and render all child objects (hosts or services) unreachable.
1431
1432 You can determine the child's reachability by querying the `is_reachable` attribute
1433 in for example [DB IDO](19-apendix.md#schema-db-ido-extensions).
1434
1435 ### <a id="dependencies-implicit-host-service"></a> Implicit Dependencies for Services on Host
1436
1437 Icinga 2 automatically adds an implicit dependency for services on their host. That way
1438 service notifications are suppressed when a host is `DOWN` or `UNREACHABLE`. This dependency
1439 does not overwrite other dependencies and implicitely sets `disable_notifications = true` and
1440 `states = [ Up ]` for all service objects.
1441
1442 Service checks are still executed. If you want to prevent them from happening, you can
1443 apply the following dependency to all services setting their host as `parent_host_name`
1444 and disabling the checks. `assign where true` matches on all `Service` objects.
1445
1446     apply Dependency "disable-host-service-checks" to Service {
1447       disable_checks = true
1448       assign where true
1449     }
1450
1451 ### <a id="dependencies-network-reachability"></a> Dependencies for Network Reachability
1452
1453 A common scenario is the Icinga 2 server behind a router. Checking internet
1454 access by pinging the Google DNS server `google-dns` is a common method, but
1455 will fail in case the `dsl-router` host is down. Therefore the example below
1456 defines a host dependency which acts implicitly as parent relation too.
1457
1458 Furthermore the host may be reachable but ping probes are dropped by the
1459 router's firewall. In case the `dsl-router``ping4` service check fails, all
1460 further checks for the `ping4` service on host `google-dns` service should
1461 be suppressed. This is achieved by setting the `disable_checks` attribute to `true`.
1462
1463     object Host "dsl-router" {
1464       import "generic-host"
1465       address = "192.168.1.1"
1466     }
1467
1468     object Host "google-dns" {
1469       import "generic-host"
1470       address = "8.8.8.8"
1471     }
1472
1473     apply Service "ping4" {
1474       import "generic-service"
1475
1476       check_command = "ping4"
1477
1478       assign where host.address
1479     }
1480
1481     apply Dependency "internet" to Host {
1482       parent_host_name = "dsl-router"
1483       disable_checks = true
1484       disable_notifications = true
1485
1486       assign where host.name != "dsl-router"
1487     }
1488
1489     apply Dependency "internet" to Service {
1490       parent_host_name = "dsl-router"
1491       parent_service_name = "ping4"
1492       disable_checks = true
1493
1494       assign where host.name != "dsl-router"
1495     }
1496
1497 ### <a id="dependencies-apply-custom-attributes"></a> Apply Dependencies based on Custom Attributes
1498
1499 You can use [apply rules](3-monitoring-basics.md#using-apply) to set parent or
1500 child attributes e.g. `parent_host_name`to other object's
1501 attributes.
1502
1503 A common example are virtual machines hosted on a master. The object
1504 name of that master is auto-generated from your CMDB or VMWare inventory
1505 into the host's custom attributes (or a generic template for your
1506 cloud).
1507
1508 Define your master host object:
1509
1510     /* your master */
1511     object Host "master.example.com" {
1512       import "generic-host"
1513     }
1514
1515 Add a generic template defining all common host attributes:
1516
1517     /* generic template for your virtual machines */
1518     template Host "generic-vm" {
1519       import "generic-host"
1520     }
1521
1522 Add a template for all hosts on your example.com cloud setting
1523 custom attribute `vm_parent` to `master.example.com`:
1524
1525     template Host "generic-vm-example.com" {
1526       import "generic-vm"
1527       vars.vm_parent = "master.example.com"
1528     }
1529
1530 Define your guest hosts:
1531
1532     object Host "www.example1.com" {
1533       import "generic-vm-master.example.com"
1534     }
1535
1536     object Host "www.example2.com" {
1537       import "generic-vm-master.example.com"
1538     }
1539
1540 Apply the host dependency to all child hosts importing the
1541 `generic-vm` template and set the `parent_host_name`
1542 to the previously defined custom attribute `host.vars.vm_parent`.
1543
1544     apply Dependency "vm-host-to-parent-master" to Host {
1545       parent_host_name = host.vars.vm_parent
1546       assign where "generic-vm" in host.templates
1547     }
1548
1549 You can extend this example, and make your services depend on the
1550 `master.example.com` host too. Their local scope allows you to use
1551 `host.vars.vm_parent` similar to the example above.
1552
1553     apply Dependency "vm-service-to-parent-master" to Service {
1554       parent_host_name = host.vars.vm_parent
1555       assign where "generic-vm" in host.templates
1556     }
1557
1558 That way you don't need to wait for your guest hosts becoming
1559 unreachable when the master host goes down. Instead the services
1560 will detect their reachability immediately when executing checks.
1561
1562 > **Note**
1563 >
1564 > This method with setting locally scoped variables only works in
1565 > apply rules, but not in object definitions.
1566
1567
1568 ### <a id="dependencies-agent-checks"></a> Dependencies for Agent Checks
1569
1570 Another classic example are agent based checks. You would define a health check
1571 for the agent daemon responding to your requests, and make all other services
1572 querying that daemon depend on that health check.
1573
1574 The following configuration defines two nrpe based service checks `nrpe-load`
1575 and `nrpe-disk` applied to the `nrpe-server`. The health check is defined as
1576 `nrpe-health` service.
1577
1578     apply Service "nrpe-health" {
1579       import "generic-service"
1580       check_command = "nrpe"
1581       assign where match("nrpe-*", host.name)
1582     }
1583
1584     apply Service "nrpe-load" {
1585       import "generic-service"
1586       check_command = "nrpe"
1587       vars.nrpe_command = "check_load"
1588       assign where match("nrpe-*", host.name)
1589     }
1590
1591     apply Service "nrpe-disk" {
1592       import "generic-service"
1593       check_command = "nrpe"
1594       vars.nrpe_command = "check_disk"
1595       assign where match("nrpe-*", host.name)
1596     }
1597
1598     object Host "nrpe-server" {
1599       import "generic-host"
1600       address = "192.168.1.5"
1601     }
1602
1603     apply Dependency "disable-nrpe-checks" to Service {
1604       parent_service_name = "nrpe-health"
1605
1606       states = [ OK ]
1607       disable_checks = true
1608       disable_notifications = true
1609       assign where service.check_command == "nrpe"
1610       ignore where service.name == "nrpe-health"
1611     }
1612
1613 The `disable-nrpe-checks` dependency is applied to all services
1614 on the `nrpe-service` host using the `nrpe` check_command attribute
1615 but not the `nrpe-health` service itself.
1616
1617
1618