granicus.if.org Git - icinga2/blob - doc/3-monitoring-basics.md

   1 # <a id="monitoring-basics"></a> Monitoring Basics
   2
   3 This part of the Icinga 2 documentation provides an overview of all the basic
   4 monitoring concepts you need to know to run Icinga 2.
   5
   6 ## <a id="hosts-services"></a> Hosts and Services
   7
   8 Icinga 2 can be used to monitor the availability of hosts and services. Hosts
   9 and services can be virtually anything which can be checked in some way:
  10
  11 * Network services (HTTP, SMTP, SNMP, SSH, etc.)
  12 * Printers
  13 * Switches / routers
  14 * Temperature sensors
  15 * Other local or network-accessible services
  16
  17 Host objects provide a mechanism to group services that are running
  18 on the same physical device.
  19
  20 Here is an example of a host object which defines two child services:
  21
  22     object Host "my-server1" {
  23       address = "10.0.0.1"
  24       check_command = "hostalive"
  25     }
  26
  27     object Service "ping4" {
  28       host_name = "my-server1"
  29       check_command = "ping4"
  30     }
  31
  32     object Service "http" {
  33       host_name = "my-server1"
  34       check_command = "http"
  35     }
  36
  37 The example creates two services `ping4` and `http` which belong to the
  38 host `my-server1`.
  39
  40 It also specifies that the host should perform its own check using the `hostalive`
  41 check command.
  42
  43 The `address` attribute is used by check commands to determine which network
  44 address is associated with the host object.
  45
  46 Details on troubleshooting check problems can be found [here](8-troubleshooting.md#troubleshooting).
  47
  48 ### <a id="host-states"></a> Host States
  49
  50 Hosts can be in any of the following states:
  51
  52   Name        | Description
  53   ------------|--------------
  54   UP          | The host is available.
  55   DOWN        | The host is unavailable.
  56
  57 ### <a id="service-states"></a> Service States
  58
  59 Services can be in any of the following states:
  60
  61   Name        | Description
  62   ------------|--------------
  63   OK          | The service is working properly.
  64   WARNING     | The service is experiencing some problems but is still considered to be in working condition.
  65   CRITICAL    | The service is in a critical state.
  66   UNKNOWN     | The check could not determine the service's state.
  67
  68 ### <a id="hard-soft-states"></a> Hard and Soft States
  69
  70 When detecting a problem with a host/service Icinga re-checks the object a number of
  71 times (based on the `max_check_attempts` and `retry_interval` settings) before sending
  72 notifications. This ensures that no unnecessary notifications are sent for
  73 transient failures. During this time the object is in a `SOFT` state.
  74
  75 After all re-checks have been executed and the object is still in a non-OK
  76 state the host/service switches to a `HARD` state and notifications are sent.
  77
  78   Name        | Description
  79   ------------|--------------
  80   HARD        | The host/service's state hasn't recently changed.
  81   SOFT        | The host/service has recently changed state and is being re-checked.
  82
  83 ### <a id="host-service-checks"></a> Host and Service Checks
  84
  85 Hosts and Services determine their state from a check result returned from a check
  86 execution to the Icinga 2 application. By default the `generic-host` example template
  87 will define `hostalive` as host check. If your host is unreachable for ping, you should
  88 consider using a different check command, for instance the `http` check command, or if
  89 there is no check available, the `dummy` check command.
  90
  91     object Host "uncheckable-host" {
  92       check_command = "dummy"
  93       vars.dummy_state = 1
  94       vars.dummy_text = "Pretending to be OK."
  95     }
  96
  97 Service checks could also use a `dummy` check, but the common strategy is to
  98 [integrate an existing plugin](3-monitoring-basics.md#command-plugin-integration) as
  99 [check command](3-monitoring-basics.md#check-commands) and [reference](3-monitoring-basics.md#command-passing-parameters)
 100 that in your [Service](12-object-types.md#objecttype-service) object definition.
 101
 102 ## <a id="configuration-best-practice"></a> Configuration Best Practice
 103
 104 The [Getting Started](2-getting-started.md#getting-started) chapter already introduced various aspects
 105 of the Icinga 2 configuration language. If you are ready to configure additional
 106 hosts, services, notifications, dependencies, etc, you should think about the
 107 requirements first and then decide for a possible strategy.
 108
 109 There are many ways of creating Icinga 2 configuration objects:
 110
 111 * Manually with your preferred editor, for example vi(m), nano, notepad, etc.
 112 * Generated by a [configuration management too](2-getting-started.md#configuration-tools) such as Puppet, Chef, Ansible, etc.
 113 * A configuration addon for Icinga 2
 114 * A custom exporter script from your CMDB or inventory tool
 115 * your own.
 116
 117 In order to find the best strategy for your own configuration, ask yourself the following questions:
 118
 119 * Do your hosts share a common group of services (for example linux hosts with disk, load, etc checks)?
 120 * Only a small set of users receives notifications and escalations for all hosts/services?
 121
 122 If you can at least answer one of these questions with yes, look for the [apply rules](3-monitoring-basics.md#using-apply) logic
 123 instead of defining objects on a per host and service basis.
 124
 125 * You are required to define specific configuration for each host/service?
 126 * Does your configuration generation tool already know about the host-service-relationship?
 127
 128 Then you should look for the object specific configuration setting `host_name` etc accordingly.
 129
 130 Finding the best files and directory tree for your configuration is up to you. Make sure that
 131 the [icinga2.conf](2-getting-started.md#icinga2-conf) configuration file includes them, and then think about:
 132
 133 * tree-based on locations, hostgroups, specific host attributes with sub levels of directories.
 134 * flat `hosts.conf`, `services.conf`, etc files for rule based configuration.
 135 * generated configuration with one file per host and a global configuration for groups, users, etc.
 136 * one big file generated from an external application (probably a bad idea for maintaining changes).
 137 * your own.
 138
 139 In either way of choosing the right strategy you should additionally check the following:
 140
 141 * Are there any specific attributes describing the host/service you could set as `vars` custom attributes?
 142 You can later use them for applying assign/ignore rules, or export them into external interfaces.
 143 * Put hosts into hostgroups, services into servicegroups and use these attributes for your apply rules.
 144 * Use templates to store generic attributes for your objects and apply rules making your configuration more readable.
 145 Details can be found in the [using templates](3-monitoring-basics.md#object-inheritance-using-templates) chapter.
 146 * Apply rules may overlap. Keep a central place (for example, [services.conf](2-getting-started.md#services-conf) or [notifications.conf](2-getting-started.md#notifications-conf)) storing
 147 the configuration instead of defining apply rules deep in your configuration tree.
 148 * Every plugin used as check, notification or event command requires a `Command` definition.
 149 Further details can be looked up in the [check commands](3-monitoring-basics.md#check-commands) chapter.
 150
 151 If you happen to have further questions, do not hesitate to join the [community support channels](https://support.icinga.org)
 152 and ask community members for their experience and best practices.
 153
 154
 155 ### <a id="object-inheritance-using-templates"></a> Object Inheritance Using Templates
 156
 157 Templates may be used to apply a set of identical attributes to more than one
 158 object:
 159
 160     template Service "generic-service" {
 161       max_check_attempts = 3
 162       check_interval = 5m
 163       retry_interval = 1m
 164       enable_perfdata = true
 165     }
 166
 167     template Service "ipv6-service {
 168       notes = "IPv6 critical != IPv4 broken."
 169     }
 170
 171     apply Service "ping4" {
 172       import "generic-service"
 173
 174       check_command = "ping4"
 175
 176       assign where host.address
 177     }
 178
 179     apply Service "ping6" {
 180       import "generic-service"
 181       import "ipv6-service"
 182
 183       check_command = "ping6"
 184
 185       assign where host.address6
 186     }
 187
 188
 189 In this example the `ping4` and `ping6` services inherit properties from the
 190 template `generic-service`. The `ping6` service additionally imports the `ipv6-service`
 191 template with the `notes` attribute.
 192
 193 Objects as well as templates themselves can import an arbitrary number of
 194 templates. Attributes inherited from a template can be overridden in the
 195 object if necessary.
 196
 197 You can import existing non-template objects into objects which
 198 requires you to use unique names for templates and objects sharing
 199 the same namespace.
 200
 201 Example for importing objects:
 202
 203     object CheckCommand "snmp-simple" {
 204       ...
 205       vars.snmp_defaults = ...
 206     }
 207
 208     object CheckCommand "snmp-advanced" {
 209       import "snmp-simple"
 210       ...
 211       vars.snmp_advanced = ...
 212     }
 213
 214 ### <a id="using-apply"></a> Apply objects based on rules
 215
 216 Instead of assigning each object ([Service](12-object-types.md#objecttype-service),
 217 [Notification](12-object-types.md#objecttype-notification), [Dependency](12-object-types.md#objecttype-dependency),
 218 [ScheduledDowntime](12-object-types.md#objecttype-scheduleddowntime))
 219 based on attribute identifiers for example `host_name` objects can be [applied](10-language-reference.md#apply).
 220
 221 Before you start using the apply rules keep the following in mind:
 222
 223 * Define the best match.
 224     * A set of unique [custom attributes](3-monitoring-basics.md#custom-attributes-apply) for these hosts/services?
 225     * Or [group](3-monitoring-basics.md#groups) memberships, e.g. a host being a member of a hostgroup, applying services to it?
 226     * A generic pattern [match](10-language-reference.md#function-calls) on the host/service name?
 227     * [Multiple expressions combined](3-monitoring-basics.md#using-apply-expressions) with `&&` or `||` [operators](10-language-reference.md#expression-operators)
 228 * All expressions must return a boolean value (an empty string is equal to `false` e.g.)
 229
 230 > **Note**
 231 >
 232 > You can set/override object attributes in apply rules using the respectively available
 233 > objects in that scope (host and/or service objects).
 234
 235 [Custom attributes](3-monitoring-basics.md#custom-attributes) can also store nested dictionaries and arrays. That way you can use them
 236 for not only matching for their existance or values in apply expressions, but also assign
 237 ("inherit") their values into the generated objected from apply rules.
 238
 239 * [Apply services to hosts](3-monitoring-basics.md#using-apply-services)
 240 * [Apply notifications to hosts and services](3-monitoring-basics.md#using-apply-notifications)
 241 * [Apply dependencies to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes)
 242 * [Apply scheduled downtimes to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes)
 243
 244 A more advanced example is using [apply with for loops on arrays or
 245 dictionaries](#using-apply-for) for example provided by
 246 [custom atttributes](3-monitoring-basics.md#custom-attributes-apply) or groups.
 247
 248 > **Tip**
 249 >
 250 > Building configuration in that dynamic way requires detailed information
 251 > of the generated objects. Use the `object list` [CLI command](5-cli-commands.md#cli-command-object)
 252 > after successful [configuration validation](5-cli-commands.md#config-validation).
 253
 254
 255 #### <a id="using-apply-expressions"></a> Apply Rules Expressions
 256
 257 You can use simple or advanced combinations of apply rule expressions. Each
 258 expression must evaluate into the boolean `true` value. An empty string
 259 will be for instance interpreted as `false`. In a similar fashion undefined
 260 attributes will return `false`.
 261
 262 Returns `false`:
 263
 264     assign where host.vars.attribute_does_not_exist
 265
 266 Multiple `assign where` condition rows are evaluated as `OR` condition.
 267
 268 You can combine multiple expressions for matching only a subset of objects. In some cases,
 269 you want to be able to add more than one assign/ignore where expression which matches
 270 a specific condition. To achieve this you can use the logical `and` and `or` operators.
 271
 272
 273 Match all `*mysql*` patterns in the host name and (`&&`) custom attribute `prod_mysql_db`
 274 matches the `db-*` pattern. All hosts with the custom attribute `test_server` set to `true`
 275 should be ignored, or any host name ending with `*internal` pattern.
 276
 277     object HostGroup "mysql-server" {
 278       display_name = "MySQL Server"
 279
 280       assign where match("*mysql*", host.name) && match("db-*", host.vars.prod_mysql_db)
 281       ignore where host.vars.test_server == true
 282       ignore where match("*internal", host.name)
 283     }
 284
 285 Similar example for advanced notification apply rule filters: If the service
 286 attribute `notes` contains the `has gold support 24x7` string `AND` one of the
 287 two condition passes: Either the `customer` host custom attribute is set to `customer-xy`
 288 `OR` the host custom attribute `always_notify` is set to `true`.
 289
 290 The notification is ignored for services whose host name ends with `*internal`
 291 `OR` the `priority` custom attribute is [less than](10-language-reference.md#expression-operators) `2`.
 292
 293     template Notification "cust-xy-notification" {
 294       users = [ "noc-xy", "mgmt-xy" ]
 295       command = "mail-service-notification"
 296     }
 297
 298     apply Notification "notify-cust-xy-mysql" to Service {
 299       import "cust-xy-notification"
 300
 301       assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true
 302       ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true)
 303     }
 304
 305
 306
 307
 308 #### <a id="using-apply-services"></a> Apply Services to Hosts
 309
 310 The sample configuration already ships a detailed example in [hosts.conf](2-getting-started.md#hosts-conf)
 311 and [services.conf](2-getting-started.md#services-conf) for this use case.
 312
 313 The example for `ssh` applies a service object to all hosts with the `address`
 314 attribute being defined and the custom attribute `os` set to the string `Linux` in `vars`.
 315
 316     apply Service "ssh" {
 317       import "generic-service"
 318
 319       check_command = "ssh"
 320
 321       assign where host.address && host.vars.os == "Linux"
 322     }
 323
 324
 325 Other detailed scenario examples are used in their respective chapters, for example
 326 [apply services with custom command arguments](3-monitoring-basics.md#using-apply-services-command-arguments).
 327
 328 #### <a id="using-apply-notifications"></a> Apply Notifications to Hosts and Services
 329
 330 Notifications are applied to specific targets (`Host` or `Service`) and work in a similar
 331 manner:
 332
 333
 334     apply Notification "mail-noc" to Service {
 335       import "mail-service-notification"
 336
 337       user_groups = [ "noc" ]
 338
 339       assign where host.vars.notification.mail
 340     }
 341
 342
 343 In this example the `mail-noc` notification will be created as object for all services having the
 344 `notification.mail` custom attribute defined. The notification command is set to `mail-service-notification`
 345 and all members of the user group `noc` will get notified.
 346
 347 #### <a id="using-apply-dependencies"></a> Apply Dependencies to Hosts and Services
 348
 349 Detailed examples can be found in the [dependencies](3-monitoring-basics.md#dependencies) chapter.
 350
 351 #### <a id="using-apply-scheduledowntimes"></a> Apply Recurring Downtimes to Hosts and Services
 352
 353 The sample confituration ships an example in [downtimes.conf](2-getting-started.md#downtimes-conf).
 354
 355 Detailed examples can be found in the [recurring downtimes](3-monitoring-basics.md#recurring-downtimes) chapter.
 356
 357
 358 #### <a id="using-apply-for"></a> Using Apply For Rules
 359
 360 Next to the standard way of using apply rules there is the requirement of generating
 361 apply rules objects based on set (array or dictionary). That way you'll save quite
 362 of a lot of duplicated apply rules by combining them into one generic generating
 363 the object name with or without a prefix.
 364
 365 The sample configuration already ships a detailed example in [hosts.conf](2-getting-started.md#hosts-conf)
 366 and [services.conf](2-getting-started.md#services-conf) for this use case.
 367
 368 Imagine a different example: You are monitoring your switch (hosts) with many
 369 interfaces (services). The following requirements/problems apply:
 370
 371 * Each interface service check should be named with a prefix and a running number
 372 * Each interface has its own vlan tag
 373 * Some interfaces have QoS enabled
 374 * Additional attributes such as `display_name` or `notes, `notes_url` and `action_url` must be
 375 dynamically generated
 376
 377 By defining the `interfaces` dictionary with three example interfaces on the `core-switch`
 378 host object, you'll make sure to pass the storage required by the for loop in the service apply
 379 rule.
 380
 381
 382     object Host "core-switch" {
 383       import "generic-host"
 384       address = "127.0.0.1"
 385
 386       vars.interfaces["0"] = {
 387         port = 1
 388         vlan = "internal"
 389         address = "127.0.0.2"
 390         qos = "enabled"
 391       }
 392       vars.interfaces["1"] = {
 393         port = 2
 394         vlan = "mgmt"
 395         address = "127.0.1.2"
 396       }
 397       vars.interfaces["2"] = {
 398         port = 3
 399         vlan = "remote"
 400         address = "127.0.2.2"
 401       }
 402     }
 403
 404 You can also omit the `"if-"` string, then all generated service names are directly
 405 taken from the `if_name` variable value.
 406
 407 The config dictionary contains all key-value pairs for the specific interface in one
 408 loop cycle, like `port`, `vlan`, `address` and `qos` for the `0` interface.
 409
 410 By defining a default value for the custom attribute `qos` in the `vars` dictionary
 411 before adding the `config` dictionary we''ll ensure that this attribute is always defined.
 412
 413 After `vars` is fully populated, all object attributes can be set. For strings, you can use
 414 string concatention with the `+` operator.
 415
 416 You can also specifiy the check command that way.
 417
 418     apply Service "if-" for (if_name => config in host.vars.interfaces) {
 419       import "generic-service"
 420       check_command = "ping4"
 421
 422       vars.qos = "disabled"
 423       vars += config
 424
 425       display_name = "if-" + if_name + "-" + vars.vlan
 426
 427       notes = "Interface check for Port " + string(vars.port) + " in VLAN " + vars.vlan + " on Address " + vars.address + " QoS " + vars.qos
 428       notes_url = "http://foreman.company.com/hosts/" + host.name
 429       action_url = "http://snmp.checker.company.com/" + host.name + "if-" + if_name
 430
 431       assign where host.vars.interfaces
 432     }
 433
 434 Note that numbers must be explicitely casted to string when adding to strings.
 435 This can be achieved by wrapping them into the [string()](10-language-reference.md#function-calls) function.
 436
 437 > **Tip**
 438 >
 439 > Building configuration in that dynamic way requires detailed information
 440 > of the generated objects. Use the `object list` [CLI command](5-cli-commands.md#cli-command-object)
 441 > after successful [configuration validation](5-cli-commands.md#config-validation).
 442
 443
 444 #### <a id="using-apply-object attributes"></a> Use Object Attributes in Apply Rules
 445
 446 Since apply rules are evaluated after the generic objects, you
 447 can reference existing host and/or service object attributes as
 448 values for any object attribute specified in that apply rule.
 449
 450     object Host "opennebula-host" {
 451       import "generic-host"
 452       address = "10.1.1.2"
 453
 454       vars.hosting["xyz"] = {
 455         http_uri = "/shop"
 456         customer_name = "Customer xyz"
 457         customer_id = "7568"
 458         support_contract = "gold"
 459       }
 460       vars.hosting["abc"] = {
 461         http_uri = "/shop"
 462         customer_name = "Customer xyz"
 463         customer_id = "7568"
 464         support_contract = "silver"
 465       }
 466     }
 467
 468     apply Service for (customer => config in host.vars.hosting) {
 469       import "generic-service"
 470       check_command = "ping4"
 471
 472       vars.qos = "disabled"
 473
 474       vars += config
 475
 476       vars.http_uri = "/" + vars.customer + "/" + config.http_uri
 477
 478       display_name = "Shop Check for " + vars.customer_name + "-" + vars.customer_id
 479
 480       notes = "Support contract: " + vars.support_contract + " for Customer " + vars.customer_name + " (" + vars.customer_id + ")."
 481
 482       notes_url = "http://foreman.company.com/hosts/" + host.name
 483       action_url = "http://snmp.checker.company.com/" + host.name + "/" + vars.customer_id
 484
 485       assign where host.vars.hosting
 486     }
 487
 488 ### <a id="groups"></a> Groups
 489
 490 Groups are used for combining hosts, services, and users into
 491 accessible configuration attributes and views in external (web)
 492 interfaces.
 493
 494 Group membership is defined at the respective object itself. If
 495 you have a hostgroup name `windows` for example, and want to assign
 496 specific hosts to this group for later viewing the group on your
 497 alert dashboard, first create the hostgroup:
 498
 499     object HostGroup "windows" {
 500       display_name = "Windows Servers"
 501     }
 502
 503 Then add your hosts to this hostgroup
 504
 505     template Host "windows-server" {
 506       groups += [ "windows" ]
 507     }
 508
 509     object Host "mssql-srv1" {
 510       import "windows-server"
 511
 512       vars.mssql_port = 1433
 513     }
 514
 515     object Host "mssql-srv2" {
 516       import "windows-server"
 517
 518       vars.mssql_port = 1433
 519     }
 520
 521 This can be done for service and user groups the same way. Additionally
 522 the user groups are associated as attributes in `Notification` objects.
 523
 524     object UserGroup "windows-mssql-admins" {
 525       display_name = "Windows MSSQL Admins"
 526     }
 527
 528     template User "generic-windows-mssql-users" {
 529       groups += [ "windows-mssql-admins" ]
 530     }
 531
 532     object User "win-mssql-noc" {
 533       import "generic-windows-mssql-users"
 534
 535       email = "noc@example.com"
 536     }
 537
 538     object User "win-mssql-ops" {
 539       import "generic-windows-mssql-users"
 540
 541       email = "ops@example.com"
 542     }
 543
 544 #### <a id="group-assign-intro"></a> Group Membership Assign
 545
 546 If there is a certain number of hosts, services, or users matching a pattern
 547 it's reasonable to assign the group object to these members.
 548 Details on the `assign where` syntax can be found [here](10-language-reference.md#apply)
 549
 550     object HostGroup "prod-mssql" {
 551       display_name = "Production MSSQL Servers"
 552       assign where host.vars.mssql_port && host.vars.prod_mysql_db
 553       ignore where host.vars.test_server == true
 554       ignore where match("*internal", host.name)
 555     }
 556
 557 In this inherited example from above all hosts with the `vars` attribute `mssql_port`
 558 set will be added as members to the host group `mssql`. All `*internal`
 559 hosts or with the `test_server` attribute set to `true` will be ignored.
 560
 561 ## <a id="notifications"></a> Notifications
 562
 563 Notifications for service and host problems are an integral part of your
 564 monitoring setup.
 565
 566 When a host or service is in a downtime, a problem has been acknowledged or
 567 the dependency logic determined that the host/service is unreachable, no
 568 notifications are sent. You can configure additional type and state filters
 569 refining the notifications being actually sent.
 570
 571 There are many ways of sending notifications, e.g. by e-mail, XMPP,
 572 IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications.
 573 Instead it relies on external mechanisms such as shell scripts to notify users.
 574
 575 A notification specification requires one or more users (and/or user groups)
 576 who will be notified in case of problems. These users must have all custom
 577 attributes defined which will be used in the `NotificationCommand` on execution.
 578
 579 The user `icingaadmin` in the example below will get notified only on `WARNING` and
 580 `CRITICAL` states and `problem` and `recovery` notification types.
 581
 582     object User "icingaadmin" {
 583       display_name = "Icinga 2 Admin"
 584       enable_notifications = true
 585       states = [ OK, Warning, Critical ]
 586       types = [ Problem, Recovery ]
 587       email = "icinga@localhost"
 588     }
 589
 590 If you don't set the `states` and `types` configuration attributes for the `User`
 591 object, notifications for all states and types will be sent.
 592
 593 Details on troubleshooting notification problems can be found [here](8-troubleshooting.md#troubleshooting).
 594
 595 > **Note**
 596 >
 597 > Make sure that the [notification](5-cli-commands.md#features) feature is enabled on your master instance
 598 > in order to execute notification commands.
 599
 600 You should choose which information you (and your notified users) are interested in
 601 case of emergency, and also which information does not provide any value to you and
 602 your environment.
 603
 604 An example notification command is explained [here](3-monitoring-basics.md#notification-commands).
 605
 606 You can add all shared attributes to a `Notification` template which is inherited
 607 to the defined notifications. That way you'll save duplicated attributes in each
 608 `Notification` object. Attributes can be overridden locally.
 609
 610     template Notification "generic-notification" {
 611       interval = 15m
 612
 613       command = "mail-service-notification"
 614
 615       states = [ Warning, Critical, Unknown ]
 616       types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
 617                 FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
 618
 619       period = "24x7"
 620     }
 621
 622 The time period `24x7` is shipped as example configuration with Icinga 2.
 623
 624 Use the `apply` keyword to create `Notification` objects for your services:
 625
 626     apply Notification "notify-cust-xy-mysql" to Service {
 627       import "generic-notification"
 628
 629       users = [ "noc-xy", "mgmt-xy" ]
 630
 631       assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true
 632       ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true)
 633     }
 634
 635
 636 Instead of assigning users to notifications, you can also add the `user_groups`
 637 attribute with a list of user groups to the `Notification` object. Icinga 2 will
 638 send notifications to all group members.
 639
 640 > **Note**
 641 >
 642 > Only users who have been notified of a problem before  (`Warning`, `Critical`, `Unknown`
 643 > states for services, `Down` for hosts) will receive `Recovery` notifications.
 644
 645 ### <a id="notification-escalations"></a> Notification Escalations
 646
 647 When a problem notification is sent and a problem still exists at the time of re-notification
 648 you may want to escalate the problem to the next support level. A different approach
 649 is to configure the default notification by email, and escalate the problem via SMS
 650 if not already solved.
 651
 652 You can define notification start and end times as additional configuration
 653 attributes making the `Notification` object a so-called `notification escalation`.
 654 Using templates you can share the basic notification attributes such as users or the
 655 `interval` (and override them for the escalation then).
 656
 657 Using the example from above, you can define additional users being escalated for SMS
 658 notifications between start and end time.
 659
 660     object User "icinga-oncall-2nd-level" {
 661       display_name = "Icinga 2nd Level"
 662
 663       vars.mobile = "+1 555 424642"
 664     }
 665
 666     object User "icinga-oncall-1st-level" {
 667       display_name = "Icinga 1st Level"
 668
 669       vars.mobile = "+1 555 424642"
 670     }
 671
 672 Define an additional [NotificationCommand](#notification) for SMS notifications.
 673
 674 > **Note**
 675 >
 676 > The example is not complete as there are many different SMS providers.
 677 > Please note that sending SMS notifications will require an SMS provider
 678 > or local hardware with a SIM card active.
 679
 680     object NotificationCommand "sms-notification" {
 681        command = [
 682          PluginDir + "/send_sms_notification",
 683          "$mobile$",
 684          "..."
 685     }
 686
 687 The two new notification escalations are added onto the local host
 688 and its service `ping4` using the `generic-notification` template.
 689 The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification`
 690 command) after `30m` until `1h`.
 691
 692 > **Note**
 693 >
 694 > The `interval` was set to 15m in the `generic-notification`
 695 > template example. Lower that value in your escalations by using a secondary
 696 > template or by overriding the attribute directly in the `notifications` array
 697 > position for `escalation-sms-2nd-level`.
 698
 699 If the problem does not get resolved nor acknowledged preventing further notifications
 700 the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was
 701 notified, but only for one hour (`2h` as `end` key for the `times` dictionary).
 702
 703     apply Notification "mail" to Service {
 704       import "generic-notification"
 705
 706       command = "mail-notification"
 707       users = [ "icingaadmin" ]
 708
 709       assign where service.name == "ping4"
 710     }
 711
 712     apply Notification "escalation-sms-2nd-level" to Service {
 713       import "generic-notification"
 714
 715       command = "sms-notification"
 716       users = [ "icinga-oncall-2nd-level" ]
 717
 718       times = {
 719         begin = 30m
 720         end = 1h
 721       }
 722
 723       assign where service.name == "ping4"
 724     }
 725
 726     apply Notification "escalation-sms-1st-level" to Service {
 727       import "generic-notification"
 728
 729       command = "sms-notification"
 730       users = [ "icinga-oncall-1st-level" ]
 731
 732       times = {
 733         begin = 1h
 734         end = 2h
 735       }
 736
 737       assign where service.name == "ping4"
 738     }
 739
 740 ### <a id="notification-delay"></a> Notification Delay
 741
 742 Sometimes the problem in question should not be notified when the notification is due
 743 (the object reaching the `HARD` state) but a defined time duration afterwards. In Icinga 2
 744 you can use the `times` dictionary and set `begin = 15m` as key and value if you want to
 745 postpone the notification window for 15 minutes. Leave out the `end` key - if not set,
 746 Icinga 2 will not check against any end time for this notification. Make sure to
 747 specify a relatively low notification `interval` to get notified soon enough again.
 748
 749     apply Notification "mail" to Service {
 750       import "generic-notification"
 751
 752       command = "mail-notification"
 753       users = [ "icingaadmin" ]
 754
 755       interval = 5m
 756
 757       times.begin = 15m // delay notification window
 758
 759       assign where service.name == "ping4"
 760     }
 761
 762 ### <a id="disable-renotification"></a> Disable Re-notifications
 763
 764 If you prefer to be notified only once, you can disable re-notifications by setting the
 765 `interval` attribute to `0`.
 766
 767     apply Notification "notify-once" to Service {
 768       import "generic-notification"
 769
 770       command = "mail-notification"
 771       users = [ "icingaadmin" ]
 772
 773       interval = 0 // disable re-notification
 774
 775       assign where service.name == "ping4"
 776     }
 777
 778 ### <a id="notification-filters-state-type"></a> Notification Filters by State and Type
 779
 780 If there are no notification state and type filter attributes defined at the `Notification`
 781 or `User` object Icinga 2 assumes that all states and types are being notified.
 782
 783 Available state and type filters for notifications are:
 784
 785     template Notification "generic-notification" {
 786
 787       states = [ Warning, Critical, Unknown ]
 788       types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
 789                 FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
 790     }
 791
 792 If you are familiar with Icinga 1.x `notification_options` please note that they have been split
 793 into type and state to allow more fine granular filtering for example on downtimes and flapping.
 794 You can filter for acknowledgements and custom notifications too.s and custom notifications too.
 795
 796
 797 ## <a id="timeperiods"></a> Time Periods
 798
 799 Time Periods define time ranges in Icinga where event actions are
 800 triggered, for example whether a service check is executed or not within
 801 the `check_period` attribute. Or a notification should be sent to
 802 users or not, filtered by the `period` and `notification_period`
 803 configuration attributes for `Notification` and `User` objects.
 804
 805 > **Note**
 806 >
 807 > If you are familar with Icinga 1.x - these time period definitions
 808 > are called `legacy timeperiods` in Icinga 2.
 809 >
 810 > An Icinga 2 legacy timeperiod requires the `ITL` provided template
 811 >`legacy-timeperiod`.
 812
 813 The `TimePeriod` attribute `ranges` may contain multiple directives,
 814 including weekdays, days of the month, and calendar dates.
 815 These types may overlap/override other types in your ranges dictionary.
 816
 817 The descending order of precedence is as follows:
 818
 819 * Calendar date (2008-01-01)
 820 * Specific month date (January 1st)
 821 * Generic month date (Day 15)
 822 * Offset weekday of specific month (2nd Tuesday in December)
 823 * Offset weekday (3rd Monday)
 824 * Normal weekday (Tuesday)
 825
 826 If you don't set any `check_period` or `notification_period` attribute
 827 on your configuration objects Icinga 2 assumes `24x7` as time period
 828 as shown below.
 829
 830     object TimePeriod "24x7" {
 831       import "legacy-timeperiod"
 832
 833       display_name = "Icinga 2 24x7 TimePeriod"
 834       ranges = {
 835         "monday"    = "00:00-24:00"
 836         "tuesday"   = "00:00-24:00"
 837         "wednesday" = "00:00-24:00"
 838         "thursday"  = "00:00-24:00"
 839         "friday"    = "00:00-24:00"
 840         "saturday"  = "00:00-24:00"
 841         "sunday"    = "00:00-24:00"
 842       }
 843     }
 844
 845 If your operation staff should only be notified during workhours
 846 create a new timeperiod named `workhours` defining a work day from
 847 09:00 to 17:00.
 848
 849     object TimePeriod "workhours" {
 850       import "legacy-timeperiod"
 851
 852       display_name = "Icinga 2 8x5 TimePeriod"
 853       ranges = {
 854         "monday"    = "09:00-17:00"
 855         "tuesday"   = "09:00-17:00"
 856         "wednesday" = "09:00-17:00"
 857         "thursday"  = "09:00-17:00"
 858         "friday"    = "09:00-17:00"
 859       }
 860     }
 861
 862 Use the `period` attribute to assign time periods to
 863 `Notification` and `Dependency` objects:
 864
 865     object Notification "mail" {
 866       import "generic-notification"
 867
 868       host_name = "localhost"
 869
 870       command = "mail-notification"
 871       users = [ "icingaadmin" ]
 872       period = "workhours"
 873     }
 874
 875
 876 ## <a id="commands"></a> Commands
 877
 878 Icinga 2 uses three different command object types to specify how
 879 checks should be performed, notifications should be sent, and
 880 events should be handled.
 881
 882 ### <a id="command-environment-variables"></a> Environment Variables for Commands
 883
 884 Please check [Runtime Custom Attributes as Environment Variables](3-monitoring-basics.md#runtime-custom-attribute-env-vars).
 885
 886
 887 ### <a id="check-commands"></a> Check Commands
 888
 889 [CheckCommand](12-object-types.md#objecttype-checkcommand) objects define the command line how
 890 a check is called.
 891
 892 [CheckCommand](12-object-types.md#objecttype-checkcommand) objects are referenced by
 893 [Host](12-object-types.md#objecttype-host) and [Service](12-object-types.md#objecttype-service) objects
 894 using the `check_command` attribute.
 895
 896 > **Note**
 897 >
 898 > Make sure that the [checker](5-cli-commands.md#features) feature is enabled in order to
 899 > execute checks.
 900
 901 #### <a id="command-plugin-integration"></a> Integrate the Plugin with a CheckCommand Definition
 902
 903 [CheckCommand](12-object-types.md#objecttype-checkcommand) objects require the [ITL template](13-icinga-template-library.md#itl-plugin-check-command)
 904 `plugin-check-command` to support native plugin based check methods.
 905
 906 Unless you have done so already, download your check plugin and put it
 907 into the [PluginDir](2-getting-started.md#constants-conf) directory. The following example uses the
 908 `check_disk` plugin shipped with the Monitoring Plugins package.
 909
 910 The plugin path and all command arguments are made a list of
 911 double-quoted string arguments for proper shell escaping.
 912
 913 Call the `check_disk` plugin with the `--help` parameter to see
 914 all available options. Our example defines warning (`-w`) and
 915 critical (`-c`) thresholds for the disk usage. Without any
 916 partition defined (`-p`) it will check all local partitions.
 917
 918     icinga@icinga2 $ /usr/lib/nagios/plugins/check_disk --help
 919     ...
 920     This plugin checks the amount of used disk space on a mounted file system
 921     and generates an alert if free space is less than one of the threshold values
 922
 923
 924     Usage:
 925      check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device}
 926     [-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ]
 927     [-t timeout] [-u unit] [-v] [-X type] [-N type]
 928     ...
 929
 930 > **Note**
 931 >
 932 > Don't execute plugins as `root` and always use the absolute path to the plugin! Trust us.
 933
 934 Next step is to understand how command parameters are being passed from
 935 a host or service object, and add a [CheckCommand](12-object-types.md#objecttype-checkcommand)
 936 definition based on these required parameters and/or default values.
 937
 938 #### <a id="command-passing-parameters"></a> Passing Check Command Parameters from Host or Service
 939
 940 Check command parameters are defined as custom attributes which can be accessed as runtime macros
 941 by the executed check command.
 942
 943 Define the default check command custom attribute `disk_wfree` and `disk_cfree`
 944 (freely definable naming schema) and their default threshold values. You can
 945 then use these custom attributes as runtime macros for [command arguments](3-monitoring-basics.md#command-arguments)
 946 on the command line.
 947
 948 > **Tip**
 949 >
 950 > Use a common command type as prefix for your command arguments to increase
 951 > readability. `disk_wfree` helps understanding the context better than just
 952 > `wfree` as argument.
 953
 954 The default custom attributes can be overridden by the custom attributes
 955 defined in the service using the check command `my-disk`. The custom attributes
 956 can also be inherited from a parent template using additive inheritance (`+=`).
 957
 958     object CheckCommand "my-disk" {
 959       import "plugin-check-command"
 960
 961       command = [ PluginDir + "/check_disk" ]
 962
 963       arguments = {
 964         "-w" = "$disk_wfree$%"
 965         "-c" = "$disk_cfree$%"
 966         "-W" = "$disk_inode_wfree$%"
 967         "-K" = "$disk_inode_cfree$%"
 968         "-p" = "$disk_partitions$"
 969         "-x" = "$disk_partitions_excluded$"
 970       }
 971
 972       vars.disk_wfree = 20
 973       vars.disk_cfree = 10
 974     }
 975
 976 > **Note**
 977 >
 978 > A proper example for the `check_disk` plugin is already shipped with Icinga 2
 979 > ready to use with the [plugin check commands](13-icinga-template-library.md#plugin-check-command-disk).
 980
 981 The host `localhost` with the applied service `basic-partitions` checks a basic set of disk partitions
 982 with modified custom attributes (warning thresholds at `10%`, critical thresholds at `5%`
 983 free disk space).
 984
 985 The custom attribute `disk_partition` can either hold a single string or an array of
 986 string values for passing multiple partitions to the `check_disk` check plugin.
 987
 988     object Host "my-server" {
 989       import "generic-host"
 990       address = "127.0.0.1"
 991       address6 = "::1"
 992
 993       vars.local_disks["basic-partitions"] = {
 994         disk_partitions = [ "/", "/tmp", "/var", "/home" ]
 995       }
 996     }
 997
 998     apply Service for (disk => config in host.vars.local_disks) {
 999       import "generic-service"
1000       check_command = "my-disk"
1001
1002       vars += config
1003
1004       vars.disk_wfree = 10
1005       vars.disk_cfree = 5
1006
1007       assign where host.vars.local_disks
1008     }
1009
1010
1011 More details on using arrays in custom attributes can be found in
1012 [this chapter](3-monitoring-basics.md#runtime-custom-attributes).
1013
1014
1015 #### <a id="command-arguments"></a> Command Arguments
1016
1017 By defining a check command line using the `command` attribute Icinga 2
1018 will resolve all macros in the static string or array. Sometimes it is
1019 required to extend the arguments list based on a met condition evaluated
1020 at command execution. Or making arguments optional - only set if the
1021 macro value can be resolved by Icinga 2.
1022
1023     object CheckCommand "check_http" {
1024       import "plugin-check-command"
1025
1026       command = [ PluginDir + "/check_http" ]
1027
1028       arguments = {
1029         "-H" = "$http_vhost$"
1030         "-I" = "$http_address$"
1031         "-u" = "$http_uri$"
1032         "-p" = "$http_port$"
1033         "-S" = {
1034           set_if = "$http_ssl$"
1035         }
1036         "--sni" = {
1037           set_if = "$http_sni$"
1038         }
1039         "-a" = {
1040           value = "$http_auth_pair$"
1041           description = "Username:password on sites with basic authentication"
1042         }
1043         "--no-body" = {
1044           set_if = "$http_ignore_body$"
1045         }
1046         "-r" = "$http_expect_body_regex$"
1047         "-w" = "$http_warn_time$"
1048         "-c" = "$http_critical_time$"
1049         "-e" = "$http_expect$"
1050       }
1051
1052       vars.http_address = "$address$"
1053       vars.http_ssl = false
1054       vars.http_sni = false
1055     }
1056
1057 The example shows the `check_http` check command defining the most common
1058 arguments. Each of them is optional by default and will be omitted if
1059 the value is not set. For example if the service calling the check command
1060 does not have `vars.http_port` set, it won't get added to the command
1061 line.
1062
1063 If the `vars.http_ssl` custom attribute is set in the service, host or command
1064 object definition, Icinga 2 will add the `-S` argument based on the `set_if`
1065 numeric value to the command line. String values are not supported.
1066
1067 If the macro value cannot be resolved, Icinga 2 will not add the defined argument
1068 to the final command argument array. Empty strings for macro values won't omit
1069 the argument.
1070
1071 That way you can use the `check_http` command definition for both, with and
1072 without SSL enabled checks saving you duplicated command definitions.
1073
1074 Details on all available options can be found in the
1075 [CheckCommand object definition](12-object-types.md#objecttype-checkcommand).
1076
1077 ### <a id="using-apply-services-command-arguments"></a> Apply Services with Custom Command Arguments
1078
1079 Imagine the following scenario: The `my-host1` host is reachable using the default port 22, while
1080 the `my-host2` host requires a different port on 2222. Both hosts are in the hostgroup `my-linux-servers`.
1081
1082     object HostGroup "my-linux-servers" {
1083       display_name = "Linux Servers"
1084       assign where host.vars.os == "Linux"
1085     }
1086
1087     /* this one has port 22 opened */
1088     object Host "my-host1" {
1089       import "generic-host"
1090       address = "129.168.1.50"
1091       vars.os = "Linux"
1092     }
1093
1094     /* this one listens on a different ssh port */
1095     object Host "my-host2" {
1096       import "generic-host"
1097       address = "129.168.2.50"
1098       vars.os = "Linux"
1099       vars.custom_ssh_port = 2222
1100     }
1101
1102 All hosts in the `my-linux-servers` hostgroup should get the `my-ssh` service applied based on an
1103 [apply rule](10-language-reference.md#apply). The optional `ssh_port` command argument should be inherited from the host
1104 the service is applied to. If not set, the check command `my-ssh` will omit the argument.
1105 The `host` argument is special: `skip_key` tells Icinga 2 to ignore the key, and directly put the
1106 value onto the command line. The `order` attribute specifies that this argument is the first one
1107 (`-1` is smaller than the other defaults).
1108
1109     object CheckCommand "my-ssh" {
1110       import "plugin-check-command"
1111
1112       command = [ PluginDir + "/check_ssh" ]
1113
1114       arguments = {
1115         "-p" = "$ssh_port$"
1116         "host" = {
1117           value = "$ssh_address$"
1118           skip_key = true
1119           order = -1
1120         }
1121       }
1122
1123       vars.ssh_address = "$address$"
1124     }
1125
1126     /* apply ssh service */
1127     apply Service "my-ssh" {
1128       import "generic-service"
1129       check_command = "my-ssh"
1130
1131       //set the command argument for ssh port with a custom host attribute, if set
1132       vars.ssh_port = "$host.vars.custom_ssh_port$"
1133
1134       assign where "my-linux-servers" in host.groups
1135     }
1136
1137 The `my-host1` will get the `my-ssh` service checking on the default port:
1138
1139     [2014-05-26 21:52:23 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ssh', '129.168.1.50': PID 27281
1140
1141 The `my-host2` will inherit the `custom_ssh_port` variable to the service and execute a different command:
1142
1143     [2014-05-26 21:51:32 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ssh', '-p', '2222', '129.168.2.50': PID 26956
1144
1145
1146 ### <a id="notification-commands"></a> Notification Commands
1147
1148 [NotificationCommand](12-object-types.md#objecttype-notificationcommand) objects define how notifications are delivered to external
1149 interfaces (E-Mail, XMPP, IRC, Twitter, etc).
1150
1151 [NotificationCommand](12-object-types.md#objecttype-notificationcommand) objects are referenced by
1152 [Notification](12-object-types.md#objecttype-notification) objects using the `command` attribute.
1153
1154 `NotificationCommand` objects require the [ITL template](13-icinga-template-library.md#itl-plugin-notification-command)
1155 `plugin-notification-command` to support native plugin-based notifications.
1156
1157 > **Note**
1158 >
1159 > Make sure that the [notification](5-cli-commands.md#features) feature is enabled on your master instance
1160 > in order to execute notification commands.
1161
1162 Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for
1163 the current check output) sending an email to the user(s) associated with the
1164 notification itself (`$user.email$`).
1165
1166 If you want to specify default values for some of the custom attribute definitions,
1167 you can add a `vars` dictionary as shown for the `CheckCommand` object.
1168
1169     object NotificationCommand "mail-service-notification" {
1170       import "plugin-notification-command"
1171
1172       command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ]
1173
1174       env = {
1175         NOTIFICATIONTYPE = "$notification.type$"
1176         SERVICEDESC = "$service.name$"
1177         HOSTALIAS = "$host.display_name$"
1178         HOSTADDRESS = "$address$"
1179         SERVICESTATE = "$service.state$"
1180         LONGDATETIME = "$icinga.long_date_time$"
1181         SERVICEOUTPUT = "$service.output$"
1182         NOTIFICATIONAUTHORNAME = "$notification.author$"
1183         NOTIFICATIONCOMMENT = "$notification.comment$"
1184         HOSTDISPLAYNAME = "$host.display_name$"
1185         SERVICEDISPLAYNAME = "$service.display_name$"
1186         USEREMAIL = "$user.email$"
1187       }
1188     }
1189
1190 The command attribute in the `mail-service-notification` command refers to the following
1191 shell script. The macros specified in the `env` array are exported
1192 as environment variables and can be used in the notification script:
1193
1194     #!/usr/bin/env bash
1195     template=$(cat <<TEMPLATE
1196     ***** Icinga  *****
1197
1198     Notification Type: $NOTIFICATIONTYPE
1199
1200     Service: $SERVICEDESC
1201     Host: $HOSTALIAS
1202     Address: $HOSTADDRESS
1203     State: $SERVICESTATE
1204
1205     Date/Time: $LONGDATETIME
1206
1207     Additional Info: $SERVICEOUTPUT
1208
1209     Comment: [$NOTIFICATIONAUTHORNAME] $NOTIFICATIONCOMMENT
1210     TEMPLATE
1211     )
1212
1213     /usr/bin/printf "%b" $template | mail -s "$NOTIFICATIONTYPE - $HOSTDISPLAYNAME - $SERVICEDISPLAYNAME is $SERVICESTATE" $USEREMAIL
1214
1215 > **Note**
1216 >
1217 > This example is for `exim` only. Requires changes for `sendmail` and
1218 > other MTAs.
1219
1220 While it's possible to specify the entire notification command right
1221 in the NotificationCommand object it is generally advisable to create a
1222 shell script in the `/etc/icinga2/scripts` directory and have the
1223 NotificationCommand object refer to that.
1224
1225 ### <a id="event-commands"></a> Event Commands
1226
1227 Unlike notifications event commands for hosts/services are called on every
1228 check execution if one of these conditions match:
1229
1230 * The host/service is in a [soft state](3-monitoring-basics.md#hard-soft-states)
1231 * The host/service state changes into a [hard state](3-monitoring-basics.md#hard-soft-states)
1232 * The host/service state recovers from a [soft or hard state](3-monitoring-basics.md#hard-soft-states) to [OK](3-monitoring-basics.md#service-states)/[Up](3-monitoring-basics.md#host-states)
1233
1234 [EventCommand](12-object-types.md#objecttype-eventcommand) objects are referenced by
1235 [Host](12-object-types.md#objecttype-host) and [Service](12-object-types.md#objecttype-service) objects
1236 using the `event_command` attribute.
1237
1238 Therefore the `EventCommand` object should define a command line
1239 evaluating the current service state and other service runtime attributes
1240 available through runtime vars. Runtime macros such as `$service.state_type$`
1241 and `$service.state$` will be processed by Icinga 2 helping on fine-granular
1242 events being triggered.
1243
1244 Common use case scenarios are a failing HTTP check requiring an immediate
1245 restart via event command, or if an application is locked and requires
1246 a restart upon detection.
1247
1248 `EventCommand` objects require the ITL template `plugin-event-command`
1249 to support native plugin based checks.
1250
1251 #### <a id="event-command-restart-service-daemon"></a> Use Event Commands to Restart Service Daemon
1252
1253 The following example will triggert a restart of the `httpd` daemon
1254 via ssh when the `http` service check fails. If the service state is
1255 `OK`, it will not trigger any event action.
1256
1257 Requirements:
1258
1259 * ssh connection
1260 * icinga user with public key authentication
1261 * icinga user with sudo permissions for restarting the httpd daemon.
1262
1263 Example on Debian:
1264
1265     # ls /home/icinga/.ssh/
1266     authorized_keys
1267
1268     # visudo
1269     icinga  ALL=(ALL) NOPASSWD: /etc/init.d/apache2 restart
1270
1271
1272 Define a generic [EventCommand](12-object-types.md#objecttype-eventcommand) object `event_by_ssh`
1273 which can be used for all event commands triggered using ssh:
1274
1275     /* pass event commands through ssh */
1276     object EventCommand "event_by_ssh" {
1277       import "plugin-event-command"
1278
1279       command = [ PluginDir + "/check_by_ssh" ]
1280
1281       arguments = {
1282         "-H" = "$event_by_ssh_address$"
1283         "-p" = "$event_by_ssh_port$"
1284         "-C" = "$event_by_ssh_command$"
1285         "-l" = "$event_by_ssh_logname$"
1286         "-i" = "$event_by_ssh_identity$"
1287         "-q" = {
1288           set_if = "$event_by_ssh_quiet$"
1289         }
1290         "-w" = "$event_by_ssh_warn$"
1291         "-c" = "$event_by_ssh_crit$"
1292         "-t" = "$event_by_ssh_timeout$"
1293       }
1294
1295       vars.event_by_ssh_address = "$address$"
1296       vars.event_by_ssh_quiet = false
1297     }
1298
1299 The actual event command only passes the `event_by_ssh_command` attribute.
1300 The `event_by_ssh_service` custom attribute takes care of passing the correct
1301 daemon name, while `test $service.state_id$ -gt 0` makes sure that the daemon
1302 is only restarted when the service is an a not `OK` state.
1303
1304
1305     object EventCommand "event_by_ssh_restart_service" {
1306       import "event_by_ssh"
1307
1308       //only restart the daemon if state > 0 (not-ok)
1309       //requires sudo permissions for the icinga user
1310       vars.event_by_ssh_command = "test $service.state_id$ -gt 0 && sudo /etc/init.d/$event_by_ssh_service$ restart"
1311     }
1312
1313
1314 Now set the `event_command` attribute to `event_by_ssh_restart_service` and tell it
1315 which service should be restarted using the `event_by_ssh_service` attribute.
1316
1317     object Service "http" {
1318       import "generic-service"
1319       host_name = "remote-http-host"
1320       check_command = "http"
1321
1322       event_command = "event_by_ssh_restart_service"
1323       vars.event_by_ssh_service = "$host.vars.httpd_name$"
1324
1325       //vars.event_by_ssh_logname = "icinga"
1326       //vars.event_by_ssh_identity = "/home/icinga/.ssh/id_rsa.pub"
1327     }
1328
1329
1330 Each host with this service then must define the `httpd_name` custom attribute
1331 (for example generated from your cmdb):
1332
1333     object Host "remote-http-host" {
1334       import "generic-host"
1335       address = "192.168.1.100"
1336
1337       vars.httpd_name = "apache2"
1338     }
1339
1340 You can testdrive this example by manually stopping the `httpd` daemon
1341 on your `remote-http-host`. Enable the `debuglog` feature and tail the
1342 `/var/log/icinga2/debug.log` file.
1343
1344 Remote Host Terminal:
1345
1346     # date; service apache2 status
1347     Mon Sep 15 18:57:39 CEST 2014
1348     Apache2 is running (pid 23651).
1349     # date; service apache2 stop
1350     Mon Sep 15 18:57:47 CEST 2014
1351     [ ok ] Stopping web server: apache2 ... waiting .
1352
1353 Icinga 2 Host Terminal:
1354
1355     [2014-09-15 18:58:32 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_http' '-I' '192.168.1.100': PID 32622
1356     [2014-09-15 18:58:32 +0200] notice/Process: PID 32622 ('/usr/lib64/nagios/plugins/check_http' '-I' '192.168.1.100') terminated with exit code 2
1357     [2014-09-15 18:58:32 +0200] notice/Checkable: State Change: Checkable remote-http-host!http soft state change from OK to CRITICAL detected.
1358     [2014-09-15 18:58:32 +0200] notice/Checkable: Executing event handler 'event_by_ssh_restart_service' for service 'remote-http-host!http'
1359     [2014-09-15 18:58:32 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_by_ssh' '-C' 'test 2 -gt 0 && sudo /etc/init.d/apache2 restart' '-H' '192.168.1.100': PID 32623
1360     [2014-09-15 18:58:33 +0200] notice/Process: PID 32623 ('/usr/lib64/nagios/plugins/check_by_ssh' '-C' 'test 2 -gt 0 && sudo /etc/init.d/apache2 restart' '-H' '192.168.1.100') terminated with exit code 0
1361
1362 Remote Host Terminal:
1363
1364     # date; service apache2 status
1365     Mon Sep 15 18:58:44 CEST 2014
1366     Apache2 is running (pid 24908).
1367
1368
1369
1370
1371 ## <a id="dependencies"></a> Dependencies
1372
1373 Icinga 2 uses host and service [Dependency](12-object-types.md#objecttype-dependency) objects
1374 for determing their network reachability.
1375
1376 A service can depend on a host, and vice versa. A service has an implicit
1377 dependency (parent) to its host. A host to host dependency acts implicitly
1378 as host parent relation.
1379 When dependencies are calculated, not only the immediate parent is taken into
1380 account but all parents are inherited.
1381
1382 The `parent_host_name` and `parent_service_name` attributes are mandatory for
1383 service dependencies, `parent_host_name` is required for host dependencies.
1384 [Apply rules](3-monitoring-basics.md#using-apply) will allow you to
1385 [determine these attributes](3-monitoring-basics.md#dependencies-apply-custom-attributes) in a more
1386 dynamic fashion if required.
1387
1388     parent_host_name = "core-router"
1389     parent_service_name = "uplink-port"
1390
1391 Notifications are suppressed by default if a host or service becomes unreachable.
1392 You can control that option by defining the `disable_notifications` attribute.
1393
1394     disable_notifications = false
1395
1396 The dependency state filter must be defined based on the parent object being
1397 either a host (`Up`, `Down`) or a service (`OK`, `Warning`, `Critical`, `Unknown`).
1398
1399 The following example will make the dependency fail and trigger it if the parent
1400 object is **not** in one of these states:
1401
1402     states = [ OK, Critical, Unknown ]
1403
1404 Rephrased: If the parent service object changes into the `Warning` state, this
1405 dependency will fail and render all child objects (hosts or services) unreachable.
1406
1407 You can determine the child's reachability by querying the `is_reachable` attribute
1408 in for example [DB IDO](14-appendix.md#schema-db-ido-extensions).
1409
1410 ### <a id="dependencies-implicit-host-service"></a> Implicit Dependencies for Services on Host
1411
1412 Icinga 2 automatically adds an implicit dependency for services on their host. That way
1413 service notifications are suppressed when a host is `DOWN` or `UNREACHABLE`. This dependency
1414 does not overwrite other dependencies and implicitely sets `disable_notifications = true` and
1415 `states = [ Up ]` for all service objects.
1416
1417 Service checks are still executed. If you want to prevent them from happening, you can
1418 apply the following dependency to all services setting their host as `parent_host_name`
1419 and disabling the checks. `assign where true` matches on all `Service` objects.
1420
1421     apply Dependency "disable-host-service-checks" to Service {
1422       disable_checks = true
1423       assign where true
1424     }
1425
1426 ### <a id="dependencies-network-reachability"></a> Dependencies for Network Reachability
1427
1428 A common scenario is the Icinga 2 server behind a router. Checking internet
1429 access by pinging the Google DNS server `google-dns` is a common method, but
1430 will fail in case the `dsl-router` host is down. Therefore the example below
1431 defines a host dependency which acts implicitly as parent relation too.
1432
1433 Furthermore the host may be reachable but ping probes are dropped by the
1434 router's firewall. In case the `dsl-router``ping4` service check fails, all
1435 further checks for the `ping4` service on host `google-dns` service should
1436 be suppressed. This is achieved by setting the `disable_checks` attribute to `true`.
1437
1438     object Host "dsl-router" {
1439       import "generic-host"
1440       address = "192.168.1.1"
1441     }
1442
1443     object Host "google-dns" {
1444       import "generic-host"
1445       address = "8.8.8.8"
1446     }
1447
1448     apply Service "ping4" {
1449       import "generic-service"
1450
1451       check_command = "ping4"
1452
1453       assign where host.address
1454     }
1455
1456     apply Dependency "internet" to Host {
1457       parent_host_name = "dsl-router"
1458       disable_checks = true
1459       disable_notifications = true
1460
1461       assign where host.name != "dsl-router"
1462     }
1463
1464     apply Dependency "internet" to Service {
1465       parent_host_name = "dsl-router"
1466       parent_service_name = "ping4"
1467       disable_checks = true
1468
1469       assign where host.name != "dsl-router"
1470     }
1471
1472 ### <a id="dependencies-apply-custom-attributes"></a> Apply Dependencies based on Custom Attributes
1473
1474 You can use [apply rules](3-monitoring-basics.md#using-apply) to set parent or
1475 child attributes e.g. `parent_host_name`to other object's
1476 attributes.
1477
1478 A common example are virtual machines hosted on a master. The object
1479 name of that master is auto-generated from your CMDB or VMWare inventory
1480 into the host's custom attributes (or a generic template for your
1481 cloud).
1482
1483 Define your master host object:
1484
1485     /* your master */
1486     object Host "master.example.com" {
1487       import "generic-host"
1488     }
1489
1490 Add a generic template defining all common host attributes:
1491
1492     /* generic template for your virtual machines */
1493     template Host "generic-vm" {
1494       import "generic-host"
1495     }
1496
1497 Add a template for all hosts on your example.com cloud setting
1498 custom attribute `vm_parent` to `master.example.com`:
1499
1500     template Host "generic-vm-example.com" {
1501       import "generic-vm"
1502       vars.vm_parent = "master.example.com"
1503     }
1504
1505 Define your guest hosts:
1506
1507     object Host "www.example1.com" {
1508       import "generic-vm-master.example.com"
1509     }
1510
1511     object Host "www.example2.com" {
1512       import "generic-vm-master.example.com"
1513     }
1514
1515 Apply the host dependency to all child hosts importing the
1516 `generic-vm` template and set the `parent_host_name`
1517 to the previously defined custom attribute `host.vars.vm_parent`.
1518
1519     apply Dependency "vm-host-to-parent-master" to Host {
1520       parent_host_name = host.vars.vm_parent
1521       assign where "generic-vm" in host.templates
1522     }
1523
1524 You can extend this example, and make your services depend on the
1525 `master.example.com` host too. Their local scope allows you to use
1526 `host.vars.vm_parent` similar to the example above.
1527
1528     apply Dependency "vm-service-to-parent-master" to Service {
1529       parent_host_name = host.vars.vm_parent
1530       assign where "generic-vm" in host.templates
1531     }
1532
1533 That way you don't need to wait for your guest hosts becoming
1534 unreachable when the master host goes down. Instead the services
1535 will detect their reachability immediately when executing checks.
1536
1537 > **Note**
1538 >
1539 > This method with setting locally scoped variables only works in
1540 > apply rules, but not in object definitions.
1541
1542
1543 ### <a id="dependencies-agent-checks"></a> Dependencies for Agent Checks
1544
1545 Another classic example are agent based checks. You would define a health check
1546 for the agent daemon responding to your requests, and make all other services
1547 querying that daemon depend on that health check.
1548
1549 The following configuration defines two nrpe based service checks `nrpe-load`
1550 and `nrpe-disk` applied to the `nrpe-server`. The health check is defined as
1551 `nrpe-health` service.
1552
1553     apply Service "nrpe-health" {
1554       import "generic-service"
1555       check_command = "nrpe"
1556       assign where match("nrpe-*", host.name)
1557     }
1558
1559     apply Service "nrpe-load" {
1560       import "generic-service"
1561       check_command = "nrpe"
1562       vars.nrpe_command = "check_load"
1563       assign where match("nrpe-*", host.name)
1564     }
1565
1566     apply Service "nrpe-disk" {
1567       import "generic-service"
1568       check_command = "nrpe"
1569       vars.nrpe_command = "check_disk"
1570       assign where match("nrpe-*", host.name)
1571     }
1572
1573     object Host "nrpe-server" {
1574       import "generic-host"
1575       address = "192.168.1.5"
1576     }
1577
1578     apply Dependency "disable-nrpe-checks" to Service {
1579       parent_service_name = "nrpe-health"
1580
1581       states = [ OK ]
1582       disable_checks = true
1583       disable_notifications = true
1584       assign where service.check_command == "nrpe"
1585       ignore where service.name == "nrpe-health"
1586     }
1587
1588 The `disable-nrpe-checks` dependency is applied to all services
1589 on the `nrpe-service` host using the `nrpe` check_command attribute
1590 but not the `nrpe-health` service itself.
1591
1592
1593 ## <a id="downtimes"></a> Downtimes
1594
1595 Downtimes can be scheduled for planned server maintenance or
1596 any other targetted service outage you are aware of in advance.
1597
1598 Downtimes will suppress any notifications, and may trigger other
1599 downtimes too. If the downtime was set by accident, or the duration
1600 exceeds the maintenance, you can manually cancel the downtime.
1601 Planned downtimes will also be taken into account for SLA reporting
1602 tools calculating the SLAs based on the state and downtime history.
1603
1604 Multiple downtimes for a single object may overlap. This is useful
1605 when you want to extend your maintenance window taking longer than expected.
1606 If there are multiple downtimes triggered for one object, the overall downtime depth
1607 will be greater than `1`.
1608
1609
1610 If the downtime was scheduled after the problem changed to a critical hard
1611 state triggering a problem notification, and the service recovers during
1612 the downtime window, the recovery notification won't be suppressed.
1613
1614 ### <a id="fixed-flexible-downtimes"></a> Fixed and Flexible Downtimes
1615
1616 A `fixed` downtime will be activated at the defined start time, and
1617 removed at the end time. During this time window the service state
1618 will change to `NOT-OK` and then actually trigger the downtime.
1619 Notifications are suppressed and the downtime depth is incremented.
1620
1621 Common scenarios are a planned distribution upgrade on your linux
1622 servers, or database updates in your warehouse. The customer knows
1623 about a fixed downtime window between 23:00 and 24:00. After 24:00
1624 all problems should be alerted again. Solution is simple -
1625 schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
1626
1627 Unlike a `fixed` downtime, a `flexible` downtime will be triggered
1628 by the state change in the time span defined by start and end time,
1629 and then last for the specified duration in minutes.
1630
1631 Imagine the following scenario: Your service is frequently polled
1632 by users trying to grab free deleted domains for immediate registration.
1633 Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
1634 a network outage visible to the monitoring. The service is still alive,
1635 but answering too slow to Icinga 2 service checks.
1636 For that reason, you may want to schedule a downtime between 07:30 and
1637 08:00 with a duration of 15 minutes. The downtime will then last from
1638 its trigger time until the duration is over. After that, the downtime
1639 is removed (may happen before or after the actual end time!).
1640
1641 ### <a id="scheduling-downtime"></a> Scheduling a downtime
1642
1643 This can either happen through a web interface or by sending an [external command](3-monitoring-basics.md#external-commands)
1644 to the external command pipe provided by the `ExternalCommandListener` configuration.
1645
1646 Fixed downtimes require a start and end time (a duration will be ignored).
1647 Flexible downtimes need a start and end time for the time span, and a duration
1648 independent from that time span.
1649
1650 ### <a id="triggered-downtimes"></a> Triggered Downtimes
1651
1652 This is optional when scheduling a downtime. If there is already a downtime
1653 scheduled for a future maintenance, the current downtime can be triggered by
1654 that downtime. This renders useful if you have scheduled a host downtime and
1655 are now scheduling a child host's downtime getting triggered by the parent
1656 downtime on NOT-OK state change.
1657
1658 ### <a id="recurring-downtimes"></a> Recurring Downtimes
1659
1660 [ScheduledDowntime objects](12-object-types.md#objecttype-scheduleddowntime) can be used to set up
1661 recurring downtimes for services.
1662
1663 Example:
1664
1665     apply ScheduledDowntime "backup-downtime" to Service {
1666       author = "icingaadmin"
1667       comment = "Scheduled downtime for backup"
1668
1669       ranges = {
1670         monday = "02:00-03:00"
1671         tuesday = "02:00-03:00"
1672         wednesday = "02:00-03:00"
1673         thursday = "02:00-03:00"
1674         friday = "02:00-03:00"
1675         saturday = "02:00-03:00"
1676         sunday = "02:00-03:00"
1677       }
1678
1679       assign where "backup" in service.groups
1680     }
1681
1682
1683 ## <a id="comments-intro"></a> Comments
1684
1685 Comments can be added at runtime and are persistent over restarts. You can
1686 add useful information for others on repeating incidents (for example
1687 "last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
1688 is primarly accessible using web interfaces.
1689
1690 Adding and deleting comment actions are possible through the external command pipe
1691 provided with the `ExternalCommandListener` configuration. The caller must
1692 pass the comment id in case of manipulating an existing comment.
1693
1694
1695 ## <a id="acknowledgements"></a> Acknowledgements
1696
1697 If a problem is alerted and notified you may signal the other notification
1698 recipients that you are aware of the problem and will handle it.
1699
1700 By sending an acknowledgement to Icinga 2 (using the external command pipe
1701 provided with `ExternalCommandListener` configuration) all future notifications
1702 are suppressed, a new comment is added with the provided description and
1703 a notification with the type `NotificationFilterAcknowledgement` is sent
1704 to all notified users.
1705
1706 ### <a id="expiring-acknowledgements"></a> Expiring Acknowledgements
1707
1708 Once a problem is acknowledged it may disappear from your `handled problems`
1709 dashboard and no-one ever looks at it again since it will suppress
1710 notifications too.
1711
1712 This `fire-and-forget` action is quite common. If you're sure that a
1713 current problem should be resolved in the future at a defined time,
1714 you can define an expiration time when acknowledging the problem.
1715
1716 Icinga 2 will clear the acknowledgement when expired and start to
1717 re-notify if the problem persists.
1718
1719
1720
1721 ## <a id="custom-attributes"></a> Custom Attributes
1722
1723 ### <a id="custom-attributes-apply"></a> Using Custom Attributes for Apply Rules
1724
1725 Custom attributes are not only used at runtime in command definitions to pass
1726 command arguments, but are also a smart way to define patterns and groups
1727 for applying objects for dynamic config generation.
1728
1729 There are several ways of using custom attributes with [apply rules](3-monitoring-basics.md#using-apply):
1730
1731 * As simple attribute literal ([number](10-language-reference.md#numeric-literals), [string](10-language-reference.md#string-literals),
1732 [boolean](10-language-reference.md#boolean-literals)) for expression conditions (`assign where`, `ignore where`)
1733 * As [array](10-language-reference.md#array) or [dictionary](10-language-reference.md#dictionary) attribute with nested values
1734 (e.g. dictionaries in dictionaries) in [apply for](3-monitoring-basics.md#using-apply-for) rules.
1735
1736 Features like [DB IDO](3-monitoring-basics.md#db-ido), Livestatus(#setting-up-livestatus) or StatusData(#status-data)
1737 dump this column as encoded JSON string, and set `is_json` resp. `cv_is_json` to `1`.
1738
1739 If arrays are used in runtime macros (for example `$host.groups$`) all entries
1740 are separated using the `;` character. If an entry contains a semi-colon itself,
1741 it is escaped like this: `entry1;ent\;ry2;entry3`.
1742
1743 ### <a id="runtime-custom-attributes"></a> Using Custom Attributes at Runtime
1744
1745 Custom attributes may be used in command definitions to dynamically change how the command
1746 is executed.
1747
1748 Additionally there are Icinga 2 features such as the [PerfDataWriter](3-monitoring-basics.md#performance-data) feature
1749 which use custom runtime attributes to format their output.
1750
1751 > **Tip**
1752 >
1753 > Custom attributes are identified by the `vars` dictionary attribute as short name.
1754 > Accessing the different attribute keys is possible using the [index accessor](10-language-reference.md#indexer) `.`.
1755
1756 Custom attributes in command definitions or performance data templates are evaluated at
1757 runtime when executing a command. These custom attributes cannot be used somewhere else
1758 for example in other configuration attributes.
1759
1760 Custom attribute values must be either a string, a number, a boolean value or an array.
1761 Dictionaries cannot be used at the time of writing.
1762
1763 Arrays can be used to pass multiple arguments with or without repeating the key string.
1764 This helps passing multiple parameters to check plugins requiring them. Prominent
1765 plugin examples are:
1766
1767 * [check_disk -p](13-icinga-template-library.md#plugin-check-command-disk)
1768 * [check_nrpe -a](13-icinga-template-library.md#plugin-check-command-nrpe)
1769 * [check_nscp -l](13-icinga-template-library.md#plugin-check-command-nscp)
1770 * [check_dns -a](13-icinga-template-library.md#plugin-check-command-dns)
1771
1772 More details on how to use `repeat_key` and other command argument options can be
1773 found in [this section](12-object-types.md#objecttype-checkcommand-arguments).
1774
1775 > **Note**
1776 >
1777 > If a macro value cannot be resolved, be it a single macro, or a recursive macro
1778 > containing an array of macros, the entire command argument is skipped.
1779
1780 This is an example of a command definition which uses user-defined custom attributes:
1781
1782     object CheckCommand "my-icmp" {
1783       import "plugin-check-command"
1784       command = [ "/bin/sudo", PluginDir + "/check_icmp" ]
1785
1786       arguments = {
1787         "-H" = {
1788           value = "$icmp_targets$"
1789           repeat_key = false
1790           order = 1
1791         }
1792         "-w" = "$icmp_wrta$,$icmp_wpl$%"
1793         "-c" = "$icmp_crta$,$icmp_cpl$%"
1794         "-s" = "$icmp_source$"
1795         "-n" = "$icmp_packets$"
1796         "-i" = "$icmp_packet_interval$"
1797         "-I" = "$icmp_target_interval$"
1798         "-m" = "$icmp_hosts_alive$"
1799         "-b" = "$icmp_data_bytes$"
1800         "-t" = "$icmp_timeout$"
1801       }
1802
1803       vars.icmp_wrta = 200.00
1804       vars.icmp_wpl = 40
1805       vars.icmp_crta = 500.00
1806       vars.icmp_cpl = 80
1807
1808       vars.notes = "Requires setuid root or sudo."
1809     }
1810
1811 Custom attribute names used at runtime must be enclosed in two `$` signs,
1812 for example `$address$`.
1813
1814 > **Note**
1815 >
1816 > When using the `$` sign as single character, you need to escape it with an
1817 > additional dollar sign (`$$`).
1818
1819 This example also makes use of the [command arguments](3-monitoring-basics.md#command-arguments) passed
1820 to the command line.
1821
1822 You can integrate the above example `CheckCommand` definition
1823 [passing command argument parameters](3-monitoring-basics.md#command-passing-parameters) like this:
1824
1825     object Host "my-icmp-host" {
1826       import "generic-host"
1827       address = "192.168.1.10"
1828       vars.address_mgmt = "192.168.2.10"
1829       vars.address_web = "192.168.10.10"
1830       vars.icmp_targets = [ "$address$", "$host.vars.address_mgmt$", "$host.vars.address_web$" ]
1831     }
1832
1833     apply Service "my-icmp" {
1834       check_command = "my-icmp"
1835       check_interval = 1m
1836       retry_interval = 30s
1837
1838       vars.icmp_targets = host.vars.icmp_targets
1839
1840       assign where host.vars.icmp_targets
1841     }
1842
1843 ### <a id="runtime-custom-attributes-evaluation-order"></a> Runtime Custom Attributes Evaluation Order
1844
1845 When executing commands Icinga 2 checks the following objects in this order to look
1846 up custom attributes and their respective values:
1847
1848 1. User object (only for notifications)
1849 2. Service object
1850 3. Host object
1851 4. Command object
1852 5. Global custom attributes in the `vars` constant
1853
1854 This execution order allows you to define default values for custom attributes
1855 in your command objects. The `my-ping` command shown above uses this to set
1856 default values for some of the latency thresholds and timeouts.
1857
1858 When using the `my-ping` command you can override some or all of the custom
1859 attributes in the service definition like this:
1860
1861     object Service "ping" {
1862       host_name = "localhost"
1863       check_command = "my-ping"
1864
1865       vars.ping_packets = 10 // Overrides the default value of 5 given in the command
1866     }
1867
1868 If a custom attribute isn't defined anywhere an empty value is used and a warning is
1869 emitted to the Icinga 2 log.
1870
1871 > **Best Practice**
1872 >
1873 > By convention every host should have an `address` attribute. Hosts
1874 > which have an IPv6 address should also have an `address6` attribute.
1875
1876 ### <a id="runtime-custom-attribute-env-vars"></a> Runtime Custom Attributes as Environment Variables
1877
1878 The `env` command object attribute specifies a list of environment variables with values calculated
1879 from either runtime macros or custom attributes which should be exported as environment variables
1880 prior to executing the command.
1881
1882 This is useful for example for hiding sensitive information on the command line output
1883 when passing credentials to database checks:
1884
1885     object CheckCommand "mysql-health" {
1886       import "plugin-check-command"
1887
1888       command = [
1889         PluginDir + "/check_mysql"
1890       ]
1891
1892       arguments = {
1893         "-H" = "$mysql_address$"
1894         "-d" = "$mysql_database$"
1895       }
1896
1897       vars.mysql_address = "$address$"
1898       vars.mysql_database = "icinga"
1899       vars.mysql_user = "icinga_check"
1900       vars.mysql_pass = "password"
1901
1902       env.MYSQLUSER = "$mysql_user$"
1903       env.MYSQLPASS = "$mysql_pass$"
1904     }
1905
1906 ### <a id="multiple-host-addresses-custom-attributes"></a> Multiple Host Addresses using Custom Attributes
1907
1908 The following example defines a `Host` with three different interface addresses defined as
1909 custom attributes in the `vars` dictionary. The `if-eth0` and `if-eth1` services will import
1910 these values into the `address` custom attribute. This attribute is available through the
1911 generic `$address$` runtime macro.
1912
1913     object Host "multi-ip" {
1914       check_command = "dummy"
1915       vars.address_lo = "127.0.0.1"
1916       vars.address_eth0 = "10.0.0.10"
1917       vars.address_eth1 = "192.168.1.10"
1918     }
1919
1920     apply Service "if-eth0" {
1921       import "generic-service"
1922
1923       vars.address = "$host.vars.address_eth0$"
1924       check_command = "my-generic-interface-check"
1925
1926       assign where host.vars.address_eth0 != ""
1927     }
1928
1929     apply Service "if-eth1" {
1930       import "generic-service"
1931
1932       vars.address = "$host.vars.address_eth1$"
1933       check_command = "my-generic-interface-check"
1934
1935       assign where host.vars.address_eth1 != ""
1936     }
1937
1938     object CheckCommand "my-generic-interface-check" {
1939       import "plugin-check-command"
1940
1941       command = "echo \"This would be the service $service.description$ using the address value: $address$\""
1942     }
1943
1944 The `CheckCommand` object is just an example to help you with testing and
1945 understanding the different custom attributes and runtime macros.
1946
1947 ### <a id="modified-attributes"></a> Modified Attributes
1948
1949 Icinga 2 allows you to modify defined object attributes at runtime different to
1950 the local configuration object attributes. These modified attributes are
1951 stored as bit-shifted-value and made available in backends. Icinga 2 stores
1952 modified attributes in its state file and restores them on restart.
1953
1954 Modified Attributes can be reset using external commands.
1955
1956
1957 ## <a id="runtime-macros"></a> Runtime Macros
1958
1959 Next to custom attributes there are additional runtime macros made available by Icinga 2.
1960 These runtime macros reflect the current object state and may change over time while
1961 custom attributes are configured statically (but can be modified at runtime using
1962 external commands).
1963
1964 ### <a id="runtime-macro-evaluation-order"></a> Runtime Macro Evaluation Order
1965
1966 Custom attributes can be accessed at [runtime](3-monitoring-basics.md#runtime-custom-attributes) using their
1967 identifier omitting the `vars.` prefix.
1968 There are special cases when those custom attributes are not set and Icinga 2 provides
1969 a fallback to existing object attributes for example `host.address`.
1970
1971 In the following example the `$address$` macro will be resolved with the value of `vars.address`.
1972
1973     object Host "localhost" {
1974       import "generic-host"
1975       check_command = "my-host-macro-test"
1976       address = "127.0.0.1"
1977       vars.address = "127.2.2.2"
1978     }
1979
1980     object CheckCommand "my-host-macro-test" {
1981       command = "echo \"address: $address$ host.address: $host.address$ host.vars.address: $host.vars.address$\""
1982     }
1983
1984 The check command output will look like
1985
1986     "address: 127.2.2.2 host.address: 127.0.0.1 host.vars.address: 127.2.2.2"
1987
1988 If you alter the host object and remove the `vars.address` line, Icinga 2 will fail to look up `$address$` in the
1989 custom attributes dictionary and then look for the host object's attribute.
1990
1991 The check command output will change to
1992
1993     "address: 127.0.0.1 host.address: 127.0.0.1 host.vars.address: "
1994
1995
1996 The same example can be defined for services overriding the `address` field based on a specific host custom attribute.
1997
1998     object Host "localhost" {
1999       import "generic-host"
2000       address = "127.0.0.1"
2001       vars.macro_address = "127.3.3.3"
2002     }
2003
2004     apply Service "my-macro-test" to Host {
2005       import "generic-service"
2006       check_command = "my-service-macro-test"
2007       vars.address = "$host.vars.macro_address$"
2008
2009       assign where host.address
2010     }
2011
2012     object CheckCommand "my-service-macro-test" {
2013       command = "echo \"address: $address$ host.address: $host.address$ host.vars.macro_address: $host.vars.macro_address$ service.vars.address: $service.vars.address$\""
2014     }
2015
2016 When the service check is executed the output looks like
2017
2018     "address: 127.3.3.3 host.address: 127.0.0.1 host.vars.macro_address: 127.3.3.3 service.vars.address: 127.3.3.3"
2019
2020 That way you can easily override existing macros being accessed by their short name like `$address$` and refrain
2021 from defining multiple check commands (one for `$address$` and one for `$host.vars.macro_address$`).
2022
2023
2024 ### <a id="host-runtime-macros"></a> Host Runtime Macros
2025
2026 The following host custom attributes are available in all commands that are executed for
2027 hosts or services:
2028
2029   Name                         | Description
2030   -----------------------------|--------------
2031   host.name                    | The name of the host object.
2032   host.display_name            | The value of the `display_name` attribute.
2033   host.state                   | The host's current state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
2034   host.state_id                | The host's current state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
2035   host.state_type              | The host's current state type. Can be one of `SOFT` and `HARD`.
2036   host.check_attempt           | The current check attempt number.
2037   host.max_check_attempts      | The maximum number of checks which are executed before changing to a hard state.
2038   host.last_state              | The host's previous state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
2039   host.last_state_id           | The host's previous state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
2040   host.last_state_type         | The host's previous state type. Can be one of `SOFT` and `HARD`.
2041   host.last_state_change       | The last state change's timestamp.
2042   host.duration_sec            | The time since the last state change.
2043   host.latency                 | The host's check latency.
2044   host.execution_time          | The host's check execution time.
2045   host.output                  | The last check's output.
2046   host.perfdata                | The last check's performance data.
2047   host.last_check              | The timestamp when the last check was executed.
2048   host.num_services            | Number of services associated with the host.
2049   host.num_services_ok         | Number of services associated with the host which are in an `OK` state.
2050   host.num_services_warning    | Number of services associated with the host which are in a `WARNING` state.
2051   host.num_services_unknown    | Number of services associated with the host which are in an `UNKNOWN` state.
2052   host.num_services_critical   | Number of services associated with the host which are in a `CRITICAL` state.
2053
2054 ### <a id="service-runtime-macros"></a> Service Runtime Macros
2055
2056 The following service macros are available in all commands that are executed for
2057 services:
2058
2059   Name                       | Description
2060   ---------------------------|--------------
2061   service.name               | The short name of the service object.
2062   service.display_name       | The value of the `display_name` attribute.
2063   service.check_command      | The short name of the command along with any arguments to be used for the check.
2064   service.state              | The service's current state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
2065   service.state_id           | The service's current state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
2066   service.state_type         | The service's current state type. Can be one of `SOFT` and `HARD`.
2067   service.check_attempt      | The current check attempt number.
2068   service.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
2069   service.last_state         | The service's previous state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
2070   service.last_state_id      | The service's previous state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
2071   service.last_state_type    | The service's previous state type. Can be one of `SOFT` and `HARD`.
2072   service.last_state_change  | The last state change's timestamp.
2073   service.duration_sec       | The time since the last state change.
2074   service.latency            | The service's check latency.
2075   service.execution_time     | The service's check execution time.
2076   service.output             | The last check's output.
2077   service.perfdata           | The last check's performance data.
2078   service.last_check         | The timestamp when the last check was executed.
2079
2080 ### <a id="command-runtime-macros"></a> Command Runtime Macros
2081
2082 The following custom attributes are available in all commands:
2083
2084   Name                   | Description
2085   -----------------------|--------------
2086   command.name           | The name of the command object.
2087
2088 ### <a id="user-runtime-macros"></a> User Runtime Macros
2089
2090 The following custom attributes are available in all commands that are executed for
2091 users:
2092
2093   Name                   | Description
2094   -----------------------|--------------
2095   user.name              | The name of the user object.
2096   user.display_name      | The value of the display_name attribute.
2097
2098 ### <a id="notification-runtime-macros"></a> Notification Runtime Macros
2099
2100   Name                   | Description
2101   -----------------------|--------------
2102   notification.type      | The type of the notification.
2103   notification.author    | The author of the notification comment, if existing.
2104   notification.comment   | The comment of the notification, if existing.
2105
2106 ### <a id="global-runtime-macros"></a> Global Runtime Macros
2107
2108 The following macros are available in all executed commands:
2109
2110   Name                   | Description
2111   -----------------------|--------------
2112   icinga.timet           | Current UNIX timestamp.
2113   icinga.long_date_time  | Current date and time including timezone information. Example: `2014-01-03 11:23:08 +0000`
2114   icinga.short_date_time | Current date and time. Example: `2014-01-03 11:23:08`
2115   icinga.date            | Current date. Example: `2014-01-03`
2116   icinga.time            | Current time including timezone information. Example: `11:23:08 +0000`
2117   icinga.uptime          | Current uptime of the Icinga 2 process.
2118
2119 The following macros provide global statistics:
2120
2121   Name                              | Description
2122   ----------------------------------|--------------
2123   icinga.num_services_ok            | Current number of services in state 'OK'.
2124   icinga.num_services_warning       | Current number of services in state 'Warning'.
2125   icinga.num_services_critical      | Current number of services in state 'Critical'.
2126   icinga.num_services_unknown       | Current number of services in state 'Unknown'.
2127   icinga.num_services_pending       | Current number of pending services.
2128   icinga.num_services_unreachable   | Current number of unreachable services.
2129   icinga.num_services_flapping      | Current number of flapping services.
2130   icinga.num_services_in_downtime   | Current number of services in downtime.
2131   icinga.num_services_acknowledged  | Current number of acknowledged service problems.
2132   icinga.num_hosts_up               | Current number of hosts in state 'Up'.
2133   icinga.num_hosts_down             | Current number of hosts in state 'Down'.
2134   icinga.num_hosts_unreachable      | Current number of unreachable hosts.
2135   icinga.num_hosts_flapping         | Current number of flapping hosts.
2136   icinga.num_hosts_in_downtime      | Current number of hosts in downtime.
2137   icinga.num_hosts_acknowledged     | Current number of acknowledged host problems.
2138
2139
2140 ## <a id="check-result-freshness"></a> Check Result Freshness
2141
2142 In Icinga 2 active check freshness is enabled by default. It is determined by the
2143 `check_interval` attribute and no incoming check results in that period of time.
2144
2145     threshold = last check execution time + check interval
2146
2147 Passive check freshness is calculated from the `check_interval` attribute if set.
2148
2149     threshold = last check result time + check interval
2150
2151 If the freshness checks are invalid, a new check is executed defined by the
2152 `check_command` attribute.
2153
2154
2155 ## <a id="check-flapping"></a> Check Flapping
2156
2157 The flapping algorithm used in Icinga 2 does not store the past states but
2158 calculcates the flapping threshold from a single value based on counters and
2159 half-life values. Icinga 2 compares the value with a single flapping threshold
2160 configuration attribute named `flapping_threshold`.
2161
2162 Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
2163
2164
2165 ## <a id="volatile-services"></a> Volatile Services
2166
2167 By default all services remain in a non-volatile state. When a problem
2168 occurs, the `SOFT` state applies and once `max_check_attempts` attribute
2169 is reached with the check counter, a `HARD` state transition happens.
2170 Notifications are only triggered by `HARD` state changes and are then
2171 re-sent defined by the `interval` attribute.
2172
2173 It may be reasonable to have a volatile service which stays in a `HARD`
2174 state type if the service stays in a `NOT-OK` state. That way each
2175 service recheck will automatically trigger a notification unless the
2176 service is acknowledged or in a scheduled downtime.
2177
2178
2179 ## <a id="external-commands"></a> External Commands
2180
2181 Icinga 2 provides an external command pipe for processing commands
2182 triggering specific actions (for example rescheduling a service check
2183 through the web interface).
2184
2185 In order to enable the `ExternalCommandListener` configuration use the
2186 following command and restart Icinga 2 afterwards:
2187
2188     # icinga2 feature enable command
2189
2190 Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd`
2191 using the default configuration.
2192
2193 Web interfaces and other Icinga addons are able to send commands to
2194 Icinga 2 through the external command pipe, for example for rescheduling
2195 a forced service check:
2196
2197     # /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd
2198
2199     # tail -f /var/log/messages
2200
2201     Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885
2202     Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4'
2203
2204
2205 ### <a id="external-command-list"></a> External Command List
2206
2207 A list of currently supported external commands can be found [here](14-appendix.md#external-commands-list-detail).
2208
2209 Detailed information on the commands and their required parameters can be found
2210 on the [Icinga 1.x documentation](http://docs.icinga.org/latest/en/extcommands2.html).
2211
2212 ## <a id="logging"></a> Logging
2213
2214 Icinga 2 supports three different types of logging:
2215
2216 * File logging
2217 * Syslog (on *NIX-based operating systems)
2218 * Console logging (`STDOUT` on tty)
2219
2220 You can enable additional loggers using the `icinga2 feature enable`
2221 and `icinga2 feature disable` commands to configure loggers:
2222
2223 Feature  | Description
2224 ---------|------------
2225 debuglog | Debug log (path: `/var/log/icinga2/debug.log`, severity: `debug` or higher)
2226 mainlog  | Main log (path: `/var/log/icinga2/icinga2.log`, severity: `information` or higher)
2227 syslog   | Syslog (severity: `warning` or higher)
2228
2229 By default file the `mainlog` feature is enabled. When running Icinga 2
2230 on a terminal log messages with severity `information` or higher are
2231 written to the console.
2232
2233
2234 ## <a id="performance-data"></a> Performance Data
2235
2236 When a host or service check is executed plugins should provide so-called
2237 `performance data`. Next to that additional check performance data
2238 can be fetched using Icinga 2 runtime macros such as the check latency
2239 or the current service state (or additional custom attributes).
2240
2241 The performance data can be passed to external applications which aggregate and
2242 store them in their backends. These tools usually generate graphs for historical
2243 reporting and trending.
2244
2245 Well-known addons processing Icinga performance data are PNP4Nagios,
2246 inGraph and Graphite.
2247
2248 ### <a id="writing-performance-data-files"></a> Writing Performance Data Files
2249
2250 PNP4Nagios, inGraph and Graphios use performance data collector daemons to fetch
2251 the current performance files for their backend updates.
2252
2253 Therefore the Icinga 2 `PerfdataWriter` object allows you to define
2254 the output template format for host and services backed with Icinga 2
2255 runtime vars.
2256
2257     host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.checkcommand$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.statetype$"
2258     service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.name$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.checkcommand$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.statetype$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.statetype$"
2259
2260 The default templates are already provided with the Icinga 2 feature configuration
2261 which can be enabled using
2262
2263     # icinga2 feature enable perfdata
2264
2265 By default all performance data files are rotated in a 15 seconds interval into
2266 the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.<timestamp>` and
2267 `service-perfdata.<timestamp>`.
2268 External collectors need to parse the rotated performance data files and then
2269 remove the processed files.
2270
2271 ### <a id="graphite-carbon-cache-writer"></a> Graphite Carbon Cache Writer
2272
2273 While there are some Graphite collector scripts and daemons like Graphios available for
2274 Icinga 1.x it's more reasonable to directly process the check and plugin performance
2275 in memory in Icinga 2. Once there are new metrics available, Icinga 2 will directly
2276 write them to the defined Graphite Carbon daemon tcp socket.
2277
2278 You can enable the feature using
2279
2280     # icinga2 feature enable graphite
2281
2282 By default the `GraphiteWriter` object expects the Graphite Carbon Cache to listen at
2283 `127.0.0.1` on TCP port `2003`.
2284
2285 The current naming schema is
2286
2287     icinga.<hostname>.<metricname>
2288     icinga.<hostname>.<servicename>.<metricname>
2289
2290 You can customize the metric prefix name by using the `host_name_template` and
2291 `service_name_template` configuration attributes.
2292
2293 The example below uses [runtime macros](3-monitoring-basics.md#runtime-macros) and a
2294 [global constant](10-language-reference.md#constants) named `GraphiteEnv`. The constant name
2295 is freely definable and should be put in the [constants.conf](2-getting-started.md#constants-conf) file.
2296
2297     const GraphiteEnv = "icinga.env1"
2298
2299     object GraphiteWriter "graphite" {
2300       host_name_template = GraphiteEnv + ".$host.name$"
2301       service_name_template = GraphiteEnv + ".$host.name$.$service.name$"
2302     }
2303
2304 To make sure Icinga 2 writes a valid label into Graphite some characters are replaced
2305 with `_` in the target name:
2306
2307     \/.-  (and space)
2308
2309 The resulting name in Graphite might look like:
2310
2311     www-01 / http-cert / response time
2312     icinga.www_01.http_cert.response_time
2313
2314 In addition to the performance data retrieved from the check plugin, Icinga 2 sends
2315 internal check statistic data to Graphite:
2316
2317   metric             | description
2318   -------------------|------------------------------------------
2319   current_attempt    | current check attempt
2320   max_check_attempts | maximum check attempts until the hard state is reached
2321   reachable          | checked object is reachable
2322   downtime_depth     | number of downtimes this object is in
2323   execution_time     | check execution time
2324   latency            | check latency
2325   state              | current state of the checked object
2326   state_type         | 0=SOFT, 1=HARD state
2327
2328 The following example illustrates how to configure the storage-schemas for Graphite Carbon
2329 Cache. Please make sure that the order is correct because the first match wins.
2330
2331     [icinga_internals]
2332     pattern = ^icinga\..*\.(max_check_attempts|reachable|current_attempt|execution_time|latency|state|state_type)
2333     retentions = 5m:7d
2334
2335     [icinga_default]
2336     # intervals like PNP4Nagios uses them per default
2337     pattern = ^icinga\.
2338     retentions = 1m:2d,5m:10d,30m:90d,360m:4y
2339
2340 ### <a id="gelfwriter"></a> GELF Writer
2341
2342 The `Graylog Extended Log Format` (short: [GELF](http://www.graylog2.org/resources/gelf))
2343 can be used to send application logs directly to a TCP socket.
2344
2345 While it has been specified by the [graylog2](http://www.graylog2.org/) project as their
2346 [input resource standard](http://www.graylog2.org/resources/gelf), other tools such as
2347 [Logstash](http://www.logstash.net) also support `GELF` as
2348 [input type](http://logstash.net/docs/latest/inputs/gelf).
2349
2350 You can enable the feature using
2351
2352     # icinga2 feature enable gelf
2353
2354 By default the `GelfWriter` object expects the GELF receiver to listen at `127.0.0.1` on TCP port `12201`.
2355 The default `source`  attribute is set to `icinga2`. You can customize that for your needs if required.
2356
2357 Currently these events are processed:
2358 * Check results
2359 * State changes
2360 * Notifications
2361
2362
2363 ## <a id="status-data"></a> Status Data
2364
2365 Icinga 1.x writes object configuration data and status data in a cyclic
2366 interval to its `objects.cache` and `status.dat` files. Icinga 2 provides
2367 the `StatusDataWriter` object which dumps all configuration objects and
2368 status updates in a regular interval.
2369
2370     # icinga2 feature enable statusdata
2371
2372 Icinga 1.x Classic UI requires this data set as part of its backend.
2373
2374 > **Note**
2375 >
2376 > If you are not using any web interface or addon which uses these files
2377 > you can safely disable this feature.
2378
2379
2380 ## <a id="compat-logging"></a> Compat Logging
2381
2382 The Icinga 1.x log format is considered being the `Compat Log`
2383 in Icinga 2 provided with the `CompatLogger` object.
2384
2385 These logs are not only used for informational representation in
2386 external web interfaces parsing the logs, but also to generate
2387 SLA reports and trends in Icinga 1.x Classic UI. Furthermore the
2388 [Livestatus](7-livestatus.md#setting-up-livestatus) feature uses these logs for answering queries to
2389 historical tables.
2390
2391 The `CompatLogger` object can be enabled with
2392
2393     # icinga2 feature enable compatlog
2394
2395 By default, the Icinga 1.x log file called `icinga.log` is located
2396 in `/var/log/icinga2/compat`. Rotated log files are moved into
2397 `var/log/icinga2/compat/archives`.
2398
2399 The format cannot be changed without breaking compatibility to
2400 existing log parsers.
2401
2402     # tail -f /var/log/icinga2/compat/icinga.log
2403
2404     [1382115688] LOG ROTATION: HOURLY
2405     [1382115688] LOG VERSION: 2.0
2406     [1382115688] HOST STATE: CURRENT;localhost;UP;HARD;1;
2407     [1382115688] SERVICE STATE: CURRENT;localhost;disk;WARNING;HARD;1;
2408     [1382115688] SERVICE STATE: CURRENT;localhost;http;OK;HARD;1;
2409     [1382115688] SERVICE STATE: CURRENT;localhost;load;OK;HARD;1;
2410     [1382115688] SERVICE STATE: CURRENT;localhost;ping4;OK;HARD;1;
2411     [1382115688] SERVICE STATE: CURRENT;localhost;ping6;OK;HARD;1;
2412     [1382115688] SERVICE STATE: CURRENT;localhost;processes;WARNING;HARD;1;
2413     [1382115688] SERVICE STATE: CURRENT;localhost;ssh;OK;HARD;1;
2414     [1382115688] SERVICE STATE: CURRENT;localhost;users;OK;HARD;1;
2415     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;disk;1382115705
2416     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;http;1382115705
2417     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;load;1382115705
2418     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382115705
2419     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ping6;1382115705
2420     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;processes;1382115705
2421     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ssh;1382115705
2422     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;users;1382115705
2423     [1382115731] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;localhost;ping6;2;critical test|
2424     [1382115731] SERVICE ALERT: localhost;ping6;CRITICAL;SOFT;2;critical test
2425
2426
2427
2428
2429 ## <a id="db-ido"></a> DB IDO
2430
2431 The IDO (Icinga Data Output) modules for Icinga 2 take care of exporting all
2432 configuration and status information into a database. The IDO database is used
2433 by a number of projects including Icinga Web 1.x and 2.
2434
2435 Details on the installation can be found in the [Configuring DB IDO](2-getting-started.md#configuring-db-ido)
2436 chapter. Details on the configuration can be found in the
2437 [IdoMysqlConnection](12-object-types.md#objecttype-idomysqlconnection) and
2438 [IdoPgsqlConnection](12-object-types.md#objecttype-idopgsqlconnection)
2439 object configuration documentation.
2440 The DB IDO feature supports [High Availability](4-monitoring-remote-systems.md#high-availability-db-ido) in
2441 the Icinga 2 cluster.
2442
2443 The following example query checks the health of the current Icinga 2 instance
2444 writing its current status to the DB IDO backend table `icinga_programstatus`
2445 every 10 seconds. By default it checks 60 seconds into the past which is a reasonable
2446 amount of time - adjust it for your requirements. If the condition is not met,
2447 the query returns an empty result.
2448
2449 > **Tip**
2450 >
2451 > Use [check plugins](6-addons-plugins.md#plugins) to monitor the backend.
2452
2453 Replace the `default` string with your instance name, if different.
2454
2455 Example for MySQL:
2456
2457     # mysql -u root -p icinga -e "SELECT status_update_time FROM icinga_programstatus ps
2458       JOIN icinga_instances i ON ps.instance_id=i.instance_id
2459       WHERE (UNIX_TIMESTAMP(ps.status_update_time) > UNIX_TIMESTAMP(NOW())-60)
2460       AND i.instance_name='default';"
2461
2462     +---------------------+
2463     | status_update_time  |
2464     +---------------------+
2465     | 2014-05-29 14:29:56 |
2466     +---------------------+
2467
2468
2469 Example for PostgreSQL:
2470
2471     # export PGPASSWORD=icinga; psql -U icinga -d icinga -c "SELECT ps.status_update_time FROM icinga_programstatus AS ps
2472       JOIN icinga_instances AS i ON ps.instance_id=i.instance_id
2473       WHERE ((SELECT extract(epoch from status_update_time) FROM icinga_programstatus) > (SELECT extract(epoch from now())-60))
2474       AND i.instance_name='default'";
2475
2476     status_update_time
2477     ------------------------
2478      2014-05-29 15:11:38+02
2479     (1 Zeile)
2480
2481
2482 A detailed list on the available table attributes can be found in the [DB IDO Schema documentation](14-appendix.md#schema-db-ido).
2483
2484
2485 ## <a id="check-result-files"></a> Check Result Files
2486
2487 Icinga 1.x writes its check result files to a temporary spool directory
2488 where they are processed in a regular interval.
2489 While this is extremely inefficient in performance regards it has been
2490 rendered useful for passing passive check results directly into Icinga 1.x
2491 skipping the external command pipe.
2492
2493 Several clustered/distributed environments and check-aggregation addons
2494 use that method. In order to support step-by-step migration of these
2495 environments, Icinga 2 ships the `CheckResultReader` object.
2496
2497 There is no feature configuration available, but it must be defined
2498 on-demand in your Icinga 2 objects configuration.
2499
2500     object CheckResultReader "reader" {
2501       spool_dir = "/data/check-results"
2502     }