1 # <a id="monitoring-basics"></a> Monitoring Basics
3 This part of the Icinga 2 documentation provides an overview of all the basic
4 monitoring concepts you need to know to run Icinga 2.
5 Keep in mind these examples are made with a linux server in mind, if you are
6 using Windows you will need to change the services accordingly. See the [ITL reference](7-icinga-template-library.md#windows-plugins)
7 for further information.
9 ## <a id="hosts-services"></a> Hosts and Services
11 Icinga 2 can be used to monitor the availability of hosts and services. Hosts
12 and services can be virtually anything which can be checked in some way:
14 * Network services (HTTP, SMTP, SNMP, SSH, etc.)
18 * Other local or network-accessible services
20 Host objects provide a mechanism to group services that are running
21 on the same physical device.
23 Here is an example of a host object which defines two child services:
25 object Host "my-server1" {
27 check_command = "hostalive"
30 object Service "ping4" {
31 host_name = "my-server1"
32 check_command = "ping4"
35 object Service "http" {
36 host_name = "my-server1"
37 check_command = "http"
40 The example creates two services `ping4` and `http` which belong to the
43 It also specifies that the host should perform its own check using the `hostalive`
46 The `address` attribute is used by check commands to determine which network
47 address is associated with the host object.
49 Details on troubleshooting check problems can be found [here](16-troubleshooting.md#troubleshooting).
51 ### <a id="host-states"></a> Host States
53 Hosts can be in any of the following states:
56 ------------|--------------
57 UP | The host is available.
58 DOWN | The host is unavailable.
60 ### <a id="service-states"></a> Service States
62 Services can be in any of the following states:
65 ------------|--------------
66 OK | The service is working properly.
67 WARNING | The service is experiencing some problems but is still considered to be in working condition.
68 CRITICAL | The service is in a critical state.
69 UNKNOWN | The check could not determine the service's state.
71 ### <a id="hard-soft-states"></a> Hard and Soft States
73 When detecting a problem with a host/service Icinga re-checks the object a number of
74 times (based on the `max_check_attempts` and `retry_interval` settings) before sending
75 notifications. This ensures that no unnecessary notifications are sent for
76 transient failures. During this time the object is in a `SOFT` state.
78 After all re-checks have been executed and the object is still in a non-OK
79 state the host/service switches to a `HARD` state and notifications are sent.
82 ------------|--------------
83 HARD | The host/service's state hasn't recently changed.
84 SOFT | The host/service has recently changed state and is being re-checked.
86 ### <a id="host-service-checks"></a> Host and Service Checks
88 Hosts and services determine their state by running checks in a regular interval.
90 object Host "router" {
91 check_command = "hostalive"
95 The `hostalive` command is one of several built-in check commands. It sends ICMP
96 echo requests to the IP address specified in the `address` attribute to determine
97 whether a host is online.
99 A number of other [built-in check commands](7-icinga-template-library.md#plugin-check-commands) are also
100 available. In addition to these commands the next few chapters will explain in
101 detail how to set up your own check commands.
104 ## <a id="object-inheritance-using-templates"></a> Templates
106 Templates may be used to apply a set of identical attributes to more than one
109 template Service "generic-service" {
110 max_check_attempts = 3
113 enable_perfdata = true
116 apply Service "ping4" {
117 import "generic-service"
119 check_command = "ping4"
121 assign where host.address
124 apply Service "ping6" {
125 import "generic-service"
127 check_command = "ping6"
129 assign where host.address6
133 In this example the `ping4` and `ping6` services inherit properties from the
134 template `generic-service`.
136 Objects as well as templates themselves can import an arbitrary number of
137 other templates. Attributes inherited from a template can be overridden in the
140 You can also import existing non-template objects. Note that templates
141 and objects share the same namespace, i.e. you can't define a template
142 that has the same name like an object.
145 ## <a id="custom-attributes"></a> Custom Attributes
147 In addition to built-in attributes you can define your own attributes:
149 object Host "localhost" {
153 Valid values for custom attributes include:
155 * Strings and numbers
156 * Arrays and dictionaries
159 ### <a id="custom-attributes-functions"></a> Functions as Custom Attributes
161 Icinga 2 lets you specify functions for custom attributes. The special case here
162 is that whenever Icinga 2 needs the value for such a custom attribute it runs
163 the function and uses whatever value the function returns:
165 object CheckCommand "random-value" {
166 import "plugin-check-command"
168 command = [ PluginDir + "/check_dummy", "0", "$text$" ]
170 vars.text = {{ Math.random() * 100 }}
173 This example uses the [abbreviated lambda syntax](19-language-reference.md#nullary-lambdas).
175 These functions have access to a number of variables:
177 Variable | Description
178 -------------|---------------
179 user | The User object (for notifications).
180 service | The Service object (for service checks/notifications/event handlers).
181 host | The Host object.
182 command | The command object (e.g. a CheckCommand object for checks).
186 vars.text = {{ host.check_interval }}
188 In addition to these variables the `macro` function can be used to retrieve the
189 value of arbitrary macro expressions:
192 if (macro("$address$") == "127.0.0.1") {
193 log("Running a check for localhost!")
199 Acessing object attributes at runtime inside these functions is described in the
200 [advanced topics](4-advanced-topics.md#access-object-attributes-at-runtime) chapter.
202 ## <a id="runtime-macros"></a> Runtime Macros
204 Macros can be used to access other objects' attributes at runtime. For example they
205 are used in command definitions to figure out which IP address a check should be
208 object CheckCommand "my-ping" {
209 import "plugin-check-command"
211 command = [ PluginDir + "/check_ping", "-H", "$ping_address$" ]
214 "-w" = "$ping_wrta$,$ping_wpl$%"
215 "-c" = "$ping_crta$,$ping_cpl$%"
216 "-p" = "$ping_packets$"
219 vars.ping_address = "$address$"
227 vars.ping_packets = 5
230 object Host "router" {
231 check_command = "my-ping"
235 In this example we are using the `$address$` macro to refer to the host's `address`
238 We can also directly refer to custom attributes, e.g. by using `$ping_wrta$`. Icinga
239 automatically tries to find the closest match for the attribute you specified. The
240 exact rules for this are explained in the next section.
243 ### <a id="macro-evaluation-order"></a> Evaluation Order
245 When executing commands Icinga 2 checks the following objects in this order to look
246 up macros and their respective values:
248 1. User object (only for notifications)
252 5. Global custom attributes in the `Vars` constant
254 This execution order allows you to define default values for custom attributes
255 in your command objects.
257 Here's how you can override the custom attribute `ping_packets` from the previous
260 object Service "ping" {
261 host_name = "localhost"
262 check_command = "my-ping"
264 vars.ping_packets = 10 // Overrides the default value of 5 given in the command
267 If a custom attribute isn't defined anywhere an empty value is used and a warning is
268 written to the Icinga 2 log.
270 You can also directly refer to a specific attribute - thereby ignoring these evaluation
271 rules - by specifying the full attribute name:
273 $service.vars.ping_wrta$
275 This retrieves the value of the `ping_wrta` custom attribute for the service. This
276 returns an empty value if the service does not have such a custom attribute no matter
277 whether another object such as the host has this attribute.
280 ### <a id="host-runtime-macros"></a> Host Runtime Macros
282 The following host custom attributes are available in all commands that are executed for
286 -----------------------------|--------------
287 host.name | The name of the host object.
288 host.display_name | The value of the `display_name` attribute.
289 host.state | The host's current state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
290 host.state_id | The host's current state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
291 host.state_type | The host's current state type. Can be one of `SOFT` and `HARD`.
292 host.check_attempt | The current check attempt number.
293 host.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
294 host.last_state | The host's previous state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
295 host.last_state_id | The host's previous state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
296 host.last_state_type | The host's previous state type. Can be one of `SOFT` and `HARD`.
297 host.last_state_change | The last state change's timestamp.
298 host.downtime_depth | The number of active downtimes.
299 host.duration_sec | The time since the last state change.
300 host.latency | The host's check latency.
301 host.execution_time | The host's check execution time.
302 host.output | The last check's output.
303 host.perfdata | The last check's performance data.
304 host.last_check | The timestamp when the last check was executed.
305 host.check_source | The monitoring instance that performed the last check.
306 host.num_services | Number of services associated with the host.
307 host.num_services_ok | Number of services associated with the host which are in an `OK` state.
308 host.num_services_warning | Number of services associated with the host which are in a `WARNING` state.
309 host.num_services_unknown | Number of services associated with the host which are in an `UNKNOWN` state.
310 host.num_services_critical | Number of services associated with the host which are in a `CRITICAL` state.
312 ### <a id="service-runtime-macros"></a> Service Runtime Macros
314 The following service macros are available in all commands that are executed for
318 ---------------------------|--------------
319 service.name | The short name of the service object.
320 service.display_name | The value of the `display_name` attribute.
321 service.check_command | The short name of the command along with any arguments to be used for the check.
322 service.state | The service's current state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
323 service.state_id | The service's current state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
324 service.state_type | The service's current state type. Can be one of `SOFT` and `HARD`.
325 service.check_attempt | The current check attempt number.
326 service.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
327 service.last_state | The service's previous state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
328 service.last_state_id | The service's previous state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
329 service.last_state_type | The service's previous state type. Can be one of `SOFT` and `HARD`.
330 service.last_state_change | The last state change's timestamp.
331 service.downtime_depth | The number of active downtimes.
332 service.duration_sec | The time since the last state change.
333 service.latency | The service's check latency.
334 service.execution_time | The service's check execution time.
335 service.output | The last check's output.
336 service.perfdata | The last check's performance data.
337 service.last_check | The timestamp when the last check was executed.
338 service.check_source | The monitoring instance that performed the last check.
340 ### <a id="command-runtime-macros"></a> Command Runtime Macros
342 The following custom attributes are available in all commands:
345 -----------------------|--------------
346 command.name | The name of the command object.
348 ### <a id="user-runtime-macros"></a> User Runtime Macros
350 The following custom attributes are available in all commands that are executed for
354 -----------------------|--------------
355 user.name | The name of the user object.
356 user.display_name | The value of the display_name attribute.
358 ### <a id="notification-runtime-macros"></a> Notification Runtime Macros
361 -----------------------|--------------
362 notification.type | The type of the notification.
363 notification.author | The author of the notification comment, if existing.
364 notification.comment | The comment of the notification, if existing.
366 ### <a id="global-runtime-macros"></a> Global Runtime Macros
368 The following macros are available in all executed commands:
371 -----------------------|--------------
372 icinga.timet | Current UNIX timestamp.
373 icinga.long_date_time | Current date and time including timezone information. Example: `2014-01-03 11:23:08 +0000`
374 icinga.short_date_time | Current date and time. Example: `2014-01-03 11:23:08`
375 icinga.date | Current date. Example: `2014-01-03`
376 icinga.time | Current time including timezone information. Example: `11:23:08 +0000`
377 icinga.uptime | Current uptime of the Icinga 2 process.
379 The following macros provide global statistics:
382 ----------------------------------|--------------
383 icinga.num_services_ok | Current number of services in state 'OK'.
384 icinga.num_services_warning | Current number of services in state 'Warning'.
385 icinga.num_services_critical | Current number of services in state 'Critical'.
386 icinga.num_services_unknown | Current number of services in state 'Unknown'.
387 icinga.num_services_pending | Current number of pending services.
388 icinga.num_services_unreachable | Current number of unreachable services.
389 icinga.num_services_flapping | Current number of flapping services.
390 icinga.num_services_in_downtime | Current number of services in downtime.
391 icinga.num_services_acknowledged | Current number of acknowledged service problems.
392 icinga.num_hosts_up | Current number of hosts in state 'Up'.
393 icinga.num_hosts_down | Current number of hosts in state 'Down'.
394 icinga.num_hosts_unreachable | Current number of unreachable hosts.
395 icinga.num_hosts_flapping | Current number of flapping hosts.
396 icinga.num_hosts_in_downtime | Current number of hosts in downtime.
397 icinga.num_hosts_acknowledged | Current number of acknowledged host problems.
400 ## <a id="using-apply"></a> Apply Rules
402 Instead of assigning each object ([Service](6-object-types.md#objecttype-service),
403 [Notification](6-object-types.md#objecttype-notification), [Dependency](6-object-types.md#objecttype-dependency),
404 [ScheduledDowntime](6-object-types.md#objecttype-scheduleddowntime))
405 based on attribute identifiers for example `host_name` objects can be [applied](19-language-reference.md#apply).
407 Before you start using the apply rules keep the following in mind:
409 * Define the best match.
410 * A set of unique [custom attributes](#custom-attributes-apply) for these hosts/services?
411 * Or [group](3-monitoring-basics.md#groups) memberships, e.g. a host being a member of a hostgroup, applying services to it?
412 * A generic pattern [match](19-language-reference.md#function-calls) on the host/service name?
413 * [Multiple expressions combined](3-monitoring-basics.md#using-apply-expressions) with `&&` or `||` [operators](19-language-reference.md#expression-operators)
414 * All expressions must return a boolean value (an empty string is equal to `false` e.g.)
418 > You can set/override object attributes in apply rules using the respectively available
419 > objects in that scope (host and/or service objects).
421 [Custom attributes](3-monitoring-basics.md#custom-attributes) can also store nested dictionaries and arrays. That way you can use them
422 for not only matching for their existance or values in apply expressions, but also assign
423 ("inherit") their values into the generated objected from apply rules.
425 * [Apply services to hosts](3-monitoring-basics.md#using-apply-services)
426 * [Apply notifications to hosts and services](3-monitoring-basics.md#using-apply-notifications)
427 * [Apply dependencies to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes)
428 * [Apply scheduled downtimes to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes)
430 A more advanced example is using [apply with for loops on arrays or
431 dictionaries](#using-apply-for) for example provided by
432 [custom atttributes](#custom-attributes-apply) or groups.
436 > Building configuration in that dynamic way requires detailed information
437 > of the generated objects. Use the `object list` [CLI command](8-cli-commands.md#cli-command-object)
438 > after successful [configuration validation](8-cli-commands.md#config-validation).
441 ### <a id="using-apply-expressions"></a> Apply Rules Expressions
443 You can use simple or advanced combinations of apply rule expressions. Each
444 expression must evaluate into the boolean `true` value. An empty string
445 will be for instance interpreted as `false`. In a similar fashion undefined
446 attributes will return `false`.
450 assign where host.vars.attribute_does_not_exist
452 Multiple `assign where` condition rows are evaluated as `OR` condition.
454 You can combine multiple expressions for matching only a subset of objects. In some cases,
455 you want to be able to add more than one assign/ignore where expression which matches
456 a specific condition. To achieve this you can use the logical `and` and `or` operators.
459 Match all `*mysql*` patterns in the host name and (`&&`) custom attribute `prod_mysql_db`
460 matches the `db-*` pattern. All hosts with the custom attribute `test_server` set to `true`
461 should be ignored, or any host name ending with `*internal` pattern.
463 object HostGroup "mysql-server" {
464 display_name = "MySQL Server"
466 assign where match("*mysql*", host.name) && match("db-*", host.vars.prod_mysql_db)
467 ignore where host.vars.test_server == true
468 ignore where match("*internal", host.name)
471 Similar example for advanced notification apply rule filters: If the service
472 attribute `notes` contains the `has gold support 24x7` string `AND` one of the
473 two condition passes: Either the `customer` host custom attribute is set to `customer-xy`
474 `OR` the host custom attribute `always_notify` is set to `true`.
476 The notification is ignored for services whose host name ends with `*internal`
477 `OR` the `priority` custom attribute is [less than](19-language-reference.md#expression-operators) `2`.
479 template Notification "cust-xy-notification" {
480 users = [ "noc-xy", "mgmt-xy" ]
481 command = "mail-service-notification"
484 apply Notification "notify-cust-xy-mysql" to Service {
485 import "cust-xy-notification"
487 assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true
488 ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true)
494 ### <a id="using-apply-services"></a> Apply Services to Hosts
496 The sample configuration already includes a detailed example in [hosts.conf](5-configuring-icinga-2.md#hosts-conf)
497 and [services.conf](5-configuring-icinga-2.md#services-conf) for this use case.
499 The example for `ssh` applies a service object to all hosts with the `address`
500 attribute being defined and the custom attribute `os` set to the string `Linux` in `vars`.
502 apply Service "ssh" {
503 import "generic-service"
505 check_command = "ssh"
507 assign where host.address && host.vars.os == "Linux"
511 Other detailed scenario examples are used in their respective chapters, for example
512 [apply services with custom command arguments](#using-apply-services-command-arguments).
514 ### <a id="using-apply-notifications"></a> Apply Notifications to Hosts and Services
516 Notifications are applied to specific targets (`Host` or `Service`) and work in a similar
520 apply Notification "mail-noc" to Service {
521 import "mail-service-notification"
523 user_groups = [ "noc" ]
525 assign where host.vars.notification.mail
529 In this example the `mail-noc` notification will be created as object for all services having the
530 `notification.mail` custom attribute defined. The notification command is set to `mail-service-notification`
531 and all members of the user group `noc` will get notified.
533 ### <a id="using-apply-dependencies"></a> Apply Dependencies to Hosts and Services
535 Detailed examples can be found in the [dependencies](3-monitoring-basics.md#dependencies) chapter.
537 ### <a id="using-apply-scheduledowntimes"></a> Apply Recurring Downtimes to Hosts and Services
539 The sample confituration includes an example in [downtimes.conf](5-configuring-icinga-2.md#downtimes-conf).
541 Detailed examples can be found in the [recurring downtimes](4-advanced-topics.md#recurring-downtimes) chapter.
544 ### <a id="using-apply-for"></a> Using Apply For Rules
546 Next to the standard way of using [apply rules](3-monitoring-basics.md#using-apply)
547 there is the requirement of generating apply rules objects based on set (array or
550 The sample configuration already includes a detailed example in [hosts.conf](5-configuring-icinga-2.md#hosts-conf)
551 and [services.conf](5-configuring-icinga-2.md#services-conf) for this use case.
553 Take the following example: A host provides the snmp oids for different service check
554 types. This could look like the following example:
556 object Host "router-v6" {
557 check_command = "hostalive"
560 vars.oids["if01"] = "1.1.1.1.1"
561 vars.oids["temp"] = "1.1.1.1.2"
562 vars.oids["bgp"] = "1.1.1.1.5"
565 Now we want to create service checks for `if01` and `temp` but not `bgp`.
566 Furthermore we want to pass the snmp oid stored as dictionary value to the
567 custom attribute called `vars.snmp_oid` - this is the command argument required
568 by the [snmp](7-icinga-template-library.md#plugin-check-command-snmp) check command.
569 The service's `display_name` should be set to the identifier inside the dictionary.
571 apply Service for (identifier => oid in host.vars.oids) {
572 check_command = "snmp"
573 display_name = identifier
576 ignore where identifier == "bgp" //don't generate service for bgp checks
579 Icinga 2 evalatues the `apply for` rule for all objects with the custom attribute
580 `oids` set. It then iterates over all list items inside the `for` loop and evaluates the
581 `assign/ignore where` expressions. You can access the loop variable
582 in these expressions, e.g. for ignoring certain values.
583 In this example we'd ignore the `bgp` identifier and avoid generating an unwanted service.
584 We could extend the configuration by also matching the `oid` value on certain regex/wildcard
585 patterns for example.
589 > You don't need an `assign where` expression only checking for existance
590 > of the custom attribute.
592 That way you'll save duplicated apply rules by combining them into one
593 generic `apply for` rule generating the object name with or without a prefix.
596 #### <a id="using-apply-for-custom-attribute-override"></a> Apply For and Custom Attribute Override
598 Imagine a different more advanced example: You are monitoring your switch (hosts) with many
599 interfaces (services). The following requirements/problems apply:
601 * Each interface service check should be named with a prefix and a running number
602 * Each interface has its own vlan tag
603 * Some interfaces have QoS enabled
604 * Additional attributes such as `display_name` or `notes, `notes_url` and `action_url` must be
605 dynamically generated
607 By defining the `interfaces` dictionary with three example interfaces on the `core-switch`
608 host object, you'll make sure to pass the storage required by the for loop in the service apply
611 object Host "core-switch" {
612 import "generic-host"
613 address = "127.0.0.1"
615 vars.interfaces["0"] = {
618 address = "127.0.0.2"
621 vars.interfaces["1"] = {
624 address = "127.0.1.2"
626 vars.interfaces["2"] = {
629 address = "127.0.2.2"
633 You can also omit the `"if-"` string, then all generated service names are directly
634 taken from the `if_name` variable value.
636 The config dictionary contains all key-value pairs for the specific interface in one
637 loop cycle, like `port`, `vlan`, `address` and `qos` for the `0` interface.
639 By defining a default value for the custom attribute `qos` in the `vars` dictionary
640 before adding the `config` dictionary we'll ensure that this attribute is always defined.
642 After `vars` is fully populated, all object attributes can be set. For strings, you can use
643 string concatention with the `+` operator.
645 You can also specifiy the check command that way.
647 apply Service "if-" for (if_name => config in host.vars.interfaces) {
648 import "generic-service"
649 check_command = "ping4"
651 vars.qos = "disabled"
654 display_name = "if-" + if_name + "-" + vars.vlan
656 notes = "Interface check for Port " + string(vars.port) + " in VLAN " + vars.vlan + " on Address " + vars.address + " QoS " + vars.qos
657 notes_url = "http://foreman.company.com/hosts/" + host.name
658 action_url = "http://snmp.checker.company.com/" + host.name + "if-" + if_name
661 Note that numbers must be explicitely casted to string when adding to strings.
662 This can be achieved by wrapping them into the [string()](19-language-reference.md#function-calls) function.
666 > Building configuration in that dynamic way requires detailed information
667 > of the generated objects. Use the `object list` [CLI command](8-cli-commands.md#cli-command-object)
668 > after successful [configuration validation](8-cli-commands.md#config-validation).
671 ### <a id="using-apply-object-attributes"></a> Use Object Attributes in Apply Rules
673 Since apply rules are evaluated after the generic objects, you
674 can reference existing host and/or service object attributes as
675 values for any object attribute specified in that apply rule.
677 object Host "opennebula-host" {
678 import "generic-host"
681 vars.hosting["xyz"] = {
683 customer_name = "Customer xyz"
685 support_contract = "gold"
687 vars.hosting["abc"] = {
689 customer_name = "Customer xyz"
691 support_contract = "silver"
695 apply Service for (customer => config in host.vars.hosting) {
696 import "generic-service"
697 check_command = "ping4"
699 vars.qos = "disabled"
703 vars.http_uri = "/" + vars.customer + "/" + config.http_uri
705 display_name = "Shop Check for " + vars.customer_name + "-" + vars.customer_id
707 notes = "Support contract: " + vars.support_contract + " for Customer " + vars.customer_name + " (" + vars.customer_id + ")."
709 notes_url = "http://foreman.company.com/hosts/" + host.name
710 action_url = "http://snmp.checker.company.com/" + host.name + "/" + vars.customer_id
713 ## <a id="groups"></a> Groups
715 A group is a collection of similar objects. Groups are primarily used as a
716 visualization aid in web interfaces.
718 Group membership is defined at the respective object itself. If
719 you have a hostgroup name `windows` for example, and want to assign
720 specific hosts to this group for later viewing the group on your
721 alert dashboard, first create a HostGroup object:
723 object HostGroup "windows" {
724 display_name = "Windows Servers"
727 Then add your hosts to this group:
729 template Host "windows-server" {
730 groups += [ "windows" ]
733 object Host "mssql-srv1" {
734 import "windows-server"
736 vars.mssql_port = 1433
739 object Host "mssql-srv2" {
740 import "windows-server"
742 vars.mssql_port = 1433
745 This can be done for service and user groups the same way:
747 object UserGroup "windows-mssql-admins" {
748 display_name = "Windows MSSQL Admins"
751 template User "generic-windows-mssql-users" {
752 groups += [ "windows-mssql-admins" ]
755 object User "win-mssql-noc" {
756 import "generic-windows-mssql-users"
758 email = "noc@example.com"
761 object User "win-mssql-ops" {
762 import "generic-windows-mssql-users"
764 email = "ops@example.com"
767 ### <a id="group-assign-intro"></a> Group Membership Assign
769 Instead of manually assigning each object to a group you can also assign objects
770 to a group based on their attributes:
772 object HostGroup "prod-mssql" {
773 display_name = "Production MSSQL Servers"
775 assign where host.vars.mssql_port && host.vars.prod_mysql_db
776 ignore where host.vars.test_server == true
777 ignore where match("*internal", host.name)
780 In this example all hosts with the `vars` attribute `mssql_port`
781 will be added as members to the host group `mssql`. However, all `*internal`
782 hosts or with the `test_server` attribute set to `true` are not added to this
785 Details on the `assign where` syntax can be found in the
786 [Language Reference](19-language-reference.md#apply)
788 ## <a id="notifications"></a> Notifications
790 Notifications for service and host problems are an integral part of your
793 When a host or service is in a downtime, a problem has been acknowledged or
794 the dependency logic determined that the host/service is unreachable, no
795 notifications are sent. You can configure additional type and state filters
796 refining the notifications being actually sent.
798 There are many ways of sending notifications, e.g. by e-mail, XMPP,
799 IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications.
800 Instead it relies on external mechanisms such as shell scripts to notify users.
802 A notification specification requires one or more users (and/or user groups)
803 who will be notified in case of problems. These users must have all custom
804 attributes defined which will be used in the `NotificationCommand` on execution.
806 The user `icingaadmin` in the example below will get notified only on `WARNING` and
807 `CRITICAL` states and `problem` and `recovery` notification types.
809 object User "icingaadmin" {
810 display_name = "Icinga 2 Admin"
811 enable_notifications = true
812 states = [ OK, Warning, Critical ]
813 types = [ Problem, Recovery ]
814 email = "icinga@localhost"
817 If you don't set the `states` and `types` configuration attributes for the `User`
818 object, notifications for all states and types will be sent.
820 Details on troubleshooting notification problems can be found [here](16-troubleshooting.md#troubleshooting).
824 > Make sure that the [notification](8-cli-commands.md#features) feature is enabled
825 > in order to execute notification commands.
827 You should choose which information you (and your notified users) are interested in
828 case of emergency, and also which information does not provide any value to you and
831 An example notification command is explained [here](3-monitoring-basics.md#notification-commands).
833 You can add all shared attributes to a `Notification` template which is inherited
834 to the defined notifications. That way you'll save duplicated attributes in each
835 `Notification` object. Attributes can be overridden locally.
837 template Notification "generic-notification" {
840 command = "mail-service-notification"
842 states = [ Warning, Critical, Unknown ]
843 types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
844 FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
849 The time period `24x7` is included as example configuration with Icinga 2.
851 Use the `apply` keyword to create `Notification` objects for your services:
853 apply Notification "notify-cust-xy-mysql" to Service {
854 import "generic-notification"
856 users = [ "noc-xy", "mgmt-xy" ]
858 assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true
859 ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true)
863 Instead of assigning users to notifications, you can also add the `user_groups`
864 attribute with a list of user groups to the `Notification` object. Icinga 2 will
865 send notifications to all group members.
869 > Only users who have been notified of a problem before (`Warning`, `Critical`, `Unknown`
870 > states for services, `Down` for hosts) will receive `Recovery` notifications.
872 ### <a id="notification-escalations"></a> Notification Escalations
874 When a problem notification is sent and a problem still exists at the time of re-notification
875 you may want to escalate the problem to the next support level. A different approach
876 is to configure the default notification by email, and escalate the problem via SMS
877 if not already solved.
879 You can define notification start and end times as additional configuration
880 attributes making the `Notification` object a so-called `notification escalation`.
881 Using templates you can share the basic notification attributes such as users or the
882 `interval` (and override them for the escalation then).
884 Using the example from above, you can define additional users being escalated for SMS
885 notifications between start and end time.
887 object User "icinga-oncall-2nd-level" {
888 display_name = "Icinga 2nd Level"
890 vars.mobile = "+1 555 424642"
893 object User "icinga-oncall-1st-level" {
894 display_name = "Icinga 1st Level"
896 vars.mobile = "+1 555 424642"
899 Define an additional [NotificationCommand](#notification) for SMS notifications.
903 > The example is not complete as there are many different SMS providers.
904 > Please note that sending SMS notifications will require an SMS provider
905 > or local hardware with a SIM card active.
907 object NotificationCommand "sms-notification" {
909 PluginDir + "/send_sms_notification",
914 The two new notification escalations are added onto the local host
915 and its service `ping4` using the `generic-notification` template.
916 The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification`
917 command) after `30m` until `1h`.
921 > The `interval` was set to 15m in the `generic-notification`
922 > template example. Lower that value in your escalations by using a secondary
923 > template or by overriding the attribute directly in the `notifications` array
924 > position for `escalation-sms-2nd-level`.
926 If the problem does not get resolved nor acknowledged preventing further notifications
927 the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was
928 notified, but only for one hour (`2h` as `end` key for the `times` dictionary).
930 apply Notification "mail" to Service {
931 import "generic-notification"
933 command = "mail-notification"
934 users = [ "icingaadmin" ]
936 assign where service.name == "ping4"
939 apply Notification "escalation-sms-2nd-level" to Service {
940 import "generic-notification"
942 command = "sms-notification"
943 users = [ "icinga-oncall-2nd-level" ]
950 assign where service.name == "ping4"
953 apply Notification "escalation-sms-1st-level" to Service {
954 import "generic-notification"
956 command = "sms-notification"
957 users = [ "icinga-oncall-1st-level" ]
964 assign where service.name == "ping4"
967 ### <a id="notification-delay"></a> Notification Delay
969 Sometimes the problem in question should not be notified when the notification is due
970 (the object reaching the `HARD` state) but a defined time duration afterwards. In Icinga 2
971 you can use the `times` dictionary and set `begin = 15m` as key and value if you want to
972 postpone the notification window for 15 minutes. Leave out the `end` key - if not set,
973 Icinga 2 will not check against any end time for this notification. Make sure to
974 specify a relatively low notification `interval` to get notified soon enough again.
976 apply Notification "mail" to Service {
977 import "generic-notification"
979 command = "mail-notification"
980 users = [ "icingaadmin" ]
984 times.begin = 15m // delay notification window
986 assign where service.name == "ping4"
989 ### <a id="disable-renotification"></a> Disable Re-notifications
991 If you prefer to be notified only once, you can disable re-notifications by setting the
992 `interval` attribute to `0`.
994 apply Notification "notify-once" to Service {
995 import "generic-notification"
997 command = "mail-notification"
998 users = [ "icingaadmin" ]
1000 interval = 0 // disable re-notification
1002 assign where service.name == "ping4"
1005 ### <a id="notification-filters-state-type"></a> Notification Filters by State and Type
1007 If there are no notification state and type filter attributes defined at the `Notification`
1008 or `User` object Icinga 2 assumes that all states and types are being notified.
1010 Available state and type filters for notifications are:
1012 template Notification "generic-notification" {
1014 states = [ Warning, Critical, Unknown ]
1015 types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
1016 FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
1019 If you are familiar with Icinga 1.x `notification_options` please note that they have been split
1020 into type and state to allow more fine granular filtering for example on downtimes and flapping.
1021 You can filter for acknowledgements and custom notifications too.
1024 ## <a id="commands"></a> Commands
1026 Icinga 2 uses three different command object types to specify how
1027 checks should be performed, notifications should be sent, and
1028 events should be handled.
1030 ### <a id="check-commands"></a> Check Commands
1032 [CheckCommand](6-object-types.md#objecttype-checkcommand) objects define the command line how
1035 [CheckCommand](6-object-types.md#objecttype-checkcommand) objects are referenced by
1036 [Host](6-object-types.md#objecttype-host) and [Service](6-object-types.md#objecttype-service) objects
1037 using the `check_command` attribute.
1041 > Make sure that the [checker](8-cli-commands.md#features) feature is enabled in order to
1044 #### <a id="command-plugin-integration"></a> Integrate the Plugin with a CheckCommand Definition
1046 [CheckCommand](6-object-types.md#objecttype-checkcommand) objects require the [ITL template](7-icinga-template-library.md#itl-plugin-check-command)
1047 `plugin-check-command` to support native plugin based check methods.
1049 Unless you have done so already, download your check plugin and put it
1050 into the [PluginDir](5-configuring-icinga-2.md#constants-conf) directory. The following example uses the
1051 `check_disk` plugin contained in the Monitoring Plugins package.
1053 The plugin path and all command arguments are made a list of
1054 double-quoted string arguments for proper shell escaping.
1056 Call the `check_disk` plugin with the `--help` parameter to see
1057 all available options. Our example defines warning (`-w`) and
1058 critical (`-c`) thresholds for the disk usage. Without any
1059 partition defined (`-p`) it will check all local partitions.
1061 icinga@icinga2 $ /usr/lib/nagios/plugins/check_disk --help
1063 This plugin checks the amount of used disk space on a mounted file system
1064 and generates an alert if free space is less than one of the threshold values
1068 check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device}
1069 [-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ]
1070 [-t timeout] [-u unit] [-v] [-X type] [-N type]
1075 > Don't execute plugins as `root` and always use the absolute path to the plugin! Trust us.
1077 Next step is to understand how command parameters are being passed from
1078 a host or service object, and add a [CheckCommand](6-object-types.md#objecttype-checkcommand)
1079 definition based on these required parameters and/or default values.
1081 #### <a id="command-passing-parameters"></a> Passing Check Command Parameters from Host or Service
1083 Check command parameters are defined as custom attributes which can be accessed as runtime macros
1084 by the executed check command.
1086 Define the default check command custom attribute `disk_wfree` and `disk_cfree`
1087 (freely definable naming schema) and their default threshold values. You can
1088 then use these custom attributes as runtime macros for [command arguments](3-monitoring-basics.md#command-arguments)
1089 on the command line.
1093 > Use a common command type as prefix for your command arguments to increase
1094 > readability. `disk_wfree` helps understanding the context better than just
1095 > `wfree` as argument.
1097 The default custom attributes can be overridden by the custom attributes
1098 defined in the service using the check command `my-disk`. The custom attributes
1099 can also be inherited from a parent template using additive inheritance (`+=`).
1101 object CheckCommand "my-disk" {
1102 import "plugin-check-command"
1104 command = [ PluginDir + "/check_disk" ]
1108 value = "$disk_wfree$"
1109 description = "Exit with WARNING status if less than INTEGER units of disk are free or Exit with WARNING status if less than PERCENT of disk space is free"
1113 value = "$disk_cfree$"
1114 description = "Exit with CRITICAL status if less than INTEGER units of disk are free or Exit with CRITCAL status if less than PERCENT of disk space is free"
1118 value = "$disk_inode_wfree$"
1119 description = "Exit with WARNING status if less than PERCENT of inode space is free"
1122 value = "$disk_inode_cfree$"
1123 description = "Exit with CRITICAL status if less than PERCENT of inode space is free"
1126 value = "$disk_partitions$"
1127 description = "Path or partition (may be repeated)"
1132 value = "$disk_partitions_excluded$"
1133 description = "Ignore device (only works if -p unspecified)"
1137 vars.disk_wfree = "20%"
1138 vars.disk_cfree = "10%"
1143 > A proper example for the `check_disk` plugin is already shipped with Icinga 2
1144 > ready to use with the [plugin check commands](7-icinga-template-library.md#plugin-check-command-disk).
1146 The host `localhost` with the applied service `basic-partitions` checks a basic set of disk partitions
1147 with modified custom attributes (warning thresholds at `10%`, critical thresholds at `5%`
1150 The custom attribute `disk_partition` can either hold a single string or an array of
1151 string values for passing multiple partitions to the `check_disk` check plugin.
1153 object Host "my-server" {
1154 import "generic-host"
1155 address = "127.0.0.1"
1158 vars.local_disks["basic-partitions"] = {
1159 disk_partitions = [ "/", "/tmp", "/var", "/home" ]
1163 apply Service for (disk => config in host.vars.local_disks) {
1164 import "generic-service"
1165 check_command = "my-disk"
1169 vars.disk_wfree = "10%"
1170 vars.disk_cfree = "5%"
1174 More details on using arrays in custom attributes can be found in
1175 [this chapter](#runtime-custom-attributes).
1178 #### <a id="command-arguments"></a> Command Arguments
1180 By defining a check command line using the `command` attribute Icinga 2
1181 will resolve all macros in the static string or array. Sometimes it is
1182 required to extend the arguments list based on a met condition evaluated
1183 at command execution. Or making arguments optional - only set if the
1184 macro value can be resolved by Icinga 2.
1186 object CheckCommand "check_http" {
1187 import "plugin-check-command"
1189 command = [ PluginDir + "/check_http" ]
1192 "-H" = "$http_vhost$"
1193 "-I" = "$http_address$"
1195 "-p" = "$http_port$"
1197 set_if = "$http_ssl$"
1200 set_if = "$http_sni$"
1203 value = "$http_auth_pair$"
1204 description = "Username:password on sites with basic authentication"
1207 set_if = "$http_ignore_body$"
1209 "-r" = "$http_expect_body_regex$"
1210 "-w" = "$http_warn_time$"
1211 "-c" = "$http_critical_time$"
1212 "-e" = "$http_expect$"
1215 vars.http_address = "$address$"
1216 vars.http_ssl = false
1217 vars.http_sni = false
1220 The example shows the `check_http` check command defining the most common
1221 arguments. Each of them is optional by default and will be omitted if
1222 the value is not set. For example if the service calling the check command
1223 does not have `vars.http_port` set, it won't get added to the command
1226 If the `vars.http_ssl` custom attribute is set in the service, host or command
1227 object definition, Icinga 2 will add the `-S` argument based on the `set_if`
1228 numeric value to the command line. String values are not supported.
1230 If the macro value cannot be resolved, Icinga 2 will not add the defined argument
1231 to the final command argument array. Empty strings for macro values won't omit
1234 That way you can use the `check_http` command definition for both, with and
1235 without SSL enabled checks saving you duplicated command definitions.
1237 Details on all available options can be found in the
1238 [CheckCommand object definition](6-object-types.md#objecttype-checkcommand).
1241 #### <a id="command-environment-variables"></a> Environment Variables
1243 The `env` command object attribute specifies a list of environment variables with values calculated
1244 from either runtime macros or custom attributes which should be exported as environment variables
1245 prior to executing the command.
1247 This is useful for example for hiding sensitive information on the command line output
1248 when passing credentials to database checks:
1250 object CheckCommand "mysql-health" {
1251 import "plugin-check-command"
1254 PluginDir + "/check_mysql"
1258 "-H" = "$mysql_address$"
1259 "-d" = "$mysql_database$"
1262 vars.mysql_address = "$address$"
1263 vars.mysql_database = "icinga"
1264 vars.mysql_user = "icinga_check"
1265 vars.mysql_pass = "password"
1267 env.MYSQLUSER = "$mysql_user$"
1268 env.MYSQLPASS = "$mysql_pass$"
1273 ### <a id="notification-commands"></a> Notification Commands
1275 [NotificationCommand](6-object-types.md#objecttype-notificationcommand) objects define how notifications are delivered to external
1276 interfaces (E-Mail, XMPP, IRC, Twitter, etc).
1278 [NotificationCommand](6-object-types.md#objecttype-notificationcommand) objects are referenced by
1279 [Notification](6-object-types.md#objecttype-notification) objects using the `command` attribute.
1281 `NotificationCommand` objects require the [ITL template](7-icinga-template-library.md#itl-plugin-notification-command)
1282 `plugin-notification-command` to support native plugin-based notifications.
1286 > Make sure that the [notification](8-cli-commands.md#features) feature is enabled
1287 > in order to execute notification commands.
1289 Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for
1290 the current check output) sending an email to the user(s) associated with the
1291 notification itself (`$user.email$`).
1293 If you want to specify default values for some of the custom attribute definitions,
1294 you can add a `vars` dictionary as shown for the `CheckCommand` object.
1296 object NotificationCommand "mail-service-notification" {
1297 import "plugin-notification-command"
1299 command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ]
1302 NOTIFICATIONTYPE = "$notification.type$"
1303 SERVICEDESC = "$service.name$"
1304 HOSTALIAS = "$host.display_name$"
1305 HOSTADDRESS = "$address$"
1306 SERVICESTATE = "$service.state$"
1307 LONGDATETIME = "$icinga.long_date_time$"
1308 SERVICEOUTPUT = "$service.output$"
1309 NOTIFICATIONAUTHORNAME = "$notification.author$"
1310 NOTIFICATIONCOMMENT = "$notification.comment$"
1311 HOSTDISPLAYNAME = "$host.display_name$"
1312 SERVICEDISPLAYNAME = "$service.display_name$"
1313 USEREMAIL = "$user.email$"
1317 The command attribute in the `mail-service-notification` command refers to the following
1318 shell script. The macros specified in the `env` array are exported
1319 as environment variables and can be used in the notification script:
1322 template=$(cat <<TEMPLATE
1325 Notification Type: $NOTIFICATIONTYPE
1327 Service: $SERVICEDESC
1329 Address: $HOSTADDRESS
1330 State: $SERVICESTATE
1332 Date/Time: $LONGDATETIME
1334 Additional Info: $SERVICEOUTPUT
1336 Comment: [$NOTIFICATIONAUTHORNAME] $NOTIFICATIONCOMMENT
1340 /usr/bin/printf "%b" $template | mail -s "$NOTIFICATIONTYPE - $HOSTDISPLAYNAME - $SERVICEDISPLAYNAME is $SERVICESTATE" $USEREMAIL
1344 > This example is for `exim` only. Requires changes for `sendmail` and
1347 While it's possible to specify the entire notification command right
1348 in the NotificationCommand object it is generally advisable to create a
1349 shell script in the `/etc/icinga2/scripts` directory and have the
1350 NotificationCommand object refer to that.
1352 ### <a id="event-commands"></a> Event Commands
1354 Unlike notifications event commands for hosts/services are called on every
1355 check execution if one of these conditions match:
1357 * The host/service is in a [soft state](3-monitoring-basics.md#hard-soft-states)
1358 * The host/service state changes into a [hard state](3-monitoring-basics.md#hard-soft-states)
1359 * The host/service state recovers from a [soft or hard state](3-monitoring-basics.md#hard-soft-states) to [OK](3-monitoring-basics.md#service-states)/[Up](3-monitoring-basics.md#host-states)
1361 [EventCommand](6-object-types.md#objecttype-eventcommand) objects are referenced by
1362 [Host](6-object-types.md#objecttype-host) and [Service](6-object-types.md#objecttype-service) objects
1363 using the `event_command` attribute.
1365 Therefore the `EventCommand` object should define a command line
1366 evaluating the current service state and other service runtime attributes
1367 available through runtime vars. Runtime macros such as `$service.state_type$`
1368 and `$service.state$` will be processed by Icinga 2 helping on fine-granular
1369 events being triggered.
1371 Common use case scenarios are a failing HTTP check requiring an immediate
1372 restart via event command, or if an application is locked and requires
1373 a restart upon detection.
1375 `EventCommand` objects require the ITL template `plugin-event-command`
1376 to support native plugin based checks.
1378 #### <a id="event-command-restart-service-daemon"></a> Use Event Commands to Restart Service Daemon
1380 The following example will triggert a restart of the `httpd` daemon
1381 via ssh when the `http` service check fails. If the service state is
1382 `OK`, it will not trigger any event action.
1387 * icinga user with public key authentication
1388 * icinga user with sudo permissions for restarting the httpd daemon.
1392 # ls /home/icinga/.ssh/
1396 icinga ALL=(ALL) NOPASSWD: /etc/init.d/apache2 restart
1399 Define a generic [EventCommand](6-object-types.md#objecttype-eventcommand) object `event_by_ssh`
1400 which can be used for all event commands triggered using ssh:
1402 /* pass event commands through ssh */
1403 object EventCommand "event_by_ssh" {
1404 import "plugin-event-command"
1406 command = [ PluginDir + "/check_by_ssh" ]
1409 "-H" = "$event_by_ssh_address$"
1410 "-p" = "$event_by_ssh_port$"
1411 "-C" = "$event_by_ssh_command$"
1412 "-l" = "$event_by_ssh_logname$"
1413 "-i" = "$event_by_ssh_identity$"
1415 set_if = "$event_by_ssh_quiet$"
1417 "-w" = "$event_by_ssh_warn$"
1418 "-c" = "$event_by_ssh_crit$"
1419 "-t" = "$event_by_ssh_timeout$"
1422 vars.event_by_ssh_address = "$address$"
1423 vars.event_by_ssh_quiet = false
1426 The actual event command only passes the `event_by_ssh_command` attribute.
1427 The `event_by_ssh_service` custom attribute takes care of passing the correct
1428 daemon name, while `test $service.state_id$ -gt 0` makes sure that the daemon
1429 is only restarted when the service is not in an `OK` state.
1432 object EventCommand "event_by_ssh_restart_service" {
1433 import "event_by_ssh"
1435 //only restart the daemon if state > 0 (not-ok)
1436 //requires sudo permissions for the icinga user
1437 vars.event_by_ssh_command = "test $service.state_id$ -gt 0 && sudo /etc/init.d/$event_by_ssh_service$ restart"
1441 Now set the `event_command` attribute to `event_by_ssh_restart_service` and tell it
1442 which service should be restarted using the `event_by_ssh_service` attribute.
1444 object Service "http" {
1445 import "generic-service"
1446 host_name = "remote-http-host"
1447 check_command = "http"
1449 event_command = "event_by_ssh_restart_service"
1450 vars.event_by_ssh_service = "$host.vars.httpd_name$"
1452 //vars.event_by_ssh_logname = "icinga"
1453 //vars.event_by_ssh_identity = "/home/icinga/.ssh/id_rsa.pub"
1457 Each host with this service then must define the `httpd_name` custom attribute
1458 (for example generated from your cmdb):
1460 object Host "remote-http-host" {
1461 import "generic-host"
1462 address = "192.168.1.100"
1464 vars.httpd_name = "apache2"
1467 You can testdrive this example by manually stopping the `httpd` daemon
1468 on your `remote-http-host`. Enable the `debuglog` feature and tail the
1469 `/var/log/icinga2/debug.log` file.
1471 Remote Host Terminal:
1473 # date; service apache2 status
1474 Mon Sep 15 18:57:39 CEST 2014
1475 Apache2 is running (pid 23651).
1476 # date; service apache2 stop
1477 Mon Sep 15 18:57:47 CEST 2014
1478 [ ok ] Stopping web server: apache2 ... waiting .
1480 Icinga 2 Host Terminal:
1482 [2014-09-15 18:58:32 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_http' '-I' '192.168.1.100': PID 32622
1483 [2014-09-15 18:58:32 +0200] notice/Process: PID 32622 ('/usr/lib64/nagios/plugins/check_http' '-I' '192.168.1.100') terminated with exit code 2
1484 [2014-09-15 18:58:32 +0200] notice/Checkable: State Change: Checkable remote-http-host!http soft state change from OK to CRITICAL detected.
1485 [2014-09-15 18:58:32 +0200] notice/Checkable: Executing event handler 'event_by_ssh_restart_service' for service 'remote-http-host!http'
1486 [2014-09-15 18:58:32 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_by_ssh' '-C' 'test 2 -gt 0 && sudo /etc/init.d/apache2 restart' '-H' '192.168.1.100': PID 32623
1487 [2014-09-15 18:58:33 +0200] notice/Process: PID 32623 ('/usr/lib64/nagios/plugins/check_by_ssh' '-C' 'test 2 -gt 0 && sudo /etc/init.d/apache2 restart' '-H' '192.168.1.100') terminated with exit code 0
1489 Remote Host Terminal:
1491 # date; service apache2 status
1492 Mon Sep 15 18:58:44 CEST 2014
1493 Apache2 is running (pid 24908).
1496 ## <a id="dependencies"></a> Dependencies
1498 Icinga 2 uses host and service [Dependency](6-object-types.md#objecttype-dependency) objects
1499 for determing their network reachability.
1501 A service can depend on a host, and vice versa. A service has an implicit
1502 dependency (parent) to its host. A host to host dependency acts implicitly
1503 as host parent relation.
1504 When dependencies are calculated, not only the immediate parent is taken into
1505 account but all parents are inherited.
1507 The `parent_host_name` and `parent_service_name` attributes are mandatory for
1508 service dependencies, `parent_host_name` is required for host dependencies.
1509 [Apply rules](3-monitoring-basics.md#using-apply) will allow you to
1510 [determine these attributes](3-monitoring-basics.md#dependencies-apply-custom-attributes) in a more
1511 dynamic fashion if required.
1513 parent_host_name = "core-router"
1514 parent_service_name = "uplink-port"
1516 Notifications are suppressed by default if a host or service becomes unreachable.
1517 You can control that option by defining the `disable_notifications` attribute.
1519 disable_notifications = false
1521 If the dependency should be triggered in the parent object's soft state, you
1522 need to set `ignore_soft_states` to `false`.
1524 The dependency state filter must be defined based on the parent object being
1525 either a host (`Up`, `Down`) or a service (`OK`, `Warning`, `Critical`, `Unknown`).
1527 The following example will make the dependency fail and trigger it if the parent
1528 object is **not** in one of these states:
1530 states = [ OK, Critical, Unknown ]
1532 Rephrased: If the parent service object changes into the `Warning` state, this
1533 dependency will fail and render all child objects (hosts or services) unreachable.
1535 You can determine the child's reachability by querying the `is_reachable` attribute
1536 in for example [DB IDO](22-appendix.md#schema-db-ido-extensions).
1538 ### <a id="dependencies-implicit-host-service"></a> Implicit Dependencies for Services on Host
1540 Icinga 2 automatically adds an implicit dependency for services on their host. That way
1541 service notifications are suppressed when a host is `DOWN` or `UNREACHABLE`. This dependency
1542 does not overwrite other dependencies and implicitely sets `disable_notifications = true` and
1543 `states = [ Up ]` for all service objects.
1545 Service checks are still executed. If you want to prevent them from happening, you can
1546 apply the following dependency to all services setting their host as `parent_host_name`
1547 and disabling the checks. `assign where true` matches on all `Service` objects.
1549 apply Dependency "disable-host-service-checks" to Service {
1550 disable_checks = true
1554 ### <a id="dependencies-network-reachability"></a> Dependencies for Network Reachability
1556 A common scenario is the Icinga 2 server behind a router. Checking internet
1557 access by pinging the Google DNS server `google-dns` is a common method, but
1558 will fail in case the `dsl-router` host is down. Therefore the example below
1559 defines a host dependency which acts implicitly as parent relation too.
1561 Furthermore the host may be reachable but ping probes are dropped by the
1562 router's firewall. In case the `dsl-router`'s `ping4` service check fails, all
1563 further checks for the `ping4` service on host `google-dns` service should
1564 be suppressed. This is achieved by setting the `disable_checks` attribute to `true`.
1566 object Host "dsl-router" {
1567 import "generic-host"
1568 address = "192.168.1.1"
1571 object Host "google-dns" {
1572 import "generic-host"
1576 apply Service "ping4" {
1577 import "generic-service"
1579 check_command = "ping4"
1581 assign where host.address
1584 apply Dependency "internet" to Host {
1585 parent_host_name = "dsl-router"
1586 disable_checks = true
1587 disable_notifications = true
1589 assign where host.name != "dsl-router"
1592 apply Dependency "internet" to Service {
1593 parent_host_name = "dsl-router"
1594 parent_service_name = "ping4"
1595 disable_checks = true
1597 assign where host.name != "dsl-router"
1600 ### <a id="dependencies-apply-custom-attributes"></a> Apply Dependencies based on Custom Attributes
1602 You can use [apply rules](3-monitoring-basics.md#using-apply) to set parent or
1603 child attributes e.g. `parent_host_name` to other object's
1606 A common example are virtual machines hosted on a master. The object
1607 name of that master is auto-generated from your CMDB or VMWare inventory
1608 into the host's custom attributes (or a generic template for your
1611 Define your master host object:
1614 object Host "master.example.com" {
1615 import "generic-host"
1618 Add a generic template defining all common host attributes:
1620 /* generic template for your virtual machines */
1621 template Host "generic-vm" {
1622 import "generic-host"
1625 Add a template for all hosts on your example.com cloud setting
1626 custom attribute `vm_parent` to `master.example.com`:
1628 template Host "generic-vm-example.com" {
1630 vars.vm_parent = "master.example.com"
1633 Define your guest hosts:
1635 object Host "www.example1.com" {
1636 import "generic-vm-master.example.com"
1639 object Host "www.example2.com" {
1640 import "generic-vm-master.example.com"
1643 Apply the host dependency to all child hosts importing the
1644 `generic-vm` template and set the `parent_host_name`
1645 to the previously defined custom attribute `host.vars.vm_parent`.
1647 apply Dependency "vm-host-to-parent-master" to Host {
1648 parent_host_name = host.vars.vm_parent
1649 assign where "generic-vm" in host.templates
1652 You can extend this example, and make your services depend on the
1653 `master.example.com` host too. Their local scope allows you to use
1654 `host.vars.vm_parent` similar to the example above.
1656 apply Dependency "vm-service-to-parent-master" to Service {
1657 parent_host_name = host.vars.vm_parent
1658 assign where "generic-vm" in host.templates
1661 That way you don't need to wait for your guest hosts becoming
1662 unreachable when the master host goes down. Instead the services
1663 will detect their reachability immediately when executing checks.
1667 > This method with setting locally scoped variables only works in
1668 > apply rules, but not in object definitions.
1671 ### <a id="dependencies-agent-checks"></a> Dependencies for Agent Checks
1673 Another classic example are agent based checks. You would define a health check
1674 for the agent daemon responding to your requests, and make all other services
1675 querying that daemon depend on that health check.
1677 The following configuration defines two nrpe based service checks `nrpe-load`
1678 and `nrpe-disk` applied to the `nrpe-server`. The health check is defined as
1679 `nrpe-health` service.
1681 apply Service "nrpe-health" {
1682 import "generic-service"
1683 check_command = "nrpe"
1684 assign where match("nrpe-*", host.name)
1687 apply Service "nrpe-load" {
1688 import "generic-service"
1689 check_command = "nrpe"
1690 vars.nrpe_command = "check_load"
1691 assign where match("nrpe-*", host.name)
1694 apply Service "nrpe-disk" {
1695 import "generic-service"
1696 check_command = "nrpe"
1697 vars.nrpe_command = "check_disk"
1698 assign where match("nrpe-*", host.name)
1701 object Host "nrpe-server" {
1702 import "generic-host"
1703 address = "192.168.1.5"
1706 apply Dependency "disable-nrpe-checks" to Service {
1707 parent_service_name = "nrpe-health"
1710 disable_checks = true
1711 disable_notifications = true
1712 assign where service.check_command == "nrpe"
1713 ignore where service.name == "nrpe-health"
1716 The `disable-nrpe-checks` dependency is applied to all services
1717 on the `nrpe-service` host using the `nrpe` check_command attribute
1718 but not the `nrpe-health` service itself.