1 # <a id="monitoring-basics"></a> Monitoring Basics
3 This part of the Icinga 2 documentation provides an overview of all the basic
4 monitoring concepts you need to know to run Icinga 2.
6 ## <a id="hosts-services"></a> Hosts and Services
8 Icinga 2 can be used to monitor the availability of hosts and services. Hosts
9 and services can be virtually anything which can be checked in some way:
11 * Network services (HTTP, SMTP, SNMP, SSH, etc.)
15 * Other local or network-accessible services
17 Host objects provide a mechanism to group services that are running
18 on the same physical device.
20 Here is an example of a host object which defines two child services:
22 object Host "my-server1" {
24 check_command = "hostalive"
27 object Service "ping4" {
28 host_name = "my-server1"
29 check_command = "ping4"
32 object Service "http" {
33 host_name = "my-server1"
34 check_command = "http"
37 The example creates two services `ping4` and `http` which belong to the
40 It also specifies that the host should perform its own check using the `hostalive`
43 The `address` attribute is used by check commands to determine which network
44 address is associated with the host object.
46 Details on troubleshooting check problems can be found [here](8-troubleshooting.md#troubleshooting).
48 ### <a id="host-states"></a> Host States
50 Hosts can be in any of the following states:
53 ------------|--------------
54 UP | The host is available.
55 DOWN | The host is unavailable.
57 ### <a id="service-states"></a> Service States
59 Services can be in any of the following states:
62 ------------|--------------
63 OK | The service is working properly.
64 WARNING | The service is experiencing some problems but is still considered to be in working condition.
65 CRITICAL | The service is in a critical state.
66 UNKNOWN | The check could not determine the service's state.
68 ### <a id="hard-soft-states"></a> Hard and Soft States
70 When detecting a problem with a host/service Icinga re-checks the object a number of
71 times (based on the `max_check_attempts` and `retry_interval` settings) before sending
72 notifications. This ensures that no unnecessary notifications are sent for
73 transient failures. During this time the object is in a `SOFT` state.
75 After all re-checks have been executed and the object is still in a non-OK
76 state the host/service switches to a `HARD` state and notifications are sent.
79 ------------|--------------
80 HARD | The host/service's state hasn't recently changed.
81 SOFT | The host/service has recently changed state and is being re-checked.
83 ### <a id="host-service-checks"></a> Host and Service Checks
85 Hosts and Services determine their state from a check result returned from a check
86 execution to the Icinga 2 application. By default the `generic-host` example template
87 will define `hostalive` as host check. If your host is unreachable for ping, you should
88 consider using a different check command, for instance the `http` check command, or if
89 there is no check available, the `dummy` check command.
91 object Host "uncheckable-host" {
92 check_command = "dummy"
94 vars.dummy_text = "Pretending to be OK."
97 Service checks could also use a `dummy` check, but the common strategy is to
98 [integrate an existing plugin](3-monitoring-basics.md#command-plugin-integration) as
99 [check command](3-monitoring-basics.md#check-commands) and [reference](3-monitoring-basics.md#command-passing-parameters)
100 that in your [Service](12-object-types.md#objecttype-service) object definition.
102 ## <a id="configuration-best-practice"></a> Configuration Best Practice
104 The [Getting Started](2-getting-started.md#getting-started) chapter already introduced various aspects
105 of the Icinga 2 configuration language. If you are ready to configure additional
106 hosts, services, notifications, dependencies, etc, you should think about the
107 requirements first and then decide for a possible strategy.
109 There are many ways of creating Icinga 2 configuration objects:
111 * Manually with your preferred editor, for example vi(m), nano, notepad, etc.
112 * Generated by a [configuration management too](2-getting-started.md#configuration-tools) such as Puppet, Chef, Ansible, etc.
113 * A configuration addon for Icinga 2
114 * A custom exporter script from your CMDB or inventory tool
117 In order to find the best strategy for your own configuration, ask yourself the following questions:
119 * Do your hosts share a common group of services (for example linux hosts with disk, load, etc checks)?
120 * Only a small set of users receives notifications and escalations for all hosts/services?
122 If you can at least answer one of these questions with yes, look for the [apply rules](3-monitoring-basics.md#using-apply) logic
123 instead of defining objects on a per host and service basis.
125 * You are required to define specific configuration for each host/service?
126 * Does your configuration generation tool already know about the host-service-relationship?
128 Then you should look for the object specific configuration setting `host_name` etc accordingly.
130 Finding the best files and directory tree for your configuration is up to you. Make sure that
131 the [icinga2.conf](2-getting-started.md#icinga2-conf) configuration file includes them, and then think about:
133 * tree-based on locations, hostgroups, specific host attributes with sub levels of directories.
134 * flat `hosts.conf`, `services.conf`, etc files for rule based configuration.
135 * generated configuration with one file per host and a global configuration for groups, users, etc.
136 * one big file generated from an external application (probably a bad idea for maintaining changes).
139 In either way of choosing the right strategy you should additionally check the following:
141 * Are there any specific attributes describing the host/service you could set as `vars` custom attributes?
142 You can later use them for applying assign/ignore rules, or export them into external interfaces.
143 * Put hosts into hostgroups, services into servicegroups and use these attributes for your apply rules.
144 * Use templates to store generic attributes for your objects and apply rules making your configuration more readable.
145 Details can be found in the [using templates](3-monitoring-basics.md#object-inheritance-using-templates) chapter.
146 * Apply rules may overlap. Keep a central place (for example, [services.conf](2-getting-started.md#services-conf) or [notifications.conf](2-getting-started.md#notifications-conf)) storing
147 the configuration instead of defining apply rules deep in your configuration tree.
148 * Every plugin used as check, notification or event command requires a `Command` definition.
149 Further details can be looked up in the [check commands](3-monitoring-basics.md#check-commands) chapter.
151 If you happen to have further questions, do not hesitate to join the [community support channels](https://support.icinga.org)
152 and ask community members for their experience and best practices.
155 ### <a id="object-inheritance-using-templates"></a> Object Inheritance Using Templates
157 Templates may be used to apply a set of identical attributes to more than one
160 template Service "generic-service" {
161 max_check_attempts = 3
164 enable_perfdata = true
167 template Service "ipv6-service {
168 notes = "IPv6 critical != IPv4 broken."
171 apply Service "ping4" {
172 import "generic-service"
174 check_command = "ping4"
176 assign where host.address
179 apply Service "ping6" {
180 import "generic-service"
181 import "ipv6-service"
183 check_command = "ping6"
185 assign where host.address6
189 In this example the `ping4` and `ping6` services inherit properties from the
190 template `generic-service`. The `ping6` service additionally imports the `ipv6-service`
191 template with the `notes` attribute.
193 Objects as well as templates themselves can import an arbitrary number of
194 templates. Attributes inherited from a template can be overridden in the
197 You can import existing non-template objects into objects which
198 requires you to use unique names for templates and objects sharing
201 Example for importing objects:
203 object CheckCommand "snmp-simple" {
205 vars.snmp_defaults = ...
208 object CheckCommand "snmp-advanced" {
211 vars.snmp_advanced = ...
214 ### <a id="using-apply"></a> Apply objects based on rules
216 Instead of assigning each object ([Service](12-object-types.md#objecttype-service),
217 [Notification](12-object-types.md#objecttype-notification), [Dependency](12-object-types.md#objecttype-dependency),
218 [ScheduledDowntime](12-object-types.md#objecttype-scheduleddowntime))
219 based on attribute identifiers for example `host_name` objects can be [applied](10-language-reference.md#apply).
221 Before you start using the apply rules keep the following in mind:
223 * Define the best match.
224 * A set of unique [custom attributes](3-monitoring-basics.md#custom-attributes-apply) for these hosts/services?
225 * Or [group](3-monitoring-basics.md#groups) memberships, e.g. a host being a member of a hostgroup, applying services to it?
226 * A generic pattern [match](10-language-reference.md#function-calls) on the host/service name?
227 * [Multiple expressions combined](3-monitoring-basics.md#using-apply-expressions) with `&&` or `||` [operators](10-language-reference.md#expression-operators)
228 * All expressions must return a boolean value (an empty string is equal to `false` e.g.)
232 > You can set/override object attributes in apply rules using the respectively available
233 > objects in that scope (host and/or service objects).
235 [Custom attributes](3-monitoring-basics.md#custom-attributes) can also store nested dictionaries and arrays. That way you can use them
236 for not only matching for their existance or values in apply expressions, but also assign
237 ("inherit") their values into the generated objected from apply rules.
239 * [Apply services to hosts](3-monitoring-basics.md#using-apply-services)
240 * [Apply notifications to hosts and services](3-monitoring-basics.md#using-apply-notifications)
241 * [Apply dependencies to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes)
242 * [Apply scheduled downtimes to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes)
244 A more advanced example is using [apply with for loops on arrays or
245 dictionaries](#using-apply-for) for example provided by
246 [custom atttributes](3-monitoring-basics.md#custom-attributes-apply) or groups.
250 > Building configuration in that dynamic way requires detailed information
251 > of the generated objects. Use the `object list` [CLI command](5-cli-commands.md#cli-command-object)
252 > after successful [configuration validation](5-cli-commands.md#config-validation).
255 #### <a id="using-apply-expressions"></a> Apply Rules Expressions
257 You can use simple or advanced combinations of apply rule expressions. Each
258 expression must evaluate into the boolean `true` value. An empty string
259 will be for instance interpreted as `false`. In a similar fashion undefined
260 attributes will return `false`.
264 assign where host.vars.attribute_does_not_exist
266 Multiple `assign where` condition rows are evaluated as `OR` condition.
268 You can combine multiple expressions for matching only a subset of objects. In some cases,
269 you want to be able to add more than one assign/ignore where expression which matches
270 a specific condition. To achieve this you can use the logical `and` and `or` operators.
273 Match all `*mysql*` patterns in the host name and (`&&`) custom attribute `prod_mysql_db`
274 matches the `db-*` pattern. All hosts with the custom attribute `test_server` set to `true`
275 should be ignored, or any host name ending with `*internal` pattern.
277 object HostGroup "mysql-server" {
278 display_name = "MySQL Server"
280 assign where match("*mysql*", host.name) && match("db-*", host.vars.prod_mysql_db)
281 ignore where host.vars.test_server == true
282 ignore where match("*internal", host.name)
285 Similar example for advanced notification apply rule filters: If the service
286 attribute `notes` contains the `has gold support 24x7` string `AND` one of the
287 two condition passes: Either the `customer` host custom attribute is set to `customer-xy`
288 `OR` the host custom attribute `always_notify` is set to `true`.
290 The notification is ignored for services whose host name ends with `*internal`
291 `OR` the `priority` custom attribute is [less than](10-language-reference.md#expression-operators) `2`.
293 template Notification "cust-xy-notification" {
294 users = [ "noc-xy", "mgmt-xy" ]
295 command = "mail-service-notification"
298 apply Notification "notify-cust-xy-mysql" to Service {
299 import "cust-xy-notification"
301 assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true
302 ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true)
308 #### <a id="using-apply-services"></a> Apply Services to Hosts
310 The sample configuration already ships a detailed example in [hosts.conf](2-getting-started.md#hosts-conf)
311 and [services.conf](2-getting-started.md#services-conf) for this use case.
313 The example for `ssh` applies a service object to all hosts with the `address`
314 attribute being defined and the custom attribute `os` set to the string `Linux` in `vars`.
316 apply Service "ssh" {
317 import "generic-service"
319 check_command = "ssh"
321 assign where host.address && host.vars.os == "Linux"
325 Other detailed scenario examples are used in their respective chapters, for example
326 [apply services with custom command arguments](3-monitoring-basics.md#using-apply-services-command-arguments).
328 #### <a id="using-apply-notifications"></a> Apply Notifications to Hosts and Services
330 Notifications are applied to specific targets (`Host` or `Service`) and work in a similar
334 apply Notification "mail-noc" to Service {
335 import "mail-service-notification"
337 user_groups = [ "noc" ]
339 assign where host.vars.notification.mail
343 In this example the `mail-noc` notification will be created as object for all services having the
344 `notification.mail` custom attribute defined. The notification command is set to `mail-service-notification`
345 and all members of the user group `noc` will get notified.
347 #### <a id="using-apply-dependencies"></a> Apply Dependencies to Hosts and Services
349 Detailed examples can be found in the [dependencies](3-monitoring-basics.md#dependencies) chapter.
351 #### <a id="using-apply-scheduledowntimes"></a> Apply Recurring Downtimes to Hosts and Services
353 The sample confituration ships an example in [downtimes.conf](2-getting-started.md#downtimes-conf).
355 Detailed examples can be found in the [recurring downtimes](3-monitoring-basics.md#recurring-downtimes) chapter.
358 #### <a id="using-apply-for"></a> Using Apply For Rules
360 Next to the standard way of using apply rules there is the requirement of generating
361 apply rules objects based on set (array or dictionary). That way you'll save quite
362 of a lot of duplicated apply rules by combining them into one generic generating
363 the object name with or without a prefix.
365 The sample configuration already ships a detailed example in [hosts.conf](2-getting-started.md#hosts-conf)
366 and [services.conf](2-getting-started.md#services-conf) for this use case.
368 Imagine a different example: You are monitoring your switch (hosts) with many
369 interfaces (services). The following requirements/problems apply:
371 * Each interface service check should be named with a prefix and a running number
372 * Each interface has its own vlan tag
373 * Some interfaces have QoS enabled
374 * Additional attributes such as `display_name` or `notes, `notes_url` and `action_url` must be
375 dynamically generated
377 By defining the `interfaces` dictionary with three example interfaces on the `core-switch`
378 host object, you'll make sure to pass the storage required by the for loop in the service apply
382 object Host "core-switch" {
383 import "generic-host"
384 address = "127.0.0.1"
386 vars.interfaces["0"] = {
389 address = "127.0.0.2"
392 vars.interfaces["1"] = {
395 address = "127.0.1.2"
397 vars.interfaces["2"] = {
400 address = "127.0.2.2"
404 You can also omit the `"if-"` string, then all generated service names are directly
405 taken from the `if_name` variable value.
407 The config dictionary contains all key-value pairs for the specific interface in one
408 loop cycle, like `port`, `vlan`, `address` and `qos` for the `0` interface.
410 By defining a default value for the custom attribute `qos` in the `vars` dictionary
411 before adding the `config` dictionary we''ll ensure that this attribute is always defined.
413 After `vars` is fully populated, all object attributes can be set. For strings, you can use
414 string concatention with the `+` operator.
416 You can also specifiy the check command that way.
418 apply Service "if-" for (if_name => config in host.vars.interfaces) {
419 import "generic-service"
420 check_command = "ping4"
422 vars.qos = "disabled"
425 display_name = "if-" + if_name + "-" + vars.vlan
427 notes = "Interface check for Port " + string(vars.port) + " in VLAN " + vars.vlan + " on Address " + vars.address + " QoS " + vars.qos
428 notes_url = "http://foreman.company.com/hosts/" + host.name
429 action_url = "http://snmp.checker.company.com/" + host.name + "if-" + if_name
431 assign where host.vars.interfaces
434 Note that numbers must be explicitely casted to string when adding to strings.
435 This can be achieved by wrapping them into the [string()](10-language-reference.md#function-calls) function.
439 > Building configuration in that dynamic way requires detailed information
440 > of the generated objects. Use the `object list` [CLI command](5-cli-commands.md#cli-command-object)
441 > after successful [configuration validation](5-cli-commands.md#config-validation).
444 #### <a id="using-apply-object attributes"></a> Use Object Attributes in Apply Rules
446 Since apply rules are evaluated after the generic objects, you
447 can reference existing host and/or service object attributes as
448 values for any object attribute specified in that apply rule.
450 object Host "opennebula-host" {
451 import "generic-host"
454 vars.hosting["xyz"] = {
456 customer_name = "Customer xyz"
458 support_contract = "gold"
460 vars.hosting["abc"] = {
462 customer_name = "Customer xyz"
464 support_contract = "silver"
468 apply Service for (customer => config in host.vars.hosting) {
469 import "generic-service"
470 check_command = "ping4"
472 vars.qos = "disabled"
476 vars.http_uri = "/" + vars.customer + "/" + config.http_uri
478 display_name = "Shop Check for " + vars.customer_name + "-" + vars.customer_id
480 notes = "Support contract: " + vars.support_contract + " for Customer " + vars.customer_name + " (" + vars.customer_id + ")."
482 notes_url = "http://foreman.company.com/hosts/" + host.name
483 action_url = "http://snmp.checker.company.com/" + host.name + "/" + vars.customer_id
485 assign where host.vars.hosting
488 ### <a id="groups"></a> Groups
490 Groups are used for combining hosts, services, and users into
491 accessible configuration attributes and views in external (web)
494 Group membership is defined at the respective object itself. If
495 you have a hostgroup name `windows` for example, and want to assign
496 specific hosts to this group for later viewing the group on your
497 alert dashboard, first create the hostgroup:
499 object HostGroup "windows" {
500 display_name = "Windows Servers"
503 Then add your hosts to this hostgroup
505 template Host "windows-server" {
506 groups += [ "windows" ]
509 object Host "mssql-srv1" {
510 import "windows-server"
512 vars.mssql_port = 1433
515 object Host "mssql-srv2" {
516 import "windows-server"
518 vars.mssql_port = 1433
521 This can be done for service and user groups the same way. Additionally
522 the user groups are associated as attributes in `Notification` objects.
524 object UserGroup "windows-mssql-admins" {
525 display_name = "Windows MSSQL Admins"
528 template User "generic-windows-mssql-users" {
529 groups += [ "windows-mssql-admins" ]
532 object User "win-mssql-noc" {
533 import "generic-windows-mssql-users"
535 email = "noc@example.com"
538 object User "win-mssql-ops" {
539 import "generic-windows-mssql-users"
541 email = "ops@example.com"
544 #### <a id="group-assign-intro"></a> Group Membership Assign
546 If there is a certain number of hosts, services, or users matching a pattern
547 it's reasonable to assign the group object to these members.
548 Details on the `assign where` syntax can be found [here](10-language-reference.md#apply)
550 object HostGroup "prod-mssql" {
551 display_name = "Production MSSQL Servers"
552 assign where host.vars.mssql_port && host.vars.prod_mysql_db
553 ignore where host.vars.test_server == true
554 ignore where match("*internal", host.name)
557 In this inherited example from above all hosts with the `vars` attribute `mssql_port`
558 set will be added as members to the host group `mssql`. All `*internal`
559 hosts or with the `test_server` attribute set to `true` will be ignored.
561 ## <a id="notifications"></a> Notifications
563 Notifications for service and host problems are an integral part of your
566 When a host or service is in a downtime, a problem has been acknowledged or
567 the dependency logic determined that the host/service is unreachable, no
568 notifications are sent. You can configure additional type and state filters
569 refining the notifications being actually sent.
571 There are many ways of sending notifications, e.g. by e-mail, XMPP,
572 IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications.
573 Instead it relies on external mechanisms such as shell scripts to notify users.
575 A notification specification requires one or more users (and/or user groups)
576 who will be notified in case of problems. These users must have all custom
577 attributes defined which will be used in the `NotificationCommand` on execution.
579 The user `icingaadmin` in the example below will get notified only on `WARNING` and
580 `CRITICAL` states and `problem` and `recovery` notification types.
582 object User "icingaadmin" {
583 display_name = "Icinga 2 Admin"
584 enable_notifications = true
585 states = [ OK, Warning, Critical ]
586 types = [ Problem, Recovery ]
587 email = "icinga@localhost"
590 If you don't set the `states` and `types` configuration attributes for the `User`
591 object, notifications for all states and types will be sent.
593 Details on troubleshooting notification problems can be found [here](8-troubleshooting.md#troubleshooting).
597 > Make sure that the [notification](5-cli-commands.md#features) feature is enabled on your master instance
598 > in order to execute notification commands.
600 You should choose which information you (and your notified users) are interested in
601 case of emergency, and also which information does not provide any value to you and
604 An example notification command is explained [here](3-monitoring-basics.md#notification-commands).
606 You can add all shared attributes to a `Notification` template which is inherited
607 to the defined notifications. That way you'll save duplicated attributes in each
608 `Notification` object. Attributes can be overridden locally.
610 template Notification "generic-notification" {
613 command = "mail-service-notification"
615 states = [ Warning, Critical, Unknown ]
616 types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
617 FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
622 The time period `24x7` is shipped as example configuration with Icinga 2.
624 Use the `apply` keyword to create `Notification` objects for your services:
626 apply Notification "notify-cust-xy-mysql" to Service {
627 import "generic-notification"
629 users = [ "noc-xy", "mgmt-xy" ]
631 assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true
632 ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true)
636 Instead of assigning users to notifications, you can also add the `user_groups`
637 attribute with a list of user groups to the `Notification` object. Icinga 2 will
638 send notifications to all group members.
642 > Only users who have been notified of a problem before (`Warning`, `Critical`, `Unknown`
643 > states for services, `Down` for hosts) will receive `Recovery` notifications.
645 ### <a id="notification-escalations"></a> Notification Escalations
647 When a problem notification is sent and a problem still exists at the time of re-notification
648 you may want to escalate the problem to the next support level. A different approach
649 is to configure the default notification by email, and escalate the problem via SMS
650 if not already solved.
652 You can define notification start and end times as additional configuration
653 attributes making the `Notification` object a so-called `notification escalation`.
654 Using templates you can share the basic notification attributes such as users or the
655 `interval` (and override them for the escalation then).
657 Using the example from above, you can define additional users being escalated for SMS
658 notifications between start and end time.
660 object User "icinga-oncall-2nd-level" {
661 display_name = "Icinga 2nd Level"
663 vars.mobile = "+1 555 424642"
666 object User "icinga-oncall-1st-level" {
667 display_name = "Icinga 1st Level"
669 vars.mobile = "+1 555 424642"
672 Define an additional [NotificationCommand](#notification) for SMS notifications.
676 > The example is not complete as there are many different SMS providers.
677 > Please note that sending SMS notifications will require an SMS provider
678 > or local hardware with a SIM card active.
680 object NotificationCommand "sms-notification" {
682 PluginDir + "/send_sms_notification",
687 The two new notification escalations are added onto the local host
688 and its service `ping4` using the `generic-notification` template.
689 The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification`
690 command) after `30m` until `1h`.
694 > The `interval` was set to 15m in the `generic-notification`
695 > template example. Lower that value in your escalations by using a secondary
696 > template or by overriding the attribute directly in the `notifications` array
697 > position for `escalation-sms-2nd-level`.
699 If the problem does not get resolved nor acknowledged preventing further notifications
700 the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was
701 notified, but only for one hour (`2h` as `end` key for the `times` dictionary).
703 apply Notification "mail" to Service {
704 import "generic-notification"
706 command = "mail-notification"
707 users = [ "icingaadmin" ]
709 assign where service.name == "ping4"
712 apply Notification "escalation-sms-2nd-level" to Service {
713 import "generic-notification"
715 command = "sms-notification"
716 users = [ "icinga-oncall-2nd-level" ]
723 assign where service.name == "ping4"
726 apply Notification "escalation-sms-1st-level" to Service {
727 import "generic-notification"
729 command = "sms-notification"
730 users = [ "icinga-oncall-1st-level" ]
737 assign where service.name == "ping4"
740 ### <a id="notification-delay"></a> Notification Delay
742 Sometimes the problem in question should not be notified when the notification is due
743 (the object reaching the `HARD` state) but a defined time duration afterwards. In Icinga 2
744 you can use the `times` dictionary and set `begin = 15m` as key and value if you want to
745 postpone the notification window for 15 minutes. Leave out the `end` key - if not set,
746 Icinga 2 will not check against any end time for this notification. Make sure to
747 specify a relatively low notification `interval` to get notified soon enough again.
749 apply Notification "mail" to Service {
750 import "generic-notification"
752 command = "mail-notification"
753 users = [ "icingaadmin" ]
757 times.begin = 15m // delay notification window
759 assign where service.name == "ping4"
762 ### <a id="disable-renotification"></a> Disable Re-notifications
764 If you prefer to be notified only once, you can disable re-notifications by setting the
765 `interval` attribute to `0`.
767 apply Notification "notify-once" to Service {
768 import "generic-notification"
770 command = "mail-notification"
771 users = [ "icingaadmin" ]
773 interval = 0 // disable re-notification
775 assign where service.name == "ping4"
778 ### <a id="notification-filters-state-type"></a> Notification Filters by State and Type
780 If there are no notification state and type filter attributes defined at the `Notification`
781 or `User` object Icinga 2 assumes that all states and types are being notified.
783 Available state and type filters for notifications are:
785 template Notification "generic-notification" {
787 states = [ Warning, Critical, Unknown ]
788 types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
789 FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
792 If you are familiar with Icinga 1.x `notification_options` please note that they have been split
793 into type and state to allow more fine granular filtering for example on downtimes and flapping.
794 You can filter for acknowledgements and custom notifications too.s and custom notifications too.
797 ## <a id="timeperiods"></a> Time Periods
799 Time Periods define time ranges in Icinga where event actions are
800 triggered, for example whether a service check is executed or not within
801 the `check_period` attribute. Or a notification should be sent to
802 users or not, filtered by the `period` and `notification_period`
803 configuration attributes for `Notification` and `User` objects.
807 > If you are familar with Icinga 1.x - these time period definitions
808 > are called `legacy timeperiods` in Icinga 2.
810 > An Icinga 2 legacy timeperiod requires the `ITL` provided template
811 >`legacy-timeperiod`.
813 The `TimePeriod` attribute `ranges` may contain multiple directives,
814 including weekdays, days of the month, and calendar dates.
815 These types may overlap/override other types in your ranges dictionary.
817 The descending order of precedence is as follows:
819 * Calendar date (2008-01-01)
820 * Specific month date (January 1st)
821 * Generic month date (Day 15)
822 * Offset weekday of specific month (2nd Tuesday in December)
823 * Offset weekday (3rd Monday)
824 * Normal weekday (Tuesday)
826 If you don't set any `check_period` or `notification_period` attribute
827 on your configuration objects Icinga 2 assumes `24x7` as time period
830 object TimePeriod "24x7" {
831 import "legacy-timeperiod"
833 display_name = "Icinga 2 24x7 TimePeriod"
835 "monday" = "00:00-24:00"
836 "tuesday" = "00:00-24:00"
837 "wednesday" = "00:00-24:00"
838 "thursday" = "00:00-24:00"
839 "friday" = "00:00-24:00"
840 "saturday" = "00:00-24:00"
841 "sunday" = "00:00-24:00"
845 If your operation staff should only be notified during workhours
846 create a new timeperiod named `workhours` defining a work day from
849 object TimePeriod "workhours" {
850 import "legacy-timeperiod"
852 display_name = "Icinga 2 8x5 TimePeriod"
854 "monday" = "09:00-17:00"
855 "tuesday" = "09:00-17:00"
856 "wednesday" = "09:00-17:00"
857 "thursday" = "09:00-17:00"
858 "friday" = "09:00-17:00"
862 Use the `period` attribute to assign time periods to
863 `Notification` and `Dependency` objects:
865 object Notification "mail" {
866 import "generic-notification"
868 host_name = "localhost"
870 command = "mail-notification"
871 users = [ "icingaadmin" ]
876 ## <a id="commands"></a> Commands
878 Icinga 2 uses three different command object types to specify how
879 checks should be performed, notifications should be sent, and
880 events should be handled.
882 ### <a id="command-environment-variables"></a> Environment Variables for Commands
884 Please check [Runtime Custom Attributes as Environment Variables](3-monitoring-basics.md#runtime-custom-attribute-env-vars).
887 ### <a id="check-commands"></a> Check Commands
889 [CheckCommand](12-object-types.md#objecttype-checkcommand) objects define the command line how
892 [CheckCommand](12-object-types.md#objecttype-checkcommand) objects are referenced by
893 [Host](12-object-types.md#objecttype-host) and [Service](12-object-types.md#objecttype-service) objects
894 using the `check_command` attribute.
898 > Make sure that the [checker](5-cli-commands.md#features) feature is enabled in order to
901 #### <a id="command-plugin-integration"></a> Integrate the Plugin with a CheckCommand Definition
903 [CheckCommand](12-object-types.md#objecttype-checkcommand) objects require the [ITL template](13-icinga-template-library.md#itl-plugin-check-command)
904 `plugin-check-command` to support native plugin based check methods.
906 Unless you have done so already, download your check plugin and put it
907 into the [PluginDir](2-getting-started.md#constants-conf) directory. The following example uses the
908 `check_disk` plugin shipped with the Monitoring Plugins package.
910 The plugin path and all command arguments are made a list of
911 double-quoted string arguments for proper shell escaping.
913 Call the `check_disk` plugin with the `--help` parameter to see
914 all available options. Our example defines warning (`-w`) and
915 critical (`-c`) thresholds for the disk usage. Without any
916 partition defined (`-p`) it will check all local partitions.
918 icinga@icinga2 $ /usr/lib/nagios/plugins/check_disk --help
920 This plugin checks the amount of used disk space on a mounted file system
921 and generates an alert if free space is less than one of the threshold values
925 check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device}
926 [-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ]
927 [-t timeout] [-u unit] [-v] [-X type] [-N type]
932 > Don't execute plugins as `root` and always use the absolute path to the plugin! Trust us.
934 Next step is to understand how command parameters are being passed from
935 a host or service object, and add a [CheckCommand](12-object-types.md#objecttype-checkcommand)
936 definition based on these required parameters and/or default values.
938 #### <a id="command-passing-parameters"></a> Passing Check Command Parameters from Host or Service
940 Check command parameters are defined as custom attributes which can be accessed as runtime macros
941 by the executed check command.
943 Define the default check command custom attribute `disk_wfree` and `disk_cfree`
944 (freely definable naming schema) and their default threshold values. You can
945 then use these custom attributes as runtime macros for [command arguments](3-monitoring-basics.md#command-arguments)
950 > Use a common command type as prefix for your command arguments to increase
951 > readability. `disk_wfree` helps understanding the context better than just
952 > `wfree` as argument.
954 The default custom attributes can be overridden by the custom attributes
955 defined in the service using the check command `my-disk`. The custom attributes
956 can also be inherited from a parent template using additive inheritance (`+=`).
958 object CheckCommand "my-disk" {
959 import "plugin-check-command"
961 command = [ PluginDir + "/check_disk" ]
964 "-w" = "$disk_wfree$%"
965 "-c" = "$disk_cfree$%"
966 "-W" = "$disk_inode_wfree$%"
967 "-K" = "$disk_inode_cfree$%"
968 "-p" = "$disk_partitions$"
969 "-x" = "$disk_partitions_excluded$"
978 > A proper example for the `check_disk` plugin is already shipped with Icinga 2
979 > ready to use with the [plugin check commands](13-icinga-template-library.md#plugin-check-command-disk).
981 The host `localhost` with the applied service `basic-partitions` checks a basic set of disk partitions
982 with modified custom attributes (warning thresholds at `10%`, critical thresholds at `5%`
985 The custom attribute `disk_partition` can either hold a single string or an array of
986 string values for passing multiple partitions to the `check_disk` check plugin.
988 object Host "my-server" {
989 import "generic-host"
990 address = "127.0.0.1"
993 vars.local_disks["basic-partitions"] = {
994 disk_partitions = [ "/", "/tmp", "/var", "/home" ]
998 apply Service for (disk => config in host.vars.local_disks) {
999 import "generic-service"
1000 check_command = "my-disk"
1004 vars.disk_wfree = 10
1007 assign where host.vars.local_disks
1011 More details on using arrays in custom attributes can be found in
1012 [this chapter](3-monitoring-basics.md#runtime-custom-attributes).
1015 #### <a id="command-arguments"></a> Command Arguments
1017 By defining a check command line using the `command` attribute Icinga 2
1018 will resolve all macros in the static string or array. Sometimes it is
1019 required to extend the arguments list based on a met condition evaluated
1020 at command execution. Or making arguments optional - only set if the
1021 macro value can be resolved by Icinga 2.
1023 object CheckCommand "check_http" {
1024 import "plugin-check-command"
1026 command = [ PluginDir + "/check_http" ]
1029 "-H" = "$http_vhost$"
1030 "-I" = "$http_address$"
1032 "-p" = "$http_port$"
1034 set_if = "$http_ssl$"
1037 set_if = "$http_sni$"
1040 value = "$http_auth_pair$"
1041 description = "Username:password on sites with basic authentication"
1044 set_if = "$http_ignore_body$"
1046 "-r" = "$http_expect_body_regex$"
1047 "-w" = "$http_warn_time$"
1048 "-c" = "$http_critical_time$"
1049 "-e" = "$http_expect$"
1052 vars.http_address = "$address$"
1053 vars.http_ssl = false
1054 vars.http_sni = false
1057 The example shows the `check_http` check command defining the most common
1058 arguments. Each of them is optional by default and will be omitted if
1059 the value is not set. For example if the service calling the check command
1060 does not have `vars.http_port` set, it won't get added to the command
1063 If the `vars.http_ssl` custom attribute is set in the service, host or command
1064 object definition, Icinga 2 will add the `-S` argument based on the `set_if`
1065 numeric value to the command line. String values are not supported.
1067 If the macro value cannot be resolved, Icinga 2 will not add the defined argument
1068 to the final command argument array. Empty strings for macro values won't omit
1071 That way you can use the `check_http` command definition for both, with and
1072 without SSL enabled checks saving you duplicated command definitions.
1074 Details on all available options can be found in the
1075 [CheckCommand object definition](12-object-types.md#objecttype-checkcommand).
1077 ### <a id="using-apply-services-command-arguments"></a> Apply Services with Custom Command Arguments
1079 Imagine the following scenario: The `my-host1` host is reachable using the default port 22, while
1080 the `my-host2` host requires a different port on 2222. Both hosts are in the hostgroup `my-linux-servers`.
1082 object HostGroup "my-linux-servers" {
1083 display_name = "Linux Servers"
1084 assign where host.vars.os == "Linux"
1087 /* this one has port 22 opened */
1088 object Host "my-host1" {
1089 import "generic-host"
1090 address = "129.168.1.50"
1094 /* this one listens on a different ssh port */
1095 object Host "my-host2" {
1096 import "generic-host"
1097 address = "129.168.2.50"
1099 vars.custom_ssh_port = 2222
1102 All hosts in the `my-linux-servers` hostgroup should get the `my-ssh` service applied based on an
1103 [apply rule](10-language-reference.md#apply). The optional `ssh_port` command argument should be inherited from the host
1104 the service is applied to. If not set, the check command `my-ssh` will omit the argument.
1105 The `host` argument is special: `skip_key` tells Icinga 2 to ignore the key, and directly put the
1106 value onto the command line. The `order` attribute specifies that this argument is the first one
1107 (`-1` is smaller than the other defaults).
1109 object CheckCommand "my-ssh" {
1110 import "plugin-check-command"
1112 command = [ PluginDir + "/check_ssh" ]
1117 value = "$ssh_address$"
1123 vars.ssh_address = "$address$"
1126 /* apply ssh service */
1127 apply Service "my-ssh" {
1128 import "generic-service"
1129 check_command = "my-ssh"
1131 //set the command argument for ssh port with a custom host attribute, if set
1132 vars.ssh_port = "$host.vars.custom_ssh_port$"
1134 assign where "my-linux-servers" in host.groups
1137 The `my-host1` will get the `my-ssh` service checking on the default port:
1139 [2014-05-26 21:52:23 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ssh', '129.168.1.50': PID 27281
1141 The `my-host2` will inherit the `custom_ssh_port` variable to the service and execute a different command:
1143 [2014-05-26 21:51:32 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ssh', '-p', '2222', '129.168.2.50': PID 26956
1146 ### <a id="notification-commands"></a> Notification Commands
1148 [NotificationCommand](12-object-types.md#objecttype-notificationcommand) objects define how notifications are delivered to external
1149 interfaces (E-Mail, XMPP, IRC, Twitter, etc).
1151 [NotificationCommand](12-object-types.md#objecttype-notificationcommand) objects are referenced by
1152 [Notification](12-object-types.md#objecttype-notification) objects using the `command` attribute.
1154 `NotificationCommand` objects require the [ITL template](13-icinga-template-library.md#itl-plugin-notification-command)
1155 `plugin-notification-command` to support native plugin-based notifications.
1159 > Make sure that the [notification](5-cli-commands.md#features) feature is enabled on your master instance
1160 > in order to execute notification commands.
1162 Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for
1163 the current check output) sending an email to the user(s) associated with the
1164 notification itself (`$user.email$`).
1166 If you want to specify default values for some of the custom attribute definitions,
1167 you can add a `vars` dictionary as shown for the `CheckCommand` object.
1169 object NotificationCommand "mail-service-notification" {
1170 import "plugin-notification-command"
1172 command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ]
1175 NOTIFICATIONTYPE = "$notification.type$"
1176 SERVICEDESC = "$service.name$"
1177 HOSTALIAS = "$host.display_name$"
1178 HOSTADDRESS = "$address$"
1179 SERVICESTATE = "$service.state$"
1180 LONGDATETIME = "$icinga.long_date_time$"
1181 SERVICEOUTPUT = "$service.output$"
1182 NOTIFICATIONAUTHORNAME = "$notification.author$"
1183 NOTIFICATIONCOMMENT = "$notification.comment$"
1184 HOSTDISPLAYNAME = "$host.display_name$"
1185 SERVICEDISPLAYNAME = "$service.display_name$"
1186 USEREMAIL = "$user.email$"
1190 The command attribute in the `mail-service-notification` command refers to the following
1191 shell script. The macros specified in the `env` array are exported
1192 as environment variables and can be used in the notification script:
1195 template=$(cat <<TEMPLATE
1198 Notification Type: $NOTIFICATIONTYPE
1200 Service: $SERVICEDESC
1202 Address: $HOSTADDRESS
1203 State: $SERVICESTATE
1205 Date/Time: $LONGDATETIME
1207 Additional Info: $SERVICEOUTPUT
1209 Comment: [$NOTIFICATIONAUTHORNAME] $NOTIFICATIONCOMMENT
1213 /usr/bin/printf "%b" $template | mail -s "$NOTIFICATIONTYPE - $HOSTDISPLAYNAME - $SERVICEDISPLAYNAME is $SERVICESTATE" $USEREMAIL
1217 > This example is for `exim` only. Requires changes for `sendmail` and
1220 While it's possible to specify the entire notification command right
1221 in the NotificationCommand object it is generally advisable to create a
1222 shell script in the `/etc/icinga2/scripts` directory and have the
1223 NotificationCommand object refer to that.
1225 ### <a id="event-commands"></a> Event Commands
1227 Unlike notifications event commands for hosts/services are called on every
1228 check execution if one of these conditions match:
1230 * The host/service is in a [soft state](3-monitoring-basics.md#hard-soft-states)
1231 * The host/service state changes into a [hard state](3-monitoring-basics.md#hard-soft-states)
1232 * The host/service state recovers from a [soft or hard state](3-monitoring-basics.md#hard-soft-states) to [OK](3-monitoring-basics.md#service-states)/[Up](3-monitoring-basics.md#host-states)
1234 [EventCommand](12-object-types.md#objecttype-eventcommand) objects are referenced by
1235 [Host](12-object-types.md#objecttype-host) and [Service](12-object-types.md#objecttype-service) objects
1236 using the `event_command` attribute.
1238 Therefore the `EventCommand` object should define a command line
1239 evaluating the current service state and other service runtime attributes
1240 available through runtime vars. Runtime macros such as `$service.state_type$`
1241 and `$service.state$` will be processed by Icinga 2 helping on fine-granular
1242 events being triggered.
1244 Common use case scenarios are a failing HTTP check requiring an immediate
1245 restart via event command, or if an application is locked and requires
1246 a restart upon detection.
1248 `EventCommand` objects require the ITL template `plugin-event-command`
1249 to support native plugin based checks.
1251 #### <a id="event-command-restart-service-daemon"></a> Use Event Commands to Restart Service Daemon
1253 The following example will triggert a restart of the `httpd` daemon
1254 via ssh when the `http` service check fails. If the service state is
1255 `OK`, it will not trigger any event action.
1260 * icinga user with public key authentication
1261 * icinga user with sudo permissions for restarting the httpd daemon.
1265 # ls /home/icinga/.ssh/
1269 icinga ALL=(ALL) NOPASSWD: /etc/init.d/apache2 restart
1272 Define a generic [EventCommand](12-object-types.md#objecttype-eventcommand) object `event_by_ssh`
1273 which can be used for all event commands triggered using ssh:
1275 /* pass event commands through ssh */
1276 object EventCommand "event_by_ssh" {
1277 import "plugin-event-command"
1279 command = [ PluginDir + "/check_by_ssh" ]
1282 "-H" = "$event_by_ssh_address$"
1283 "-p" = "$event_by_ssh_port$"
1284 "-C" = "$event_by_ssh_command$"
1285 "-l" = "$event_by_ssh_logname$"
1286 "-i" = "$event_by_ssh_identity$"
1288 set_if = "$event_by_ssh_quiet$"
1290 "-w" = "$event_by_ssh_warn$"
1291 "-c" = "$event_by_ssh_crit$"
1292 "-t" = "$event_by_ssh_timeout$"
1295 vars.event_by_ssh_address = "$address$"
1296 vars.event_by_ssh_quiet = false
1299 The actual event command only passes the `event_by_ssh_command` attribute.
1300 The `event_by_ssh_service` custom attribute takes care of passing the correct
1301 daemon name, while `test $service.state_id$ -gt 0` makes sure that the daemon
1302 is only restarted when the service is an a not `OK` state.
1305 object EventCommand "event_by_ssh_restart_service" {
1306 import "event_by_ssh"
1308 //only restart the daemon if state > 0 (not-ok)
1309 //requires sudo permissions for the icinga user
1310 vars.event_by_ssh_command = "test $service.state_id$ -gt 0 && sudo /etc/init.d/$event_by_ssh_service$ restart"
1314 Now set the `event_command` attribute to `event_by_ssh_restart_service` and tell it
1315 which service should be restarted using the `event_by_ssh_service` attribute.
1317 object Service "http" {
1318 import "generic-service"
1319 host_name = "remote-http-host"
1320 check_command = "http"
1322 event_command = "event_by_ssh_restart_service"
1323 vars.event_by_ssh_service = "$host.vars.httpd_name$"
1325 //vars.event_by_ssh_logname = "icinga"
1326 //vars.event_by_ssh_identity = "/home/icinga/.ssh/id_rsa.pub"
1330 Each host with this service then must define the `httpd_name` custom attribute
1331 (for example generated from your cmdb):
1333 object Host "remote-http-host" {
1334 import "generic-host"
1335 address = "192.168.1.100"
1337 vars.httpd_name = "apache2"
1340 You can testdrive this example by manually stopping the `httpd` daemon
1341 on your `remote-http-host`. Enable the `debuglog` feature and tail the
1342 `/var/log/icinga2/debug.log` file.
1344 Remote Host Terminal:
1346 # date; service apache2 status
1347 Mon Sep 15 18:57:39 CEST 2014
1348 Apache2 is running (pid 23651).
1349 # date; service apache2 stop
1350 Mon Sep 15 18:57:47 CEST 2014
1351 [ ok ] Stopping web server: apache2 ... waiting .
1353 Icinga 2 Host Terminal:
1355 [2014-09-15 18:58:32 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_http' '-I' '192.168.1.100': PID 32622
1356 [2014-09-15 18:58:32 +0200] notice/Process: PID 32622 ('/usr/lib64/nagios/plugins/check_http' '-I' '192.168.1.100') terminated with exit code 2
1357 [2014-09-15 18:58:32 +0200] notice/Checkable: State Change: Checkable remote-http-host!http soft state change from OK to CRITICAL detected.
1358 [2014-09-15 18:58:32 +0200] notice/Checkable: Executing event handler 'event_by_ssh_restart_service' for service 'remote-http-host!http'
1359 [2014-09-15 18:58:32 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_by_ssh' '-C' 'test 2 -gt 0 && sudo /etc/init.d/apache2 restart' '-H' '192.168.1.100': PID 32623
1360 [2014-09-15 18:58:33 +0200] notice/Process: PID 32623 ('/usr/lib64/nagios/plugins/check_by_ssh' '-C' 'test 2 -gt 0 && sudo /etc/init.d/apache2 restart' '-H' '192.168.1.100') terminated with exit code 0
1362 Remote Host Terminal:
1364 # date; service apache2 status
1365 Mon Sep 15 18:58:44 CEST 2014
1366 Apache2 is running (pid 24908).
1371 ## <a id="dependencies"></a> Dependencies
1373 Icinga 2 uses host and service [Dependency](12-object-types.md#objecttype-dependency) objects
1374 for determing their network reachability.
1376 A service can depend on a host, and vice versa. A service has an implicit
1377 dependency (parent) to its host. A host to host dependency acts implicitly
1378 as host parent relation.
1379 When dependencies are calculated, not only the immediate parent is taken into
1380 account but all parents are inherited.
1382 The `parent_host_name` and `parent_service_name` attributes are mandatory for
1383 service dependencies, `parent_host_name` is required for host dependencies.
1384 [Apply rules](3-monitoring-basics.md#using-apply) will allow you to
1385 [determine these attributes](3-monitoring-basics.md#dependencies-apply-custom-attributes) in a more
1386 dynamic fashion if required.
1388 parent_host_name = "core-router"
1389 parent_service_name = "uplink-port"
1391 Notifications are suppressed by default if a host or service becomes unreachable.
1392 You can control that option by defining the `disable_notifications` attribute.
1394 disable_notifications = false
1396 The dependency state filter must be defined based on the parent object being
1397 either a host (`Up`, `Down`) or a service (`OK`, `Warning`, `Critical`, `Unknown`).
1399 The following example will make the dependency fail and trigger it if the parent
1400 object is **not** in one of these states:
1402 states = [ OK, Critical, Unknown ]
1404 Rephrased: If the parent service object changes into the `Warning` state, this
1405 dependency will fail and render all child objects (hosts or services) unreachable.
1407 You can determine the child's reachability by querying the `is_reachable` attribute
1408 in for example [DB IDO](14-appendix.md#schema-db-ido-extensions).
1410 ### <a id="dependencies-implicit-host-service"></a> Implicit Dependencies for Services on Host
1412 Icinga 2 automatically adds an implicit dependency for services on their host. That way
1413 service notifications are suppressed when a host is `DOWN` or `UNREACHABLE`. This dependency
1414 does not overwrite other dependencies and implicitely sets `disable_notifications = true` and
1415 `states = [ Up ]` for all service objects.
1417 Service checks are still executed. If you want to prevent them from happening, you can
1418 apply the following dependency to all services setting their host as `parent_host_name`
1419 and disabling the checks. `assign where true` matches on all `Service` objects.
1421 apply Dependency "disable-host-service-checks" to Service {
1422 disable_checks = true
1426 ### <a id="dependencies-network-reachability"></a> Dependencies for Network Reachability
1428 A common scenario is the Icinga 2 server behind a router. Checking internet
1429 access by pinging the Google DNS server `google-dns` is a common method, but
1430 will fail in case the `dsl-router` host is down. Therefore the example below
1431 defines a host dependency which acts implicitly as parent relation too.
1433 Furthermore the host may be reachable but ping probes are dropped by the
1434 router's firewall. In case the `dsl-router``ping4` service check fails, all
1435 further checks for the `ping4` service on host `google-dns` service should
1436 be suppressed. This is achieved by setting the `disable_checks` attribute to `true`.
1438 object Host "dsl-router" {
1439 import "generic-host"
1440 address = "192.168.1.1"
1443 object Host "google-dns" {
1444 import "generic-host"
1448 apply Service "ping4" {
1449 import "generic-service"
1451 check_command = "ping4"
1453 assign where host.address
1456 apply Dependency "internet" to Host {
1457 parent_host_name = "dsl-router"
1458 disable_checks = true
1459 disable_notifications = true
1461 assign where host.name != "dsl-router"
1464 apply Dependency "internet" to Service {
1465 parent_host_name = "dsl-router"
1466 parent_service_name = "ping4"
1467 disable_checks = true
1469 assign where host.name != "dsl-router"
1472 ### <a id="dependencies-apply-custom-attributes"></a> Apply Dependencies based on Custom Attributes
1474 You can use [apply rules](3-monitoring-basics.md#using-apply) to set parent or
1475 child attributes e.g. `parent_host_name`to other object's
1478 A common example are virtual machines hosted on a master. The object
1479 name of that master is auto-generated from your CMDB or VMWare inventory
1480 into the host's custom attributes (or a generic template for your
1483 Define your master host object:
1486 object Host "master.example.com" {
1487 import "generic-host"
1490 Add a generic template defining all common host attributes:
1492 /* generic template for your virtual machines */
1493 template Host "generic-vm" {
1494 import "generic-host"
1497 Add a template for all hosts on your example.com cloud setting
1498 custom attribute `vm_parent` to `master.example.com`:
1500 template Host "generic-vm-example.com" {
1502 vars.vm_parent = "master.example.com"
1505 Define your guest hosts:
1507 object Host "www.example1.com" {
1508 import "generic-vm-master.example.com"
1511 object Host "www.example2.com" {
1512 import "generic-vm-master.example.com"
1515 Apply the host dependency to all child hosts importing the
1516 `generic-vm` template and set the `parent_host_name`
1517 to the previously defined custom attribute `host.vars.vm_parent`.
1519 apply Dependency "vm-host-to-parent-master" to Host {
1520 parent_host_name = host.vars.vm_parent
1521 assign where "generic-vm" in host.templates
1524 You can extend this example, and make your services depend on the
1525 `master.example.com` host too. Their local scope allows you to use
1526 `host.vars.vm_parent` similar to the example above.
1528 apply Dependency "vm-service-to-parent-master" to Service {
1529 parent_host_name = host.vars.vm_parent
1530 assign where "generic-vm" in host.templates
1533 That way you don't need to wait for your guest hosts becoming
1534 unreachable when the master host goes down. Instead the services
1535 will detect their reachability immediately when executing checks.
1539 > This method with setting locally scoped variables only works in
1540 > apply rules, but not in object definitions.
1543 ### <a id="dependencies-agent-checks"></a> Dependencies for Agent Checks
1545 Another classic example are agent based checks. You would define a health check
1546 for the agent daemon responding to your requests, and make all other services
1547 querying that daemon depend on that health check.
1549 The following configuration defines two nrpe based service checks `nrpe-load`
1550 and `nrpe-disk` applied to the `nrpe-server`. The health check is defined as
1551 `nrpe-health` service.
1553 apply Service "nrpe-health" {
1554 import "generic-service"
1555 check_command = "nrpe"
1556 assign where match("nrpe-*", host.name)
1559 apply Service "nrpe-load" {
1560 import "generic-service"
1561 check_command = "nrpe"
1562 vars.nrpe_command = "check_load"
1563 assign where match("nrpe-*", host.name)
1566 apply Service "nrpe-disk" {
1567 import "generic-service"
1568 check_command = "nrpe"
1569 vars.nrpe_command = "check_disk"
1570 assign where match("nrpe-*", host.name)
1573 object Host "nrpe-server" {
1574 import "generic-host"
1575 address = "192.168.1.5"
1578 apply Dependency "disable-nrpe-checks" to Service {
1579 parent_service_name = "nrpe-health"
1582 disable_checks = true
1583 disable_notifications = true
1584 assign where service.check_command == "nrpe"
1585 ignore where service.name == "nrpe-health"
1588 The `disable-nrpe-checks` dependency is applied to all services
1589 on the `nrpe-service` host using the `nrpe` check_command attribute
1590 but not the `nrpe-health` service itself.
1593 ## <a id="downtimes"></a> Downtimes
1595 Downtimes can be scheduled for planned server maintenance or
1596 any other targetted service outage you are aware of in advance.
1598 Downtimes will suppress any notifications, and may trigger other
1599 downtimes too. If the downtime was set by accident, or the duration
1600 exceeds the maintenance, you can manually cancel the downtime.
1601 Planned downtimes will also be taken into account for SLA reporting
1602 tools calculating the SLAs based on the state and downtime history.
1604 Multiple downtimes for a single object may overlap. This is useful
1605 when you want to extend your maintenance window taking longer than expected.
1606 If there are multiple downtimes triggered for one object, the overall downtime depth
1607 will be greater than `1`.
1610 If the downtime was scheduled after the problem changed to a critical hard
1611 state triggering a problem notification, and the service recovers during
1612 the downtime window, the recovery notification won't be suppressed.
1614 ### <a id="fixed-flexible-downtimes"></a> Fixed and Flexible Downtimes
1616 A `fixed` downtime will be activated at the defined start time, and
1617 removed at the end time. During this time window the service state
1618 will change to `NOT-OK` and then actually trigger the downtime.
1619 Notifications are suppressed and the downtime depth is incremented.
1621 Common scenarios are a planned distribution upgrade on your linux
1622 servers, or database updates in your warehouse. The customer knows
1623 about a fixed downtime window between 23:00 and 24:00. After 24:00
1624 all problems should be alerted again. Solution is simple -
1625 schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
1627 Unlike a `fixed` downtime, a `flexible` downtime will be triggered
1628 by the state change in the time span defined by start and end time,
1629 and then last for the specified duration in minutes.
1631 Imagine the following scenario: Your service is frequently polled
1632 by users trying to grab free deleted domains for immediate registration.
1633 Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
1634 a network outage visible to the monitoring. The service is still alive,
1635 but answering too slow to Icinga 2 service checks.
1636 For that reason, you may want to schedule a downtime between 07:30 and
1637 08:00 with a duration of 15 minutes. The downtime will then last from
1638 its trigger time until the duration is over. After that, the downtime
1639 is removed (may happen before or after the actual end time!).
1641 ### <a id="scheduling-downtime"></a> Scheduling a downtime
1643 This can either happen through a web interface or by sending an [external command](3-monitoring-basics.md#external-commands)
1644 to the external command pipe provided by the `ExternalCommandListener` configuration.
1646 Fixed downtimes require a start and end time (a duration will be ignored).
1647 Flexible downtimes need a start and end time for the time span, and a duration
1648 independent from that time span.
1650 ### <a id="triggered-downtimes"></a> Triggered Downtimes
1652 This is optional when scheduling a downtime. If there is already a downtime
1653 scheduled for a future maintenance, the current downtime can be triggered by
1654 that downtime. This renders useful if you have scheduled a host downtime and
1655 are now scheduling a child host's downtime getting triggered by the parent
1656 downtime on NOT-OK state change.
1658 ### <a id="recurring-downtimes"></a> Recurring Downtimes
1660 [ScheduledDowntime objects](12-object-types.md#objecttype-scheduleddowntime) can be used to set up
1661 recurring downtimes for services.
1665 apply ScheduledDowntime "backup-downtime" to Service {
1666 author = "icingaadmin"
1667 comment = "Scheduled downtime for backup"
1670 monday = "02:00-03:00"
1671 tuesday = "02:00-03:00"
1672 wednesday = "02:00-03:00"
1673 thursday = "02:00-03:00"
1674 friday = "02:00-03:00"
1675 saturday = "02:00-03:00"
1676 sunday = "02:00-03:00"
1679 assign where "backup" in service.groups
1683 ## <a id="comments-intro"></a> Comments
1685 Comments can be added at runtime and are persistent over restarts. You can
1686 add useful information for others on repeating incidents (for example
1687 "last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
1688 is primarly accessible using web interfaces.
1690 Adding and deleting comment actions are possible through the external command pipe
1691 provided with the `ExternalCommandListener` configuration. The caller must
1692 pass the comment id in case of manipulating an existing comment.
1695 ## <a id="acknowledgements"></a> Acknowledgements
1697 If a problem is alerted and notified you may signal the other notification
1698 recipients that you are aware of the problem and will handle it.
1700 By sending an acknowledgement to Icinga 2 (using the external command pipe
1701 provided with `ExternalCommandListener` configuration) all future notifications
1702 are suppressed, a new comment is added with the provided description and
1703 a notification with the type `NotificationFilterAcknowledgement` is sent
1704 to all notified users.
1706 ### <a id="expiring-acknowledgements"></a> Expiring Acknowledgements
1708 Once a problem is acknowledged it may disappear from your `handled problems`
1709 dashboard and no-one ever looks at it again since it will suppress
1712 This `fire-and-forget` action is quite common. If you're sure that a
1713 current problem should be resolved in the future at a defined time,
1714 you can define an expiration time when acknowledging the problem.
1716 Icinga 2 will clear the acknowledgement when expired and start to
1717 re-notify if the problem persists.
1721 ## <a id="custom-attributes"></a> Custom Attributes
1723 ### <a id="custom-attributes-apply"></a> Using Custom Attributes for Apply Rules
1725 Custom attributes are not only used at runtime in command definitions to pass
1726 command arguments, but are also a smart way to define patterns and groups
1727 for applying objects for dynamic config generation.
1729 There are several ways of using custom attributes with [apply rules](3-monitoring-basics.md#using-apply):
1731 * As simple attribute literal ([number](10-language-reference.md#numeric-literals), [string](10-language-reference.md#string-literals),
1732 [boolean](10-language-reference.md#boolean-literals)) for expression conditions (`assign where`, `ignore where`)
1733 * As [array](10-language-reference.md#array) or [dictionary](10-language-reference.md#dictionary) attribute with nested values
1734 (e.g. dictionaries in dictionaries) in [apply for](3-monitoring-basics.md#using-apply-for) rules.
1736 Features like [DB IDO](3-monitoring-basics.md#db-ido), Livestatus(#livestatus) or StatusData(#status-data)
1737 dump this column as encoded JSON string, and set `is_json` resp. `cv_is_json` to `1`.
1739 If arrays are used in runtime macros (for example `$host.groups$`) all entries
1740 are separated using the `;` character. If an entry contains a semi-colon itself,
1741 it is escaped like this: `entry1;ent\;ry2;entry3`.
1743 ### <a id="runtime-custom-attributes"></a> Using Custom Attributes at Runtime
1745 Custom attributes may be used in command definitions to dynamically change how the command
1748 Additionally there are Icinga 2 features such as the [PerfDataWriter](3-monitoring-basics.md#performance-data) feature
1749 which use custom runtime attributes to format their output.
1753 > Custom attributes are identified by the `vars` dictionary attribute as short name.
1754 > Accessing the different attribute keys is possible using the [index accessor](10-language-reference.md#indexer) `.`.
1756 Custom attributes in command definitions or performance data templates are evaluated at
1757 runtime when executing a command. These custom attributes cannot be used somewhere else
1758 for example in other configuration attributes.
1760 Custom attribute values must be either a string, a number, a boolean value or an array.
1761 Dictionaries cannot be used at the time of writing.
1763 Arrays can be used to pass multiple arguments with or without repeating the key string.
1764 This helps passing multiple parameters to check plugins requiring them. Prominent
1765 plugin examples are:
1767 * [check_disk -p](13-icinga-template-library.md#plugin-check-command-disk)
1768 * [check_nrpe -a](13-icinga-template-library.md#plugin-check-command-nrpe)
1769 * [check_nscp -l](13-icinga-template-library.md#plugin-check-command-nscp)
1770 * [check_dns -a](13-icinga-template-library.md#plugin-check-command-dns)
1772 More details on how to use `repeat_key` and other command argument options can be
1773 found in [this section](12-object-types.md#objecttype-checkcommand-arguments).
1777 > If a macro value cannot be resolved, be it a single macro, or a recursive macro
1778 > containing an array of macros, the entire command argument is skipped.
1780 This is an example of a command definition which uses user-defined custom attributes:
1782 object CheckCommand "my-icmp" {
1783 import "plugin-check-command"
1784 command = [ "/bin/sudo", PluginDir + "/check_icmp" ]
1788 value = "$icmp_targets$"
1792 "-w" = "$icmp_wrta$,$icmp_wpl$%"
1793 "-c" = "$icmp_crta$,$icmp_cpl$%"
1794 "-s" = "$icmp_source$"
1795 "-n" = "$icmp_packets$"
1796 "-i" = "$icmp_packet_interval$"
1797 "-I" = "$icmp_target_interval$"
1798 "-m" = "$icmp_hosts_alive$"
1799 "-b" = "$icmp_data_bytes$"
1800 "-t" = "$icmp_timeout$"
1803 vars.icmp_wrta = 200.00
1805 vars.icmp_crta = 500.00
1808 vars.notes = "Requires setuid root or sudo."
1811 Custom attribute names used at runtime must be enclosed in two `$` signs,
1812 for example `$address$`.
1816 > When using the `$` sign as single character, you need to escape it with an
1817 > additional dollar sign (`$$`).
1819 This example also makes use of the [command arguments](3-monitoring-basics.md#command-arguments) passed
1820 to the command line.
1822 You can integrate the above example `CheckCommand` definition
1823 [passing command argument parameters](3-monitoring-basics.md#command-passing-parameters) like this:
1825 object Host "my-icmp-host" {
1826 import "generic-host"
1827 address = "192.168.1.10"
1828 vars.address_mgmt = "192.168.2.10"
1829 vars.address_web = "192.168.10.10"
1830 vars.icmp_targets = [ "$address$", "$host.vars.address_mgmt$", "$host.vars.address_web$" ]
1833 apply Service "my-icmp" {
1834 check_command = "my-icmp"
1836 retry_interval = 30s
1838 vars.icmp_targets = host.vars.icmp_targets
1840 assign where host.vars.icmp_targets
1843 ### <a id="runtime-custom-attributes-evaluation-order"></a> Runtime Custom Attributes Evaluation Order
1845 When executing commands Icinga 2 checks the following objects in this order to look
1846 up custom attributes and their respective values:
1848 1. User object (only for notifications)
1852 5. Global custom attributes in the `vars` constant
1854 This execution order allows you to define default values for custom attributes
1855 in your command objects. The `my-ping` command shown above uses this to set
1856 default values for some of the latency thresholds and timeouts.
1858 When using the `my-ping` command you can override some or all of the custom
1859 attributes in the service definition like this:
1861 object Service "ping" {
1862 host_name = "localhost"
1863 check_command = "my-ping"
1865 vars.ping_packets = 10 // Overrides the default value of 5 given in the command
1868 If a custom attribute isn't defined anywhere an empty value is used and a warning is
1869 emitted to the Icinga 2 log.
1873 > By convention every host should have an `address` attribute. Hosts
1874 > which have an IPv6 address should also have an `address6` attribute.
1876 ### <a id="runtime-custom-attribute-env-vars"></a> Runtime Custom Attributes as Environment Variables
1878 The `env` command object attribute specifies a list of environment variables with values calculated
1879 from either runtime macros or custom attributes which should be exported as environment variables
1880 prior to executing the command.
1882 This is useful for example for hiding sensitive information on the command line output
1883 when passing credentials to database checks:
1885 object CheckCommand "mysql-health" {
1886 import "plugin-check-command"
1889 PluginDir + "/check_mysql"
1893 "-H" = "$mysql_address$"
1894 "-d" = "$mysql_database$"
1897 vars.mysql_address = "$address$"
1898 vars.mysql_database = "icinga"
1899 vars.mysql_user = "icinga_check"
1900 vars.mysql_pass = "password"
1902 env.MYSQLUSER = "$mysql_user$"
1903 env.MYSQLPASS = "$mysql_pass$"
1906 ### <a id="multiple-host-addresses-custom-attributes"></a> Multiple Host Addresses using Custom Attributes
1908 The following example defines a `Host` with three different interface addresses defined as
1909 custom attributes in the `vars` dictionary. The `if-eth0` and `if-eth1` services will import
1910 these values into the `address` custom attribute. This attribute is available through the
1911 generic `$address$` runtime macro.
1913 object Host "multi-ip" {
1914 check_command = "dummy"
1915 vars.address_lo = "127.0.0.1"
1916 vars.address_eth0 = "10.0.0.10"
1917 vars.address_eth1 = "192.168.1.10"
1920 apply Service "if-eth0" {
1921 import "generic-service"
1923 vars.address = "$host.vars.address_eth0$"
1924 check_command = "my-generic-interface-check"
1926 assign where host.vars.address_eth0 != ""
1929 apply Service "if-eth1" {
1930 import "generic-service"
1932 vars.address = "$host.vars.address_eth1$"
1933 check_command = "my-generic-interface-check"
1935 assign where host.vars.address_eth1 != ""
1938 object CheckCommand "my-generic-interface-check" {
1939 import "plugin-check-command"
1941 command = "echo \"This would be the service $service.description$ using the address value: $address$\""
1944 The `CheckCommand` object is just an example to help you with testing and
1945 understanding the different custom attributes and runtime macros.
1947 ### <a id="modified-attributes"></a> Modified Attributes
1949 Icinga 2 allows you to modify defined object attributes at runtime different to
1950 the local configuration object attributes. These modified attributes are
1951 stored as bit-shifted-value and made available in backends. Icinga 2 stores
1952 modified attributes in its state file and restores them on restart.
1954 Modified Attributes can be reset using external commands.
1957 ## <a id="runtime-macros"></a> Runtime Macros
1959 Next to custom attributes there are additional runtime macros made available by Icinga 2.
1960 These runtime macros reflect the current object state and may change over time while
1961 custom attributes are configured statically (but can be modified at runtime using
1964 ### <a id="runtime-macro-evaluation-order"></a> Runtime Macro Evaluation Order
1966 Custom attributes can be accessed at [runtime](3-monitoring-basics.md#runtime-custom-attributes) using their
1967 identifier omitting the `vars.` prefix.
1968 There are special cases when those custom attributes are not set and Icinga 2 provides
1969 a fallback to existing object attributes for example `host.address`.
1971 In the following example the `$address$` macro will be resolved with the value of `vars.address`.
1973 object Host "localhost" {
1974 import "generic-host"
1975 check_command = "my-host-macro-test"
1976 address = "127.0.0.1"
1977 vars.address = "127.2.2.2"
1980 object CheckCommand "my-host-macro-test" {
1981 command = "echo \"address: $address$ host.address: $host.address$ host.vars.address: $host.vars.address$\""
1984 The check command output will look like
1986 "address: 127.2.2.2 host.address: 127.0.0.1 host.vars.address: 127.2.2.2"
1988 If you alter the host object and remove the `vars.address` line, Icinga 2 will fail to look up `$address$` in the
1989 custom attributes dictionary and then look for the host object's attribute.
1991 The check command output will change to
1993 "address: 127.0.0.1 host.address: 127.0.0.1 host.vars.address: "
1996 The same example can be defined for services overriding the `address` field based on a specific host custom attribute.
1998 object Host "localhost" {
1999 import "generic-host"
2000 address = "127.0.0.1"
2001 vars.macro_address = "127.3.3.3"
2004 apply Service "my-macro-test" to Host {
2005 import "generic-service"
2006 check_command = "my-service-macro-test"
2007 vars.address = "$host.vars.macro_address$"
2009 assign where host.address
2012 object CheckCommand "my-service-macro-test" {
2013 command = "echo \"address: $address$ host.address: $host.address$ host.vars.macro_address: $host.vars.macro_address$ service.vars.address: $service.vars.address$\""
2016 When the service check is executed the output looks like
2018 "address: 127.3.3.3 host.address: 127.0.0.1 host.vars.macro_address: 127.3.3.3 service.vars.address: 127.3.3.3"
2020 That way you can easily override existing macros being accessed by their short name like `$address$` and refrain
2021 from defining multiple check commands (one for `$address$` and one for `$host.vars.macro_address$`).
2024 ### <a id="host-runtime-macros"></a> Host Runtime Macros
2026 The following host custom attributes are available in all commands that are executed for
2030 -----------------------------|--------------
2031 host.name | The name of the host object.
2032 host.display_name | The value of the `display_name` attribute.
2033 host.state | The host's current state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
2034 host.state_id | The host's current state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
2035 host.state_type | The host's current state type. Can be one of `SOFT` and `HARD`.
2036 host.check_attempt | The current check attempt number.
2037 host.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
2038 host.last_state | The host's previous state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
2039 host.last_state_id | The host's previous state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
2040 host.last_state_type | The host's previous state type. Can be one of `SOFT` and `HARD`.
2041 host.last_state_change | The last state change's timestamp.
2042 host.duration_sec | The time since the last state change.
2043 host.latency | The host's check latency.
2044 host.execution_time | The host's check execution time.
2045 host.output | The last check's output.
2046 host.perfdata | The last check's performance data.
2047 host.last_check | The timestamp when the last check was executed.
2048 host.num_services | Number of services associated with the host.
2049 host.num_services_ok | Number of services associated with the host which are in an `OK` state.
2050 host.num_services_warning | Number of services associated with the host which are in a `WARNING` state.
2051 host.num_services_unknown | Number of services associated with the host which are in an `UNKNOWN` state.
2052 host.num_services_critical | Number of services associated with the host which are in a `CRITICAL` state.
2054 ### <a id="service-runtime-macros"></a> Service Runtime Macros
2056 The following service macros are available in all commands that are executed for
2060 ---------------------------|--------------
2061 service.name | The short name of the service object.
2062 service.display_name | The value of the `display_name` attribute.
2063 service.check_command | The short name of the command along with any arguments to be used for the check.
2064 service.state | The service's current state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
2065 service.state_id | The service's current state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
2066 service.state_type | The service's current state type. Can be one of `SOFT` and `HARD`.
2067 service.check_attempt | The current check attempt number.
2068 service.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
2069 service.last_state | The service's previous state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
2070 service.last_state_id | The service's previous state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
2071 service.last_state_type | The service's previous state type. Can be one of `SOFT` and `HARD`.
2072 service.last_state_change | The last state change's timestamp.
2073 service.duration_sec | The time since the last state change.
2074 service.latency | The service's check latency.
2075 service.execution_time | The service's check execution time.
2076 service.output | The last check's output.
2077 service.perfdata | The last check's performance data.
2078 service.last_check | The timestamp when the last check was executed.
2080 ### <a id="command-runtime-macros"></a> Command Runtime Macros
2082 The following custom attributes are available in all commands:
2085 -----------------------|--------------
2086 command.name | The name of the command object.
2088 ### <a id="user-runtime-macros"></a> User Runtime Macros
2090 The following custom attributes are available in all commands that are executed for
2094 -----------------------|--------------
2095 user.name | The name of the user object.
2096 user.display_name | The value of the display_name attribute.
2098 ### <a id="notification-runtime-macros"></a> Notification Runtime Macros
2101 -----------------------|--------------
2102 notification.type | The type of the notification.
2103 notification.author | The author of the notification comment, if existing.
2104 notification.comment | The comment of the notification, if existing.
2106 ### <a id="global-runtime-macros"></a> Global Runtime Macros
2108 The following macros are available in all executed commands:
2111 -----------------------|--------------
2112 icinga.timet | Current UNIX timestamp.
2113 icinga.long_date_time | Current date and time including timezone information. Example: `2014-01-03 11:23:08 +0000`
2114 icinga.short_date_time | Current date and time. Example: `2014-01-03 11:23:08`
2115 icinga.date | Current date. Example: `2014-01-03`
2116 icinga.time | Current time including timezone information. Example: `11:23:08 +0000`
2117 icinga.uptime | Current uptime of the Icinga 2 process.
2119 The following macros provide global statistics:
2122 ----------------------------------|--------------
2123 icinga.num_services_ok | Current number of services in state 'OK'.
2124 icinga.num_services_warning | Current number of services in state 'Warning'.
2125 icinga.num_services_critical | Current number of services in state 'Critical'.
2126 icinga.num_services_unknown | Current number of services in state 'Unknown'.
2127 icinga.num_services_pending | Current number of pending services.
2128 icinga.num_services_unreachable | Current number of unreachable services.
2129 icinga.num_services_flapping | Current number of flapping services.
2130 icinga.num_services_in_downtime | Current number of services in downtime.
2131 icinga.num_services_acknowledged | Current number of acknowledged service problems.
2132 icinga.num_hosts_up | Current number of hosts in state 'Up'.
2133 icinga.num_hosts_down | Current number of hosts in state 'Down'.
2134 icinga.num_hosts_unreachable | Current number of unreachable hosts.
2135 icinga.num_hosts_flapping | Current number of flapping hosts.
2136 icinga.num_hosts_in_downtime | Current number of hosts in downtime.
2137 icinga.num_hosts_acknowledged | Current number of acknowledged host problems.
2140 ## <a id="check-result-freshness"></a> Check Result Freshness
2142 In Icinga 2 active check freshness is enabled by default. It is determined by the
2143 `check_interval` attribute and no incoming check results in that period of time.
2145 threshold = last check execution time + check interval
2147 Passive check freshness is calculated from the `check_interval` attribute if set.
2149 threshold = last check result time + check interval
2151 If the freshness checks are invalid, a new check is executed defined by the
2152 `check_command` attribute.
2155 ## <a id="check-flapping"></a> Check Flapping
2157 The flapping algorithm used in Icinga 2 does not store the past states but
2158 calculcates the flapping threshold from a single value based on counters and
2159 half-life values. Icinga 2 compares the value with a single flapping threshold
2160 configuration attribute named `flapping_threshold`.
2162 Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
2165 ## <a id="volatile-services"></a> Volatile Services
2167 By default all services remain in a non-volatile state. When a problem
2168 occurs, the `SOFT` state applies and once `max_check_attempts` attribute
2169 is reached with the check counter, a `HARD` state transition happens.
2170 Notifications are only triggered by `HARD` state changes and are then
2171 re-sent defined by the `interval` attribute.
2173 It may be reasonable to have a volatile service which stays in a `HARD`
2174 state type if the service stays in a `NOT-OK` state. That way each
2175 service recheck will automatically trigger a notification unless the
2176 service is acknowledged or in a scheduled downtime.
2179 ## <a id="external-commands"></a> External Commands
2181 Icinga 2 provides an external command pipe for processing commands
2182 triggering specific actions (for example rescheduling a service check
2183 through the web interface).
2185 In order to enable the `ExternalCommandListener` configuration use the
2186 following command and restart Icinga 2 afterwards:
2188 # icinga2 feature enable command
2190 Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd`
2191 using the default configuration.
2193 Web interfaces and other Icinga addons are able to send commands to
2194 Icinga 2 through the external command pipe, for example for rescheduling
2195 a forced service check:
2197 # /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd
2199 # tail -f /var/log/messages
2201 Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885
2202 Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4'
2205 ### <a id="external-command-list"></a> External Command List
2207 A list of currently supported external commands can be found [here](14-appendix.md#external-commands-list-detail).
2209 Detailed information on the commands and their required parameters can be found
2210 on the [Icinga 1.x documentation](http://docs.icinga.org/latest/en/extcommands2.html).
2212 ## <a id="logging"></a> Logging
2214 Icinga 2 supports three different types of logging:
2217 * Syslog (on *NIX-based operating systems)
2218 * Console logging (`STDOUT` on tty)
2220 You can enable additional loggers using the `icinga2 feature enable`
2221 and `icinga2 feature disable` commands to configure loggers:
2223 Feature | Description
2224 ---------|------------
2225 debuglog | Debug log (path: `/var/log/icinga2/debug.log`, severity: `debug` or higher)
2226 mainlog | Main log (path: `/var/log/icinga2/icinga2.log`, severity: `information` or higher)
2227 syslog | Syslog (severity: `warning` or higher)
2229 By default file the `mainlog` feature is enabled. When running Icinga 2
2230 on a terminal log messages with severity `information` or higher are
2231 written to the console.
2234 ## <a id="performance-data"></a> Performance Data
2236 When a host or service check is executed plugins should provide so-called
2237 `performance data`. Next to that additional check performance data
2238 can be fetched using Icinga 2 runtime macros such as the check latency
2239 or the current service state (or additional custom attributes).
2241 The performance data can be passed to external applications which aggregate and
2242 store them in their backends. These tools usually generate graphs for historical
2243 reporting and trending.
2245 Well-known addons processing Icinga performance data are PNP4Nagios,
2246 inGraph and Graphite.
2248 ### <a id="writing-performance-data-files"></a> Writing Performance Data Files
2250 PNP4Nagios, inGraph and Graphios use performance data collector daemons to fetch
2251 the current performance files for their backend updates.
2253 Therefore the Icinga 2 `PerfdataWriter` object allows you to define
2254 the output template format for host and services backed with Icinga 2
2257 host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.checkcommand$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.statetype$"
2258 service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.name$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.checkcommand$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.statetype$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.statetype$"
2260 The default templates are already provided with the Icinga 2 feature configuration
2261 which can be enabled using
2263 # icinga2 feature enable perfdata
2265 By default all performance data files are rotated in a 15 seconds interval into
2266 the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.<timestamp>` and
2267 `service-perfdata.<timestamp>`.
2268 External collectors need to parse the rotated performance data files and then
2269 remove the processed files.
2271 ### <a id="graphite-carbon-cache-writer"></a> Graphite Carbon Cache Writer
2273 While there are some Graphite collector scripts and daemons like Graphios available for
2274 Icinga 1.x it's more reasonable to directly process the check and plugin performance
2275 in memory in Icinga 2. Once there are new metrics available, Icinga 2 will directly
2276 write them to the defined Graphite Carbon daemon tcp socket.
2278 You can enable the feature using
2280 # icinga2 feature enable graphite
2282 By default the `GraphiteWriter` object expects the Graphite Carbon Cache to listen at
2283 `127.0.0.1` on TCP port `2003`.
2285 The current naming schema is
2287 icinga.<hostname>.<metricname>
2288 icinga.<hostname>.<servicename>.<metricname>
2290 You can customize the metric prefix name by using the `host_name_template` and
2291 `service_name_template` configuration attributes.
2293 The example below uses [runtime macros](3-monitoring-basics.md#runtime-macros) and a
2294 [global constant](10-language-reference.md#constants) named `GraphiteEnv`. The constant name
2295 is freely definable and should be put in the [constants.conf](2-getting-started.md#constants-conf) file.
2297 const GraphiteEnv = "icinga.env1"
2299 object GraphiteWriter "graphite" {
2300 host_name_template = GraphiteEnv + ".$host.name$"
2301 service_name_template = GraphiteEnv + ".$host.name$.$service.name$"
2304 To make sure Icinga 2 writes a valid label into Graphite some characters are replaced
2305 with `_` in the target name:
2309 The resulting name in Graphite might look like:
2311 www-01 / http-cert / response time
2312 icinga.www_01.http_cert.response_time
2314 In addition to the performance data retrieved from the check plugin, Icinga 2 sends
2315 internal check statistic data to Graphite:
2317 metric | description
2318 -------------------|------------------------------------------
2319 current_attempt | current check attempt
2320 max_check_attempts | maximum check attempts until the hard state is reached
2321 reachable | checked object is reachable
2322 downtime_depth | number of downtimes this object is in
2323 execution_time | check execution time
2324 latency | check latency
2325 state | current state of the checked object
2326 state_type | 0=SOFT, 1=HARD state
2328 The following example illustrates how to configure the storage-schemas for Graphite Carbon
2329 Cache. Please make sure that the order is correct because the first match wins.
2332 pattern = ^icinga\..*\.(max_check_attempts|reachable|current_attempt|execution_time|latency|state|state_type)
2336 # intervals like PNP4Nagios uses them per default
2338 retentions = 1m:2d,5m:10d,30m:90d,360m:4y
2340 ### <a id="gelfwriter"></a> GELF Writer
2342 The `Graylog Extended Log Format` (short: [GELF](http://www.graylog2.org/resources/gelf))
2343 can be used to send application logs directly to a TCP socket.
2345 While it has been specified by the [graylog2](http://www.graylog2.org/) project as their
2346 [input resource standard](http://www.graylog2.org/resources/gelf), other tools such as
2347 [Logstash](http://www.logstash.net) also support `GELF` as
2348 [input type](http://logstash.net/docs/latest/inputs/gelf).
2350 You can enable the feature using
2352 # icinga2 feature enable gelf
2354 By default the `GelfWriter` object expects the GELF receiver to listen at `127.0.0.1` on TCP port `12201`.
2355 The default `source` attribute is set to `icinga2`. You can customize that for your needs if required.
2357 Currently these events are processed:
2363 ## <a id="status-data"></a> Status Data
2365 Icinga 1.x writes object configuration data and status data in a cyclic
2366 interval to its `objects.cache` and `status.dat` files. Icinga 2 provides
2367 the `StatusDataWriter` object which dumps all configuration objects and
2368 status updates in a regular interval.
2370 # icinga2 feature enable statusdata
2372 Icinga 1.x Classic UI requires this data set as part of its backend.
2376 > If you are not using any web interface or addon which uses these files
2377 > you can safely disable this feature.
2380 ## <a id="compat-logging"></a> Compat Logging
2382 The Icinga 1.x log format is considered being the `Compat Log`
2383 in Icinga 2 provided with the `CompatLogger` object.
2385 These logs are not only used for informational representation in
2386 external web interfaces parsing the logs, but also to generate
2387 SLA reports and trends in Icinga 1.x Classic UI. Furthermore the
2388 [Livestatus](#livestatus) feature uses these logs for answering queries to
2391 The `CompatLogger` object can be enabled with
2393 # icinga2 feature enable compatlog
2395 By default, the Icinga 1.x log file called `icinga.log` is located
2396 in `/var/log/icinga2/compat`. Rotated log files are moved into
2397 `var/log/icinga2/compat/archives`.
2399 The format cannot be changed without breaking compatibility to
2400 existing log parsers.
2402 # tail -f /var/log/icinga2/compat/icinga.log
2404 [1382115688] LOG ROTATION: HOURLY
2405 [1382115688] LOG VERSION: 2.0
2406 [1382115688] HOST STATE: CURRENT;localhost;UP;HARD;1;
2407 [1382115688] SERVICE STATE: CURRENT;localhost;disk;WARNING;HARD;1;
2408 [1382115688] SERVICE STATE: CURRENT;localhost;http;OK;HARD;1;
2409 [1382115688] SERVICE STATE: CURRENT;localhost;load;OK;HARD;1;
2410 [1382115688] SERVICE STATE: CURRENT;localhost;ping4;OK;HARD;1;
2411 [1382115688] SERVICE STATE: CURRENT;localhost;ping6;OK;HARD;1;
2412 [1382115688] SERVICE STATE: CURRENT;localhost;processes;WARNING;HARD;1;
2413 [1382115688] SERVICE STATE: CURRENT;localhost;ssh;OK;HARD;1;
2414 [1382115688] SERVICE STATE: CURRENT;localhost;users;OK;HARD;1;
2415 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;disk;1382115705
2416 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;http;1382115705
2417 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;load;1382115705
2418 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382115705
2419 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ping6;1382115705
2420 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;processes;1382115705
2421 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ssh;1382115705
2422 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;users;1382115705
2423 [1382115731] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;localhost;ping6;2;critical test|
2424 [1382115731] SERVICE ALERT: localhost;ping6;CRITICAL;SOFT;2;critical test
2429 ## <a id="db-ido"></a> DB IDO
2431 The IDO (Icinga Data Output) modules for Icinga 2 take care of exporting all
2432 configuration and status information into a database. The IDO database is used
2433 by a number of projects including Icinga Web 1.x and 2.
2435 Details on the installation can be found in the [Configuring DB IDO](2-getting-started.md#configuring-db-ido)
2436 chapter. Details on the configuration can be found in the
2437 [IdoMysqlConnection](12-object-types.md#objecttype-idomysqlconnection) and
2438 [IdoPgsqlConnection](12-object-types.md#objecttype-idopgsqlconnection)
2439 object configuration documentation.
2440 The DB IDO feature supports [High Availability](4-monitoring-remote-systems.md#high-availability-db-ido) in
2441 the Icinga 2 cluster.
2443 The following example query checks the health of the current Icinga 2 instance
2444 writing its current status to the DB IDO backend table `icinga_programstatus`
2445 every 10 seconds. By default it checks 60 seconds into the past which is a reasonable
2446 amount of time - adjust it for your requirements. If the condition is not met,
2447 the query returns an empty result.
2451 > Use [check plugins](6-addons-plugins.md#plugins) to monitor the backend.
2453 Replace the `default` string with your instance name, if different.
2457 # mysql -u root -p icinga -e "SELECT status_update_time FROM icinga_programstatus ps
2458 JOIN icinga_instances i ON ps.instance_id=i.instance_id
2459 WHERE (UNIX_TIMESTAMP(ps.status_update_time) > UNIX_TIMESTAMP(NOW())-60)
2460 AND i.instance_name='default';"
2462 +---------------------+
2463 | status_update_time |
2464 +---------------------+
2465 | 2014-05-29 14:29:56 |
2466 +---------------------+
2469 Example for PostgreSQL:
2471 # export PGPASSWORD=icinga; psql -U icinga -d icinga -c "SELECT ps.status_update_time FROM icinga_programstatus AS ps
2472 JOIN icinga_instances AS i ON ps.instance_id=i.instance_id
2473 WHERE ((SELECT extract(epoch from status_update_time) FROM icinga_programstatus) > (SELECT extract(epoch from now())-60))
2474 AND i.instance_name='default'";
2477 ------------------------
2478 2014-05-29 15:11:38+02
2482 A detailed list on the available table attributes can be found in the [DB IDO Schema documentation](14-appendix.md#schema-db-ido).
2485 ## <a id="check-result-files"></a> Check Result Files
2487 Icinga 1.x writes its check result files to a temporary spool directory
2488 where they are processed in a regular interval.
2489 While this is extremely inefficient in performance regards it has been
2490 rendered useful for passing passive check results directly into Icinga 1.x
2491 skipping the external command pipe.
2493 Several clustered/distributed environments and check-aggregation addons
2494 use that method. In order to support step-by-step migration of these
2495 environments, Icinga 2 ships the `CheckResultReader` object.
2497 There is no feature configuration available, but it must be defined
2498 on-demand in your Icinga 2 objects configuration.
2500 object CheckResultReader "reader" {
2501 spool_dir = "/data/check-results"