1 # <a id="monitoring-basics"></a> Monitoring Basics
3 This part of the Icinga 2 documentation provides an overview of all the basic
4 monitoring concepts you need to know to run Icinga 2.
6 ## <a id="hosts-services"></a> Hosts and Services
8 Icinga 2 can be used to monitor the availability of hosts and services. Hosts
9 and services can be virtually anything which can be checked in some way:
11 * Network services (HTTP, SMTP, SNMP, SSH, etc.)
15 * Other local or network-accessible services
17 Host objects provide a mechanism to group services that are running
18 on the same physical device.
20 Here is an example of a host object which defines two child services:
22 object Host "my-server1" {
24 check_command = "hostalive"
27 object Service "ping4" {
28 host_name = "my-server1"
29 check_command = "ping4"
32 object Service "http" {
33 host_name = "my-server1"
34 check_command = "http"
37 The example creates two services `ping4` and `http` which belong to the
40 It also specifies that the host should perform its own check using the `hostalive`
43 The `address` attribute is used by check commands to determine which network
44 address is associated with the host object.
46 Details on troubleshooting check problems can be found [here](#troubleshooting).
48 ### <a id="host-states"></a> Host States
50 Hosts can be in any of the following states:
53 ------------|--------------
54 UP | The host is available.
55 DOWN | The host is unavailable.
57 ### <a id="service-states"></a> Service States
59 Services can be in any of the following states:
62 ------------|--------------
63 OK | The service is working properly.
64 WARNING | The service is experiencing some problems but is still considered to be in working condition.
65 CRITICAL | The service is in a critical state.
66 UNKNOWN | The check could not determine the service's state.
68 ### <a id="hard-soft-states"></a> Hard and Soft States
70 When detecting a problem with a host/service Icinga re-checks the object a number of
71 times (based on the `max_check_attempts` and `retry_interval` settings) before sending
72 notifications. This ensures that no unnecessary notifications are sent for
73 transient failures. During this time the object is in a `SOFT` state.
75 After all re-checks have been executed and the object is still in a non-OK
76 state the host/service switches to a `HARD` state and notifications are sent.
79 ------------|--------------
80 HARD | The host/service's state hasn't recently changed.
81 SOFT | The host/service has recently changed state and is being re-checked.
83 ### <a id="host-service-checks"></a> Host and Service Checks
85 Hosts and Services determine their state from a check result returned from a check
86 execution to the Icinga 2 application. By default the `generic-host` example template
87 will define `hostalive` as host check. If your host is unreachable for ping, you should
88 consider using a different check command, for instance the `http` check command, or if
89 there is no check available, the `dummy` check command.
91 object Host "uncheckable-host" {
92 check_command = "dummy"
94 vars.dummy_text = "Pretending to be OK."
97 Service checks could also use a `dummy` check, but the common strategy is to
98 [integrate an existing plugin](#command-plugin-integration) as
99 [check command](#check-commands) and [reference](#command-passing-parameters)
100 that in your [Service](#objecttype-service) object definition.
102 ## <a id="configuration-best-practice"></a> Configuration Best Practice
104 The [Getting Started](#getting-started) chapter already introduced various aspects
105 of the Icinga 2 configuration language. If you are ready to configure additional
106 hosts, services, notifications, dependencies, etc, you should think about the
107 requirements first and then decide for a possible strategy.
109 There are many ways of creating Icinga 2 configuration objects:
111 * Manually with your preferred editor, for example vi(m), nano, notepad, etc.
112 * Generated by a configuration management tool such as Puppet, Chef, Ansible, etc.
113 * A configuration addon for Icinga 2
114 * A custom exporter script from your CMDB or inventory tool
117 In order to find the best strategy for your own configuration, ask yourself the following questions:
119 * Do your hosts share a common group of services (for example linux hosts with disk, load, etc checks)?
120 * Only a small set of users receives notifications and escalations for all hosts/services?
122 If you can at least answer one of these questions with yes, look for the [apply rules](#using-apply) logic
123 instead of defining objects on a per host and service basis.
125 * You are required to define specific configuration for each host/service?
126 * Does your configuration generation tool already know about the host-service-relationship?
128 Then you should look for the object specific configuration setting `host_name` etc accordingly.
130 Finding the best files and directory tree for your configuration is up to you. Make sure that
131 the [icinga2.conf](#icinga2-conf) configuration file includes them, and then think about:
133 * tree-based on locations, hostgroups, specific host attributes with sub levels of directories.
134 * flat `hosts.conf`, `services.conf`, etc files for rule based configuration.
135 * generated configuration with one file per host and a global configuration for groups, users, etc.
136 * one big file generated from an external application (probably a bad idea for maintaining changes).
139 In either way of choosing the right strategy you should additionally check the following:
141 * Are there any specific attributes describing the host/service you could set as `vars` custom attributes?
142 You can later use them for applying assign/ignore rules, or export them into external interfaces.
143 * Put hosts into hostgroups, services into servicegroups and use these attributes for your apply rules.
144 * Use templates to store generic attributes for your objects and apply rules making your configuration more readable.
145 Details can be found in the [using templates](#using-templates) chapter.
146 * Apply rules may overlap. Keep a central place (for example, `services.conf` or `notifications.conf`) storing
147 the configuration instead of defining apply rules deep in your configuration tree.
148 * Every plugin used as check, notification or event command requires a `Command` definition.
149 Further details can be looked up in the [check commands](#check-commands) chapter.
151 If you happen to have further questions, do not hesitate to join the [community support channels](https://support.icinga.org)
152 and ask community members for their experience and best practices.
155 ### <a id="object-inheritance-using-templates"></a> Object Inheritance Using Templates
157 Templates may be used to apply a set of identical attributes to more than one
160 template Service "generic-service" {
161 max_check_attempts = 3
164 enable_perfdata = true
167 object Service "ping4" {
168 import "generic-service"
170 host_name = "localhost"
171 check_command = "ping4"
174 object Service "ping6" {
175 import "generic-service"
177 host_name = "localhost"
178 check_command = "ping6"
181 In this example the `ping4` and `ping6` services inherit properties from the
182 template `generic-service`.
184 Objects as well as templates themselves can import an arbitrary number of
185 templates. Attributes inherited from a template can be overridden in the
188 ### <a id="using-apply"></a> Apply objects based on rules
190 Instead of assigning each object (`Service`, `Notification`, `Dependency`, `ScheduledDowntime`)
191 based on attribute identifiers for example `host_name` objects can be [applied](#apply).
193 Detailed scenario examples are used in their respective chapters, for example
194 [apply services with custom command arguments](#using-apply-services-command-arguments).
196 #### <a id="using-apply-services"></a> Apply Services to Hosts
198 apply Service "load" {
199 import "generic-service"
201 check_command = "load"
203 assign where "linux-server" in host.groups
204 ignore where host.vars.no_load_check
207 In this example the `load` service will be created as object for all hosts in the `linux-server`
208 host group. If the `no_load_check` custom attribute is set, the host will be
211 #### <a id="using-apply-notifications"></a> Apply Notifications to Hosts and Services
213 Notifications are applied to specific targets (`Host` or `Service`) and work in a similar
216 apply Notification "mail-noc" to Service {
217 import "mail-service-notification"
218 command = "mail-service-notification"
219 user_groups = [ "noc" ]
221 assign where service.vars.sla == "24x7"
224 In this example the `mail-noc` notification will be created as object for all services having the
225 `sla` custom attribute set to `24x7`. The notification command is set to `mail-service-notification`
226 and all members of the user group `noc` will get notified.
228 #### <a id="using-apply-dependencies"></a> Apply Dependencies to Hosts and Services
230 Detailed examples can be found in the [dependencies](#dependencies) chapter.
232 ### <a id="using-apply-scheduledowntimes"></a> Apply Recurring Downtimes to Hosts and Services
234 Detailed examples can be found in the [recurring downtimes](#recurring-downtimes) chapter.
237 ### <a id="groups"></a> Groups
239 Groups are used for combining hosts, services, and users into
240 accessible configuration attributes and views in external (web)
243 Group membership is defined at the respective object itself. If
244 you have a hostgroup name `windows` for example, and want to assign
245 specific hosts to this group for later viewing the group on your
246 alert dashboard, first create the hostgroup:
248 object HostGroup "windows" {
249 display_name = "Windows Servers"
252 Then add your hosts to this hostgroup
254 template Host "windows-server" {
255 groups += [ "windows" ]
258 object Host "mssql-srv1" {
259 import "windows-server"
261 vars.mssql_port = 1433
264 object Host "mssql-srv2" {
265 import "windows-server"
267 vars.mssql_port = 1433
270 This can be done for service and user groups the same way. Additionally
271 the user groups are associated as attributes in `Notification` objects.
273 object UserGroup "windows-mssql-admins" {
274 display_name = "Windows MSSQL Admins"
277 template User "generic-windows-mssql-users" {
278 groups += [ "windows-mssql-admins" ]
281 object User "win-mssql-noc" {
282 import "generic-windows-mssql-users"
284 email = "noc@example.com"
287 object User "win-mssql-ops" {
288 import "generic-windows-mssql-users"
290 email = "ops@example.com"
293 #### <a id="group-assign"></a> Group Membership Assign
295 If there is a certain number of hosts, services, or users matching a pattern
296 it's reasonable to assign the group object to these members.
297 Details on the `assign where` syntax can be found [here](#apply)
299 object HostGroup "mssql" {
300 display_name = "MSSQL Servers"
301 assign where host.vars.mssql_port
304 In this inherited example from above all hosts with the `vars` attribute `mssql_port`
305 set will be added as members to the host group `mssql`.
307 ## <a id="notifications"></a> Notifications
309 Notifications for service and host problems are an integral part of your
312 When a host or service is in a downtime, a problem has been acknowledged or
313 the dependency logic determined that the host/service is unreachable, no
314 notifications are sent. You can configure additional type and state filters
315 refining the notifications being actually sent.
317 There are many ways of sending notifications, e.g. by e-mail, XMPP,
318 IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications.
319 Instead it relies on external mechanisms such as shell scripts to notify users.
321 A notification specification requires one or more users (and/or user groups)
322 who will be notified in case of problems. These users must have all custom
323 attributes defined which will be used in the `NotificationCommand` on execution.
325 The user `icingaadmin` in the example below will get notified only on `WARNING` and
326 `CRITICAL` states and `problem` and `recovery` notification types.
328 object User "icingaadmin" {
329 display_name = "Icinga 2 Admin"
330 enable_notifications = true
331 states = [ OK, Warning, Critical ]
332 types = [ Problem, Recovery ]
333 email = "icinga@localhost"
336 If you don't set the `states` and `types` configuration attributes for the `User`
337 object, notifications for all states and types will be sent.
339 Details on troubleshooting notification problems can be found [here](#troubleshooting).
343 > Make sure that the [notification](#features) feature is enabled on your master instance
344 > in order to execute notification commands.
346 You should choose which information you (and your notified users) are interested in
347 case of emergency, and also which information does not provide any value to you and
350 An example notification command is explained [here](#notification-commands).
352 You can add all shared attributes to a `Notification` template which is inherited
353 to the defined notifications. That way you'll save duplicated attributes in each
354 `Notification` object. Attributes can be overridden locally.
356 template Notification "generic-notification" {
359 command = "mail-service-notification"
361 states = [ Warning, Critical, Unknown ]
362 types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
363 FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
368 The time period `24x7` is shipped as example configuration with Icinga 2.
370 Use the `apply` keyword to create `Notification` objects for your services:
372 apply Notification "mail" to Service {
373 import "generic-notification"
375 command = "mail-notification"
376 users = [ "icingaadmin" ]
378 assign where service.name == "mysql"
381 Instead of assigning users to notifications, you can also add the `user_groups`
382 attribute with a list of user groups to the `Notification` object. Icinga 2 will
383 send notifications to all group members.
385 ### <a id="notification-escalations"></a> Notification Escalations
387 When a problem notification is sent and a problem still exists at the time of re-notification
388 you may want to escalate the problem to the next support level. A different approach
389 is to configure the default notification by email, and escalate the problem via SMS
390 if not already solved.
392 You can define notification start and end times as additional configuration
393 attributes making the `Notification` object a so-called `notification escalation`.
394 Using templates you can share the basic notification attributes such as users or the
395 `interval` (and override them for the escalation then).
397 Using the example from above, you can define additional users being escalated for SMS
398 notifications between start and end time.
400 object User "icinga-oncall-2nd-level" {
401 display_name = "Icinga 2nd Level"
403 vars.mobile = "+1 555 424642"
406 object User "icinga-oncall-1st-level" {
407 display_name = "Icinga 1st Level"
409 vars.mobile = "+1 555 424642"
412 Define an additional `NotificationCommand` for SMS notifications.
416 > The example is not complete as there are many different SMS providers.
417 > Please note that sending SMS notifications will require an SMS provider
418 > or local hardware with a SIM card active.
420 object NotificationCommand "sms-notification" {
422 PluginDir + "/send_sms_notification",
427 The two new notification escalations are added onto the host `localhost`
428 and its service `ping4` using the `generic-notification` template.
429 The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification`
430 command) after `30m` until `1h`.
434 > The `interval` was set to 15m in the `generic-notification`
435 > template example. Lower that value in your escalations by using a secondary
436 > template or by overriding the attribute directly in the `notifications` array
437 > position for `escalation-sms-2nd-level`.
439 If the problem does not get resolved nor acknowledged preventing further notifications
440 the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was
441 notified, but only for one hour (`2h` as `end` key for the `times` dictionary).
443 apply Notification "mail" to Service {
444 import "generic-notification"
446 command = "mail-notification"
447 users = [ "icingaadmin" ]
449 assign where service.name == "ping4"
452 apply Notification "escalation-sms-2nd-level" to Service {
453 import "generic-notification"
455 command = "sms-notification"
456 users = [ "icinga-oncall-2nd-level" ]
463 assign where service.name == "ping4"
466 apply Notification "escalation-sms-1st-level" to Service {
467 import "generic-notification"
469 command = "sms-notification"
470 users = [ "icinga-oncall-1st-level" ]
477 assign where service.name == "ping4"
480 ### <a id="notification-delay"></a> Notification Delay
482 Sometimes the problem in question should not be notified when the notification is due
483 (the object reaching the `HARD` state) but a defined time duration afterwards. In Icinga 2
484 you can use the `times` dictionary and set `begin = 15m` as key and value if you want to
485 postpone the first notification for 15 minutes. Leave out the `end` key - if not set,
486 Icinga 2 will not check against any end time for this notification.
488 apply Notification "mail" to Service {
489 import "generic-notification"
491 command = "mail-notification"
492 users = [ "icingaadmin" ]
494 times.begin = 15m // delay first notification
496 assign where service.name == "ping4"
499 ### <a id="disable-renotification"></a> Disable Re-notifications
501 If you prefer to be notified only once, you can disable re-notifications by setting the
502 `interval` attribute to `0`.
504 apply Notification "notify-once" to Service {
505 import "generic-notification"
507 command = "mail-notification"
508 users = [ "icingaadmin" ]
510 interval = 0 // disable re-notification
512 assign where service.name == "ping4"
515 ### <a id="notification-filters-state-type"></a> Notification Filters by State and Type
517 If there are no notification state and type filter attributes defined at the `Notification`
518 or `User` object Icinga 2 assumes that all states and types are being notified.
520 Available state and type filters for notifications are:
522 template Notification "generic-notification" {
524 states = [ Warning, Critical, Unknown ]
525 types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
526 FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
529 If you are familiar with Icinga 1.x `notification_options` please note that they have been split
530 into type and state to allow more fine granular filtering for example on downtimes and flapping.
531 You can filter for acknowledgements and custom notifications too.
534 ## <a id="timeperiods"></a> Time Periods
536 Time Periods define time ranges in Icinga where event actions are
537 triggered, for example whether a service check is executed or not within
538 the `check_period` attribute. Or a notification should be sent to
539 users or not, filtered by the `period` and `notification_period`
540 configuration attributes for `Notification` and `User` objects.
544 > If you are familar with Icinga 1.x - these time period definitions
545 > are called `legacy timeperiods` in Icinga 2.
547 > An Icinga 2 legacy timeperiod requires the `ITL` provided template
548 >`legacy-timeperiod`.
550 The `TimePeriod` attribute `ranges` may contain multiple directives,
551 including weekdays, days of the month, and calendar dates.
552 These types may overlap/override other types in your ranges dictionary.
554 The descending order of precedence is as follows:
556 * Calendar date (2008-01-01)
557 * Specific month date (January 1st)
558 * Generic month date (Day 15)
559 * Offset weekday of specific month (2nd Tuesday in December)
560 * Offset weekday (3rd Monday)
561 * Normal weekday (Tuesday)
563 If you don't set any `check_period` or `notification_period` attribute
564 on your configuration objects Icinga 2 assumes `24x7` as time period
567 object TimePeriod "24x7" {
568 import "legacy-timeperiod"
570 display_name = "Icinga 2 24x7 TimePeriod"
572 "monday" = "00:00-24:00"
573 "tuesday" = "00:00-24:00"
574 "wednesday" = "00:00-24:00"
575 "thursday" = "00:00-24:00"
576 "friday" = "00:00-24:00"
577 "saturday" = "00:00-24:00"
578 "sunday" = "00:00-24:00"
582 If your operation staff should only be notified during workhours
583 create a new timeperiod named `workhours` defining a work day from
586 object TimePeriod "workhours" {
587 import "legacy-timeperiod"
589 display_name = "Icinga 2 8x5 TimePeriod"
591 "monday" = "09:00-17:00"
592 "tuesday" = "09:00-17:00"
593 "wednesday" = "09:00-17:00"
594 "thursday" = "09:00-17:00"
595 "friday" = "09:00-17:00"
599 Use the `period` attribute to assign time periods to
600 `Notification` and `Dependency` objects:
602 object Notification "mail" {
603 import "generic-notification"
605 host_name = "localhost"
607 command = "mail-notification"
608 users = [ "icingaadmin" ]
613 ## <a id="commands"></a> Commands
615 Icinga 2 uses three different command object types to specify how
616 checks should be performed, notifications should be sent, and
617 events should be handled.
619 ### <a id="command-environment-variables"></a> Environment Variables for Commands
621 Please check [Runtime Custom Attributes as Environment Variables](#runtime-custom-attribute-env-vars).
624 ### <a id="check-commands"></a> Check Commands
626 `CheckCommand` objects define the command line how a check is called.
630 > Make sure that the [checker](#features) feature is enabled in order to
633 #### <a id="command-plugin-integration"></a> Integrate the Plugin with a CheckCommand Definition
635 `CheckCommand` objects require the [ITL template](#itl-plugin-check-command)
636 `plugin-check-command` to support native plugin based check methods.
638 Unless you have done so already, download your check plugin and put it
639 into the `PluginDir` directory. The following example uses the
640 `check_disk` plugin shipped with the Monitoring Plugins package.
642 The plugin path and all command arguments are made a list of
643 double-quoted string arguments for proper shell escaping.
645 Call the `check_disk` plugin with the `--help` parameter to see
646 all available options. Our example defines warning (`-w`) and
647 critical (`-c`) thresholds for the disk usage. Without any
648 partition defined (`-p`) it will check all local partitions.
650 icinga@icinga2 $ /usr/lib/nagios/plugins/check_disk --help
652 This plugin checks the amount of used disk space on a mounted file system
653 and generates an alert if free space is less than one of the threshold values
657 check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device}
658 [-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ]
659 [-t timeout] [-u unit] [-v] [-X type] [-N type]
664 > Don't execute plugins as `root` and always use the absolute path to the plugin! Trust us.
666 Next step is to understand how command parameters are being passed from
667 a host or service object, and add a `CheckCommand` definition based on these
668 required parameters and/or default values.
670 #### <a id="command-passing-parameters"></a> Passing Check Command Parameters from Host or Service
672 Unlike Icinga 1.x check command parameters are defined as custom attributes
673 which can be accessed as runtime macros by the executed check command.
675 Define the default check command custom attribute `disk_wfree` and `disk_cfree`
676 (freely definable naming schema) and their default threshold values. You can
677 then use these custom attributes as runtime macros for [command arguments](#command-arguments)
680 The default custom attributes can be overridden by the custom attributes
681 defined in the service using the check command `my-disk`. The custom attributes
682 can also be inherited from a parent template using additive inheritance (`+=`).
685 object CheckCommand "my-disk" {
686 import "plugin-check-command"
688 command = [ PluginDir + "/check_disk" ]
691 "-w" = "$disk_wfree$%"
692 "-c" = "$disk_cfree$%"
700 The host `localhost` with the service `my-disk` checks all disks with modified
701 custom attributes (warning thresholds at `10%`, critical thresholds at `5%`
704 object Host "localhost" {
705 import "generic-host"
707 address = "127.0.0.1"
711 object Service "my-disk" {
712 import "generic-service"
714 host_name = "localhost"
715 check_command = "my-disk"
721 #### <a id="command-arguments"></a> Command Arguments
723 By defining a check command line using the `command` attribute Icinga 2
724 will resolve all macros in the static string or array. Sometimes it is
725 required to extend the arguments list based on a met condition evaluated
726 at command execution. Or making arguments optional - only set if the
727 macro value can be resolved by Icinga 2.
729 object CheckCommand "check_http" {
730 import "plugin-check-command"
732 command = [ PluginDir + "/check_http" ]
735 "-H" = "$http_vhost$"
736 "-I" = "$http_address$"
740 set_if = "$http_ssl$"
743 set_if = "$http_sni$"
746 value = "$http_auth_pair$"
747 description = "Username:password on sites with basic authentication"
750 set_if = "$http_ignore_body$"
752 "-r" = "$http_expect_body_regex$"
753 "-w" = "$http_warn_time$"
754 "-c" = "$http_critical_time$"
755 "-e" = "$http_expect$"
758 vars.http_address = "$address$"
759 vars.http_ssl = false
760 vars.http_sni = false
763 The example shows the `check_http` check command defining the most common
764 arguments. Each of them is optional by default and will be omitted if
765 the value is not set. For example if the service calling the check command
766 does not have `vars.http_port` set, it won't get added to the command
769 If the `vars.http_ssl` custom attribute is set in the service, host or command
770 object definition, Icinga 2 will add the `-S` argument based on the `set_if`
771 numeric value to the command line. String values are not supported.
773 That way you can use the `check_http` command definition for both, with and
774 without SSL enabled checks saving you duplicated command definitions.
776 Details on all available options can be found in the
777 [CheckCommand object definition](#objecttype-checkcommand).
779 ### <a id="using-apply-services-command-arguments"></a> Apply Services with custom Command Arguments
781 Imagine the following scenario: The `my-host1` host is reachable using the default port 22, while
782 the `my-host2` host requires a different port on 2222. Both hosts are in the hostgroup `my-linux-servers`.
784 object HostGroup "my-linux-servers" {
785 display_name = "Linux Servers"
786 assign where host.vars.os == "Linux"
789 /* this one has port 22 opened */
790 object Host "my-host1" {
791 import "generic-host"
792 address = "129.168.1.50"
796 /* this one listens on a different ssh port */
797 object Host "my-host2" {
798 import "generic-host"
799 address = "129.168.2.50"
801 vars.custom_ssh_port = 2222
804 All hosts in the `my-linux-servers` hostgroup should get the `my-ssh` service applied based on an
805 [apply rule](#apply). The optional `ssh_port` command argument should be inherited from the host
806 the service is applied to. If not set, the check command `my-ssh` will omit the argument.
807 The `host` argument is special: `skip_key` tells Icinga 2 to ignore the key, and directly put the
808 value onto the command line. The `order` attribute specifies that this argument is the first one
809 (`-1` is smaller than the other defaults).
811 object CheckCommand "my-ssh" {
812 import "plugin-check-command"
814 command = [ PluginDir + "/check_ssh" ]
819 value = "$ssh_address$"
825 vars.ssh_address = "$address$"
828 /* apply ssh service */
829 apply Service "my-ssh" {
830 import "generic-service"
831 check_command = "my-ssh"
833 //set the command argument for ssh port with a custom host attribute, if set
834 vars.ssh_port = "$host.vars.custom_ssh_port$"
836 assign where "my-linux-servers" in host.groups
839 The `my-host1` will get the `my-ssh` service checking on the default port:
841 [2014-05-26 21:52:23 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ssh', '129.168.1.50': PID 27281
843 The `my-host2` will inherit the `custom_ssh_port` variable to the service and execute a different command:
845 [2014-05-26 21:51:32 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ssh', '-p', '2222', '129.168.2.50': PID 26956
848 ### <a id="notification-commands"></a> Notification Commands
850 `NotificationCommand` objects define how notifications are delivered to external
851 interfaces (E-Mail, XMPP, IRC, Twitter, etc).
853 `NotificationCommand` objects require the [ITL template](#itl-plugin-notification-command)
854 `plugin-notification-command` to support native plugin-based notifications.
858 > Make sure that the [notification](#features) feature is enabled on your master instance
859 > in order to execute notification commands.
861 Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for
862 the current check output) sending an email to the user(s) associated with the
863 notification itself (`$user.email$`).
865 If you want to specify default values for some of the custom attribute definitions,
866 you can add a `vars` dictionary as shown for the `CheckCommand` object.
868 object NotificationCommand "mail-service-notification" {
869 import "plugin-notification-command"
871 command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ]
874 NOTIFICATIONTYPE = "$notification.type$"
875 SERVICEDESC = "$service.name$"
876 HOSTALIAS = "$host.display_name$"
877 HOSTADDRESS = "$address$"
878 SERVICESTATE = "$service.state$"
879 LONGDATETIME = "$icinga.long_date_time$"
880 SERVICEOUTPUT = "$service.output$"
881 NOTIFICATIONAUTHORNAME = "$notification.author$"
882 NOTIFICATIONCOMMENT = "$notification.comment$"
883 HOSTDISPLAYNAME = "$host.display_name$"
884 SERVICEDISPLAYNAME = "$service.display_name$"
885 USEREMAIL = "$user.email$"
889 The command attribute in the `mail-service-notification` command refers to the following
890 shell script. The macros specified in the `env` array are exported
891 as environment variables and can be used in the notification script:
894 template=$(cat <<TEMPLATE
897 Notification Type: $NOTIFICATIONTYPE
899 Service: $SERVICEDESC
901 Address: $HOSTADDRESS
904 Date/Time: $LONGDATETIME
906 Additional Info: $SERVICEOUTPUT
908 Comment: [$NOTIFICATIONAUTHORNAME] $NOTIFICATIONCOMMENT
912 /usr/bin/printf "%b" $template | mail -s "$NOTIFICATIONTYPE - $HOSTDISPLAYNAME - $SERVICEDISPLAYNAME is $SERVICESTATE" $USEREMAIL
916 > This example is for `exim` only. Requires changes for `sendmail` and
919 While it's possible to specify the entire notification command right
920 in the NotificationCommand object it is generally advisable to create a
921 shell script in the `/etc/icinga2/scripts` directory and have the
922 NotificationCommand object refer to that.
924 ### <a id="event-commands"></a> Event Commands
926 Unlike notifications event commands are called on every host/service execution
927 if one of these conditions match:
929 * The host/service is in a [soft state](#hard-soft-states)
930 * The host/service state changes into a [hard state](#hard-soft-states)
931 * The host/service state recovers from a [soft or hard state](#hard-soft-states) to [OK](#service-states)/[Up](#host-states)
933 Therefore the `EventCommand` object should define a command line
934 evaluating the current service state and other service runtime attributes
935 available through runtime vars. Runtime macros such as `$service.state_type$`
936 and `$service.state$` will be processed by Icinga 2 helping on fine-granular
937 events being triggered.
939 Common use case scenarios are a failing HTTP check requiring an immediate
940 restart via event command, or if an application is locked and requires
941 a restart upon detection.
943 `EventCommand` objects require the ITL template `plugin-event-command`
944 to support native plugin based checks.
946 When the event command is triggered on a service state change, it will
947 send a check result using the `process_check_result` script forcibly
948 changing the service state back to `OK` (`-r 0`) providing some debug
949 information in the check output (`-o`).
951 object EventCommand "plugin-event-process-check-result" {
952 import "plugin-event-command"
955 PluginDir + "/process_check_result",
957 "-S", "$service.name$",
958 "-c", RunDir + "/icinga2/cmd/icinga2.cmd",
960 "-o", "Event Handler triggered in state '$service.state$' with output '$service.output$'."
965 ## <a id="dependencies"></a> Dependencies
967 Icinga 2 uses host and service [Dependency](#objecttype-dependency) objects
968 for determing their network reachability.
969 The `parent_host_name` and `parent_service_name` attributes are mandatory for
970 service dependencies, `parent_host_name` is required for host dependencies.
972 A service can depend on a host, and vice versa. A service has an implicit
973 dependency (parent) to its host. A host to host dependency acts implicitly
974 as host parent relation.
975 When dependencies are calculated, not only the immediate parent is taken into
976 account but all parents are inherited.
978 Notifications are suppressed if a host or service becomes unreachable.
980 ### <a id="dependencies-implicit-host-service"></a> Implicit Dependencies for Services on Host
982 Icinga 2 automatically adds an implicit dependency for services on their host. That way
983 service notifications are suppressed when a host is `DOWN` or `UNREACHABLE`. This dependency
984 does not overwrite other dependencies and implicitely sets `disable_notifications = true` and
985 `states = [ Up ]` for all service objects.
987 Service checks are still executed. If you want to prevent them from happening, you can
988 apply the following dependency to all services setting their host as `parent_host_name`
989 and disabling the checks. `assign where true` matches on all `Service` objects.
991 apply Dependency "disable-host-service-checks" to Service {
992 disable_checks = true
996 ### <a id="dependencies-network-reachability"></a> Dependencies for Network Reachability
998 A common scenario is the Icinga 2 server behind a router. Checking internet
999 access by pinging the Google DNS server `google-dns` is a common method, but
1000 will fail in case the `dsl-router` host is down. Therefore the example below
1001 defines a host dependency which acts implicitly as parent relation too.
1003 Furthermore the host may be reachable but ping probes are dropped by the
1004 router's firewall. In case the `dsl-router``ping4` service check fails, all
1005 further checks for the `ping4` service on host `google-dns` service should
1006 be suppressed. This is achieved by setting the `disable_checks` attribute to `true`.
1008 object Host "dsl-router" {
1009 address = "192.168.1.1"
1012 object Host "google-dns" {
1016 apply Service "ping4" {
1017 import "generic-service"
1019 check_command = "ping4"
1021 assign where host.address
1024 apply Dependency "internet" to Host {
1025 parent_host_name = "dsl-router"
1026 disable_checks = true
1027 disable_notifications = true
1029 assign where host.name != "dsl-router"
1032 apply Dependency "internet" to Service {
1033 parent_host_name = "dsl-router"
1034 parent_service_name = "ping4"
1035 disable_checks = true
1037 assign where host.name != "dsl-router"
1041 ### <a id="dependencies-agent-checks"></a> Dependencies for Agent Checks
1043 Another classic example are agent based checks. You would define a health check
1044 for the agent daemon responding to your requests, and make all other services
1045 querying that daemon depend on that health check.
1047 The following configuration defines two nrpe based service checks `nrpe-load`
1048 and `nrpe-disk` applied to the `nrpe-server`. The health check is defined as
1049 `nrpe-health` service.
1051 apply Service "nrpe-health" {
1052 import "generic-service"
1053 check_command = "nrpe"
1054 assign where match("nrpe-*", host.name)
1057 apply Service "nrpe-load" {
1058 import "generic-service"
1059 check_command = "nrpe"
1060 vars.nrpe_command = "check_load"
1061 assign where match("nrpe-*", host.name)
1064 apply Service "nrpe-disk" {
1065 import "generic-service"
1066 check_command = "nrpe"
1067 vars.nrpe_command = "check_disk"
1068 assign where match("nrpe-*", host.name)
1071 object Host "nrpe-server" {
1072 import "generic-host"
1073 address = "192.168.1.5"
1076 apply Dependency "disable-nrpe-checks" to Service {
1077 parent_service_name = "nrpe-health"
1080 disable_checks = true
1081 disable_notifications = true
1082 assign where service.check_command == "nrpe"
1083 ignore where service.name == "nrpe-health"
1086 The `disable-nrpe-checks` dependency is applied to all services
1087 on the `nrpe-service` host using the `nrpe` check_command attribute
1088 but not the `nrpe-health` service itself.
1091 ## <a id="downtimes"></a> Downtimes
1093 Downtimes can be scheduled for planned server maintenance or
1094 any other targetted service outage you are aware of in advance.
1096 Downtimes will suppress any notifications, and may trigger other
1097 downtimes too. If the downtime was set by accident, or the duration
1098 exceeds the maintenance, you can manually cancel the downtime.
1099 Planned downtimes will also be taken into account for SLA reporting
1100 tools calculating the SLAs based on the state and downtime history.
1102 Multiple downtimes for a single object may overlap. This is useful
1103 when you want to extend your maintenance window taking longer than expected.
1104 If there are multiple downtimes triggered for one object, the overall downtime depth
1105 will be greater than `1`.
1108 If the downtime was scheduled after the problem changed to a critical hard
1109 state triggering a problem notification, and the service recovers during
1110 the downtime window, the recovery notification won't be suppressed.
1112 ### <a id="fixed-flexible-downtimes"></a> Fixed and Flexible Downtimes
1114 A `fixed` downtime will be activated at the defined start time, and
1115 removed at the end time. During this time window the service state
1116 will change to `NOT-OK` and then actually trigger the downtime.
1117 Notifications are suppressed and the downtime depth is incremented.
1119 Common scenarios are a planned distribution upgrade on your linux
1120 servers, or database updates in your warehouse. The customer knows
1121 about a fixed downtime window between 23:00 and 24:00. After 24:00
1122 all problems should be alerted again. Solution is simple -
1123 schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
1125 Unlike a `fixed` downtime, a `flexible` downtime will be triggered
1126 by the state change in the time span defined by start and end time,
1127 and then last for the specified duration in minutes.
1129 Imagine the following scenario: Your service is frequently polled
1130 by users trying to grab free deleted domains for immediate registration.
1131 Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
1132 a network outage visible to the monitoring. The service is still alive,
1133 but answering too slow to Icinga 2 service checks.
1134 For that reason, you may want to schedule a downtime between 07:30 and
1135 08:00 with a duration of 15 minutes. The downtime will then last from
1136 its trigger time until the duration is over. After that, the downtime
1137 is removed (may happen before or after the actual end time!).
1139 ### <a id="scheduling-downtime"></a> Scheduling a downtime
1141 This can either happen through a web interface or by sending an [external command](#external-commands)
1142 to the external command pipe provided by the `ExternalCommandListener` configuration.
1144 Fixed downtimes require a start and end time (a duration will be ignored).
1145 Flexible downtimes need a start and end time for the time span, and a duration
1146 independent from that time span.
1148 ### <a id="triggered-downtimes"></a> Triggered Downtimes
1150 This is optional when scheduling a downtime. If there is already a downtime
1151 scheduled for a future maintenance, the current downtime can be triggered by
1152 that downtime. This renders useful if you have scheduled a host downtime and
1153 are now scheduling a child host's downtime getting triggered by the parent
1154 downtime on NOT-OK state change.
1156 ### <a id="recurring-downtimes"></a> Recurring Downtimes
1158 [ScheduledDowntime objects](#objecttype-scheduleddowntime) can be used to set up
1159 recurring downtimes for services.
1163 apply ScheduledDowntime "backup-downtime" to Service {
1164 author = "icingaadmin"
1165 comment = "Scheduled downtime for backup"
1168 monday = "02:00-03:00"
1169 tuesday = "02:00-03:00"
1170 wednesday = "02:00-03:00"
1171 thursday = "02:00-03:00"
1172 friday = "02:00-03:00"
1173 saturday = "02:00-03:00"
1174 sunday = "02:00-03:00"
1177 assign where "backup" in service.groups
1181 ## <a id="comments"></a> Comments
1183 Comments can be added at runtime and are persistent over restarts. You can
1184 add useful information for others on repeating incidents (for example
1185 "last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
1186 is primarly accessible using web interfaces.
1188 Adding and deleting comment actions are possible through the external command pipe
1189 provided with the `ExternalCommandListener` configuration. The caller must
1190 pass the comment id in case of manipulating an existing comment.
1193 ## <a id="acknowledgements"></a> Acknowledgements
1195 If a problem is alerted and notified you may signal the other notification
1196 recipients that you are aware of the problem and will handle it.
1198 By sending an acknowledgement to Icinga 2 (using the external command pipe
1199 provided with `ExternalCommandListener` configuration) all future notifications
1200 are suppressed, a new comment is added with the provided description and
1201 a notification with the type `NotificationFilterAcknowledgement` is sent
1202 to all notified users.
1204 ### <a id="expiring-acknowledgements"></a> Expiring Acknowledgements
1206 Once a problem is acknowledged it may disappear from your `handled problems`
1207 dashboard and no-one ever looks at it again since it will suppress
1210 This `fire-and-forget` action is quite common. If you're sure that a
1211 current problem should be resolved in the future at a defined time,
1212 you can define an expiration time when acknowledging the problem.
1214 Icinga 2 will clear the acknowledgement when expired and start to
1215 re-notify if the problem persists.
1219 ## <a id="custom-attributes"></a> Custom Attributes
1221 ### <a id="runtime-custom-attributes"></a> Using Custom Attributes at Runtime
1223 Custom attributes may be used in command definitions to dynamically change how the command
1226 Additionally there are Icinga 2 features such as the `PerfDataWriter` type
1227 which use custom attributes to format their output.
1231 > Custom attributes are identified by the 'vars' dictionary attribute as short name.
1232 > Accessing the different attribute keys is possible using the '.' accessor.
1234 Custom attributes in command definitions or performance data templates are evaluated at
1235 runtime when executing a command. These custom attributes cannot be used elsewhere
1236 (e.g. in other configuration attributes).
1238 Custom attribute values must be either a string, a number or a boolean value. Arrays
1239 and dictionaries cannot be used.
1241 Here is an example of a command definition which uses user-defined custom attributes:
1243 object CheckCommand "my-ping" {
1244 import "plugin-check-command"
1247 PluginDir + "/check_ping", "-4"
1251 "-H" = "$ping_address$"
1252 "-w" = "$ping_wrta$,$ping_wpl$%"
1253 "-c" = "$ping_crta$,$ping_cpl$%"
1254 "-p" = "$ping_packets$"
1255 "-t" = "$ping_timeout$"
1258 vars.ping_address = "$address$"
1259 vars.ping_wrta = 100
1261 vars.ping_crta = 200
1263 vars.ping_packets = 5
1264 vars.ping_timeout = 0
1267 Custom attribute names used at runtime must be enclosed in two `$` signs, e.g.
1268 `$address$`. When using the `$` sign as single character, you need to escape
1269 it with an additional dollar sign (`$$`). This example also makes use of the
1270 [command arguments](#command-arguments) passed to the command line. `-4` must
1271 be added as additional array key.
1273 ### <a id="runtime-custom-attributes-evaluation-order"></a> Runtime Custom Attributes Evaluation Order
1275 When executing commands Icinga 2 checks the following objects in this order to look
1276 up custom attributes and their respective values:
1278 1. User object (only for notifications)
1282 5. Global custom attributes in the `vars` constant
1284 This execution order allows you to define default values for custom attributes
1285 in your command objects. The `my-ping` command shown above uses this to set
1286 default values for some of the latency thresholds and timeouts.
1288 When using the `my-ping` command you can override some or all of the custom
1289 attributes in the service definition like this:
1291 object Service "ping" {
1292 host_name = "localhost"
1293 check_command = "my-ping"
1295 vars.ping_packets = 10 // Overrides the default value of 5 given in the command
1298 If a custom attribute isn't defined anywhere an empty value is used and a warning is
1299 emitted to the Icinga 2 log.
1303 > By convention every host should have an `address` attribute. Hosts
1304 > which have an IPv6 address should also have an `address6` attribute.
1306 ### <a id="runtime-custom-attribute-env-vars"></a> Runtime Custom Attributes as Environment Variables
1308 The `env` command object attribute specifies a list of environment variables with values calculated
1309 from either runtime macros or custom attributes which should be exported as environment variables
1310 prior to executing the command.
1312 This is useful for example for hiding sensitive information on the command line output
1313 when passing credentials to database checks:
1315 object CheckCommand "mysql-health" {
1316 import "plugin-check-command"
1319 PluginDir + "/check_mysql"
1323 "-H" = "$mysql_address$"
1324 "-d" = "$mysql_database$"
1327 vars.mysql_address = "$address$"
1328 vars.mysql_database = "icinga"
1329 vars.mysql_user = "icinga_check"
1330 vars.mysql_pass = "password"
1332 env.MYSQLUSER = "$mysql_user$"
1333 env.MYSQLPASS = "$mysql_pass$"
1336 ### <a id="multiple-host-addresses-custom-attributes"></a> Multiple Host Addresses using Custom Attributes
1338 The following example defines a `Host` with three different interface addresses defined as
1339 custom attributes in the `vars` dictionary. The `if-eth0` and `if-eth1` services will import
1340 these values into the `address` custom attribute. This attribute is available through the
1341 generic `$address$` runtime macro.
1343 object Host "multi-ip" {
1344 check_command = "dummy"
1345 vars.address_lo = "127.0.0.1"
1346 vars.address_eth0 = "10.0.0.10"
1347 vars.address_eth1 = "192.168.1.10"
1350 apply Service "if-eth0" {
1351 import "generic-service"
1353 vars.address = "$host.vars.address_eth0$"
1354 check_command = "my-generic-interface-check"
1356 assign where host.vars.address_eth0 != ""
1359 apply Service "if-eth1" {
1360 import "generic-service"
1362 vars.address = "$host.vars.address_eth1$"
1363 check_command = "my-generic-interface-check"
1365 assign where host.vars.address_eth1 != ""
1368 object CheckCommand "my-generic-interface-check" {
1369 import "plugin-check-command"
1371 command = "echo \"This would be the service $service.description$ using the address value: $address$\""
1374 The `CheckCommand` object is just an example to help you with testing and
1375 understanding the different custom attributes and runtime macros.
1377 ### <a id="modified-attributes"></a> Modified Attributes
1379 Icinga 2 allows you to modify defined object attributes at runtime different to
1380 the local configuration object attributes. These modified attributes are
1381 stored as bit-shifted-value and made available in backends. Icinga 2 stores
1382 modified attributes in its state file and restores them on restart.
1384 Modified Attributes can be reset using external commands.
1387 ## <a id="runtime-macros"></a> Runtime Macros
1389 Next to custom attributes there are additional runtime macros made available by Icinga 2.
1390 These runtime macros reflect the current object state and may change over time while
1391 custom attributes are configured statically (but can be modified at runtime using
1394 ### <a id="runtime-macro-evaluation-order"></a> Runtime Macro Evaluation Order
1396 Custom attributes can be accessed at [runtime](#runtime-custom-attributes) using their
1397 identifier omitting the `vars.` prefix.
1398 There are special cases when those custom attributes are not set and Icinga 2 provides
1399 a fallback to existing object attributes for example `host.address`.
1401 In the following example the `$address$` macro will be resolved with the value of `vars.address`.
1403 object Host "localhost" {
1404 import "generic-host"
1405 check_command = "my-host-macro-test"
1406 address = "127.0.0.1"
1407 vars.address = "127.2.2.2"
1410 object CheckCommand "my-host-macro-test" {
1411 command = "echo \"address: $address$ host.address: $host.address$ host.vars.address: $host.vars.address$\""
1414 The check command output will look like
1416 "address: 127.2.2.2 host.address: 127.0.0.1 host.vars.address: 127.2.2.2"
1418 If you alter the host object and remove the `vars.address` line, Icinga 2 will fail to look up `$address$` in the
1419 custom attributes dictionary and then look for the host object's attribute.
1421 The check command output will change to
1423 "address: 127.0.0.1 host.address: 127.0.0.1 host.vars.address: "
1426 The same example can be defined for services overriding the `address` field based on a specific host custom attribute.
1428 object Host "localhost" {
1429 import "generic-host"
1430 address = "127.0.0.1"
1431 vars.macro_address = "127.3.3.3"
1434 apply Service "my-macro-test" to Host {
1435 import "generic-service"
1436 check_command = "my-service-macro-test"
1437 vars.address = "$host.vars.macro_address$"
1439 assign where host.address
1442 object CheckCommand "my-service-macro-test" {
1443 command = "echo \"address: $address$ host.address: $host.address$ host.vars.macro_address: $host.vars.macro_address$ service.vars.address: $service.vars.address$\""
1446 When the service check is executed the output looks like
1448 "address: 127.3.3.3 host.address: 127.0.0.1 host.vars.macro_address: 127.3.3.3 service.vars.address: 127.3.3.3"
1450 That way you can easily override existing macros being accessed by their short name like `$address$` and refrain
1451 from defining multiple check commands (one for `$address$` and one for `$host.vars.macro_address$`).
1454 ### <a id="host-runtime-macros"></a> Host Runtime Macros
1456 The following host custom attributes are available in all commands that are executed for
1460 -----------------------------|--------------
1461 host.name | The name of the host object.
1462 host.display_name | The value of the `display_name` attribute.
1463 host.state | The host's current state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
1464 host.state_id | The host's current state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
1465 host.state_type | The host's current state type. Can be one of `SOFT` and `HARD`.
1466 host.check_attempt | The current check attempt number.
1467 host.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
1468 host.last_state | The host's previous state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
1469 host.last_state_id | The host's previous state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
1470 host.last_state_type | The host's previous state type. Can be one of `SOFT` and `HARD`.
1471 host.last_state_change | The last state change's timestamp.
1472 host.duration_sec | The time since the last state change.
1473 host.latency | The host's check latency.
1474 host.execution_time | The host's check execution time.
1475 host.output | The last check's output.
1476 host.perfdata | The last check's performance data.
1477 host.last_check | The timestamp when the last check was executed.
1478 host.num_services | Number of services associated with the host.
1479 host.num_services_ok | Number of services associated with the host which are in an `OK` state.
1480 host.num_services_warning | Number of services associated with the host which are in a `WARNING` state.
1481 host.num_services_unknown | Number of services associated with the host which are in an `UNKNOWN` state.
1482 host.num_services_critical | Number of services associated with the host which are in a `CRITICAL` state.
1484 ### <a id="service-runtime-macros"></a> Service Runtime Macros
1486 The following service macros are available in all commands that are executed for
1490 ---------------------------|--------------
1491 service.name | The short name of the service object.
1492 service.display_name | The value of the `display_name` attribute.
1493 service.check_command | The short name of the command along with any arguments to be used for the check.
1494 service.state | The service's current state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
1495 service.state_id | The service's current state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
1496 service.state_type | The service's current state type. Can be one of `SOFT` and `HARD`.
1497 service.check_attempt | The current check attempt number.
1498 service.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
1499 service.last_state | The service's previous state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
1500 service.last_state_id | The service's previous state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
1501 service.last_state_type | The service's previous state type. Can be one of `SOFT` and `HARD`.
1502 service.last_state_change | The last state change's timestamp.
1503 service.duration_sec | The time since the last state change.
1504 service.latency | The service's check latency.
1505 service.execution_time | The service's check execution time.
1506 service.output | The last check's output.
1507 service.perfdata | The last check's performance data.
1508 service.last_check | The timestamp when the last check was executed.
1510 ### <a id="command-runtime-macros"></a> Command Runtime Macros
1512 The following custom attributes are available in all commands:
1515 -----------------------|--------------
1516 command.name | The name of the command object.
1518 ### <a id="user-runtime-macros"></a> User Runtime Macros
1520 The following custom attributes are available in all commands that are executed for
1524 -----------------------|--------------
1525 user.name | The name of the user object.
1526 user.display_name | The value of the display_name attribute.
1528 ### <a id="notification-runtime-macros"></a> Notification Runtime Macros
1531 -----------------------|--------------
1532 notification.type | The type of the notification.
1533 notification.author | The author of the notification comment, if existing.
1534 notification.comment | The comment of the notification, if existing.
1536 ### <a id="global-runtime-macros"></a> Global Runtime Macros
1538 The following macros are available in all executed commands:
1541 -----------------------|--------------
1542 icinga.timet | Current UNIX timestamp.
1543 icinga.long_date_time | Current date and time including timezone information. Example: `2014-01-03 11:23:08 +0000`
1544 icinga.short_date_time | Current date and time. Example: `2014-01-03 11:23:08`
1545 icinga.date | Current date. Example: `2014-01-03`
1546 icinga.time | Current time including timezone information. Example: `11:23:08 +0000`
1547 icinga.uptime | Current uptime of the Icinga 2 process.
1549 The following macros provide global statistics:
1552 ----------------------------------|--------------
1553 icinga.num_services_ok | Current number of services in state 'OK'.
1554 icinga.num_services_warning | Current number of services in state 'Warning'.
1555 icinga.num_services_critical | Current number of services in state 'Critical'.
1556 icinga.num_services_unknown | Current number of services in state 'Unknown'.
1557 icinga.num_services_pending | Current number of pending services.
1558 icinga.num_services_unreachable | Current number of unreachable services.
1559 icinga.num_services_flapping | Current number of flapping services.
1560 icinga.num_services_in_downtime | Current number of services in downtime.
1561 icinga.num_services_acknowledged | Current number of acknowledged service problems.
1562 icinga.num_hosts_up | Current number of hosts in state 'Up'.
1563 icinga.num_hosts_down | Current number of hosts in state 'Down'.
1564 icinga.num_hosts_unreachable | Current number of unreachable hosts.
1565 icinga.num_hosts_flapping | Current number of flapping hosts.
1566 icinga.num_hosts_in_downtime | Current number of hosts in downtime.
1567 icinga.num_hosts_acknowledged | Current number of acknowledged host problems.
1570 ## <a id="check-result-freshness"></a> Check Result Freshness
1572 In Icinga 2 active check freshness is enabled by default. It is determined by the
1573 `check_interval` attribute and no incoming check results in that period of time.
1575 threshold = last check execution time + check interval
1577 Passive check freshness is calculated from the `check_interval` attribute if set.
1579 threshold = last check result time + check interval
1581 If the freshness checks are invalid, a new check is executed defined by the
1582 `check_command` attribute.
1585 ## <a id="check-flapping"></a> Check Flapping
1587 The flapping algorithm used in Icinga 2 does not store the past states but
1588 calculcates the flapping threshold from a single value based on counters and
1589 half-life values. Icinga 2 compares the value with a single flapping threshold
1590 configuration attribute named `flapping_threshold`.
1592 Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
1595 ## <a id="volatile-services"></a> Volatile Services
1597 By default all services remain in a non-volatile state. When a problem
1598 occurs, the `SOFT` state applies and once `max_check_attempts` attribute
1599 is reached with the check counter, a `HARD` state transition happens.
1600 Notifications are only triggered by `HARD` state changes and are then
1601 re-sent defined by the `interval` attribute.
1603 It may be reasonable to have a volatile service which stays in a `HARD`
1604 state type if the service stays in a `NOT-OK` state. That way each
1605 service recheck will automatically trigger a notification unless the
1606 service is acknowledged or in a scheduled downtime.
1609 ## <a id="external-commands"></a> External Commands
1611 Icinga 2 provides an external command pipe for processing commands
1612 triggering specific actions (for example rescheduling a service check
1613 through the web interface).
1615 In order to enable the `ExternalCommandListener` configuration use the
1616 following command and restart Icinga 2 afterwards:
1618 # icinga2-enable-feature command
1620 Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd`
1621 using the default configuration.
1623 Web interfaces and other Icinga addons are able to send commands to
1624 Icinga 2 through the external command pipe, for example for rescheduling
1625 a forced service check:
1627 # /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd
1629 # tail -f /var/log/messages
1631 Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885
1632 Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4'
1635 ### <a id="external-command-list"></a> External Command List
1637 A list of currently supported external commands can be found [here](#external-commands-list-detail).
1639 Detailed information on the commands and their required parameters can be found
1640 on the [Icinga 1.x documentation](http://docs.icinga.org/latest/en/extcommands2.html).
1642 ## <a id="logging"></a> Logging
1644 Icinga 2 supports three different types of logging:
1647 * Syslog (on *NIX-based operating systems)
1648 * Console logging (`STDOUT` on tty)
1650 You can enable additional loggers using the `icinga2-enable-feature`
1651 and `icinga2-disable-feature` commands to configure loggers:
1653 Feature | Description
1654 ---------|------------
1655 debuglog | Debug log (path: `/var/log/icinga2/debug.log`, severity: `debug` or higher)
1656 mainlog | Main log (path: `/var/log/icinga2/icinga2.log`, severity: `information` or higher)
1657 syslog | Syslog (severity: `warning` or higher)
1659 By default file the `mainlog` feature is enabled. When running Icinga 2
1660 on a terminal log messages with severity `information` or higher are
1661 written to the console.
1664 ## <a id="performance-data"></a> Performance Data
1666 When a host or service check is executed plugins should provide so-called
1667 `performance data`. Next to that additional check performance data
1668 can be fetched using Icinga 2 runtime macros such as the check latency
1669 or the current service state (or additional custom attributes).
1671 The performance data can be passed to external applications which aggregate and
1672 store them in their backends. These tools usually generate graphs for historical
1673 reporting and trending.
1675 Well-known addons processing Icinga performance data are PNP4Nagios,
1676 inGraph and Graphite.
1678 ### <a id="writing-performance-data-files"></a> Writing Performance Data Files
1680 PNP4Nagios, inGraph and Graphios use performance data collector daemons to fetch
1681 the current performance files for their backend updates.
1683 Therefore the Icinga 2 `PerfdataWriter` object allows you to define
1684 the output template format for host and services backed with Icinga 2
1687 host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.checkcommand$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.statetype$"
1688 service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.name$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.checkcommand$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.statetype$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.statetype$"
1690 The default templates are already provided with the Icinga 2 feature configuration
1691 which can be enabled using
1693 # icinga2-enable-feature perfdata
1695 By default all performance data files are rotated in a 15 seconds interval into
1696 the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.<timestamp>` and
1697 `service-perfdata.<timestamp>`.
1698 External collectors need to parse the rotated performance data files and then
1699 remove the processed files.
1701 ### <a id="graphite-carbon-cache-writer"></a> Graphite Carbon Cache Writer
1703 While there are some Graphite collector scripts and daemons like Graphios available for
1704 Icinga 1.x it's more reasonable to directly process the check and plugin performance
1705 in memory in Icinga 2. Once there are new metrics available, Icinga 2 will directly
1706 write them to the defined Graphite Carbon daemon tcp socket.
1708 You can enable the feature using
1710 # icinga2-enable-feature graphite
1712 By default the `GraphiteWriter` object expects the Graphite Carbon Cache to listen at
1713 `127.0.0.1` on port `2003`.
1715 The current naming schema is
1717 icinga.<hostname>.<metricname>
1718 icinga.<hostname>.<servicename>.<metricname>
1720 To make sure Icinga 2 writes a valid label into Graphite some characters are replaced
1721 with `_` in the target name:
1725 The resulting name in Graphite might look like:
1727 www-01 / http-cert / response time
1728 icinga.www_01.http_cert.response_time
1730 In addition to the performance data retrieved from the check plugin, Icinga 2 sends
1731 internal check statistic data to Graphite:
1733 metric | description
1734 -------------------|------------------------------------------
1735 current_attempt | current check attempt
1736 max_check_attempts | maximum check attempts until the hard state is reached
1737 reachable | checked object is reachable
1738 execution_time | check execution time
1739 latency | check latency
1740 state | current state of the checked object
1741 state_type | 0=SOFT, 1=HARD state
1743 The following example illustrates how to configure the storage-schemas for Graphite Carbon
1744 Cache. Please make sure that the order is correct because the first match wins.
1747 pattern = ^icinga\..*\.(max_check_attempts|reachable|current_attempt|execution_time|latency|state|state_type)
1751 # intervals like PNP4Nagios uses them per default
1753 retentions = 1m:2d,5m:10d,30m:90d,360m:4y
1755 ## <a id="status-data"></a> Status Data
1757 Icinga 1.x writes object configuration data and status data in a cyclic
1758 interval to its `objects.cache` and `status.dat` files. Icinga 2 provides
1759 the `StatusDataWriter` object which dumps all configuration objects and
1760 status updates in a regular interval.
1762 # icinga2-enable-feature statusdata
1764 Icinga 1.x Classic UI requires this data set as part of its backend.
1768 > If you are not using any web interface or addon which uses these files
1769 > you can safely disable this feature.
1772 ## <a id="compat-logging"></a> Compat Logging
1774 The Icinga 1.x log format is considered being the `Compat Log`
1775 in Icinga 2 provided with the `CompatLogger` object.
1777 These logs are not only used for informational representation in
1778 external web interfaces parsing the logs, but also to generate
1779 SLA reports and trends in Icinga 1.x Classic UI. Furthermore the
1780 [Livestatus](#livestatus) feature uses these logs for answering queries to
1783 The `CompatLogger` object can be enabled with
1785 # icinga2-enable-feature compatlog
1787 By default, the Icinga 1.x log file called `icinga.log` is located
1788 in `/var/log/icinga2/compat`. Rotated log files are moved into
1789 `var/log/icinga2/compat/archives`.
1791 The format cannot be changed without breaking compatibility to
1792 existing log parsers.
1794 # tail -f /var/log/icinga2/compat/icinga.log
1796 [1382115688] LOG ROTATION: HOURLY
1797 [1382115688] LOG VERSION: 2.0
1798 [1382115688] HOST STATE: CURRENT;localhost;UP;HARD;1;
1799 [1382115688] SERVICE STATE: CURRENT;localhost;disk;WARNING;HARD;1;
1800 [1382115688] SERVICE STATE: CURRENT;localhost;http;OK;HARD;1;
1801 [1382115688] SERVICE STATE: CURRENT;localhost;load;OK;HARD;1;
1802 [1382115688] SERVICE STATE: CURRENT;localhost;ping4;OK;HARD;1;
1803 [1382115688] SERVICE STATE: CURRENT;localhost;ping6;OK;HARD;1;
1804 [1382115688] SERVICE STATE: CURRENT;localhost;processes;WARNING;HARD;1;
1805 [1382115688] SERVICE STATE: CURRENT;localhost;ssh;OK;HARD;1;
1806 [1382115688] SERVICE STATE: CURRENT;localhost;users;OK;HARD;1;
1807 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;disk;1382115705
1808 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;http;1382115705
1809 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;load;1382115705
1810 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382115705
1811 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ping6;1382115705
1812 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;processes;1382115705
1813 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ssh;1382115705
1814 [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;users;1382115705
1815 [1382115731] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;localhost;ping6;2;critical test|
1816 [1382115731] SERVICE ALERT: localhost;ping6;CRITICAL;SOFT;2;critical test
1821 ## <a id="db-ido"></a> DB IDO
1823 The IDO (Icinga Data Output) modules for Icinga 2 take care of exporting all
1824 configuration and status information into a database. The IDO database is used
1825 by a number of projects including Icinga Web 1.x and 2.
1827 Details on the installation can be found in the [Getting Started](#configuring-ido)
1828 chapter. Details on the configuration can be found in the
1829 [IdoMysqlConnection](#objecttype-idomysqlconnection) and
1830 [IdoPgsqlConnection](#objecttype-idoPgsqlconnection)
1831 object configuration documentation.
1832 The DB IDO feature supports [High Availability](##high-availability-db-ido) in
1833 the Icinga 2 cluster.
1835 The following example query checks the health of the current Icinga 2 instance
1836 writing its current status to the DB IDO backend table `icinga_programstatus`
1837 every 10 seconds. By default it checks 60 seconds into the past which is a reasonable
1838 amount of time - adjust it for your requirements. If the condition is not met,
1839 the query returns an empty result.
1843 > Use [check plugins](#plugins) to monitor the backend.
1845 Replace the `default` string with your instance name, if different.
1849 # mysql -u root -p icinga -e "SELECT status_update_time FROM icinga_programstatus ps
1850 JOIN icinga_instances i ON ps.instance_id=i.instance_id
1851 WHERE (UNIX_TIMESTAMP(ps.status_update_time) > UNIX_TIMESTAMP(NOW())-60)
1852 AND i.instance_name='default';"
1854 +---------------------+
1855 | status_update_time |
1856 +---------------------+
1857 | 2014-05-29 14:29:56 |
1858 +---------------------+
1861 Example for PostgreSQL:
1863 # export PGPASSWORD=icinga; psql -U icinga -d icinga -c "SELECT ps.status_update_time FROM icinga_programstatus AS ps
1864 JOIN icinga_instances AS i ON ps.instance_id=i.instance_id
1865 WHERE ((SELECT extract(epoch from status_update_time) FROM icinga_programstatus) > (SELECT extract(epoch from now())-60))
1866 AND i.instance_name='default'";
1869 ------------------------
1870 2014-05-29 15:11:38+02
1874 A detailed list on the available table attributes can be found in the [DB IDO Schema documentation](#schema-db-ido).
1877 ## <a id="livestatus"></a> Livestatus
1879 The [MK Livestatus](http://mathias-kettner.de/checkmk_livestatus.html) project
1880 implements a query protocol that lets users query their Icinga instance for
1881 status information. It can also be used to send commands.
1883 Details on the installation can be found in the [Getting Started](#setting-up-livestatus)
1886 ### <a id="livestatus-sockets"></a> Livestatus Sockets
1888 Other to the Icinga 1.x Addon, Icinga 2 supports two socket types
1890 * Unix socket (default)
1893 Details on the configuration can be found in the [LivestatusListener](#objecttype-livestatuslistener)
1894 object configuration.
1896 ### <a id="livestatus-get-queries"></a> Livestatus GET Queries
1900 > All Livestatus queries require an additional empty line as query end identifier.
1901 > The `unixcat` tool is either available by the MK Livestatus project or as separate
1904 There also is a Perl module available in CPAN for accessing the Livestatus socket
1905 programmatically: [Monitoring::Livestatus](http://search.cpan.org/~nierlein/Monitoring-Livestatus-0.74/)
1908 Example using the unix socket:
1910 # echo -e "GET services\n" | unixcat /var/run/icinga2/cmd/livestatus
1912 Example using the tcp socket listening on port `6558`:
1914 # echo -e 'GET services\n' | netcat 127.0.0.1 6558
1916 # cat servicegroups <<EOF
1921 (cat servicegroups; sleep 1) | netcat 127.0.0.1 6558
1924 ### <a id="livestatus-command-queries"></a> Livestatus COMMAND Queries
1926 A list of available external commands and their parameters can be found [here](#external-commands-list-detail)
1928 $ echo -e 'COMMAND <externalcommandstring>' | netcat 127.0.0.1 6558
1931 ### <a id="livestatus-filters"></a> Livestatus Filters
1935 Operator | Negate | Description
1936 ----------|------------------------
1938 ~ | !~ | Regex match
1939 =~ | !=~ | Equality ignoring case
1940 ~~ | !~~ | Regex ignoring case
1943 <= | | Less than or equal
1944 >= | | Greater than or equal
1947 ### <a id="livestatus-stats"></a> Livestatus Stats
1949 Schema: "Stats: aggregatefunction aggregateattribute"
1951 Aggregate Function | Description
1952 -------------------|--------------
1957 std | standard deviation
1958 suminv | sum (1 / value)
1959 avginv | suminv / count
1960 count | ordinary default for any stats query if not aggregate function defined
1965 Filter: has_been_checked = 1
1966 Filter: check_type = 0
1967 Stats: sum execution_time
1969 Stats: sum percent_state_change
1970 Stats: min execution_time
1972 Stats: min percent_state_change
1973 Stats: max execution_time
1975 Stats: max percent_state_change
1977 ResponseHeader: fixed16
1979 ### <a id="livestatus-output"></a> Livestatus Output
1983 CSV Output uses two levels of array separators: The members array separator
1984 is a comma (1st level) while extra info and host|service relation separator
1985 is a pipe (2nd level).
1987 Separators can be set using ASCII codes like:
1989 Separators: 10 59 44 124
1995 ### <a id="livestatus-error-codes"></a> Livestatus Error Codes
1998 ----------|--------------
2000 404 | Table does not exist
2001 452 | Exception on query
2003 ### <a id="livestatus-tables"></a> Livestatus Tables
2005 Table | Join |Description
2006 --------------|-----------|----------------------------
2007 hosts | | host config and status attributes, services counter
2008 hostgroups | | hostgroup config, status attributes and host/service counters
2009 services | hosts | service config and status attributes
2010 servicegroups | | servicegroup config, status attributes and service counters
2011 contacts | | contact config and status attributes
2012 contactgroups | | contact config, members
2013 commands | | command name and line
2014 status | | programstatus, config and stats
2015 comments | services | status attributes
2016 downtimes | services | status attributes
2017 timeperiods | | name and is inside flag
2018 endpoints | | config and status attributes
2019 log | services, hosts, contacts, commands | parses [compatlog](#objecttype-compatlogger) and shows log attributes
2020 statehist | hosts, services | parses [compatlog](#objecttype-compatlogger) and aggregates state change attributes
2022 The `commands` table is populated with `CheckCommand`, `EventCommand` and `NotificationCommand` objects.
2024 A detailed list on the available table attributes can be found in the [Livestatus Schema documentation](#schema-livestatus).
2027 ## <a id="check-result-files"></a> Check Result Files
2029 Icinga 1.x writes its check result files to a temporary spool directory
2030 where they are processed in a regular interval.
2031 While this is extremely inefficient in performance regards it has been
2032 rendered useful for passing passive check results directly into Icinga 1.x
2033 skipping the external command pipe.
2035 Several clustered/distributed environments and check-aggregation addons
2036 use that method. In order to support step-by-step migration of these
2037 environments, Icinga 2 ships the `CheckResultReader` object.
2039 There is no feature configuration available, but it must be defined
2040 on-demand in your Icinga 2 objects configuration.
2042 object CheckResultReader "reader" {
2043 spool_dir = "/data/check-results"