1 # Advanced Topics <a id="advanced-topics"></a>
3 This chapter covers a number of advanced topics. If you're new to Icinga, you
4 can safely skip over things you're not interested in.
6 ## Downtimes <a id="downtimes"></a>
8 Downtimes can be scheduled for planned server maintenance or
9 any other targeted service outage you are aware of in advance.
11 Downtimes will suppress any notifications, and may trigger other
12 downtimes too. If the downtime was set by accident, or the duration
13 exceeds the maintenance, you can manually cancel the downtime.
14 Planned downtimes will also be taken into account for SLA reporting
15 tools calculating the SLAs based on the state and downtime history.
17 Multiple downtimes for a single object may overlap. This is useful
18 when you want to extend your maintenance window taking longer than expected.
19 If there are multiple downtimes triggered for one object, the overall downtime depth
20 will be greater than `1`.
23 If the downtime was scheduled after the problem changed to a critical hard
24 state triggering a problem notification, and the service recovers during
25 the downtime window, the recovery notification won't be suppressed.
27 ### Fixed and Flexible Downtimes <a id="fixed-flexible-downtimes"></a>
29 A `fixed` downtime will be activated at the defined start time, and
30 removed at the end time. During this time window the service state
31 will change to `NOT-OK` and then actually trigger the downtime.
32 Notifications are suppressed and the downtime depth is incremented.
34 Common scenarios are a planned distribution upgrade on your linux
35 servers, or database updates in your warehouse. The customer knows
36 about a fixed downtime window between 23:00 and 24:00. After 24:00
37 all problems should be alerted again. Solution is simple -
38 schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
40 Unlike a `fixed` downtime, a `flexible` downtime will be triggered
41 by the state change in the time span defined by start and end time,
42 and then last for the specified duration in minutes.
44 Imagine the following scenario: Your service is frequently polled
45 by users trying to grab free deleted domains for immediate registration.
46 Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
47 a network outage visible to the monitoring. The service is still alive,
48 but answering too slow to Icinga 2 service checks.
49 For that reason, you may want to schedule a downtime between 07:30 and
50 08:00 with a duration of 15 minutes. The downtime will then last from
51 its trigger time until the duration is over. After that, the downtime
52 is removed (may happen before or after the actual end time!).
54 ### Scheduling a downtime <a id="scheduling-downtime"></a>
56 You can schedule a downtime either by using the Icinga 2 API action
57 [schedule-downtime](12-icinga2-api.md#icinga2-api-actions-schedule-downtime) or
58 by sending an [external command](14-features.md#external-commands).
61 #### Fixed Downtime <a id="fixed-downtime"></a>
63 If the host/service changes into a NOT-OK state between the start and
64 end time window, the downtime will be marked as `in effect` and
65 increases the downtime depth counter.
73 #### Flexible Downtime <a id="flexible-downtime"></a>
75 A flexible downtime defines a time window where the downtime may be
76 triggered from a host/service NOT-OK state change. It will then last
77 until the specified time duration is reached. That way it can happen
78 that the downtime end time is already gone, but the downtime ends
79 at `trigger time + duration`.
84 start | end actual end time
85 |--------------duration--------|
90 ### Triggered Downtimes <a id="triggered-downtimes"></a>
92 This is optional when scheduling a downtime. If there is already a downtime
93 scheduled for a future maintenance, the current downtime can be triggered by
94 that downtime. This renders useful if you have scheduled a host downtime and
95 are now scheduling a child host's downtime getting triggered by the parent
96 downtime on `NOT-OK` state change.
98 ### Recurring Downtimes <a id="recurring-downtimes"></a>
100 [ScheduledDowntime objects](09-object-types.md#objecttype-scheduleddowntime) can be used to set up
101 recurring downtimes for services.
105 apply ScheduledDowntime "backup-downtime" to Service {
106 author = "icingaadmin"
107 comment = "Scheduled downtime for backup"
110 monday = "02:00-03:00"
111 tuesday = "02:00-03:00"
112 wednesday = "02:00-03:00"
113 thursday = "02:00-03:00"
114 friday = "02:00-03:00"
115 saturday = "02:00-03:00"
116 sunday = "02:00-03:00"
119 assign where "backup" in service.groups
123 ## Comments <a id="comments-intro"></a>
125 Comments can be added at runtime and are persistent over restarts. You can
126 add useful information for others on repeating incidents (for example
127 "last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
128 is primarily accessible using web interfaces.
130 You can add a comment either by using the Icinga 2 API action
131 [add-comment](12-icinga2-api.md#icinga2-api-actions-add-comment) or
132 by sending an [external command](14-features.md#external-commands).
134 ## Acknowledgements <a id="acknowledgements"></a>
136 If a problem persists and notifications have been sent, you can
137 acknowledge the problem. That way other users will get
138 a notification that you're aware of the issue and probably are
139 already working on a fix.
141 Note: Acknowledgements also add a new [comment](08-advanced-topics.md#comments-intro)
142 which contains the author and text fields.
144 You can send an acknowledgement either by using the Icinga 2 API action
145 [acknowledge-problem](12-icinga2-api.md#icinga2-api-actions-acknowledge-problem) or
146 by sending an [external command](14-features.md#external-commands).
149 ### Sticky Acknowledgements <a id="sticky-acknowledgements"></a>
151 The acknowledgement is removed if a state change occurs or if the host/service
152 recovers (OK/Up state).
154 If you acknowledge a problem once you've received a `Critical` notification,
155 the acknowledgement will be removed if there is a state transition to `Warning`.
157 OK -> WARNING -> CRITICAL -> WARNING -> OK
160 If you prefer to keep the acknowledgement until the problem is resolved (`OK`
161 recovery) you need to enable the `sticky` parameter.
164 ### Expiring Acknowledgements <a id="expiring-acknowledgements"></a>
166 Once a problem is acknowledged it may disappear from your `handled problems`
167 dashboard and no-one ever looks at it again since it will suppress
170 This `fire-and-forget` action is quite common. If you're sure that a
171 current problem should be resolved in the future at a defined time,
172 you can define an expiration time when acknowledging the problem.
174 Icinga 2 will clear the acknowledgement when expired and start to
175 re-notify, if the problem persists.
178 ## Time Periods <a id="timeperiods"></a>
180 [Time Periods](09-object-types.md#objecttype-timeperiod) define
181 time ranges in Icinga where event actions are triggered, for
182 example whether a service check is executed or not within
183 the `check_period` attribute. Or a notification should be sent to
184 users or not, filtered by the `period` and `notification_period`
185 configuration attributes for `Notification` and `User` objects.
189 > If you are familiar with Icinga 1.x, these time period definitions
190 > are called `legacy timeperiods` in Icinga 2.
192 > An Icinga 2 legacy timeperiod requires the `ITL` provided template
193 >`legacy-timeperiod`.
195 The `TimePeriod` attribute `ranges` may contain multiple directives,
196 including weekdays, days of the month, and calendar dates.
197 These types may overlap/override other types in your ranges dictionary.
199 The descending order of precedence is as follows:
201 * Calendar date (2008-01-01)
202 * Specific month date (January 1st)
203 * Generic month date (Day 15)
204 * Offset weekday of specific month (2nd Tuesday in December)
205 * Offset weekday (3rd Monday)
206 * Normal weekday (Tuesday)
208 If you don't set any `check_period` or `notification_period` attribute
209 on your configuration objects, Icinga 2 assumes `24x7` as time period
212 object TimePeriod "24x7" {
213 import "legacy-timeperiod"
215 display_name = "Icinga 2 24x7 TimePeriod"
217 "monday" = "00:00-24:00"
218 "tuesday" = "00:00-24:00"
219 "wednesday" = "00:00-24:00"
220 "thursday" = "00:00-24:00"
221 "friday" = "00:00-24:00"
222 "saturday" = "00:00-24:00"
223 "sunday" = "00:00-24:00"
227 If your operation staff should only be notified during workhours,
228 create a new timeperiod named `workhours` defining a work day from
231 object TimePeriod "workhours" {
232 import "legacy-timeperiod"
234 display_name = "Icinga 2 8x5 TimePeriod"
236 "monday" = "09:00-17:00"
237 "tuesday" = "09:00-17:00"
238 "wednesday" = "09:00-17:00"
239 "thursday" = "09:00-17:00"
240 "friday" = "09:00-17:00"
244 Furthermore if you wish to specify a notification period across midnight,
245 you can define it the following way:
247 object Timeperiod "across-midnight" {
248 import "legacy-timeperiod"
250 display_name = "Nightly Notification"
252 "saturday" = "22:00-24:00"
253 "sunday" = "00:00-03:00"
257 Below you can see another example for configuring timeperiods across several
258 days, weeks or months. This can be useful when taking components offline
259 for a distinct period of time.
261 object Timeperiod "standby" {
262 import "legacy-timeperiod"
264 display_name = "Standby"
266 "2016-09-30 - 2016-10-30" = "00:00-24:00"
270 Please note that the spaces before and after the dash are mandatory.
272 Once your time period is configured you can Use the `period` attribute
273 to assign time periods to `Notification` and `Dependency` objects:
275 object Notification "mail" {
276 import "generic-notification"
278 host_name = "localhost"
280 command = "mail-notification"
281 users = [ "icingaadmin" ]
285 ### Time Periods Inclusion and Exclusion <a id="timeperiods-includes-excludes"></a>
287 Sometimes it is necessary to exclude certain time ranges from
288 your default time period definitions, for example, if you don't
289 want to send out any notification during the holiday season,
290 or if you only want to allow small time windows for executed checks.
292 The [TimePeriod object](09-object-types.md#objecttype-timeperiod)
293 provides the `includes` and `excludes` attributes to solve this issue.
294 `prefer_includes` defines whether included or excluded time periods are
297 The following example defines a time period called `holidays` where
298 notifications should be suppressed:
300 object TimePeriod "holidays" {
301 import "legacy-timeperiod"
304 "january 1" = "00:00-24:00" //new year's day
305 "july 4" = "00:00-24:00" //independence day
306 "december 25" = "00:00-24:00" //christmas
307 "december 31" = "18:00-24:00" //new year's eve (6pm+)
308 "2017-04-16" = "00:00-24:00" //easter 2017
309 "monday -1 may" = "00:00-24:00" //memorial day (last monday in may)
310 "monday 1 september" = "00:00-24:00" //labor day (1st monday in september)
311 "thursday 4 november" = "00:00-24:00" //thanksgiving (4th thursday in november)
315 In addition to that the time period `weekends` defines an additional
316 time window which should be excluded from notifications:
318 object TimePeriod "weekends-excluded" {
319 import "legacy-timeperiod"
322 "saturday" = "00:00-09:00,18:00-24:00"
323 "sunday" = "00:00-09:00,18:00-24:00"
327 The time period `prod-notification` defines the default time ranges
328 and adds the excluded time period names as an array.
330 object TimePeriod "prod-notification" {
331 import "legacy-timeperiod"
333 excludes = [ "holidays", "weekends-excluded" ]
336 "monday" = "00:00-24:00"
337 "tuesday" = "00:00-24:00"
338 "wednesday" = "00:00-24:00"
339 "thursday" = "00:00-24:00"
340 "friday" = "00:00-24:00"
341 "saturday" = "00:00-24:00"
342 "sunday" = "00:00-24:00"
346 ## External Check Results <a id="external-check-results"></a>
348 Hosts or services which do not actively execute a check plugin to receive
349 the state and output are called "passive checks" or "external check results".
350 In this scenario an external client or script is sending in check results.
352 You can feed check results into Icinga 2 with the following transport methods:
354 * [process-check-result action](12-icinga2-api.md#icinga2-api-actions-process-check-result) available with the [REST API](12-icinga2-api.md#icinga2-api) (remote and local)
355 * External command sent via command pipe (local only)
357 Each time a new check result is received, the next expected check time
358 is updated. This means that if there are no check result received from
359 the external source, Icinga 2 will execute [freshness checks](08-advanced-topics.md#check-result-freshness).
363 > The REST API action allows to specify the `check_source` attribute
364 > which helps identifying the external sender. This is also visible
365 > in Icinga Web 2 and the REST API queries.
367 ## Check Result Freshness <a id="check-result-freshness"></a>
369 In Icinga 2 active check freshness is enabled by default. It is determined by the
370 `check_interval` attribute and no incoming check results in that period of time.
372 The threshold is calculated based on the last check execution time for actively executed checks:
374 (last check execution time + check interval) > current time
376 If this host/service receives check results from an [external source](08-advanced-topics.md#external-check-results),
377 the threshold is based on the last time a check result was received:
379 (last check result time + check interval) > current time
383 > The [process-check-result](12-icinga2-api.md#icinga2-api-actions-process-check-result) REST API
384 > action allows to overrule the pre-defined check interval with a specified TTL in Icinga 2 v2.9+.
386 If the freshness checks fail, Icinga 2 will execute the defined check command.
388 Best practice is to define a [dummy](10-icinga-template-library.md#itl-dummy) `check_command` which gets
389 executed when freshness checks fail.
392 apply Service "external-check" {
393 check_command = "dummy"
396 /* Set the state to UNKNOWN (3) if freshness checks fail. */
399 /* Use a runtime function to retrieve the last check time and more details. */
401 var service = get_service(macro("$host.name$"), macro("$service.name$"))
402 var lastCheck = DateTime(service.last_check).to_string()
404 return "No check results received. Last result time: " + lastCheck
407 assign where "external" in host.vars.services
411 References: [get_service](18-library-reference.md#objref-get_service), [macro](18-library-reference.md#scoped-functions-macro), [DateTime](18-library-reference.md#datetime-type).
413 Example output in Icinga Web 2:
415 ![Icinga 2 Freshness Checks](images/advanced-topics/icinga2_external_checks_freshness_icingaweb2.png)
418 ## Check Flapping <a id="check-flapping"></a>
420 Icinga 2 supports optional detection of hosts and services that are "flapping".
422 Flapping occurs when a service or host changes state too frequently, which would result in a storm of problem and
423 recovery notifications. With flapping detection enabled a flapping notification will be sent while other notifications are
424 suppresed until it calms down after receiving the same status from checks a few times. Flapping detection can help detect
426 configuration problems (wrong thresholds), troublesome services, or network problems.
428 Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
429 The `flapping_threshold_high` and `flapping_threshold_low` attributes allows to specify the thresholds that control
430 when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to be flapping.
432 The default thresholds are 30% for high and 25% for low. If the computed flapping value exceeds the high threshold a
433 host or service is considered flapping until it drops below the low flapping threshold.
435 `FlappingStart` and `FlappingEnd` notifications will be sent out accordingly, if configured. See the chapter on
436 [notifications](alert-notifications) for details
438 > Note: There is no distinctions between hard and soft states with flapping. All state changes count and notifications
439 > will be sent out regardless of the objects state.
441 ### How it works <a id="check-flapping-how-it-works"></a>
443 Icinga 2 saves the last 20 state changes for every host and service. See the graphic below:
445 ![Icinga 2 Flapping State Timeline](images/advanced-topics/flapping-state-graph.png)
447 All the states ware weighted, with the most recent one being worth the most (1.15) and the 20th the least (0.8). The
448 states in between are fairly distributed. The final flapping value are the weighted state changes divided by the total
451 In the example above, the added states would have a total value of 7.82 (`0.84 + 0.86 + 0.88 + 0.9 + 0.98 + 1.06 + 1.12 + 1.18`).
452 This yields a flapping percentage of 39.1% (`7.82 / 20 * 100`). As the default upper flapping threshold is 30%, it would be
455 If the next seven check results then would not be state changes, the flapping percentage would fall below the lower threshold
456 of 25% and therefore the host or service would recover from flapping.
458 ## Volatile Services and Hosts <a id="volatile-services-hosts"></a>
460 The `volatile` option, if enabled for a host or service, makes it treat every [state change](03-monitoring-basics.md#hard-soft-states)
461 as a `HARD` state change. It is comparable to `max_check_attempts = 1`. With this any `NOT-OK` result will
462 ignore `max_check_attempts` and trigger notifications etc. It will further cause any additional `NOT-OK`
463 result to re-send notifications.
465 It may be reasonable to have a volatile service which stays in a `HARD` state if the service stays in a `NOT-OK`
466 state. That way each service recheck will automatically trigger a notification unless the service is acknowledged or
467 in a scheduled downtime.
469 A common example are security checks where each `NOT-OK` check result should immediately trigger a notification.
471 The default for this option is `false` and should only be enabled when required.
474 ## Monitoring Icinga 2 <a id="monitoring-icinga"></a>
476 Why should you do that? Icinga and its components run like any other
477 service application on your server. There are predictable issues
478 such as "disk space is running low" and your monitoring suffers from just
481 You would also like to ensure that features and backends are running
482 and storing required data. Be it the database backend where Icinga Web 2
483 presents fancy dashboards, forwarded metrics to Graphite or InfluxDB or
484 the entire distributed setup.
486 This list isn't complete but should help with your own setup.
487 Windows client specific checks are highlighted.
489 Type | Description | Plugins and CheckCommands
490 ----------------|-------------------------------|-----------------------------------------------------
491 System | Filesystem | [disk](10-icinga-template-library.md#plugin-check-command-disk), [disk-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
492 System | Memory, Swap | [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap), [memory](10-icinga-template-library.md#windows-plugins) (Windows Client)
493 System | Hardware | [hpasm](10-icinga-template-library.md#plugin-contrib-command-hpasm), [ipmi-sensor](10-icinga-template-library.md#plugin-contrib-command-ipmi-sensor)
494 System | Virtualization | [VMware](10-icinga-template-library.md#plugin-contrib-vmware), [esxi_hardware](10-icinga-template-library.md#plugin-contrib-command-esxi-hardware)
495 System | Processes | [procs](10-icinga-template-library.md#plugin-check-command-processes), [service-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
496 System | System Activity Reports | [check_sar_perf](https://github.com/dnsmichi/icinga-plugins/blob/master/scripts/check_sar_perf.py)
497 System | I/O | [iostat](10-icinga-template-library.md#plugin-contrib-command-iostat)
498 System | Network interfaces | [nwc_health](10-icinga-template-library.md#plugin-contrib-command-nwc_health), [interfaces](10-icinga-template-library.md#plugin-contrib-command-interfaces)
499 System | Users | [users](10-icinga-template-library.md#plugin-check-command-users), [users-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
500 System | Logs | Forward them to [Elastic Stack](14-features.md#elastic-stack-integration) or [Graylog](14-features.md#graylog-integration) and add your own alerts.
501 System | NTP | [ntp_time](10-icinga-template-library.md#plugin-check-command-ntp-time)
502 System | Updates | [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum)
503 Icinga | Status & Stats | [icinga](10-icinga-template-library.md#itl-icinga) (more below)
504 Icinga | Cluster & Clients | [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks)
505 Database | MySQL | [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health)
506 Database | PostgreSQL | [postgres](10-icinga-template-library.md#plugin-contrib-command-postgres)
507 Database | Housekeeping | Check the database size and growth and analyse metrics to examine trends.
508 Database | DB IDO | [ido](10-icinga-template-library.md#itl-icinga-ido) (more below)
509 Webserver | Apache2, Nginx, etc. | [http](10-icinga-template-library.md#plugin-check-command-http), [apache_status](10-icinga-template-library.md#plugin-contrib-command-apache_status), [nginx_status](10-icinga-template-library.md#plugin-contrib-command-nginx_status)
510 Webserver | Certificates | [http](10-icinga-template-library.md#plugin-check-command-http)
511 Webserver | Authorization | [http](10-icinga-template-library.md#plugin-check-command-http)
512 Notifications | Mail (queue) | [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [mailq](10-icinga-template-library.md#plugin-check-command-mailq)
513 Notifications | SMS (GSM modem) | [check_sms3_status](https://exchange.icinga.com/netways/check_sms3status)
514 Notifications | Messengers, Cloud services | XMPP, Twitter, IRC, Telegram, PagerDuty, VictorOps, etc.
515 Metrics | PNP, RRDTool | [check_pnp_rrds](https://github.com/lingej/pnp4nagios/tree/master/scripts) checks for stale RRD files.
516 Metrics | Graphite | [graphite](10-icinga-template-library.md#plugin-contrib-command-graphite)
517 Metrics | InfluxDB | [check_influxdb](https://exchange.icinga.com/Mikanoshi/InfluxDB+data+monitoring+plugin)
518 Metrics | Elastic Stack | [elasticsearch](10-icinga-template-library.md#plugin-contrib-command-elasticsearch), [Elastic Stack integration](14-features.md#elastic-stack-integration)
519 Metrics | Graylog | [Graylog integration](14-features.md#graylog-integration)
522 The [icinga](10-icinga-template-library.md#itl-icinga) CheckCommand provides metrics for the runtime stats of
523 Icinga 2. You can forward them to your preferred graphing solution.
524 If you require more metrics you can also query the [REST API](12-icinga2-api.md#icinga2-api) and write
525 your own custom check plugin. Or you keep using the built-in [object accessor functions](08-advanced-topics.md#access-object-attributes-at-runtime)
526 to calculate stats in-memory.
528 There is a built-in [ido](10-icinga-template-library.md#itl-icinga-ido) check available for DB IDO MySQL/PostgreSQL
529 which provides additional metrics for the IDO database.
532 apply Service "ido-mysql" {
533 check_command = "ido"
535 vars.ido_type = "IdoMysqlConnection"
536 vars.ido_name = "ido-mysql" //the name defined in /etc/icinga2/features-enabled/ido-mysql.conf
538 assign where match("master*.localdomain", host.name)
542 More specific database queries can be found in the [DB IDO](14-features.md#db-ido) chapter.
544 Distributed setups should include specific [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks).
545 You might also want to add additional checks for SSL certificate expiration.
548 ## Advanced Configuration Hints <a id="advanced-configuration-hints"></a>
550 ### Advanced Use of Apply Rules <a id="advanced-use-of-apply-rules"></a>
552 [Apply rules](03-monitoring-basics.md#using-apply) can be used to create a rule set which is
553 entirely based on host objects and their attributes.
554 In addition to that [apply for and custom attribute override](03-monitoring-basics.md#using-apply-for)
555 extend the possibilities.
557 The following example defines a dictionary on the host object which contains
558 configuration attributes for multiple web servers. This then used to add three checks:
560 * A `ping4` check using the local IP `address` of the web server.
561 * A `tcp` check querying the TCP port where the HTTP service is running on.
562 * If the `url` key is defined, the third apply for rule will create service objects using the `http` CheckCommand.
563 In addition to that you can optionally define the `ssl` attribute which enables HTTPS checks.
567 object Host "webserver01" {
568 import "generic-host"
569 address = "192.168.56.200"
573 instance["status"] = {
574 address = "192.168.56.201"
578 instance["tomcat"] = {
579 address = "192.168.56.202"
582 instance["icingaweb2"] = {
583 address = "192.168.56.210"
591 Service apply for definitions:
593 apply Service "webserver_ping" for (instance => config in host.vars.webserver.instance) {
594 display_name = "webserver_" + instance
595 check_command = "ping4"
597 vars.ping_address = config.address
599 assign where host.vars.webserver.instance
602 apply Service "webserver_port" for (instance => config in host.vars.webserver.instance) {
603 display_name = "webserver_" + instance + "_" + config.port
604 check_command = "tcp"
606 vars.tcp_address = config.address
607 vars.tcp_port = config.port
609 assign where host.vars.webserver.instance
612 apply Service "webserver_url" for (instance => config in host.vars.webserver.instance) {
613 display_name = "webserver_" + instance + "_" + config.url
614 check_command = "http"
616 vars.http_address = config.address
617 vars.http_port = config.port
618 vars.http_uri = config.url
621 vars.http_ssl = config.ssl
624 assign where config.url != ""
627 The variables defined in the host dictionary are not using the typical custom attribute
628 prefix recommended for CheckCommand parameters. Instead they are re-used for multiple
629 service checks in this example.
630 In addition to defining check parameters this way, you can also enrich the `display_name`
631 attribute with more details. This will be shown in in Icinga Web 2 for example.
633 ### Use Functions in Object Configuration <a id="use-functions-object-config"></a>
635 There is a limited scope where functions can be used as object attributes such as:
637 * As value for [Custom Attributes](03-monitoring-basics.md#custom-attributes-functions)
638 * Returning boolean expressions for [set_if](08-advanced-topics.md#use-functions-command-arguments-setif) inside command arguments
639 * Returning a [command](08-advanced-topics.md#use-functions-command-attribute) array inside command objects
641 The other way around you can create objects dynamically using your own global functions.
645 > Functions called inside command objects share the same global scope as runtime macros.
646 > Therefore you can access host custom attributes like `host.vars.os`, or any other
647 > object attribute from inside the function definition used for [set_if](08-advanced-topics.md#use-functions-command-arguments-setif) or [command](08-advanced-topics.md#use-functions-command-attribute).
649 Tips when implementing functions:
651 * Use [log()](18-library-reference.md#global-functions-log) to dump variables. You can see the output
652 inside the `icinga2.log` file depending in your log severity
653 * Use the `icinga2 console` to test basic functionality (e.g. iterating over a dictionary)
654 * Build them step-by-step. You can always refactor your code later on.
656 #### Register and Use Global Functions <a id="use-functions-global-register"></a>
658 [Functions](17-language-reference.md#functions) can be registered into the global scope. This allows custom functions being available
659 in objects and other functions. Keep in mind that these functions are not marked
660 as side-effect-free and as such are not available via the REST API.
662 Add a new configuration file `functions.conf` and include it into the [icinga2.conf](04-configuring-icinga-2.md#icinga2-conf)
663 configuration file in the very beginning, e.g. after `constants.conf`. You can also manage global
664 functions inside `constants.conf` if you prefer.
666 The following function converts a given state parameter into a returned string value. The important
667 bits for registering it into the global scope are:
669 * `globals.<unique_function_name>` adds a new globals entry.
670 * `function()` specifies that a call to `state_to_string()` executes a function.
671 * Function parameters are defined inside the `function()` definition.
674 globals.state_to_string = function(state) {
677 } else if (state == 1) {
679 } else if (state == 0) {
681 } else if (state == 3) {
684 log(LogWarning, "state_to_string", "Unknown state " + state + " provided.")
689 The else-condition allows for better error handling. This warning will be shown in the Icinga 2
690 log file once the function is called.
694 > If these functions are used in a distributed environment, you must ensure to deploy them
697 In order to test-drive the newly created function, restart Icinga 2 and use the [debug console](11-cli-commands.md#cli-command-console)
698 to connect to the REST API.
701 $ ICINGA2_API_PASSWORD=icinga icinga2 console --connect 'https://root@localhost:5665/'
702 Icinga 2 (version: v2.8.1-373-g4bea6d25c)
703 <1> => globals.state_to_string(1)
705 <2> => state_to_string(2)
709 You can see that this function is now registered into the [global scope](17-language-reference.md#variable-scopes). The function call
710 `state_to_string()` can be used in any object at static config compile time or inside runtime
713 The following service object example uses the service state and converts it to string output.
714 The function definition is not optimized and is enrolled for better readability including a log message.
717 object Service "state-test" {
718 check_command = "dummy"
724 var h = macro("$host.name$")
725 var s = macro("$service.name$")
727 var state = get_service(h, s).state
729 log(LogInformation, "dummy_state", "Host: " + h + " Service: " + s + " State: " + state)
731 return state_to_string(state)
737 #### Use Custom Functions as Attribute <a id="custom-functions-as-attribute"></a>
739 To use custom functions as attributes, the function must be defined in a
740 slightly unexpected way. The following example shows how to assign values
741 depending on group membership. All hosts in the `slow-lan` host group use 300
742 as value for `ping_wrta`, all other hosts use 100.
744 globals.group_specific_value = function(group, group_value, non_group_value) {
745 return function() use (group, group_value, non_group_value) {
746 if (group in host.groups) {
749 return non_group_value
754 apply Service "ping4" {
755 import "generic-service"
756 check_command = "ping4"
758 vars.ping_wrta = group_specific_value("slow-lan", 300, 100)
759 vars.ping_crta = group_specific_value("slow-lan", 500, 200)
764 #### Use Functions in Assign Where Expressions <a id="use-functions-assign-where"></a>
766 If a simple expression for matching a name or checking if an item
767 exists in an array or dictionary does not fit, you should consider
768 writing your own global [functions](17-language-reference.md#functions).
769 You can call them inside `assign where` and `ignore where` expressions
770 for [apply rules](03-monitoring-basics.md#using-apply-expressions) or
771 [group assignments](03-monitoring-basics.md#group-assign-intro) just like
772 any other global functions for example [match](18-library-reference.md#global-functions-match).
774 The following example requires the host `myprinter` being added
775 to the host group `printers-lexmark` but only if the host uses
776 a template matching the name `lexmark*`.
778 template Host "lexmark-printer-host" {
779 vars.printer_type = "Lexmark"
782 object Host "myprinter" {
783 import "generic-host"
784 import "lexmark-printer-host"
786 address = "192.168.1.1"
789 /* register a global function for the assign where call */
790 globals.check_host_templates = function(host, search) {
791 /* iterate over all host templates and check if the search matches */
792 for (tmpl in host.templates) {
793 if (match(search, tmpl)) {
798 /* nothing matched */
802 object HostGroup "printers-lexmark" {
803 display_name = "Lexmark Printers"
804 /* call the global function and pass the arguments */
805 assign where check_host_templates(host, "lexmark*")
809 Take a different more complex example: All hosts with the
810 custom attribute `vars_app` as nested dictionary should be
811 added to the host group `ABAP-app-server`. But only if the
812 `app_type` for all entries is set to `ABAP`.
814 It could read as wildcard match for nested dictionaries:
816 where host.vars.vars_app["*"].app_type == "ABAP"
818 The solution for this problem is to register a global
819 function which checks the `app_type` for all hosts
820 with the `vars_app` dictionary.
822 object Host "appserver01" {
823 check_command = "dummy"
824 vars.vars_app["ABC"] = { app_type = "ABAP" }
826 object Host "appserver02" {
827 check_command = "dummy"
828 vars.vars_app["DEF"] = { app_type = "ABAP" }
831 globals.check_app_type = function(host, type) {
832 /* ensure that other hosts without the custom attribute do not match */
833 if (typeof(host.vars.vars_app) != Dictionary) {
837 /* iterate over the vars_app dictionary */
838 for (key => val in host.vars.vars_app) {
839 /* if the value is a dictionary and if contains the app_type being the requested type */
840 if (typeof(val) == Dictionary && val.app_type == type) {
845 /* nothing matched */
849 object HostGroup "ABAP-app-server" {
850 assign where check_app_type(host, "ABAP")
854 #### Use Functions in Command Arguments set_if <a id="use-functions-command-arguments-setif"></a>
856 The `set_if` attribute inside the command arguments definition in the
857 [CheckCommand object definition](09-object-types.md#objecttype-checkcommand) is primarily used to
858 evaluate whether the command parameter should be set or not.
860 By default you can evaluate runtime macros for their existence. If the result is not an empty
861 string, the command parameter is passed. This becomes fairly complicated when want to evaluate
862 multiple conditions and attributes.
864 The following example was found on the community support channels. The user had defined a host
865 dictionary named `compellent` with the key `disks`. This was then used inside service apply for rules.
867 object Host "dict-host" {
868 check_command = "check_compellent"
869 vars.compellent["disks"] = {
870 file = "/var/lib/check_compellent/san_disks.0.json",
875 The more significant problem was to only add the command parameter `--disk` to the plugin call
876 when the dictionary `compellent` contains the key `disks`, and omit it if not found.
878 By defining `set_if` as [abbreviated lambda function](17-language-reference.md#nullary-lambdas)
879 and evaluating the host custom attribute `compellent` containing the `disks` this problem was
882 object CheckCommand "check_compellent" {
883 command = [ "/usr/bin/check_compellent" ]
887 var host_vars = host.vars
889 var compel = host_vars.compellent
891 compel.contains("disks")
897 This implementation uses the dictionary type method [contains](18-library-reference.md#dictionary-contains)
898 and will fail if `host.vars.compellent` is not of the type `Dictionary`.
899 Therefore you can extend the checks using the [typeof](17-language-reference.md#types) function.
901 You can test the types using the `icinga2 console`:
904 Icinga (version: v2.3.0-193-g3eb55ad)
905 <1> => srv_vars.compellent["check_a"] = { file="outfile_a.json", checks = [ "disks", "fans" ] }
907 <2> => srv_vars.compellent["check_b"] = { file="outfile_b.json", checks = [ "power", "voltages" ] }
909 <3> => typeof(srv_vars.compellent)
913 The more programmatic approach for `set_if` could look like this:
917 var srv_vars = service.vars
918 if(len(srv_vars) > 0) {
919 if (typeof(srv_vars.compellent) == Dictionary) {
920 return srv_vars.compellent.contains("disks")
922 log(LogInformationen, "checkcommand set_if", "custom attribute compellent_checks is not a dictionary, ignoring it.")
926 log(LogWarning, "checkcommand set_if", "empty custom attributes")
933 #### Use Functions as Command Attribute <a id="use-functions-command-attribute"></a>
935 This comes in handy for [NotificationCommands](09-object-types.md#objecttype-notificationcommand)
936 or [EventCommands](09-object-types.md#objecttype-eventcommand) which does not require
937 a returned checkresult including state/output.
939 The following example was taken from the community support channels. The requirement was to
940 specify a custom attribute inside the notification apply rule and decide which notification
941 script to call based on that.
943 object User "short-dummy" {
946 object UserGroup "short-dummy-group" {
947 assign where user.name == "short-dummy"
950 apply Notification "mail-admins-short" to Host {
951 import "mail-host-notification"
952 command = "mail-host-notification-test"
953 user_groups = [ "short-dummy-group" ]
955 assign where host.vars.notification.mail
958 The solution is fairly simple: The `command` attribute is implemented as function returning
959 an array required by the caller Icinga 2.
960 The local variable `mailscript` sets the default value for the notification scrip location.
961 If the notification custom attribute `short` is set, it will override the local variable `mailscript`
963 The `mailscript` variable is then used to compute the final notification command array being
966 You can omit the `log()` calls, they only help debugging.
968 object NotificationCommand "mail-host-notification-test" {
970 log("command as function")
971 var mailscript = "mail-host-notification-long.sh"
972 if (notification.vars.short) {
973 mailscript = "mail-host-notification-short.sh"
975 log("Running command")
978 var cmd = [ SysconfDir + "/icinga2/scripts/" + mailscript ]
979 log(LogCritical, "me", cmd)
988 ### Access Object Attributes at Runtime <a id="access-object-attributes-at-runtime"></a>
990 The [Object Accessor Functions](18-library-reference.md#object-accessor-functions)
991 can be used to retrieve references to other objects by name.
993 This allows you to access configuration and runtime object attributes. A detailed
994 list can be found [here](09-object-types.md#object-types).
996 #### Access Object Attributes at Runtime: Cluster Check <a id="access-object-attributes-at-runtime-cluster-check"></a>
998 This is a simple cluster example for accessing two host object states and calculating a virtual
999 cluster state and output:
1002 object Host "cluster-host-01" {
1003 check_command = "dummy"
1004 vars.dummy_state = 2
1005 vars.dummy_text = "This host is down."
1008 object Host "cluster-host-02" {
1009 check_command = "dummy"
1010 vars.dummy_state = 0
1011 vars.dummy_text = "This host is up."
1014 object Host "cluster" {
1015 check_command = "dummy"
1016 vars.cluster_nodes = [ "cluster-host-01", "cluster-host-02" ]
1018 vars.dummy_state = {{
1021 var cluster_nodes = macro("$cluster_nodes$")
1023 for (node in cluster_nodes) {
1024 if (get_host(node).state > 0) {
1031 if (up_count >= down_count) {
1032 return 0 //same up as down -> UP
1034 return 2 //something is broken
1038 vars.dummy_text = {{
1039 var output = "Cluster hosts:\n"
1040 var cluster_nodes = macro("$cluster_nodes$")
1042 for (node in cluster_nodes) {
1043 output += node + ": " + get_host(node).last_check_result.output + "\n"
1051 #### Time Dependent Thresholds <a id="access-object-attributes-at-runtime-time-dependent-thresholds"></a>
1053 The following example sets time dependent thresholds for the load check based on the current
1054 time of the day compared to the defined time period.
1057 object TimePeriod "backup" {
1058 import "legacy-timeperiod"
1061 monday = "02:00-03:00"
1062 tuesday = "02:00-03:00"
1063 wednesday = "02:00-03:00"
1064 thursday = "02:00-03:00"
1065 friday = "02:00-03:00"
1066 saturday = "02:00-03:00"
1067 sunday = "02:00-03:00"
1071 object Host "webserver-with-backup" {
1072 check_command = "hostalive"
1073 address = "127.0.0.1"
1076 object Service "webserver-backup-load" {
1077 check_command = "load"
1078 host_name = "webserver-with-backup"
1080 vars.load_wload1 = {{
1081 if (get_time_period("backup").is_inside) {
1087 vars.load_cload1 = {{
1088 if (get_time_period("backup").is_inside) {
1098 ## Advanced Value Types <a id="advanced-value-types"></a>
1100 In addition to the default value types Icinga 2 also uses a few other types
1101 to represent its internal state. The following types are exposed via the [API](12-icinga2-api.md#icinga2-api).
1103 ### CheckResult <a id="advanced-value-types-checkresult"></a>
1105 Name | Type | Description
1106 --------------------------|-----------------------|----------------------------------
1107 exit\_status | Number | The exit status returned by the check execution.
1108 output | String | The check output.
1109 performance\_data | Array | Array of [performance data values](08-advanced-topics.md#advanced-value-types-perfdatavalue).
1110 check\_source | String | Name of the node executing the check.
1111 state | Number | The current state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN).
1112 command | Value | Array of command with shell-escaped arguments or command line string.
1113 execution\_start | Timestamp | Check execution start time (as a UNIX timestamp).
1114 execution\_end | Timestamp | Check execution end time (as a UNIX timestamp).
1115 schedule\_start | Timestamp | Scheduled check execution start time (as a UNIX timestamp).
1116 schedule\_end | Timestamp | Scheduled check execution end time (as a UNIX timestamp).
1117 active | Boolean | Whether the result is from an active or passive check.
1118 vars\_before | Dictionary | Internal attribute used for calculations.
1119 vars\_after | Dictionary | Internal attribute used for calculations.
1120 ttl | Number | Time-to-live duration in seconds for this check result. The next expected check result is `now + ttl` where freshness checks are executed.
1122 ### PerfdataValue <a id="advanced-value-types-perfdatavalue"></a>
1124 Icinga 2 parses performance data strings returned by check plugins and makes the information available to external interfaces (e.g. [GraphiteWriter](09-object-types.md#objecttype-graphitewriter) or the [Icinga 2 API](12-icinga2-api.md#icinga2-api)).
1126 Name | Type | Description
1127 --------------------------|-----------------------|----------------------------------
1128 label | String | Performance data label.
1129 value | Number | Normalized performance data value without unit.
1130 counter | Boolean | Enabled if the original value contains `c` as unit. Defaults to `false`.
1131 unit | String | Unit of measurement (`seconds`, `bytes`. `percent`) according to the [plugin API](05-service-monitoring.md#service-monitoring-plugin-api).
1132 crit | Value | Critical threshold value.
1133 warn | Value | Warning threshold value.
1134 min | Value | Minimum value returned by the check.
1135 max | Value | Maximum value returned by the check.