1 # Advanced Topics <a id="advanced-topics"></a>
3 This chapter covers a number of advanced topics. If you're new to Icinga, you
4 can safely skip over things you're not interested in.
6 ## Downtimes <a id="downtimes"></a>
8 Downtimes can be scheduled for planned server maintenance or
9 any other targeted service outage you are aware of in advance.
11 Downtimes suppress notifications and can trigger other
12 downtimes too. If the downtime was set by accident, or the duration
13 exceeds the maintenance windows, you can manually cancel the downtime.
15 ### Scheduling a downtime <a id="scheduling-downtime"></a>
17 The most convenient way to schedule planned downtimes is to create
18 them in Icinga Web 2 inside the host/service detail view. Select
19 multiple hosts/services from the listing with the shift key to
20 schedule multiple downtimes.
22 ![Downtime in Icinga Web 2](images/advanced-topics/icingaweb2_downtime_handled.png)
24 In addition to that you can schedule a downtime by using the Icinga 2 API action
25 [schedule-downtime](12-icinga2-api.md#icinga2-api-actions-schedule-downtime).
26 This is especially useful to schedule a downtime on-demand inside a (remote) backup
27 script, or create maintenance downtimes from a cron job for specific dates and intervals.
29 Multiple downtimes for a single object may overlap. This is useful
30 when you want to extend your maintenance window taking longer than expected.
31 If there are multiple downtimes triggered for one object, the overall downtime depth
32 will be greater than `1`.
34 If the downtime was scheduled after the problem changed to a critical hard
35 state triggering a problem notification, and the service recovers during
36 the downtime window, the recovery notification won't be suppressed.
38 Planned downtimes are also taken into account for SLA reporting
39 tools calculating the SLAs based on the state and downtime history.
41 ### Fixed and Flexible Downtimes <a id="fixed-flexible-downtimes"></a>
43 A `fixed` downtime will be activated at the defined start time, and
44 removed at the end time. During this time window the service state
45 will change to `NOT-OK` and then actually trigger the downtime.
46 Notifications are suppressed and the downtime depth is incremented.
48 Common scenarios are a planned distribution upgrade on your linux
49 servers, or database updates in your warehouse. The customer knows
50 about a fixed downtime window between 23:00 and 24:00. After 24:00
51 all problems should be alerted again. Solution is simple -
52 schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
54 Unlike a `fixed` downtime, a `flexible` downtime will be triggered
55 by the state change in the time span defined by start and end time,
56 and then last for the specified duration in minutes.
58 Imagine the following scenario: Your service is frequently polled
59 by users trying to grab free deleted domains for immediate registration.
60 Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
61 a network outage visible to the monitoring. The service is still alive,
62 but answering too slow to Icinga 2 service checks.
63 For that reason, you may want to schedule a downtime between 07:30 and
64 08:00 with a duration of 15 minutes. The downtime will then last from
65 its trigger time until the duration is over. After that, the downtime
66 is removed (may happen before or after the actual end time!).
68 #### Fixed Downtime <a id="fixed-downtime"></a>
70 If the host/service changes into a NOT-OK state between the start and
71 end time window, the downtime will be marked as `in effect` and
72 increases the downtime depth counter.
80 #### Flexible Downtime <a id="flexible-downtime"></a>
82 A flexible downtime defines a time window where the downtime may be
83 triggered from a host/service NOT-OK state change. It will then last
84 until the specified time duration is reached. That way it can happen
85 that the downtime end time is already gone, but the downtime ends
86 at `trigger time + duration`.
91 start | end actual end time
92 |--------------duration--------|
97 ### Triggered Downtimes <a id="triggered-downtimes"></a>
99 This is optional when scheduling a downtime. If there is already a downtime
100 scheduled for a future maintenance, the current downtime can be triggered by
101 that downtime. This renders useful if you have scheduled a host downtime and
102 are now scheduling a child host's downtime getting triggered by the parent
103 downtime on `NOT-OK` state change.
105 ### Recurring Downtimes <a id="recurring-downtimes"></a>
107 [ScheduledDowntime objects](09-object-types.md#objecttype-scheduleddowntime) can be used to set up
108 recurring downtimes for services.
113 apply ScheduledDowntime "backup-downtime" to Service {
114 author = "icingaadmin"
115 comment = "Scheduled downtime for backup"
118 monday = "02:00-03:00"
119 tuesday = "02:00-03:00"
120 wednesday = "02:00-03:00"
121 thursday = "02:00-03:00"
122 friday = "02:00-03:00"
123 saturday = "02:00-03:00"
124 sunday = "02:00-03:00"
127 assign where "backup" in service.groups
131 Icinga 2 attempts to find the next possible segment from a ScheduledDowntime object's
132 `ranges` attribute, and wont create multiple downtimes in the future. In case you need
133 all these downtimes planned and visible for the next days, weeks or months, schedule them
134 manually via the [REST API](12-icinga2-api.md#icinga2-api-actions-schedule-downtime) using
135 a script or cron job.
139 > If ScheduledDowntime objects are synced in a distributed high-availability setup,
140 > both will create the next possible downtime on their own. These runtime generated
141 > downtimes are synced among both zone instances, and you may see sort-of duplicate downtimes
145 ## Comments <a id="comments-intro"></a>
147 Comments can be added at runtime and are persistent over restarts. You can
148 add useful information for others on repeating incidents (for example
149 "last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
150 is primarily accessible using web interfaces.
152 You can add a comment either by using the Icinga 2 API action
153 [add-comment](12-icinga2-api.md#icinga2-api-actions-add-comment) or
154 by sending an [external command](14-features.md#external-commands).
156 ## Acknowledgements <a id="acknowledgements"></a>
158 If a problem persists and notifications have been sent, you can
159 acknowledge the problem. That way other users will get
160 a notification that you're aware of the issue and probably are
161 already working on a fix.
163 Note: Acknowledgements also add a new [comment](08-advanced-topics.md#comments-intro)
164 which contains the author and text fields.
166 You can send an acknowledgement either by using the Icinga 2 API action
167 [acknowledge-problem](12-icinga2-api.md#icinga2-api-actions-acknowledge-problem) or
168 by sending an [external command](14-features.md#external-commands).
171 ### Sticky Acknowledgements <a id="sticky-acknowledgements"></a>
173 The acknowledgement is removed if a state change occurs or if the host/service
174 recovers (OK/Up state).
176 If you acknowledge a problem once you've received a `Critical` notification,
177 the acknowledgement will be removed if there is a state transition to `Warning`.
179 OK -> WARNING -> CRITICAL -> WARNING -> OK
182 If you prefer to keep the acknowledgement until the problem is resolved (`OK`
183 recovery) you need to enable the `sticky` parameter.
186 ### Expiring Acknowledgements <a id="expiring-acknowledgements"></a>
188 Once a problem is acknowledged it may disappear from your `handled problems`
189 dashboard and no-one ever looks at it again since it will suppress
192 This `fire-and-forget` action is quite common. If you're sure that a
193 current problem should be resolved in the future at a defined time,
194 you can define an expiration time when acknowledging the problem.
196 Icinga 2 will clear the acknowledgement when expired and start to
197 re-notify, if the problem persists.
200 ## Time Periods <a id="timeperiods"></a>
202 [Time Periods](09-object-types.md#objecttype-timeperiod) define
203 time ranges in Icinga where event actions are triggered, for
204 example whether a service check is executed or not within
205 the `check_period` attribute. Or a notification should be sent to
206 users or not, filtered by the `period` and `notification_period`
207 configuration attributes for `Notification` and `User` objects.
211 > If you are familiar with Icinga 1.x, these time period definitions
212 > are called `legacy timeperiods` in Icinga 2.
214 > An Icinga 2 legacy timeperiod requires the `ITL` provided template
215 >`legacy-timeperiod`.
217 The `TimePeriod` attribute `ranges` may contain multiple directives,
218 including weekdays, days of the month, and calendar dates.
219 These types may overlap/override other types in your ranges dictionary.
221 The descending order of precedence is as follows:
223 * Calendar date (2008-01-01)
224 * Specific month date (January 1st)
225 * Generic month date (Day 15)
226 * Offset weekday of specific month (2nd Tuesday in December)
227 * Offset weekday (3rd Monday)
228 * Normal weekday (Tuesday)
230 If you don't set any `check_period` or `notification_period` attribute
231 on your configuration objects, Icinga 2 assumes `24x7` as time period
234 object TimePeriod "24x7" {
235 import "legacy-timeperiod"
237 display_name = "Icinga 2 24x7 TimePeriod"
239 "monday" = "00:00-24:00"
240 "tuesday" = "00:00-24:00"
241 "wednesday" = "00:00-24:00"
242 "thursday" = "00:00-24:00"
243 "friday" = "00:00-24:00"
244 "saturday" = "00:00-24:00"
245 "sunday" = "00:00-24:00"
249 If your operation staff should only be notified during workhours,
250 create a new timeperiod named `workhours` defining a work day from
253 object TimePeriod "workhours" {
254 import "legacy-timeperiod"
256 display_name = "Icinga 2 8x5 TimePeriod"
258 "monday" = "09:00-17:00"
259 "tuesday" = "09:00-17:00"
260 "wednesday" = "09:00-17:00"
261 "thursday" = "09:00-17:00"
262 "friday" = "09:00-17:00"
266 Furthermore if you wish to specify a notification period across midnight,
267 you can define it the following way:
269 object Timeperiod "across-midnight" {
270 import "legacy-timeperiod"
272 display_name = "Nightly Notification"
274 "saturday" = "22:00-24:00"
275 "sunday" = "00:00-03:00"
279 Below you can see another example for configuring timeperiods across several
280 days, weeks or months. This can be useful when taking components offline
281 for a distinct period of time.
283 object Timeperiod "standby" {
284 import "legacy-timeperiod"
286 display_name = "Standby"
288 "2016-09-30 - 2016-10-30" = "00:00-24:00"
292 Please note that the spaces before and after the dash are mandatory.
294 Once your time period is configured you can Use the `period` attribute
295 to assign time periods to `Notification` and `Dependency` objects:
297 object Notification "mail" {
298 import "generic-notification"
300 host_name = "localhost"
302 command = "mail-notification"
303 users = [ "icingaadmin" ]
307 ### Time Periods Inclusion and Exclusion <a id="timeperiods-includes-excludes"></a>
309 Sometimes it is necessary to exclude certain time ranges from
310 your default time period definitions, for example, if you don't
311 want to send out any notification during the holiday season,
312 or if you only want to allow small time windows for executed checks.
314 The [TimePeriod object](09-object-types.md#objecttype-timeperiod)
315 provides the `includes` and `excludes` attributes to solve this issue.
316 `prefer_includes` defines whether included or excluded time periods are
319 The following example defines a time period called `holidays` where
320 notifications should be suppressed:
322 object TimePeriod "holidays" {
323 import "legacy-timeperiod"
326 "january 1" = "00:00-24:00" //new year's day
327 "july 4" = "00:00-24:00" //independence day
328 "december 25" = "00:00-24:00" //christmas
329 "december 31" = "18:00-24:00" //new year's eve (6pm+)
330 "2017-04-16" = "00:00-24:00" //easter 2017
331 "monday -1 may" = "00:00-24:00" //memorial day (last monday in may)
332 "monday 1 september" = "00:00-24:00" //labor day (1st monday in september)
333 "thursday 4 november" = "00:00-24:00" //thanksgiving (4th thursday in november)
337 In addition to that the time period `weekends` defines an additional
338 time window which should be excluded from notifications:
340 object TimePeriod "weekends-excluded" {
341 import "legacy-timeperiod"
344 "saturday" = "00:00-09:00,18:00-24:00"
345 "sunday" = "00:00-09:00,18:00-24:00"
349 The time period `prod-notification` defines the default time ranges
350 and adds the excluded time period names as an array.
352 object TimePeriod "prod-notification" {
353 import "legacy-timeperiod"
355 excludes = [ "holidays", "weekends-excluded" ]
358 "monday" = "00:00-24:00"
359 "tuesday" = "00:00-24:00"
360 "wednesday" = "00:00-24:00"
361 "thursday" = "00:00-24:00"
362 "friday" = "00:00-24:00"
363 "saturday" = "00:00-24:00"
364 "sunday" = "00:00-24:00"
368 ## External Check Results <a id="external-check-results"></a>
370 Hosts or services which do not actively execute a check plugin to receive
371 the state and output are called "passive checks" or "external check results".
372 In this scenario an external client or script is sending in check results.
374 You can feed check results into Icinga 2 with the following transport methods:
376 * [process-check-result action](12-icinga2-api.md#icinga2-api-actions-process-check-result) available with the [REST API](12-icinga2-api.md#icinga2-api) (remote and local)
377 * External command sent via command pipe (local only)
379 Each time a new check result is received, the next expected check time
380 is updated. This means that if there are no check result received from
381 the external source, Icinga 2 will execute [freshness checks](08-advanced-topics.md#check-result-freshness).
385 > The REST API action allows to specify the `check_source` attribute
386 > which helps identifying the external sender. This is also visible
387 > in Icinga Web 2 and the REST API queries.
389 ## Check Result Freshness <a id="check-result-freshness"></a>
391 In Icinga 2 active check freshness is enabled by default. It is determined by the
392 `check_interval` attribute and no incoming check results in that period of time.
394 The threshold is calculated based on the last check execution time for actively executed checks:
396 (last check execution time + check interval) > current time
398 If this host/service receives check results from an [external source](08-advanced-topics.md#external-check-results),
399 the threshold is based on the last time a check result was received:
401 (last check result time + check interval) > current time
405 > The [process-check-result](12-icinga2-api.md#icinga2-api-actions-process-check-result) REST API
406 > action allows to overrule the pre-defined check interval with a specified TTL in Icinga 2 v2.9+.
408 If the freshness checks fail, Icinga 2 will execute the defined check command.
410 Best practice is to define a [dummy](10-icinga-template-library.md#itl-dummy) `check_command` which gets
411 executed when freshness checks fail.
414 apply Service "external-check" {
415 check_command = "dummy"
418 /* Set the state to UNKNOWN (3) if freshness checks fail. */
421 /* Use a runtime function to retrieve the last check time and more details. */
423 var service = get_service(macro("$host.name$"), macro("$service.name$"))
424 var lastCheck = DateTime(service.last_check).to_string()
426 return "No check results received. Last result time: " + lastCheck
429 assign where "external" in host.vars.services
433 References: [get_service](18-library-reference.md#objref-get_service), [macro](18-library-reference.md#scoped-functions-macro), [DateTime](18-library-reference.md#datetime-type).
435 Example output in Icinga Web 2:
437 ![Icinga 2 Freshness Checks](images/advanced-topics/icinga2_external_checks_freshness_icingaweb2.png)
440 ## Check Flapping <a id="check-flapping"></a>
442 Icinga 2 supports optional detection of hosts and services that are "flapping".
444 Flapping occurs when a service or host changes state too frequently, which would result in a storm of problem and
445 recovery notifications. With flapping detection enabled a flapping notification will be sent while other notifications are
446 suppresed until it calms down after receiving the same status from checks a few times. Flapping detection can help detect
448 configuration problems (wrong thresholds), troublesome services, or network problems.
450 Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
451 The `flapping_threshold_high` and `flapping_threshold_low` attributes allows to specify the thresholds that control
452 when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to be flapping.
454 The default thresholds are 30% for high and 25% for low. If the computed flapping value exceeds the high threshold a
455 host or service is considered flapping until it drops below the low flapping threshold.
457 `FlappingStart` and `FlappingEnd` notifications will be sent out accordingly, if configured. See the chapter on
458 [notifications](alert-notifications) for details
460 > Note: There is no distinctions between hard and soft states with flapping. All state changes count and notifications
461 > will be sent out regardless of the objects state.
463 ### How it works <a id="check-flapping-how-it-works"></a>
465 Icinga 2 saves the last 20 state changes for every host and service. See the graphic below:
467 ![Icinga 2 Flapping State Timeline](images/advanced-topics/flapping-state-graph.png)
469 All the states ware weighted, with the most recent one being worth the most (1.15) and the 20th the least (0.8). The
470 states in between are fairly distributed. The final flapping value are the weighted state changes divided by the total
473 In the example above, the added states would have a total value of 7.82 (`0.84 + 0.86 + 0.88 + 0.9 + 0.98 + 1.06 + 1.12 + 1.18`).
474 This yields a flapping percentage of 39.1% (`7.82 / 20 * 100`). As the default upper flapping threshold is 30%, it would be
477 If the next seven check results then would not be state changes, the flapping percentage would fall below the lower threshold
478 of 25% and therefore the host or service would recover from flapping.
480 ## Volatile Services and Hosts <a id="volatile-services-hosts"></a>
482 The `volatile` option, if enabled for a host or service, makes it treat every [state change](03-monitoring-basics.md#hard-soft-states)
483 as a `HARD` state change. It is comparable to `max_check_attempts = 1`. With this any `NOT-OK` result will
484 ignore `max_check_attempts` and trigger notifications etc. It will further cause any additional `NOT-OK`
485 result to re-send notifications.
487 It may be reasonable to have a volatile service which stays in a `HARD` state if the service stays in a `NOT-OK`
488 state. That way each service recheck will automatically trigger a notification unless the service is acknowledged or
489 in a scheduled downtime.
491 A common example are security checks where each `NOT-OK` check result should immediately trigger a notification.
493 The default for this option is `false` and should only be enabled when required.
496 ## Monitoring Icinga 2 <a id="monitoring-icinga"></a>
498 Why should you do that? Icinga and its components run like any other
499 service application on your server. There are predictable issues
500 such as "disk space is running low" and your monitoring suffers from just
503 You would also like to ensure that features and backends are running
504 and storing required data. Be it the database backend where Icinga Web 2
505 presents fancy dashboards, forwarded metrics to Graphite or InfluxDB or
506 the entire distributed setup.
508 This list isn't complete but should help with your own setup.
509 Windows client specific checks are highlighted.
511 Type | Description | Plugins and CheckCommands
512 ----------------|-------------------------------|-----------------------------------------------------
513 System | Filesystem | [disk](10-icinga-template-library.md#plugin-check-command-disk), [disk-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
514 System | Memory, Swap | [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap), [memory](10-icinga-template-library.md#windows-plugins) (Windows Client)
515 System | Hardware | [hpasm](10-icinga-template-library.md#plugin-contrib-command-hpasm), [ipmi-sensor](10-icinga-template-library.md#plugin-contrib-command-ipmi-sensor)
516 System | Virtualization | [VMware](10-icinga-template-library.md#plugin-contrib-vmware), [esxi_hardware](10-icinga-template-library.md#plugin-contrib-command-esxi-hardware)
517 System | Processes | [procs](10-icinga-template-library.md#plugin-check-command-processes), [service-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
518 System | System Activity Reports | [check_sar_perf](https://github.com/dnsmichi/icinga-plugins/blob/master/scripts/check_sar_perf.py)
519 System | I/O | [iostat](10-icinga-template-library.md#plugin-contrib-command-iostat)
520 System | Network interfaces | [nwc_health](10-icinga-template-library.md#plugin-contrib-command-nwc_health), [interfaces](10-icinga-template-library.md#plugin-contrib-command-interfaces)
521 System | Users | [users](10-icinga-template-library.md#plugin-check-command-users), [users-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
522 System | Logs | Forward them to [Elastic Stack](14-features.md#elastic-stack-integration) or [Graylog](14-features.md#graylog-integration) and add your own alerts.
523 System | NTP | [ntp_time](10-icinga-template-library.md#plugin-check-command-ntp-time)
524 System | Updates | [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum)
525 Icinga | Status & Stats | [icinga](10-icinga-template-library.md#itl-icinga) (more below)
526 Icinga | Cluster & Clients | [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks)
527 Database | MySQL | [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health)
528 Database | PostgreSQL | [postgres](10-icinga-template-library.md#plugin-contrib-command-postgres)
529 Database | Housekeeping | Check the database size and growth and analyse metrics to examine trends.
530 Database | DB IDO | [ido](10-icinga-template-library.md#itl-icinga-ido) (more below)
531 Webserver | Apache2, Nginx, etc. | [http](10-icinga-template-library.md#plugin-check-command-http), [apache_status](10-icinga-template-library.md#plugin-contrib-command-apache_status), [nginx_status](10-icinga-template-library.md#plugin-contrib-command-nginx_status)
532 Webserver | Certificates | [http](10-icinga-template-library.md#plugin-check-command-http)
533 Webserver | Authorization | [http](10-icinga-template-library.md#plugin-check-command-http)
534 Notifications | Mail (queue) | [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [mailq](10-icinga-template-library.md#plugin-check-command-mailq)
535 Notifications | SMS (GSM modem) | [check_sms3_status](https://exchange.icinga.com/netways/check_sms3status)
536 Notifications | Messengers, Cloud services | XMPP, Twitter, IRC, Telegram, PagerDuty, VictorOps, etc.
537 Metrics | PNP, RRDTool | [check_pnp_rrds](https://github.com/lingej/pnp4nagios/tree/master/scripts) checks for stale RRD files.
538 Metrics | Graphite | [graphite](10-icinga-template-library.md#plugin-contrib-command-graphite)
539 Metrics | InfluxDB | [check_influxdb](https://exchange.icinga.com/Mikanoshi/InfluxDB+data+monitoring+plugin)
540 Metrics | Elastic Stack | [elasticsearch](10-icinga-template-library.md#plugin-contrib-command-elasticsearch), [Elastic Stack integration](14-features.md#elastic-stack-integration)
541 Metrics | Graylog | [Graylog integration](14-features.md#graylog-integration)
544 The [icinga](10-icinga-template-library.md#itl-icinga) CheckCommand provides metrics for the runtime stats of
545 Icinga 2. You can forward them to your preferred graphing solution.
546 If you require more metrics you can also query the [REST API](12-icinga2-api.md#icinga2-api) and write
547 your own custom check plugin. Or you keep using the built-in [object accessor functions](08-advanced-topics.md#access-object-attributes-at-runtime)
548 to calculate stats in-memory.
550 There is a built-in [ido](10-icinga-template-library.md#itl-icinga-ido) check available for DB IDO MySQL/PostgreSQL
551 which provides additional metrics for the IDO database.
554 apply Service "ido-mysql" {
555 check_command = "ido"
557 vars.ido_type = "IdoMysqlConnection"
558 vars.ido_name = "ido-mysql" //the name defined in /etc/icinga2/features-enabled/ido-mysql.conf
560 assign where match("master*.localdomain", host.name)
564 More specific database queries can be found in the [DB IDO](14-features.md#db-ido) chapter.
566 Distributed setups should include specific [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks).
567 You might also want to add additional checks for SSL certificate expiration.
570 ## Advanced Configuration Hints <a id="advanced-configuration-hints"></a>
572 ### Advanced Use of Apply Rules <a id="advanced-use-of-apply-rules"></a>
574 [Apply rules](03-monitoring-basics.md#using-apply) can be used to create a rule set which is
575 entirely based on host objects and their attributes.
576 In addition to that [apply for and custom attribute override](03-monitoring-basics.md#using-apply-for)
577 extend the possibilities.
579 The following example defines a dictionary on the host object which contains
580 configuration attributes for multiple web servers. This then used to add three checks:
582 * A `ping4` check using the local IP `address` of the web server.
583 * A `tcp` check querying the TCP port where the HTTP service is running on.
584 * If the `url` key is defined, the third apply for rule will create service objects using the `http` CheckCommand.
585 In addition to that you can optionally define the `ssl` attribute which enables HTTPS checks.
589 object Host "webserver01" {
590 import "generic-host"
591 address = "192.168.56.200"
595 instance["status"] = {
596 address = "192.168.56.201"
600 instance["tomcat"] = {
601 address = "192.168.56.202"
604 instance["icingaweb2"] = {
605 address = "192.168.56.210"
613 Service apply for definitions:
615 apply Service "webserver_ping" for (instance => config in host.vars.webserver.instance) {
616 display_name = "webserver_" + instance
617 check_command = "ping4"
619 vars.ping_address = config.address
621 assign where host.vars.webserver.instance
624 apply Service "webserver_port" for (instance => config in host.vars.webserver.instance) {
625 display_name = "webserver_" + instance + "_" + config.port
626 check_command = "tcp"
628 vars.tcp_address = config.address
629 vars.tcp_port = config.port
631 assign where host.vars.webserver.instance
634 apply Service "webserver_url" for (instance => config in host.vars.webserver.instance) {
635 display_name = "webserver_" + instance + "_" + config.url
636 check_command = "http"
638 vars.http_address = config.address
639 vars.http_port = config.port
640 vars.http_uri = config.url
643 vars.http_ssl = config.ssl
646 assign where config.url != ""
649 The variables defined in the host dictionary are not using the typical custom attribute
650 prefix recommended for CheckCommand parameters. Instead they are re-used for multiple
651 service checks in this example.
652 In addition to defining check parameters this way, you can also enrich the `display_name`
653 attribute with more details. This will be shown in in Icinga Web 2 for example.
655 ### Use Functions in Object Configuration <a id="use-functions-object-config"></a>
657 There is a limited scope where functions can be used as object attributes such as:
659 * As value for [Custom Attributes](03-monitoring-basics.md#custom-attributes-functions)
660 * Returning boolean expressions for [set_if](08-advanced-topics.md#use-functions-command-arguments-setif) inside command arguments
661 * Returning a [command](08-advanced-topics.md#use-functions-command-attribute) array inside command objects
663 The other way around you can create objects dynamically using your own global functions.
667 > Functions called inside command objects share the same global scope as runtime macros.
668 > Therefore you can access host custom attributes like `host.vars.os`, or any other
669 > object attribute from inside the function definition used for [set_if](08-advanced-topics.md#use-functions-command-arguments-setif) or [command](08-advanced-topics.md#use-functions-command-attribute).
671 Tips when implementing functions:
673 * Use [log()](18-library-reference.md#global-functions-log) to dump variables. You can see the output
674 inside the `icinga2.log` file depending in your log severity
675 * Use the `icinga2 console` to test basic functionality (e.g. iterating over a dictionary)
676 * Build them step-by-step. You can always refactor your code later on.
678 #### Register and Use Global Functions <a id="use-functions-global-register"></a>
680 [Functions](17-language-reference.md#functions) can be registered into the global scope. This allows custom functions being available
681 in objects and other functions. Keep in mind that these functions are not marked
682 as side-effect-free and as such are not available via the REST API.
684 Add a new configuration file `functions.conf` and include it into the [icinga2.conf](04-configuring-icinga-2.md#icinga2-conf)
685 configuration file in the very beginning, e.g. after `constants.conf`. You can also manage global
686 functions inside `constants.conf` if you prefer.
688 The following function converts a given state parameter into a returned string value. The important
689 bits for registering it into the global scope are:
691 * `globals.<unique_function_name>` adds a new globals entry.
692 * `function()` specifies that a call to `state_to_string()` executes a function.
693 * Function parameters are defined inside the `function()` definition.
696 globals.state_to_string = function(state) {
699 } else if (state == 1) {
701 } else if (state == 0) {
703 } else if (state == 3) {
706 log(LogWarning, "state_to_string", "Unknown state " + state + " provided.")
711 The else-condition allows for better error handling. This warning will be shown in the Icinga 2
712 log file once the function is called.
716 > If these functions are used in a distributed environment, you must ensure to deploy them
719 In order to test-drive the newly created function, restart Icinga 2 and use the [debug console](11-cli-commands.md#cli-command-console)
720 to connect to the REST API.
723 $ ICINGA2_API_PASSWORD=icinga icinga2 console --connect 'https://root@localhost:5665/'
724 Icinga 2 (version: v2.8.1-373-g4bea6d25c)
725 <1> => globals.state_to_string(1)
727 <2> => state_to_string(2)
731 You can see that this function is now registered into the [global scope](17-language-reference.md#variable-scopes). The function call
732 `state_to_string()` can be used in any object at static config compile time or inside runtime
735 The following service object example uses the service state and converts it to string output.
736 The function definition is not optimized and is enrolled for better readability including a log message.
739 object Service "state-test" {
740 check_command = "dummy"
746 var h = macro("$host.name$")
747 var s = macro("$service.name$")
749 var state = get_service(h, s).state
751 log(LogInformation, "dummy_state", "Host: " + h + " Service: " + s + " State: " + state)
753 return state_to_string(state)
759 #### Use Custom Functions as Attribute <a id="custom-functions-as-attribute"></a>
761 To use custom functions as attributes, the function must be defined in a
762 slightly unexpected way. The following example shows how to assign values
763 depending on group membership. All hosts in the `slow-lan` host group use 300
764 as value for `ping_wrta`, all other hosts use 100.
766 globals.group_specific_value = function(group, group_value, non_group_value) {
767 return function() use (group, group_value, non_group_value) {
768 if (group in host.groups) {
771 return non_group_value
776 apply Service "ping4" {
777 import "generic-service"
778 check_command = "ping4"
780 vars.ping_wrta = group_specific_value("slow-lan", 300, 100)
781 vars.ping_crta = group_specific_value("slow-lan", 500, 200)
786 #### Use Functions in Assign Where Expressions <a id="use-functions-assign-where"></a>
788 If a simple expression for matching a name or checking if an item
789 exists in an array or dictionary does not fit, you should consider
790 writing your own global [functions](17-language-reference.md#functions).
791 You can call them inside `assign where` and `ignore where` expressions
792 for [apply rules](03-monitoring-basics.md#using-apply-expressions) or
793 [group assignments](03-monitoring-basics.md#group-assign-intro) just like
794 any other global functions for example [match](18-library-reference.md#global-functions-match).
796 The following example requires the host `myprinter` being added
797 to the host group `printers-lexmark` but only if the host uses
798 a template matching the name `lexmark*`.
800 template Host "lexmark-printer-host" {
801 vars.printer_type = "Lexmark"
804 object Host "myprinter" {
805 import "generic-host"
806 import "lexmark-printer-host"
808 address = "192.168.1.1"
811 /* register a global function for the assign where call */
812 globals.check_host_templates = function(host, search) {
813 /* iterate over all host templates and check if the search matches */
814 for (tmpl in host.templates) {
815 if (match(search, tmpl)) {
820 /* nothing matched */
824 object HostGroup "printers-lexmark" {
825 display_name = "Lexmark Printers"
826 /* call the global function and pass the arguments */
827 assign where check_host_templates(host, "lexmark*")
831 Take a different more complex example: All hosts with the
832 custom attribute `vars_app` as nested dictionary should be
833 added to the host group `ABAP-app-server`. But only if the
834 `app_type` for all entries is set to `ABAP`.
836 It could read as wildcard match for nested dictionaries:
838 where host.vars.vars_app["*"].app_type == "ABAP"
840 The solution for this problem is to register a global
841 function which checks the `app_type` for all hosts
842 with the `vars_app` dictionary.
844 object Host "appserver01" {
845 check_command = "dummy"
846 vars.vars_app["ABC"] = { app_type = "ABAP" }
848 object Host "appserver02" {
849 check_command = "dummy"
850 vars.vars_app["DEF"] = { app_type = "ABAP" }
853 globals.check_app_type = function(host, type) {
854 /* ensure that other hosts without the custom attribute do not match */
855 if (typeof(host.vars.vars_app) != Dictionary) {
859 /* iterate over the vars_app dictionary */
860 for (key => val in host.vars.vars_app) {
861 /* if the value is a dictionary and if contains the app_type being the requested type */
862 if (typeof(val) == Dictionary && val.app_type == type) {
867 /* nothing matched */
871 object HostGroup "ABAP-app-server" {
872 assign where check_app_type(host, "ABAP")
876 #### Use Functions in Command Arguments set_if <a id="use-functions-command-arguments-setif"></a>
878 The `set_if` attribute inside the command arguments definition in the
879 [CheckCommand object definition](09-object-types.md#objecttype-checkcommand) is primarily used to
880 evaluate whether the command parameter should be set or not.
882 By default you can evaluate runtime macros for their existence. If the result is not an empty
883 string, the command parameter is passed. This becomes fairly complicated when want to evaluate
884 multiple conditions and attributes.
886 The following example was found on the community support channels. The user had defined a host
887 dictionary named `compellent` with the key `disks`. This was then used inside service apply for rules.
889 object Host "dict-host" {
890 check_command = "check_compellent"
891 vars.compellent["disks"] = {
892 file = "/var/lib/check_compellent/san_disks.0.json",
897 The more significant problem was to only add the command parameter `--disk` to the plugin call
898 when the dictionary `compellent` contains the key `disks`, and omit it if not found.
900 By defining `set_if` as [abbreviated lambda function](17-language-reference.md#nullary-lambdas)
901 and evaluating the host custom attribute `compellent` containing the `disks` this problem was
904 object CheckCommand "check_compellent" {
905 command = [ "/usr/bin/check_compellent" ]
909 var host_vars = host.vars
911 var compel = host_vars.compellent
913 compel.contains("disks")
919 This implementation uses the dictionary type method [contains](18-library-reference.md#dictionary-contains)
920 and will fail if `host.vars.compellent` is not of the type `Dictionary`.
921 Therefore you can extend the checks using the [typeof](17-language-reference.md#types) function.
923 You can test the types using the `icinga2 console`:
926 Icinga (version: v2.3.0-193-g3eb55ad)
927 <1> => srv_vars.compellent["check_a"] = { file="outfile_a.json", checks = [ "disks", "fans" ] }
929 <2> => srv_vars.compellent["check_b"] = { file="outfile_b.json", checks = [ "power", "voltages" ] }
931 <3> => typeof(srv_vars.compellent)
935 The more programmatic approach for `set_if` could look like this:
939 var srv_vars = service.vars
940 if(len(srv_vars) > 0) {
941 if (typeof(srv_vars.compellent) == Dictionary) {
942 return srv_vars.compellent.contains("disks")
944 log(LogInformationen, "checkcommand set_if", "custom attribute compellent_checks is not a dictionary, ignoring it.")
948 log(LogWarning, "checkcommand set_if", "empty custom attributes")
955 #### Use Functions as Command Attribute <a id="use-functions-command-attribute"></a>
957 This comes in handy for [NotificationCommands](09-object-types.md#objecttype-notificationcommand)
958 or [EventCommands](09-object-types.md#objecttype-eventcommand) which does not require
959 a returned checkresult including state/output.
961 The following example was taken from the community support channels. The requirement was to
962 specify a custom attribute inside the notification apply rule and decide which notification
963 script to call based on that.
965 object User "short-dummy" {
968 object UserGroup "short-dummy-group" {
969 assign where user.name == "short-dummy"
972 apply Notification "mail-admins-short" to Host {
973 import "mail-host-notification"
974 command = "mail-host-notification-test"
975 user_groups = [ "short-dummy-group" ]
977 assign where host.vars.notification.mail
980 The solution is fairly simple: The `command` attribute is implemented as function returning
981 an array required by the caller Icinga 2.
982 The local variable `mailscript` sets the default value for the notification scrip location.
983 If the notification custom attribute `short` is set, it will override the local variable `mailscript`
985 The `mailscript` variable is then used to compute the final notification command array being
988 You can omit the `log()` calls, they only help debugging.
990 object NotificationCommand "mail-host-notification-test" {
992 log("command as function")
993 var mailscript = "mail-host-notification-long.sh"
994 if (notification.vars.short) {
995 mailscript = "mail-host-notification-short.sh"
997 log("Running command")
1000 var cmd = [ SysconfDir + "/icinga2/scripts/" + mailscript ]
1001 log(LogCritical, "me", cmd)
1010 ### Access Object Attributes at Runtime <a id="access-object-attributes-at-runtime"></a>
1012 The [Object Accessor Functions](18-library-reference.md#object-accessor-functions)
1013 can be used to retrieve references to other objects by name.
1015 This allows you to access configuration and runtime object attributes. A detailed
1016 list can be found [here](09-object-types.md#object-types).
1018 #### Access Object Attributes at Runtime: Cluster Check <a id="access-object-attributes-at-runtime-cluster-check"></a>
1020 This is a simple cluster example for accessing two host object states and calculating a virtual
1021 cluster state and output:
1024 object Host "cluster-host-01" {
1025 check_command = "dummy"
1026 vars.dummy_state = 2
1027 vars.dummy_text = "This host is down."
1030 object Host "cluster-host-02" {
1031 check_command = "dummy"
1032 vars.dummy_state = 0
1033 vars.dummy_text = "This host is up."
1036 object Host "cluster" {
1037 check_command = "dummy"
1038 vars.cluster_nodes = [ "cluster-host-01", "cluster-host-02" ]
1040 vars.dummy_state = {{
1043 var cluster_nodes = macro("$cluster_nodes$")
1045 for (node in cluster_nodes) {
1046 if (get_host(node).state > 0) {
1053 if (up_count >= down_count) {
1054 return 0 //same up as down -> UP
1056 return 2 //something is broken
1060 vars.dummy_text = {{
1061 var output = "Cluster hosts:\n"
1062 var cluster_nodes = macro("$cluster_nodes$")
1064 for (node in cluster_nodes) {
1065 output += node + ": " + get_host(node).last_check_result.output + "\n"
1073 #### Time Dependent Thresholds <a id="access-object-attributes-at-runtime-time-dependent-thresholds"></a>
1075 The following example sets time dependent thresholds for the load check based on the current
1076 time of the day compared to the defined time period.
1079 object TimePeriod "backup" {
1080 import "legacy-timeperiod"
1083 monday = "02:00-03:00"
1084 tuesday = "02:00-03:00"
1085 wednesday = "02:00-03:00"
1086 thursday = "02:00-03:00"
1087 friday = "02:00-03:00"
1088 saturday = "02:00-03:00"
1089 sunday = "02:00-03:00"
1093 object Host "webserver-with-backup" {
1094 check_command = "hostalive"
1095 address = "127.0.0.1"
1098 object Service "webserver-backup-load" {
1099 check_command = "load"
1100 host_name = "webserver-with-backup"
1102 vars.load_wload1 = {{
1103 if (get_time_period("backup").is_inside) {
1109 vars.load_cload1 = {{
1110 if (get_time_period("backup").is_inside) {
1120 ## Advanced Value Types <a id="advanced-value-types"></a>
1122 In addition to the default value types Icinga 2 also uses a few other types
1123 to represent its internal state. The following types are exposed via the [API](12-icinga2-api.md#icinga2-api).
1125 ### CheckResult <a id="advanced-value-types-checkresult"></a>
1127 Name | Type | Description
1128 --------------------------|-----------------------|----------------------------------
1129 exit\_status | Number | The exit status returned by the check execution.
1130 output | String | The check output.
1131 performance\_data | Array | Array of [performance data values](08-advanced-topics.md#advanced-value-types-perfdatavalue).
1132 check\_source | String | Name of the node executing the check.
1133 state | Number | The current state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN).
1134 command | Value | Array of command with shell-escaped arguments or command line string.
1135 execution\_start | Timestamp | Check execution start time (as a UNIX timestamp).
1136 execution\_end | Timestamp | Check execution end time (as a UNIX timestamp).
1137 schedule\_start | Timestamp | Scheduled check execution start time (as a UNIX timestamp).
1138 schedule\_end | Timestamp | Scheduled check execution end time (as a UNIX timestamp).
1139 active | Boolean | Whether the result is from an active or passive check.
1140 vars\_before | Dictionary | Internal attribute used for calculations.
1141 vars\_after | Dictionary | Internal attribute used for calculations.
1142 ttl | Number | Time-to-live duration in seconds for this check result. The next expected check result is `now + ttl` where freshness checks are executed.
1144 ### PerfdataValue <a id="advanced-value-types-perfdatavalue"></a>
1146 Icinga 2 parses performance data strings returned by check plugins and makes the information available to external interfaces (e.g. [GraphiteWriter](09-object-types.md#objecttype-graphitewriter) or the [Icinga 2 API](12-icinga2-api.md#icinga2-api)).
1148 Name | Type | Description
1149 --------------------------|-----------------------|----------------------------------
1150 label | String | Performance data label.
1151 value | Number | Normalized performance data value without unit.
1152 counter | Boolean | Enabled if the original value contains `c` as unit. Defaults to `false`.
1153 unit | String | Unit of measurement (`seconds`, `bytes`. `percent`) according to the [plugin API](05-service-monitoring.md#service-monitoring-plugin-api).
1154 crit | Value | Critical threshold value.
1155 warn | Value | Warning threshold value.
1156 min | Value | Minimum value returned by the check.
1157 max | Value | Maximum value returned by the check.