granicus.if.org Git - icinga2/blob - doc/08-advanced-topics.md

   1 # Advanced Topics <a id="advanced-topics"></a>
   2
   3 This chapter covers a number of advanced topics. If you're new to Icinga, you
   4 can safely skip over things you're not interested in.
   5
   6 ## Downtimes <a id="downtimes"></a>
   7
   8 Downtimes can be scheduled for planned server maintenance or
   9 any other targeted service outage you are aware of in advance.
  10
  11 Downtimes suppress notifications and can trigger other
  12 downtimes too. If the downtime was set by accident, or the duration
  13 exceeds the maintenance windows, you can manually cancel the downtime.
  14
  15 ### Scheduling a downtime <a id="scheduling-downtime"></a>
  16
  17 The most convenient way to schedule planned downtimes is to create
  18 them in Icinga Web 2 inside the host/service detail view. Select
  19 multiple hosts/services from the listing with the shift key to
  20 schedule multiple downtimes.
  21
  22 ![Downtime in Icinga Web 2](images/advanced-topics/icingaweb2_downtime_handled.png)
  23
  24 In addition to that you can schedule a downtime by using the Icinga 2 API action
  25 [schedule-downtime](12-icinga2-api.md#icinga2-api-actions-schedule-downtime).
  26 This is especially useful to schedule a downtime on-demand inside a (remote) backup
  27 script, or create maintenance downtimes from a cron job for specific dates and intervals.
  28
  29 Multiple downtimes for a single object may overlap. This is useful
  30 when you want to extend your maintenance window taking longer than expected.
  31 If there are multiple downtimes triggered for one object, the overall downtime depth
  32 will be greater than `1`.
  33
  34 If the downtime was scheduled after the problem changed to a critical hard
  35 state triggering a problem notification, and the service recovers during
  36 the downtime window, the recovery notification won't be suppressed.
  37
  38 Planned downtimes are also taken into account for SLA reporting
  39 tools calculating the SLAs based on the state and downtime history.
  40
  41 ### Fixed and Flexible Downtimes <a id="fixed-flexible-downtimes"></a>
  42
  43 A `fixed` downtime will be activated at the defined start time, and
  44 removed at the end time. During this time window the service state
  45 will change to `NOT-OK` and then actually trigger the downtime.
  46 Notifications are suppressed and the downtime depth is incremented.
  47
  48 Common scenarios are a planned distribution upgrade on your linux
  49 servers, or database updates in your warehouse. The customer knows
  50 about a fixed downtime window between 23:00 and 24:00. After 24:00
  51 all problems should be alerted again. Solution is simple -
  52 schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
  53
  54 Unlike a `fixed` downtime, a `flexible` downtime will be triggered
  55 by the state change in the time span defined by start and end time,
  56 and then last for the specified duration in minutes.
  57
  58 Imagine the following scenario: Your service is frequently polled
  59 by users trying to grab free deleted domains for immediate registration.
  60 Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
  61 a network outage visible to the monitoring. The service is still alive,
  62 but answering too slow to Icinga 2 service checks.
  63 For that reason, you may want to schedule a downtime between 07:30 and
  64 08:00 with a duration of 15 minutes. The downtime will then last from
  65 its trigger time until the duration is over. After that, the downtime
  66 is removed (may happen before or after the actual end time!).
  67
  68 #### Fixed Downtime <a id="fixed-downtime"></a>
  69
  70 If the host/service changes into a NOT-OK state between the start and
  71 end time window, the downtime will be marked as `in effect` and
  72 increases the downtime depth counter.
  73
  74 ```
  75    |       |         |
  76 start      |        end
  77        trigger time
  78 ```
  79
  80 #### Flexible Downtime <a id="flexible-downtime"></a>
  81
  82 A flexible downtime defines a time window where the downtime may be
  83 triggered from a host/service NOT-OK state change. It will then last
  84 until the specified time duration is reached. That way it can happen
  85 that the downtime end time is already gone, but the downtime ends
  86 at `trigger time + duration`.
  87
  88
  89 ```
  90    |       |         |
  91 start      |        end               actual end time
  92            |--------------duration--------|
  93        trigger time
  94 ```
  95
  96
  97 ### Triggered Downtimes <a id="triggered-downtimes"></a>
  98
  99 This is optional when scheduling a downtime. If there is already a downtime
 100 scheduled for a future maintenance, the current downtime can be triggered by
 101 that downtime. This renders useful if you have scheduled a host downtime and
 102 are now scheduling a child host's downtime getting triggered by the parent
 103 downtime on `NOT-OK` state change.
 104
 105 ### Recurring Downtimes <a id="recurring-downtimes"></a>
 106
 107 [ScheduledDowntime objects](09-object-types.md#objecttype-scheduleddowntime) can be used to set up
 108 recurring downtimes for services.
 109
 110 Example:
 111
 112 ```
 113 apply ScheduledDowntime "backup-downtime" to Service {
 114   author = "icingaadmin"
 115   comment = "Scheduled downtime for backup"
 116
 117   ranges = {
 118     monday = "02:00-03:00"
 119     tuesday = "02:00-03:00"
 120     wednesday = "02:00-03:00"
 121     thursday = "02:00-03:00"
 122     friday = "02:00-03:00"
 123     saturday = "02:00-03:00"
 124     sunday = "02:00-03:00"
 125   }
 126
 127   assign where "backup" in service.groups
 128 }
 129 ```
 130
 131 Icinga 2 attempts to find the next possible segment from a ScheduledDowntime object's
 132 `ranges` attribute, and wont create multiple downtimes in the future. In case you need
 133 all these downtimes planned and visible for the next days, weeks or months, schedule them
 134 manually via the [REST API](12-icinga2-api.md#icinga2-api-actions-schedule-downtime) using
 135 a script or cron job.
 136
 137 > **Note**
 138 >
 139 > If ScheduledDowntime objects are synced in a distributed high-availability setup,
 140 > both will create the next possible downtime on their own. These runtime generated
 141 > downtimes are synced among both zone instances, and you may see sort-of duplicate downtimes
 142 > in Icinga Web 2.
 143
 144
 145 ## Comments <a id="comments-intro"></a>
 146
 147 Comments can be added at runtime and are persistent over restarts. You can
 148 add useful information for others on repeating incidents (for example
 149 "last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
 150 is primarily accessible using web interfaces.
 151
 152 You can add a comment either by using the Icinga 2 API action
 153 [add-comment](12-icinga2-api.md#icinga2-api-actions-add-comment) or
 154 by sending an [external command](14-features.md#external-commands).
 155
 156 ## Acknowledgements <a id="acknowledgements"></a>
 157
 158 If a problem persists and notifications have been sent, you can
 159 acknowledge the problem. That way other users will get
 160 a notification that you're aware of the issue and probably are
 161 already working on a fix.
 162
 163 Note: Acknowledgements also add a new [comment](08-advanced-topics.md#comments-intro)
 164 which contains the author and text fields.
 165
 166 You can send an acknowledgement either by using the Icinga 2 API action
 167 [acknowledge-problem](12-icinga2-api.md#icinga2-api-actions-acknowledge-problem) or
 168 by sending an [external command](14-features.md#external-commands).
 169
 170
 171 ### Sticky Acknowledgements <a id="sticky-acknowledgements"></a>
 172
 173 The acknowledgement is removed if a state change occurs or if the host/service
 174 recovers (OK/Up state).
 175
 176 If you acknowledge a problem once you've received a `Critical` notification,
 177 the acknowledgement will be removed if there is a state transition to `Warning`.
 178 ```
 179 OK -> WARNING -> CRITICAL -> WARNING -> OK
 180 ```
 181
 182 If you prefer to keep the acknowledgement until the problem is resolved (`OK`
 183 recovery) you need to enable the `sticky` parameter.
 184
 185
 186 ### Expiring Acknowledgements <a id="expiring-acknowledgements"></a>
 187
 188 Once a problem is acknowledged it may disappear from your `handled problems`
 189 dashboard and no-one ever looks at it again since it will suppress
 190 notifications too.
 191
 192 This `fire-and-forget` action is quite common. If you're sure that a
 193 current problem should be resolved in the future at a defined time,
 194 you can define an expiration time when acknowledging the problem.
 195
 196 Icinga 2 will clear the acknowledgement when expired and start to
 197 re-notify, if the problem persists.
 198
 199
 200 ## Time Periods <a id="timeperiods"></a>
 201
 202 [Time Periods](09-object-types.md#objecttype-timeperiod) define
 203 time ranges in Icinga where event actions are triggered, for
 204 example whether a service check is executed or not within
 205 the `check_period` attribute. Or a notification should be sent to
 206 users or not, filtered by the `period` and `notification_period`
 207 configuration attributes for `Notification` and `User` objects.
 208
 209 > **Note**
 210 >
 211 > If you are familiar with Icinga 1.x, these time period definitions
 212 > are called `legacy timeperiods` in Icinga 2.
 213 >
 214 > An Icinga 2 legacy timeperiod requires the `ITL` provided template
 215 >`legacy-timeperiod`.
 216
 217 The `TimePeriod` attribute `ranges` may contain multiple directives,
 218 including weekdays, days of the month, and calendar dates.
 219 These types may overlap/override other types in your ranges dictionary.
 220
 221 The descending order of precedence is as follows:
 222
 223 * Calendar date (2008-01-01)
 224 * Specific month date (January 1st)
 225 * Generic month date (Day 15)
 226 * Offset weekday of specific month (2nd Tuesday in December)
 227 * Offset weekday (3rd Monday)
 228 * Normal weekday (Tuesday)
 229
 230 If you don't set any `check_period` or `notification_period` attribute
 231 on your configuration objects, Icinga 2 assumes `24x7` as time period
 232 as shown below.
 233
 234     object TimePeriod "24x7" {
 235       import "legacy-timeperiod"
 236
 237       display_name = "Icinga 2 24x7 TimePeriod"
 238       ranges = {
 239         "monday"    = "00:00-24:00"
 240         "tuesday"   = "00:00-24:00"
 241         "wednesday" = "00:00-24:00"
 242         "thursday"  = "00:00-24:00"
 243         "friday"    = "00:00-24:00"
 244         "saturday"  = "00:00-24:00"
 245         "sunday"    = "00:00-24:00"
 246       }
 247     }
 248
 249 If your operation staff should only be notified during workhours,
 250 create a new timeperiod named `workhours` defining a work day from
 251 09:00 to 17:00.
 252
 253     object TimePeriod "workhours" {
 254       import "legacy-timeperiod"
 255
 256       display_name = "Icinga 2 8x5 TimePeriod"
 257       ranges = {
 258         "monday"    = "09:00-17:00"
 259         "tuesday"   = "09:00-17:00"
 260         "wednesday" = "09:00-17:00"
 261         "thursday"  = "09:00-17:00"
 262         "friday"    = "09:00-17:00"
 263       }
 264     }
 265
 266 Furthermore if you wish to specify a notification period across midnight,
 267 you can define it the following way:
 268
 269     object Timeperiod "across-midnight" {
 270       import "legacy-timeperiod"
 271
 272       display_name = "Nightly Notification"
 273       ranges = {
 274         "saturday" = "22:00-24:00"
 275         "sunday" = "00:00-03:00"
 276       }
 277     }
 278
 279 Below you can see another example for configuring timeperiods across several
 280 days, weeks or months. This can be useful when taking components offline
 281 for a distinct period of time.
 282
 283     object Timeperiod "standby" {
 284       import "legacy-timeperiod"
 285
 286       display_name = "Standby"
 287       ranges = {
 288         "2016-09-30 - 2016-10-30" = "00:00-24:00"
 289       }
 290     }
 291
 292 Please note that the spaces before and after the dash are mandatory.
 293
 294 Once your time period is configured you can Use the `period` attribute
 295 to assign time periods to `Notification` and `Dependency` objects:
 296
 297     object Notification "mail" {
 298       import "generic-notification"
 299
 300       host_name = "localhost"
 301
 302       command = "mail-notification"
 303       users = [ "icingaadmin" ]
 304       period = "workhours"
 305     }
 306
 307 ### Time Periods Inclusion and Exclusion <a id="timeperiods-includes-excludes"></a>
 308
 309 Sometimes it is necessary to exclude certain time ranges from
 310 your default time period definitions, for example, if you don't
 311 want to send out any notification during the holiday season,
 312 or if you only want to allow small time windows for executed checks.
 313
 314 The [TimePeriod object](09-object-types.md#objecttype-timeperiod)
 315 provides the `includes` and `excludes` attributes to solve this issue.
 316 `prefer_includes` defines whether included or excluded time periods are
 317 preferred.
 318
 319 The following example defines a time period called `holidays` where
 320 notifications should be suppressed:
 321
 322     object TimePeriod "holidays" {
 323       import "legacy-timeperiod"
 324
 325       ranges = {
 326         "january 1" = "00:00-24:00"                 //new year's day
 327         "july 4" = "00:00-24:00"                    //independence day
 328         "december 25" = "00:00-24:00"               //christmas
 329         "december 31" = "18:00-24:00"               //new year's eve (6pm+)
 330         "2017-04-16" = "00:00-24:00"                //easter 2017
 331         "monday -1 may" = "00:00-24:00"             //memorial day (last monday in may)
 332         "monday 1 september" = "00:00-24:00"        //labor day (1st monday in september)
 333         "thursday 4 november" = "00:00-24:00"       //thanksgiving (4th thursday in november)
 334       }
 335     }
 336
 337 In addition to that the time period `weekends` defines an additional
 338 time window which should be excluded from notifications:
 339
 340     object TimePeriod "weekends-excluded" {
 341       import "legacy-timeperiod"
 342
 343       ranges = {
 344         "saturday"  = "00:00-09:00,18:00-24:00"
 345         "sunday"    = "00:00-09:00,18:00-24:00"
 346       }
 347     }
 348
 349 The time period `prod-notification` defines the default time ranges
 350 and adds the excluded time period names as an array.
 351
 352     object TimePeriod "prod-notification" {
 353       import "legacy-timeperiod"
 354
 355       excludes = [ "holidays", "weekends-excluded" ]
 356
 357       ranges = {
 358         "monday"    = "00:00-24:00"
 359         "tuesday"   = "00:00-24:00"
 360         "wednesday" = "00:00-24:00"
 361         "thursday"  = "00:00-24:00"
 362         "friday"    = "00:00-24:00"
 363         "saturday"  = "00:00-24:00"
 364         "sunday"    = "00:00-24:00"
 365       }
 366     }
 367
 368 ## External Check Results <a id="external-check-results"></a>
 369
 370 Hosts or services which do not actively execute a check plugin to receive
 371 the state and output are called "passive checks" or "external check results".
 372 In this scenario an external client or script is sending in check results.
 373
 374 You can feed check results into Icinga 2 with the following transport methods:
 375
 376 * [process-check-result action](12-icinga2-api.md#icinga2-api-actions-process-check-result) available with the [REST API](12-icinga2-api.md#icinga2-api) (remote and local)
 377 * External command sent via command pipe (local only)
 378
 379 Each time a new check result is received, the next expected check time
 380 is updated. This means that if there are no check result received from
 381 the external source, Icinga 2 will execute [freshness checks](08-advanced-topics.md#check-result-freshness).
 382
 383 > **Note**
 384 >
 385 > The REST API action allows to specify the `check_source` attribute
 386 > which helps identifying the external sender. This is also visible
 387 > in Icinga Web 2 and the REST API queries.
 388
 389 ## Check Result Freshness <a id="check-result-freshness"></a>
 390
 391 In Icinga 2 active check freshness is enabled by default. It is determined by the
 392 `check_interval` attribute and no incoming check results in that period of time.
 393
 394 The threshold is calculated based on the last check execution time for actively executed checks:
 395
 396     (last check execution time + check interval) > current time
 397
 398 If this host/service receives check results from an [external source](08-advanced-topics.md#external-check-results),
 399 the threshold is based on the last time a check result was received:
 400
 401     (last check result time + check interval) > current time
 402
 403 > **Tip**
 404 >
 405 > The [process-check-result](12-icinga2-api.md#icinga2-api-actions-process-check-result) REST API
 406 > action allows to overrule the pre-defined check interval with a specified TTL in Icinga 2 v2.9+.
 407
 408 If the freshness checks fail, Icinga 2 will execute the defined check command.
 409
 410 Best practice is to define a [dummy](10-icinga-template-library.md#itl-dummy) `check_command` which gets
 411 executed when freshness checks fail.
 412
 413 ```
 414 apply Service "external-check" {
 415   check_command = "dummy"
 416   check_interval = 1m
 417
 418   /* Set the state to UNKNOWN (3) if freshness checks fail. */
 419   vars.dummy_state = 3
 420
 421   /* Use a runtime function to retrieve the last check time and more details. */
 422   vars.dummy_text = {{
 423     var service = get_service(macro("$host.name$"), macro("$service.name$"))
 424     var lastCheck = DateTime(service.last_check).to_string()
 425
 426     return "No check results received. Last result time: " + lastCheck
 427   }}
 428
 429   assign where "external" in host.vars.services
 430 }
 431 ```
 432
 433 References: [get_service](18-library-reference.md#objref-get_service), [macro](18-library-reference.md#scoped-functions-macro), [DateTime](18-library-reference.md#datetime-type).
 434
 435 Example output in Icinga Web 2:
 436
 437 ![Icinga 2 Freshness Checks](images/advanced-topics/icinga2_external_checks_freshness_icingaweb2.png)
 438
 439
 440 ## Check Flapping <a id="check-flapping"></a>
 441
 442 Icinga 2 supports optional detection of hosts and services that are "flapping".
 443
 444 Flapping occurs when a service or host changes state too frequently, which would result in a storm of problem and
 445 recovery notifications. With flapping detection enabled a flapping notification will be sent while other notifications are
 446 suppresed until it calms down after receiving the same status from checks a few times. Flapping detection can help detect
 447
 448 configuration problems (wrong thresholds), troublesome services, or network problems.
 449
 450 Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
 451 The `flapping_threshold_high` and `flapping_threshold_low` attributes allows to specify the thresholds that control
 452 when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to be flapping.
 453
 454 The default thresholds are 30% for high and 25% for low. If the computed flapping value exceeds the high threshold a
 455 host or service is considered flapping until it drops below the low flapping threshold.
 456
 457 `FlappingStart` and `FlappingEnd` notifications will be sent out accordingly, if configured. See the chapter on
 458 [notifications](alert-notifications) for details
 459
 460 > Note: There is no distinctions between hard and soft states with flapping. All state changes count and notifications
 461 > will be sent out regardless of the objects state.
 462
 463 ### How it works <a id="check-flapping-how-it-works"></a>
 464
 465 Icinga 2 saves the last 20 state changes for every host and service. See the graphic below:
 466
 467 ![Icinga 2 Flapping State Timeline](images/advanced-topics/flapping-state-graph.png)
 468
 469 All the states ware weighted, with the most recent one being worth the most (1.15) and the 20th the least (0.8). The
 470 states in between are fairly distributed. The final flapping value are the weighted state changes divided by the total
 471 count of 20.
 472
 473 In the example above, the added states would have a total value of 7.82 (`0.84 + 0.86 + 0.88 + 0.9 + 0.98 + 1.06 + 1.12 + 1.18`).
 474 This yields a flapping percentage of 39.1% (`7.82 / 20 * 100`). As the default upper flapping threshold is 30%, it would be
 475 considered flapping.
 476
 477 If the next seven check results then would not be state changes, the flapping percentage would fall below the lower threshold
 478 of 25% and therefore the host or service would recover from flapping.
 479
 480 ## Volatile Services and Hosts <a id="volatile-services-hosts"></a>
 481
 482 The `volatile` option, if enabled for a host or service, makes it treat every [state change](03-monitoring-basics.md#hard-soft-states)
 483 as a `HARD` state change. It is comparable to `max_check_attempts = 1`. With this any `NOT-OK` result will
 484 ignore `max_check_attempts` and trigger notifications etc. It will further cause any additional `NOT-OK`
 485 result to re-send notifications.
 486
 487 It may be reasonable to have a volatile service which stays in a `HARD` state if the service stays in a `NOT-OK`
 488 state. That way each service recheck will automatically trigger a notification unless the service is acknowledged or
 489 in a scheduled downtime.
 490
 491 A common example are security checks where each `NOT-OK` check result should immediately trigger a notification.
 492
 493 The default for this option is `false` and should only be enabled when required.
 494
 495
 496 ## Monitoring Icinga 2 <a id="monitoring-icinga"></a>
 497
 498 Why should you do that? Icinga and its components run like any other
 499 service application on your server. There are predictable issues
 500 such as "disk space is running low" and your monitoring suffers from just
 501 that.
 502
 503 You would also like to ensure that features and backends are running
 504 and storing required data. Be it the database backend where Icinga Web 2
 505 presents fancy dashboards, forwarded metrics to Graphite or InfluxDB or
 506 the entire distributed setup.
 507
 508 This list isn't complete but should help with your own setup.
 509 Windows client specific checks are highlighted.
 510
 511 Type            | Description                   | Plugins and CheckCommands
 512 ----------------|-------------------------------|-----------------------------------------------------
 513 System          | Filesystem                    | [disk](10-icinga-template-library.md#plugin-check-command-disk), [disk-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
 514 System          | Memory, Swap                  | [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap), [memory](10-icinga-template-library.md#windows-plugins) (Windows Client)
 515 System          | Hardware                      | [hpasm](10-icinga-template-library.md#plugin-contrib-command-hpasm), [ipmi-sensor](10-icinga-template-library.md#plugin-contrib-command-ipmi-sensor)
 516 System          | Virtualization                | [VMware](10-icinga-template-library.md#plugin-contrib-vmware), [esxi_hardware](10-icinga-template-library.md#plugin-contrib-command-esxi-hardware)
 517 System          | Processes                     | [procs](10-icinga-template-library.md#plugin-check-command-processes), [service-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
 518 System          | System Activity Reports       | [check_sar_perf](https://github.com/dnsmichi/icinga-plugins/blob/master/scripts/check_sar_perf.py)
 519 System          | I/O                           | [iostat](10-icinga-template-library.md#plugin-contrib-command-iostat)
 520 System          | Network interfaces            | [nwc_health](10-icinga-template-library.md#plugin-contrib-command-nwc_health), [interfaces](10-icinga-template-library.md#plugin-contrib-command-interfaces)
 521 System          | Users                         | [users](10-icinga-template-library.md#plugin-check-command-users), [users-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
 522 System          | Logs                          | Forward them to [Elastic Stack](14-features.md#elastic-stack-integration) or [Graylog](14-features.md#graylog-integration) and add your own alerts.
 523 System          | NTP                           | [ntp_time](10-icinga-template-library.md#plugin-check-command-ntp-time)
 524 System          | Updates                       | [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum)
 525 Icinga          | Status & Stats                | [icinga](10-icinga-template-library.md#itl-icinga) (more below)
 526 Icinga          | Cluster & Clients             | [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks)
 527 Database        | MySQL                         | [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health)
 528 Database        | PostgreSQL                    | [postgres](10-icinga-template-library.md#plugin-contrib-command-postgres)
 529 Database        | Housekeeping                  | Check the database size and growth and analyse metrics to examine trends.
 530 Database        | DB IDO                        | [ido](10-icinga-template-library.md#itl-icinga-ido) (more below)
 531 Webserver       | Apache2, Nginx, etc.          | [http](10-icinga-template-library.md#plugin-check-command-http), [apache_status](10-icinga-template-library.md#plugin-contrib-command-apache_status), [nginx_status](10-icinga-template-library.md#plugin-contrib-command-nginx_status)
 532 Webserver       | Certificates                  | [http](10-icinga-template-library.md#plugin-check-command-http)
 533 Webserver       | Authorization                 | [http](10-icinga-template-library.md#plugin-check-command-http)
 534 Notifications   | Mail (queue)                  | [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [mailq](10-icinga-template-library.md#plugin-check-command-mailq)
 535 Notifications   | SMS (GSM modem)               | [check_sms3_status](https://exchange.icinga.com/netways/check_sms3status)
 536 Notifications   | Messengers, Cloud services    | XMPP, Twitter, IRC, Telegram, PagerDuty, VictorOps, etc.
 537 Metrics         | PNP, RRDTool                  | [check_pnp_rrds](https://github.com/lingej/pnp4nagios/tree/master/scripts) checks for stale RRD files.
 538 Metrics         | Graphite                      | [graphite](10-icinga-template-library.md#plugin-contrib-command-graphite)
 539 Metrics         | InfluxDB                      | [check_influxdb](https://exchange.icinga.com/Mikanoshi/InfluxDB+data+monitoring+plugin)
 540 Metrics         | Elastic Stack                 | [elasticsearch](10-icinga-template-library.md#plugin-contrib-command-elasticsearch), [Elastic Stack integration](14-features.md#elastic-stack-integration)
 541 Metrics         | Graylog                       | [Graylog integration](14-features.md#graylog-integration)
 542
 543
 544 The [icinga](10-icinga-template-library.md#itl-icinga) CheckCommand provides metrics for the runtime stats of
 545 Icinga 2. You can forward them to your preferred graphing solution.
 546 If you require more metrics you can also query the [REST API](12-icinga2-api.md#icinga2-api) and write
 547 your own custom check plugin. Or you keep using the built-in [object accessor functions](08-advanced-topics.md#access-object-attributes-at-runtime)
 548 to calculate stats in-memory.
 549
 550 There is a built-in [ido](10-icinga-template-library.md#itl-icinga-ido) check available for DB IDO MySQL/PostgreSQL
 551 which provides additional metrics for the IDO database.
 552
 553 ```
 554 apply Service "ido-mysql" {
 555   check_command = "ido"
 556
 557   vars.ido_type = "IdoMysqlConnection"
 558   vars.ido_name = "ido-mysql" //the name defined in /etc/icinga2/features-enabled/ido-mysql.conf
 559
 560   assign where match("master*.localdomain", host.name)
 561 }
 562 ```
 563
 564 More specific database queries can be found in the [DB IDO](14-features.md#db-ido) chapter.
 565
 566 Distributed setups should include specific [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks).
 567 You might also want to add additional checks for SSL certificate expiration.
 568
 569
 570 ## Advanced Configuration Hints <a id="advanced-configuration-hints"></a>
 571
 572 ### Advanced Use of Apply Rules <a id="advanced-use-of-apply-rules"></a>
 573
 574 [Apply rules](03-monitoring-basics.md#using-apply) can be used to create a rule set which is
 575 entirely based on host objects and their attributes.
 576 In addition to that [apply for and custom attribute override](03-monitoring-basics.md#using-apply-for)
 577 extend the possibilities.
 578
 579 The following example defines a dictionary on the host object which contains
 580 configuration attributes for multiple web servers. This then used to add three checks:
 581
 582 * A `ping4` check using the local IP `address` of the web server.
 583 * A `tcp` check querying the TCP port where the HTTP service is running on.
 584 * If the `url` key is defined, the third apply for rule will create service objects using the `http` CheckCommand.
 585 In addition to that you can optionally define the `ssl` attribute which enables HTTPS checks.
 586
 587 Host definition:
 588
 589     object Host "webserver01" {
 590       import "generic-host"
 591       address = "192.168.56.200"
 592       vars.os = "Linux"
 593
 594       vars.webserver = {
 595         instance["status"] = {
 596           address = "192.168.56.201"
 597           port = "80"
 598           url = "/status"
 599         }
 600         instance["tomcat"] = {
 601           address = "192.168.56.202"
 602           port = "8080"
 603         }
 604         instance["icingaweb2"] = {
 605           address = "192.168.56.210"
 606           port = "443"
 607           url = "/icingaweb2"
 608           ssl = true
 609         }
 610       }
 611     }
 612
 613 Service apply for definitions:
 614
 615     apply Service "webserver_ping" for (instance => config in host.vars.webserver.instance) {
 616       display_name = "webserver_" + instance
 617       check_command = "ping4"
 618
 619       vars.ping_address = config.address
 620
 621       assign where host.vars.webserver.instance
 622     }
 623
 624     apply Service "webserver_port" for (instance => config in host.vars.webserver.instance) {
 625       display_name = "webserver_" + instance + "_" + config.port
 626       check_command = "tcp"
 627
 628       vars.tcp_address = config.address
 629       vars.tcp_port = config.port
 630
 631       assign where host.vars.webserver.instance
 632     }
 633
 634     apply Service "webserver_url" for (instance => config in host.vars.webserver.instance) {
 635       display_name = "webserver_" + instance + "_" + config.url
 636       check_command = "http"
 637
 638       vars.http_address = config.address
 639       vars.http_port = config.port
 640       vars.http_uri = config.url
 641
 642       if (config.ssl) {
 643         vars.http_ssl = config.ssl
 644       }
 645
 646       assign where config.url != ""
 647     }
 648
 649 The variables defined in the host dictionary are not using the typical custom attribute
 650 prefix recommended for CheckCommand parameters. Instead they are re-used for multiple
 651 service checks in this example.
 652 In addition to defining check parameters this way, you can also enrich the `display_name`
 653 attribute with more details. This will be shown in in Icinga Web 2 for example.
 654
 655 ### Use Functions in Object Configuration <a id="use-functions-object-config"></a>
 656
 657 There is a limited scope where functions can be used as object attributes such as:
 658
 659 * As value for [Custom Attributes](03-monitoring-basics.md#custom-attributes-functions)
 660 * Returning boolean expressions for [set_if](08-advanced-topics.md#use-functions-command-arguments-setif) inside command arguments
 661 * Returning a [command](08-advanced-topics.md#use-functions-command-attribute) array inside command objects
 662
 663 The other way around you can create objects dynamically using your own global functions.
 664
 665 > **Note**
 666 >
 667 > Functions called inside command objects share the same global scope as runtime macros.
 668 > Therefore you can access host custom attributes like `host.vars.os`, or any other
 669 > object attribute from inside the function definition used for [set_if](08-advanced-topics.md#use-functions-command-arguments-setif) or [command](08-advanced-topics.md#use-functions-command-attribute).
 670
 671 Tips when implementing functions:
 672
 673 * Use [log()](18-library-reference.md#global-functions-log) to dump variables. You can see the output
 674 inside the `icinga2.log` file depending in your log severity
 675 * Use the `icinga2 console` to test basic functionality (e.g. iterating over a dictionary)
 676 * Build them step-by-step. You can always refactor your code later on.
 677
 678 #### Register and Use Global Functions <a id="use-functions-global-register"></a>
 679
 680 [Functions](17-language-reference.md#functions) can be registered into the global scope. This allows custom functions being available
 681 in objects and other functions. Keep in mind that these functions are not marked
 682 as side-effect-free and as such are not available via the REST API.
 683
 684 Add a new configuration file `functions.conf` and include it into the [icinga2.conf](04-configuring-icinga-2.md#icinga2-conf)
 685 configuration file in the very beginning, e.g. after `constants.conf`. You can also manage global
 686 functions inside `constants.conf` if you prefer.
 687
 688 The following function converts a given state parameter into a returned string value. The important
 689 bits for registering it into the global scope are:
 690
 691 * `globals.<unique_function_name>` adds a new globals entry.
 692 * `function()` specifies that a call to `state_to_string()` executes a function.
 693 * Function parameters are defined inside the `function()` definition.
 694
 695 ```
 696 globals.state_to_string = function(state) {
 697   if (state == 2) {
 698     return "Critical"
 699   } else if (state == 1) {
 700     return "Warning"
 701   } else if (state == 0) {
 702     return "OK"
 703   } else if (state == 3) {
 704     return "Unknown"
 705   } else {
 706     log(LogWarning, "state_to_string", "Unknown state " + state + " provided.")
 707   }
 708 }
 709 ```
 710
 711 The else-condition allows for better error handling. This warning will be shown in the Icinga 2
 712 log file once the function is called.
 713
 714 > **Note**
 715 >
 716 > If these functions are used in a distributed environment, you must ensure to deploy them
 717 > everywhere needed.
 718
 719 In order to test-drive the newly created function, restart Icinga 2 and use the [debug console](11-cli-commands.md#cli-command-console)
 720 to connect to the REST API.
 721
 722 ```
 723 $ ICINGA2_API_PASSWORD=icinga icinga2 console --connect 'https://root@localhost:5665/'
 724 Icinga 2 (version: v2.8.1-373-g4bea6d25c)
 725 <1> => globals.state_to_string(1)
 726 "Warning"
 727 <2> => state_to_string(2)
 728 "Critical"
 729 ```
 730
 731 You can see that this function is now registered into the [global scope](17-language-reference.md#variable-scopes). The function call
 732 `state_to_string()` can be used in any object at static config compile time or inside runtime
 733 lambda functions.
 734
 735 The following service object example uses the service state and converts it to string output.
 736 The function definition is not optimized and is enrolled for better readability including a log message.
 737
 738 ```
 739 object Service "state-test" {
 740   check_command = "dummy"
 741   host_name = NodeName
 742
 743   vars.dummy_state = 2
 744
 745   vars.dummy_text = {{
 746     var h = macro("$host.name$")
 747     var s = macro("$service.name$")
 748
 749     var state = get_service(h, s).state
 750
 751     log(LogInformation, "dummy_state", "Host: " + h + " Service: " + s + " State: " + state)
 752
 753     return state_to_string(state)
 754   }}
 755 }
 756 ```
 757
 758
 759 #### Use Custom Functions as Attribute <a id="custom-functions-as-attribute"></a>
 760
 761 To use custom functions as attributes, the function must be defined in a
 762 slightly unexpected way. The following example shows how to assign values
 763 depending on group membership. All hosts in the `slow-lan` host group use 300
 764 as value for `ping_wrta`, all other hosts use 100.
 765
 766     globals.group_specific_value = function(group, group_value, non_group_value) {
 767         return function() use (group, group_value, non_group_value) {
 768             if (group in host.groups) {
 769                 return group_value
 770             } else {
 771                 return non_group_value
 772             }
 773         }
 774     }
 775
 776     apply Service "ping4" {
 777         import "generic-service"
 778         check_command = "ping4"
 779
 780         vars.ping_wrta = group_specific_value("slow-lan", 300, 100)
 781         vars.ping_crta = group_specific_value("slow-lan", 500, 200)
 782
 783         assign where true
 784     }
 785
 786 #### Use Functions in Assign Where Expressions <a id="use-functions-assign-where"></a>
 787
 788 If a simple expression for matching a name or checking if an item
 789 exists in an array or dictionary does not fit, you should consider
 790 writing your own global [functions](17-language-reference.md#functions).
 791 You can call them inside `assign where` and `ignore where` expressions
 792 for [apply rules](03-monitoring-basics.md#using-apply-expressions) or
 793 [group assignments](03-monitoring-basics.md#group-assign-intro) just like
 794 any other global functions for example [match](18-library-reference.md#global-functions-match).
 795
 796 The following example requires the host `myprinter` being added
 797 to the host group `printers-lexmark` but only if the host uses
 798 a template matching the name `lexmark*`.
 799
 800     template Host "lexmark-printer-host" {
 801       vars.printer_type = "Lexmark"
 802     }
 803
 804     object Host "myprinter" {
 805       import "generic-host"
 806       import "lexmark-printer-host"
 807
 808       address = "192.168.1.1"
 809     }
 810
 811     /* register a global function for the assign where call */
 812     globals.check_host_templates = function(host, search) {
 813       /* iterate over all host templates and check if the search matches */
 814       for (tmpl in host.templates) {
 815         if (match(search, tmpl)) {
 816           return true
 817         }
 818       }
 819
 820       /* nothing matched */
 821       return false
 822     }
 823
 824     object HostGroup "printers-lexmark" {
 825       display_name = "Lexmark Printers"
 826       /* call the global function and pass the arguments */
 827       assign where check_host_templates(host, "lexmark*")
 828     }
 829
 830
 831 Take a different more complex example: All hosts with the
 832 custom attribute `vars_app` as nested dictionary should be
 833 added to the host group `ABAP-app-server`. But only if the
 834 `app_type` for all entries is set to `ABAP`.
 835
 836 It could read as wildcard match for nested dictionaries:
 837
 838     where host.vars.vars_app["*"].app_type == "ABAP"
 839
 840 The solution for this problem is to register a global
 841 function which checks the `app_type` for all hosts
 842 with the `vars_app` dictionary.
 843
 844     object Host "appserver01" {
 845       check_command = "dummy"
 846       vars.vars_app["ABC"] = { app_type = "ABAP" }
 847     }
 848     object Host "appserver02" {
 849       check_command = "dummy"
 850       vars.vars_app["DEF"] = { app_type = "ABAP" }
 851     }
 852
 853     globals.check_app_type = function(host, type) {
 854       /* ensure that other hosts without the custom attribute do not match */
 855       if (typeof(host.vars.vars_app) != Dictionary) {
 856         return false
 857       }
 858
 859       /* iterate over the vars_app dictionary */
 860       for (key => val in host.vars.vars_app) {
 861         /* if the value is a dictionary and if contains the app_type being the requested type */
 862         if (typeof(val) == Dictionary && val.app_type == type) {
 863           return true
 864         }
 865       }
 866
 867       /* nothing matched */
 868       return false
 869     }
 870
 871     object HostGroup "ABAP-app-server" {
 872       assign where check_app_type(host, "ABAP")
 873     }
 874
 875
 876 #### Use Functions in Command Arguments set_if <a id="use-functions-command-arguments-setif"></a>
 877
 878 The `set_if` attribute inside the command arguments definition in the
 879 [CheckCommand object definition](09-object-types.md#objecttype-checkcommand) is primarily used to
 880 evaluate whether the command parameter should be set or not.
 881
 882 By default you can evaluate runtime macros for their existence. If the result is not an empty
 883 string, the command parameter is passed. This becomes fairly complicated when want to evaluate
 884 multiple conditions and attributes.
 885
 886 The following example was found on the community support channels. The user had defined a host
 887 dictionary named `compellent` with the key `disks`. This was then used inside service apply for rules.
 888
 889     object Host "dict-host" {
 890       check_command = "check_compellent"
 891       vars.compellent["disks"] = {
 892         file = "/var/lib/check_compellent/san_disks.0.json",
 893         checks = ["disks"]
 894       }
 895     }
 896
 897 The more significant problem was to only add the command parameter `--disk` to the plugin call
 898 when the dictionary `compellent` contains the key `disks`, and omit it if not found.
 899
 900 By defining `set_if` as [abbreviated lambda function](17-language-reference.md#nullary-lambdas)
 901 and evaluating the host custom attribute `compellent` containing the `disks` this problem was
 902 solved like this:
 903
 904     object CheckCommand "check_compellent" {
 905       command   = [ "/usr/bin/check_compellent" ]
 906       arguments   = {
 907         "--disks"  = {
 908           set_if = {{
 909             var host_vars = host.vars
 910             log(host_vars)
 911             var compel = host_vars.compellent
 912             log(compel)
 913             compel.contains("disks")
 914           }}
 915         }
 916       }
 917     }
 918
 919 This implementation uses the dictionary type method [contains](18-library-reference.md#dictionary-contains)
 920 and will fail if `host.vars.compellent` is not of the type `Dictionary`.
 921 Therefore you can extend the checks using the [typeof](17-language-reference.md#types) function.
 922
 923 You can test the types using the `icinga2 console`:
 924
 925     # icinga2 console
 926     Icinga (version: v2.3.0-193-g3eb55ad)
 927     <1> => srv_vars.compellent["check_a"] = { file="outfile_a.json", checks = [ "disks", "fans" ] }
 928     null
 929     <2> => srv_vars.compellent["check_b"] = { file="outfile_b.json", checks = [ "power", "voltages" ] }
 930     null
 931     <3> => typeof(srv_vars.compellent)
 932     type 'Dictionary'
 933     <4> =>
 934
 935 The more programmatic approach for `set_if` could look like this:
 936
 937         "--disks" = {
 938           set_if = {{
 939             var srv_vars = service.vars
 940             if(len(srv_vars) > 0) {
 941               if (typeof(srv_vars.compellent) == Dictionary) {
 942                 return srv_vars.compellent.contains("disks")
 943               } else {
 944                 log(LogInformationen, "checkcommand set_if", "custom attribute compellent_checks is not a dictionary, ignoring it.")
 945                 return false
 946               }
 947             } else {
 948               log(LogWarning, "checkcommand set_if", "empty custom attributes")
 949               return false
 950             }
 951           }}
 952         }
 953
 954
 955 #### Use Functions as Command Attribute <a id="use-functions-command-attribute"></a>
 956
 957 This comes in handy for [NotificationCommands](09-object-types.md#objecttype-notificationcommand)
 958 or [EventCommands](09-object-types.md#objecttype-eventcommand) which does not require
 959 a returned checkresult including state/output.
 960
 961 The following example was taken from the community support channels. The requirement was to
 962 specify a custom attribute inside the notification apply rule and decide which notification
 963 script to call based on that.
 964
 965     object User "short-dummy" {
 966     }
 967
 968     object UserGroup "short-dummy-group" {
 969       assign where user.name == "short-dummy"
 970     }
 971
 972     apply Notification "mail-admins-short" to Host {
 973        import "mail-host-notification"
 974        command = "mail-host-notification-test"
 975        user_groups = [ "short-dummy-group" ]
 976        vars.short = true
 977        assign where host.vars.notification.mail
 978     }
 979
 980 The solution is fairly simple: The `command` attribute is implemented as function returning
 981 an array required by the caller Icinga 2.
 982 The local variable `mailscript` sets the default value for the notification scrip location.
 983 If the notification custom attribute `short` is set, it will override the local variable `mailscript`
 984 with a new value.
 985 The `mailscript` variable is then used to compute the final notification command array being
 986 returned.
 987
 988 You can omit the `log()` calls, they only help debugging.
 989
 990     object NotificationCommand "mail-host-notification-test" {
 991       command = {{
 992         log("command as function")
 993         var mailscript = "mail-host-notification-long.sh"
 994         if (notification.vars.short) {
 995            mailscript = "mail-host-notification-short.sh"
 996         }
 997         log("Running command")
 998         log(mailscript)
 999
1000         var cmd = [ SysconfDir + "/icinga2/scripts/" + mailscript ]
1001         log(LogCritical, "me", cmd)
1002         return cmd
1003       }}
1004
1005       env = {
1006       }
1007     }
1008
1009
1010 ### Access Object Attributes at Runtime <a id="access-object-attributes-at-runtime"></a>
1011
1012 The [Object Accessor Functions](18-library-reference.md#object-accessor-functions)
1013 can be used to retrieve references to other objects by name.
1014
1015 This allows you to access configuration and runtime object attributes. A detailed
1016 list can be found [here](09-object-types.md#object-types).
1017
1018 #### Access Object Attributes at Runtime: Cluster Check <a id="access-object-attributes-at-runtime-cluster-check"></a>
1019
1020 This is a simple cluster example for accessing two host object states and calculating a virtual
1021 cluster state and output:
1022
1023 ```
1024 object Host "cluster-host-01" {
1025   check_command = "dummy"
1026   vars.dummy_state = 2
1027   vars.dummy_text = "This host is down."
1028 }
1029
1030 object Host "cluster-host-02" {
1031   check_command = "dummy"
1032   vars.dummy_state = 0
1033   vars.dummy_text = "This host is up."
1034 }
1035
1036 object Host "cluster" {
1037   check_command = "dummy"
1038   vars.cluster_nodes = [ "cluster-host-01", "cluster-host-02" ]
1039
1040   vars.dummy_state = {{
1041     var up_count = 0
1042     var down_count = 0
1043     var cluster_nodes = macro("$cluster_nodes$")
1044
1045     for (node in cluster_nodes) {
1046       if (get_host(node).state > 0) {
1047         down_count += 1
1048       } else {
1049         up_count += 1
1050       }
1051     }
1052
1053     if (up_count >= down_count) {
1054       return 0 //same up as down -> UP
1055     } else {
1056       return 2 //something is broken
1057     }
1058   }}
1059
1060   vars.dummy_text = {{
1061     var output = "Cluster hosts:\n"
1062     var cluster_nodes = macro("$cluster_nodes$")
1063
1064     for (node in cluster_nodes) {
1065       output += node + ": " + get_host(node).last_check_result.output + "\n"
1066     }
1067
1068     return output
1069   }}
1070 }
1071 ```
1072
1073 #### Time Dependent Thresholds <a id="access-object-attributes-at-runtime-time-dependent-thresholds"></a>
1074
1075 The following example sets time dependent thresholds for the load check based on the current
1076 time of the day compared to the defined time period.
1077
1078 ```
1079 object TimePeriod "backup" {
1080   import "legacy-timeperiod"
1081
1082   ranges = {
1083     monday = "02:00-03:00"
1084     tuesday = "02:00-03:00"
1085     wednesday = "02:00-03:00"
1086     thursday = "02:00-03:00"
1087     friday = "02:00-03:00"
1088     saturday = "02:00-03:00"
1089     sunday = "02:00-03:00"
1090   }
1091 }
1092
1093 object Host "webserver-with-backup" {
1094   check_command = "hostalive"
1095   address = "127.0.0.1"
1096 }
1097
1098 object Service "webserver-backup-load" {
1099   check_command = "load"
1100   host_name = "webserver-with-backup"
1101
1102   vars.load_wload1 = {{
1103     if (get_time_period("backup").is_inside) {
1104       return 20
1105     } else {
1106       return 5
1107     }
1108   }}
1109   vars.load_cload1 = {{
1110     if (get_time_period("backup").is_inside) {
1111       return 40
1112     } else {
1113       return 10
1114     }
1115   }}
1116 }
1117 ```
1118
1119
1120 ## Advanced Value Types <a id="advanced-value-types"></a>
1121
1122 In addition to the default value types Icinga 2 also uses a few other types
1123 to represent its internal state. The following types are exposed via the [API](12-icinga2-api.md#icinga2-api).
1124
1125 ### CheckResult <a id="advanced-value-types-checkresult"></a>
1126
1127   Name                      | Type                  | Description
1128   --------------------------|-----------------------|----------------------------------
1129   exit\_status              | Number                | The exit status returned by the check execution.
1130   output                    | String                | The check output.
1131   performance\_data         | Array                 | Array of [performance data values](08-advanced-topics.md#advanced-value-types-perfdatavalue).
1132   check\_source             | String                | Name of the node executing the check.
1133   state                     | Number                | The current state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN).
1134   command                   | Value                 | Array of command with shell-escaped arguments or command line string.
1135   execution\_start          | Timestamp             | Check execution start time (as a UNIX timestamp).
1136   execution\_end            | Timestamp             | Check execution end time (as a UNIX timestamp).
1137   schedule\_start           | Timestamp             | Scheduled check execution start time (as a UNIX timestamp).
1138   schedule\_end             | Timestamp             | Scheduled check execution end time (as a UNIX timestamp).
1139   active                    | Boolean               | Whether the result is from an active or passive check.
1140   vars\_before              | Dictionary            | Internal attribute used for calculations.
1141   vars\_after               | Dictionary            | Internal attribute used for calculations.
1142   ttl                       | Number                | Time-to-live duration in seconds for this check result. The next expected check result is `now + ttl` where freshness checks are executed.
1143
1144 ### PerfdataValue <a id="advanced-value-types-perfdatavalue"></a>
1145
1146 Icinga 2 parses performance data strings returned by check plugins and makes the information available to external interfaces (e.g. [GraphiteWriter](09-object-types.md#objecttype-graphitewriter) or the [Icinga 2 API](12-icinga2-api.md#icinga2-api)).
1147
1148   Name                      | Type                  | Description
1149   --------------------------|-----------------------|----------------------------------
1150   label                     | String                | Performance data label.
1151   value                     | Number                | Normalized performance data value without unit.
1152   counter                   | Boolean               | Enabled if the original value contains `c` as unit. Defaults to `false`.
1153   unit                      | String                | Unit of measurement (`seconds`, `bytes`. `percent`) according to the [plugin API](05-service-monitoring.md#service-monitoring-plugin-api).
1154   crit                      | Value                 | Critical threshold value.
1155   warn                      | Value                 | Warning threshold value.
1156   min                       | Value                 | Minimum value returned by the check.
1157   max                       | Value                 | Maximum value returned by the check.