granicus.if.org Git - icinga2/blob - doc/8-advanced-topics.md

   1 # <a id="advanced-topics"></a> Advanced Topics
   2
   3 This chapter covers a number of advanced topics. If you're new to Icinga, you
   4 can safely skip over things you're not interested in.
   5
   6 ## <a id="downtimes"></a> Downtimes
   7
   8 Downtimes can be scheduled for planned server maintenance or
   9 any other targeted service outage you are aware of in advance.
  10
  11 Downtimes will suppress any notifications, and may trigger other
  12 downtimes too. If the downtime was set by accident, or the duration
  13 exceeds the maintenance, you can manually cancel the downtime.
  14 Planned downtimes will also be taken into account for SLA reporting
  15 tools calculating the SLAs based on the state and downtime history.
  16
  17 Multiple downtimes for a single object may overlap. This is useful
  18 when you want to extend your maintenance window taking longer than expected.
  19 If there are multiple downtimes triggered for one object, the overall downtime depth
  20 will be greater than `1`.
  21
  22
  23 If the downtime was scheduled after the problem changed to a critical hard
  24 state triggering a problem notification, and the service recovers during
  25 the downtime window, the recovery notification won't be suppressed.
  26
  27 ### <a id="fixed-flexible-downtimes"></a> Fixed and Flexible Downtimes
  28
  29 A `fixed` downtime will be activated at the defined start time, and
  30 removed at the end time. During this time window the service state
  31 will change to `NOT-OK` and then actually trigger the downtime.
  32 Notifications are suppressed and the downtime depth is incremented.
  33
  34 Common scenarios are a planned distribution upgrade on your linux
  35 servers, or database updates in your warehouse. The customer knows
  36 about a fixed downtime window between 23:00 and 24:00. After 24:00
  37 all problems should be alerted again. Solution is simple -
  38 schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
  39
  40 Unlike a `fixed` downtime, a `flexible` downtime will be triggered
  41 by the state change in the time span defined by start and end time,
  42 and then last for the specified duration in minutes.
  43
  44 Imagine the following scenario: Your service is frequently polled
  45 by users trying to grab free deleted domains for immediate registration.
  46 Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
  47 a network outage visible to the monitoring. The service is still alive,
  48 but answering too slow to Icinga 2 service checks.
  49 For that reason, you may want to schedule a downtime between 07:30 and
  50 08:00 with a duration of 15 minutes. The downtime will then last from
  51 its trigger time until the duration is over. After that, the downtime
  52 is removed (may happen before or after the actual end time!).
  53
  54 ### <a id="scheduling-downtime"></a> Scheduling a downtime
  55
  56 You can schedule a downtime either by using the Icinga 2 API action
  57 [schedule-downtime](12-icinga2-api.md#icinga2-api-actions-schedule-downtime) or
  58 by sending an [external command](14-features.md#external-commands).
  59
  60
  61 #### <a id="fixed-downtime"></a> Fixed Downtime
  62
  63 If the host/service changes into a NOT-OK state between the start and
  64 end time window, the downtime will be marked as `in effect` and
  65 increases the downtime depth counter.
  66
  67 ```
  68    |       |         |
  69 start      |        end
  70        trigger time
  71 ```
  72
  73 #### <a id="flexible-downtime"></a> Flexible Downtime
  74
  75 A flexible downtime defines a time window where the downtime may be
  76 triggered from a host/service NOT-OK state change. It will then last
  77 until the specified time duration is reached. That way it can happen
  78 that the downtime end time is already gone, but the downtime ends
  79 at `trigger time + duration`.
  80
  81
  82 ```
  83    |       |         |
  84 start      |        end               actual end time
  85            |--------------duration--------|
  86        trigger time
  87 ```
  88
  89
  90 ### <a id="triggered-downtimes"></a> Triggered Downtimes
  91
  92 This is optional when scheduling a downtime. If there is already a downtime
  93 scheduled for a future maintenance, the current downtime can be triggered by
  94 that downtime. This renders useful if you have scheduled a host downtime and
  95 are now scheduling a child host's downtime getting triggered by the parent
  96 downtime on `NOT-OK` state change.
  97
  98 ### <a id="recurring-downtimes"></a> Recurring Downtimes
  99
 100 [ScheduledDowntime objects](9-object-types.md#objecttype-scheduleddowntime) can be used to set up
 101 recurring downtimes for services.
 102
 103 Example:
 104
 105     apply ScheduledDowntime "backup-downtime" to Service {
 106       author = "icingaadmin"
 107       comment = "Scheduled downtime for backup"
 108
 109       ranges = {
 110         monday = "02:00-03:00"
 111         tuesday = "02:00-03:00"
 112         wednesday = "02:00-03:00"
 113         thursday = "02:00-03:00"
 114         friday = "02:00-03:00"
 115         saturday = "02:00-03:00"
 116         sunday = "02:00-03:00"
 117       }
 118
 119       assign where "backup" in service.groups
 120     }
 121
 122
 123 ## <a id="comments-intro"></a> Comments
 124
 125 Comments can be added at runtime and are persistent over restarts. You can
 126 add useful information for others on repeating incidents (for example
 127 "last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
 128 is primarily accessible using web interfaces.
 129
 130 You can add a comment either by using the Icinga 2 API action
 131 [add-comment](12-icinga2-api.md#icinga2-api-actions-add-comment) or
 132 by sending an [external command](14-features.md#external-commands).
 133
 134 ## <a id="acknowledgements"></a> Acknowledgements
 135
 136 If a problem persists and notifications have been sent, you can
 137 acknowledge the problem. That way other users will get
 138 a notification that you're aware of the issue and probably are
 139 already working on a fix.
 140
 141 Note: Acknowledgements also add a new [comment](8-advanced-topics.md#comments-intro)
 142 which contains the author and text fields.
 143
 144 You can send an acknowledgement either by using the Icinga 2 API action
 145 [acknowledge-problem](12-icinga2-api.md#icinga2-api-actions-acknowledge-problem) or
 146 by sending an [external command](14-features.md#external-commands).
 147
 148
 149 ### <a id="sticky-acknowledgements"></a> Sticky Acknowledgements
 150
 151 The acknowledgement is removed if a state change occurs or if the host/service
 152 recovers (OK/Up state).
 153
 154 If you acknowlege a problem once you've received a `Critical` notification,
 155 the acknowledgement will be removed if there is a state transition to `Warning`.
 156 ```
 157 OK -> WARNING -> CRITICAL -> WARNING -> OK
 158 ```
 159
 160 If you prefer to keep the acknowledgement until the problem is resolved (`OK`
 161 recovery) you need to enable the `sticky` parameter.
 162
 163
 164 ### <a id="expiring-acknowledgements"></a> Expiring Acknowledgements
 165
 166 Once a problem is acknowledged it may disappear from your `handled problems`
 167 dashboard and no-one ever looks at it again since it will suppress
 168 notifications too.
 169
 170 This `fire-and-forget` action is quite common. If you're sure that a
 171 current problem should be resolved in the future at a defined time,
 172 you can define an expiration time when acknowledging the problem.
 173
 174 Icinga 2 will clear the acknowledgement when expired and start to
 175 re-notify, if the problem persists.
 176
 177
 178 ## <a id="timeperiods"></a> Time Periods
 179
 180 [Time Periods](9-object-types.md#objecttype-timeperiod) define
 181 time ranges in Icinga where event actions are triggered, for
 182 example whether a service check is executed or not within
 183 the `check_period` attribute. Or a notification should be sent to
 184 users or not, filtered by the `period` and `notification_period`
 185 configuration attributes for `Notification` and `User` objects.
 186
 187 > **Note**
 188 >
 189 > If you are familiar with Icinga 1.x, these time period definitions
 190 > are called `legacy timeperiods` in Icinga 2.
 191 >
 192 > An Icinga 2 legacy timeperiod requires the `ITL` provided template
 193 >`legacy-timeperiod`.
 194
 195 The `TimePeriod` attribute `ranges` may contain multiple directives,
 196 including weekdays, days of the month, and calendar dates.
 197 These types may overlap/override other types in your ranges dictionary.
 198
 199 The descending order of precedence is as follows:
 200
 201 * Calendar date (2008-01-01)
 202 * Specific month date (January 1st)
 203 * Generic month date (Day 15)
 204 * Offset weekday of specific month (2nd Tuesday in December)
 205 * Offset weekday (3rd Monday)
 206 * Normal weekday (Tuesday)
 207
 208 If you don't set any `check_period` or `notification_period` attribute
 209 on your configuration objects, Icinga 2 assumes `24x7` as time period
 210 as shown below.
 211
 212     object TimePeriod "24x7" {
 213       import "legacy-timeperiod"
 214
 215       display_name = "Icinga 2 24x7 TimePeriod"
 216       ranges = {
 217         "monday"    = "00:00-24:00"
 218         "tuesday"   = "00:00-24:00"
 219         "wednesday" = "00:00-24:00"
 220         "thursday"  = "00:00-24:00"
 221         "friday"    = "00:00-24:00"
 222         "saturday"  = "00:00-24:00"
 223         "sunday"    = "00:00-24:00"
 224       }
 225     }
 226
 227 If your operation staff should only be notified during workhours,
 228 create a new timeperiod named `workhours` defining a work day from
 229 09:00 to 17:00.
 230
 231     object TimePeriod "workhours" {
 232       import "legacy-timeperiod"
 233
 234       display_name = "Icinga 2 8x5 TimePeriod"
 235       ranges = {
 236         "monday"    = "09:00-17:00"
 237         "tuesday"   = "09:00-17:00"
 238         "wednesday" = "09:00-17:00"
 239         "thursday"  = "09:00-17:00"
 240         "friday"    = "09:00-17:00"
 241       }
 242     }
 243
 244 Furthermore if you wish to specify a notification period across midnight,
 245 you can define it the following way:
 246
 247     object Timeperiod "across-midnight" {
 248       import "legacy-timeperiod"
 249
 250       display_name = "Nightly Notification"
 251       ranges = {
 252         "saturday" = "22:00-24:00"
 253         "sunday" = "00:00-03:00"
 254       }
 255     }
 256
 257 Below you can see another example for configuring timeperiods across several
 258 days, weeks or months. This can be useful when taking components offline
 259 for a distinct period of time.
 260
 261     object Timeperiod "standby" {
 262       import "legacy-timeperiod"
 263
 264       display_name = "Standby"
 265       ranges = {
 266         "2016-09-30 - 2016-10-30" = "00:00-24:00"
 267       }
 268     }
 269
 270 Please note that the spaces before and after the dash are mandatory.
 271
 272 Once your time period is configured you can Use the `period` attribute
 273 to assign time periods to `Notification` and `Dependency` objects:
 274
 275     object Notification "mail" {
 276       import "generic-notification"
 277
 278       host_name = "localhost"
 279
 280       command = "mail-notification"
 281       users = [ "icingaadmin" ]
 282       period = "workhours"
 283     }
 284
 285 ### <a id="timeperiods-includes-excludes"></a> Time Periods Inclusion and Exclusion
 286
 287 Sometimes it is necessary to exclude certain time ranges from
 288 your default time period definitions, for example, if you don't
 289 want to send out any notification during the holiday season,
 290 or if you only want to allow small time windows for executed checks.
 291
 292 The [TimePeriod object](9-object-types.md#objecttype-timeperiod)
 293 provides the `includes` and `excludes` attributes to solve this issue.
 294 `prefer_includes` defines whether included or excluded time periods are
 295 preferred.
 296
 297 The following example defines a time period called `holidays` where
 298 notifications should be supressed:
 299
 300     object TimePeriod "holidays" {
 301       import "legacy-timeperiod"
 302
 303       ranges = {
 304         "january 1" = "00:00-24:00"                 //new year's day
 305         "july 4" = "00:00-24:00"                    //independence day
 306         "december 25" = "00:00-24:00"               //christmas
 307         "december 31" = "18:00-24:00"               //new year's eve (6pm+)
 308         "2017-04-16" = "00:00-24:00"                //easter 2017
 309         "monday -1 may" = "00:00-24:00"             //memorial day (last monday in may)
 310         "monday 1 september" = "00:00-24:00"        //labor day (1st monday in september)
 311         "thursday 4 november" = "00:00-24:00"       //thanksgiving (4th thursday in november)
 312       }
 313     }
 314
 315 In addition to that the time period `weekends` defines an additional
 316 time window which should be excluded from notifications:
 317
 318     object TimePeriod "weekends-excluded" {
 319       import "legacy-timeperiod"
 320
 321       ranges = {
 322         "saturday"  = "00:00-09:00,18:00-24:00"
 323         "sunday"    = "00:00-09:00,18:00-24:00"
 324       }
 325     }
 326
 327 The time period `prod-notification` defines the default time ranges
 328 and adds the excluded time period names as an array.
 329
 330     object TimePeriod "prod-notification" {
 331       import "legacy-timeperiod"
 332
 333       excludes = [ "holidays", "weekends-excluded" ]
 334
 335       ranges = {
 336         "monday"    = "00:00-24:00"
 337         "tuesday"   = "00:00-24:00"
 338         "wednesday" = "00:00-24:00"
 339         "thursday"  = "00:00-24:00"
 340         "friday"    = "00:00-24:00"
 341         "saturday"  = "00:00-24:00"
 342         "sunday"    = "00:00-24:00"
 343       }
 344     }
 345
 346 ## <a id="check-result-freshness"></a> Check Result Freshness
 347
 348 In Icinga 2 active check freshness is enabled by default. It is determined by the
 349 `check_interval` attribute and no incoming check results in that period of time.
 350
 351     threshold = last check execution time + check interval
 352
 353 Passive check freshness is calculated from the `check_interval` attribute if set.
 354
 355     threshold = last check result time + check interval
 356
 357 If the freshness checks are invalid, a new check is executed defined by the
 358 `check_command` attribute.
 359
 360
 361 ## <a id="check-flapping"></a> Check Flapping
 362
 363 Icinga 2 supports optional detection of hosts and services that are "flapping".
 364
 365 Flapping occurs when a service or host changes state too frequently, resulting
 366 in a storm of problem and recovery notifications. Flapping can be the source of
 367 configuration problems (i.e. thresholds set too low), troublesome services,
 368 or real network problems.
 369
 370 Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
 371 The `flapping_threshold` attributes allows to specify the percentage of state changes
 372 when a [host](9-object-types.md#objecttype-host) or [service](objecttype-service) is considered to flap.
 373
 374 Note: There are known issues with flapping detection. Please refrain from enabling
 375 flapping until [#4982](https://github.com/Icinga/icinga2/issues/4982) is fixed.
 376
 377 ## <a id="volatile-services"></a> Volatile Services
 378
 379 By default all services remain in a non-volatile state. When a problem
 380 occurs, the `SOFT` state applies and once `max_check_attempts` attribute
 381 is reached with the check counter, a `HARD` state transition happens.
 382 Notifications are only triggered by `HARD` state changes and are then
 383 re-sent defined by the `interval` attribute.
 384
 385 It may be reasonable to have a volatile service which stays in a `HARD`
 386 state type if the service stays in a `NOT-OK` state. That way each
 387 service recheck will automatically trigger a notification unless the
 388 service is acknowledged or in a scheduled downtime.
 389
 390 ## <a id="monitoring-icinga"></a> Monitoring Icinga 2
 391
 392 Why should you do that? Icinga and its components run like any other
 393 service application on your server. There are predictable issues
 394 such as "disk space is running low" and your monitoring suffers from just
 395 that.
 396
 397 You would also like to ensure that features and backends are running
 398 and storing required data. Be it the database backend where Icinga Web 2
 399 presents fancy dashboards, forwarded metrics to Graphite or InfluxDB or
 400 the entire distributed setup.
 401
 402 This list isn't complete but should help with your own setup.
 403 Windows client specific checks are highlighted.
 404
 405 Type            | Description                   | Plugins and CheckCommands
 406 ----------------|-------------------------------|-----------------------------------------------------
 407 System          | Filesystem                    | [disk](10-icinga-template-library.md#plugin-check-command-disk), [disk-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
 408 System          | Memory, Swap                  | [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap), [memory](10-icinga-template-library.md#windows-plugins) (Windows Client)
 409 System          | Hardware                      | [hpasm](10-icinga-template-library.md#plugin-contrib-command-hpasm), [ipmi-sensor](10-icinga-template-library.md#plugin-contrib-command-ipmi-sensor)
 410 System          | Virtualization                | [VMware](10-icinga-template-library.md#plugin-contrib-vmware), [esxi_hardware](10-icinga-template-library.md#plugin-contrib-command-esxi-hardware)
 411 System          | Processes                     | [procs](10-icinga-template-library.md#plugin-check-command-processes), [service-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
 412 System          | System Activity Reports       | [check_sar_perf](https://github.com/dnsmichi/icinga-plugins/blob/master/scripts/check_sar_perf.py)
 413 System          | I/O                           | [iostat](10-icinga-template-library.md#plugin-contrib-command-iostat)
 414 System          | Network interfaces            | [nwc_health](10-icinga-template-library.md#plugin-contrib-command-nwc_health), [interfaces](10-icinga-template-library.md#plugin-contrib-command-interfaces)
 415 System          | Users                         | [users](10-icinga-template-library.md#plugin-check-command-users), [users-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
 416 System          | Logs                          | Forward them to [Elastic Stack](14-features.md#elastic-stack-integration) or [Graylog](14-features.md#graylog-integration) and add your own alerts.
 417 System          | NTP                           | [ntp_time](10-icinga-template-library.md#plugin-check-command-ntp-time)
 418 System          | Updates                       | [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum)
 419 Icinga          | Status & Stats                | [icinga](10-icinga-template-library.md#itl-icinga) (more below)
 420 Icinga          | Cluster & Clients             | [health checks](6-distributed-monitoring.md#distributed-monitoring-health-checks)
 421 Database        | MySQL                         | [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health)
 422 Database        | PostgreSQL                    | [postgres](10-icinga-template-library.md#plugin-contrib-command-postgres)
 423 Database        | Housekeeping                  | Check the database size and growth and analyse metrics to examine trends.
 424 Database        | DB IDO                        | [ido](10-icinga-template-library.md#itl-icinga-ido) (more below)
 425 Webserver       | Apache2, Nginx, etc.          | [http](10-icinga-template-library.md#plugin-check-command-http), [apache_status](10-icinga-template-library.md#plugin-contrib-command-apache_status), [nginx_status](10-icinga-template-library.md#plugin-contrib-command-nginx_status)
 426 Webserver       | Certificates                  | [http](10-icinga-template-library.md#plugin-check-command-http)
 427 Webserver       | Authorization                 | [http](10-icinga-template-library.md#plugin-check-command-http)
 428 Notifications   | Mail (queue)                  | [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [mailq](10-icinga-template-library.md#plugin-check-command-mailq)
 429 Notifications   | SMS (GSM modem)               | [check_sms3_status](https://exchange.icinga.com/netways/check_sms3status)
 430 Notifications   | Messengers, Cloud services    | XMPP, Twitter, IRC, Telegram, PagerDuty, VictorOps, etc.
 431 Metrics         | PNP, RRDTool                  | [check_pnp_rrds](https://github.com/lingej/pnp4nagios/tree/master/scripts) checks for stale RRD files.
 432 Metrics         | Graphite                      | [graphite](10-icinga-template-library.md#plugin-contrib-command-graphite)
 433 Metrics         | InfluxDB                      | [check_influxdb](https://exchange.icinga.com/Mikanoshi/InfluxDB+data+monitoring+plugin)
 434 Metrics         | Elastic Stack                 | [elasticsearch](10-icinga-template-library.md#plugin-contrib-command-elasticsearch), [Elastic Stack integration](14-features.md#elastic-stack-integration)
 435 Metrics         | Graylog                       | [Graylog integration](14-features.md#graylog-integration)
 436
 437
 438 The [icinga](10-icinga-template-library.md#itl-icinga) CheckCommand provides metrics for the runtime stats of
 439 Icinga 2. You can forward them to your preferred graphing solution.
 440 If you require more metrics you can also query the [REST API](12-icinga2-api.md#icinga2-api) and write
 441 your own custom check plugin. Or you keep using the built-in [object accessor functions](8-advanced-topics.md#access-object-attributes-at-runtime)
 442 to calculate stats in-memory.
 443
 444 There is a built-in [ido](10-icinga-template-library.md#itl-icinga-ido) check available for DB IDO MySQL/PostgreSQL
 445 which provides additional metrics for the IDO database.
 446
 447 ```
 448 apply Service "ido-mysql" {
 449   check_command = "ido"
 450
 451   vars.ido_type = "IdoMysqlConnection"
 452   vars.ido_name = "ido-mysql" //the name defined in /etc/icinga2/features-enabled/ido-mysql.conf
 453
 454   assign where match("master*.localdomain", host.name)
 455 }
 456 ```
 457
 458 More specific database queries can be found in the [DB IDO](14-features.md#db-ido) chapter.
 459
 460 Distributed setups should include specific [health checks](6-distributed-monitoring.md#distributed-monitoring-health-checks).
 461 You might also want to add additional checks for SSL certificate expiration.
 462
 463
 464 ## <a id="advanced-configuration-hints"></a> Advanced Configuration Hints
 465
 466 ### <a id="advanced-use-of-apply-rules"></a> Advanced Use of Apply Rules
 467
 468 [Apply rules](3-monitoring-basics.md#using-apply) can be used to create a rule set which is
 469 entirely based on host objects and their attributes.
 470 In addition to that [apply for and custom attribute override](3-monitoring-basics.md#using-apply-for)
 471 extend the possibilities.
 472
 473 The following example defines a dictionary on the host object which contains
 474 configuration attributes for multiple web servers. This then used to add three checks:
 475
 476 * A `ping4` check using the local IP `address` of the web server.
 477 * A `tcp` check querying the TCP port where the HTTP service is running on.
 478 * If the `url` key is defined, the third apply for rule will create service objects using the `http` CheckCommand.
 479 In addition to that you can optionally define the `ssl` attribute which enables HTTPS checks.
 480
 481 Host definition:
 482
 483     object Host "webserver01" {
 484       import "generic-host"
 485       address = "192.168.56.200"
 486       vars.os = "Linux"
 487
 488       vars.webserver = {
 489         instance["status"] = {
 490           address = "192.168.56.201"
 491           port = "80"
 492           url = "/status"
 493         }
 494         instance["tomcat"] = {
 495           address = "192.168.56.202"
 496           port = "8080"
 497         }
 498         instance["icingaweb2"] = {
 499           address = "192.168.56.210"
 500           port = "443"
 501           url = "/icingaweb2"
 502           ssl = true
 503         }
 504       }
 505     }
 506
 507 Service apply for definitions:
 508
 509     apply Service "webserver_ping" for (instance => config in host.vars.webserver.instance) {
 510       display_name = "webserver_" + instance
 511       check_command = "ping4"
 512
 513       vars.ping_address = config.address
 514
 515       assign where host.vars.webserver.instance
 516     }
 517
 518     apply Service "webserver_port" for (instance => config in host.vars.webserver.instance) {
 519       display_name = "webserver_" + instance + "_" + config.port
 520       check_command = "tcp"
 521
 522       vars.tcp_address = config.address
 523       vars.tcp_port = config.port
 524
 525       assign where host.vars.webserver.instance
 526     }
 527
 528     apply Service "webserver_url" for (instance => config in host.vars.webserver.instance) {
 529       display_name = "webserver_" + instance + "_" + config.url
 530       check_command = "http"
 531
 532       vars.http_address = config.address
 533       vars.http_port = config.port
 534       vars.http_uri = config.url
 535
 536       if (config.ssl) {
 537         vars.http_ssl = config.ssl
 538       }
 539
 540       assign where config.url != ""
 541     }
 542
 543 The variables defined in the host dictionary are not using the typical custom attribute
 544 prefix recommended for CheckCommand parameters. Instead they are re-used for multiple
 545 service checks in this example.
 546 In addition to defining check parameters this way, you can also enrich the `display_name`
 547 attribute with more details. This will be shown in in Icinga Web 2 for example.
 548
 549 ### <a id="use-functions-object-config"></a> Use Functions in Object Configuration
 550
 551 There is a limited scope where functions can be used as object attributes such as:
 552
 553 * As value for [Custom Attributes](3-monitoring-basics.md#custom-attributes-functions)
 554 * Returning boolean expressions for [set_if](8-advanced-topics.md#use-functions-command-arguments-setif) inside command arguments
 555 * Returning a [command](8-advanced-topics.md#use-functions-command-attribute) array inside command objects
 556
 557 The other way around you can create objects dynamically using your own global functions.
 558
 559 > **Note**
 560 >
 561 > Functions called inside command objects share the same global scope as runtime macros.
 562 > Therefore you can access host custom attributes like `host.vars.os`, or any other
 563 > object attribute from inside the function definition used for [set_if](8-advanced-topics.md#use-functions-command-arguments-setif) or [command](8-advanced-topics.md#use-functions-command-attribute).
 564
 565 Tips when implementing functions:
 566
 567 * Use [log()](18-library-reference.md#global-functions-log) to dump variables. You can see the output
 568 inside the `icinga2.log` file depending in your log severity
 569 * Use the `icinga2 console` to test basic functionality (e.g. iterating over a dictionary)
 570 * Build them step-by-step. You can always refactor your code later on.
 571
 572 #### <a id="use-functions-command-arguments-setif"></a> Use Functions in Command Arguments set_if
 573
 574 The `set_if` attribute inside the command arguments definition in the
 575 [CheckCommand object definition](9-object-types.md#objecttype-checkcommand) is primarily used to
 576 evaluate whether the command parameter should be set or not.
 577
 578 By default you can evaluate runtime macros for their existence. If the result is not an empty
 579 string, the command parameter is passed. This becomes fairly complicated when want to evaluate
 580 multiple conditions and attributes.
 581
 582 The following example was found on the community support channels. The user had defined a host
 583 dictionary named `compellent` with the key `disks`. This was then used inside service apply for rules.
 584
 585     object Host "dict-host" {
 586       check_command = "check_compellent"
 587       vars.compellent["disks"] = {
 588         file = "/var/lib/check_compellent/san_disks.0.json",
 589         checks = ["disks"]
 590       }
 591     }
 592
 593 The more significant problem was to only add the command parameter `--disk` to the plugin call
 594 when the dictionary `compellent` contains the key `disks`, and omit it if not found.
 595
 596 By defining `set_if` as [abbreviated lambda function](17-language-reference.md#nullary-lambdas)
 597 and evaluating the host custom attribute `compellent` containing the `disks` this problem was
 598 solved like this:
 599
 600     object CheckCommand "check_compellent" {
 601       command   = [ "/usr/bin/check_compellent" ]
 602       arguments   = {
 603         "--disks"  = {
 604           set_if = {{
 605             var host_vars = host.vars
 606             log(host_vars)
 607             var compel = host_vars.compellent
 608             log(compel)
 609             compel.contains("disks")
 610           }}
 611         }
 612       }
 613     }
 614
 615 This implementation uses the dictionary type method [contains](18-library-reference.md#dictionary-contains)
 616 and will fail if `host.vars.compellent` is not of the type `Dictionary`.
 617 Therefore you can extend the checks using the [typeof](17-language-reference.md#types) function.
 618
 619 You can test the types using the `icinga2 console`:
 620
 621     # icinga2 console
 622     Icinga (version: v2.3.0-193-g3eb55ad)
 623     <1> => srv_vars.compellent["check_a"] = { file="outfile_a.json", checks = [ "disks", "fans" ] }
 624     null
 625     <2> => srv_vars.compellent["check_b"] = { file="outfile_b.json", checks = [ "power", "voltages" ] }
 626     null
 627     <3> => typeof(srv_vars.compellent)
 628     type 'Dictionary'
 629     <4> =>
 630
 631 The more programmatic approach for `set_if` could look like this:
 632
 633         "--disks" = {
 634           set_if = {{
 635             var srv_vars = service.vars
 636             if(len(srv_vars) > 0) {
 637               if (typeof(srv_vars.compellent) == Dictionary) {
 638                 return srv_vars.compellent.contains("disks")
 639               } else {
 640                 log(LogInformationen, "checkcommand set_if", "custom attribute compellent_checks is not a dictionary, ignoring it.")
 641                 return false
 642               }
 643             } else {
 644               log(LogWarning, "checkcommand set_if", "empty custom attributes")
 645               return false
 646             }
 647           }}
 648         }
 649
 650
 651 #### <a id="use-functions-command-attribute"></a> Use Functions as Command Attribute
 652
 653 This comes in handy for [NotificationCommands](9-object-types.md#objecttype-notificationcommand)
 654 or [EventCommands](9-object-types.md#objecttype-eventcommand) which does not require
 655 a returned checkresult including state/output.
 656
 657 The following example was taken from the community support channels. The requirement was to
 658 specify a custom attribute inside the notification apply rule and decide which notification
 659 script to call based on that.
 660
 661     object User "short-dummy" {
 662     }
 663
 664     object UserGroup "short-dummy-group" {
 665       assign where user.name == "short-dummy"
 666     }
 667
 668     apply Notification "mail-admins-short" to Host {
 669        import "mail-host-notification"
 670        command = "mail-host-notification-test"
 671        user_groups = [ "short-dummy-group" ]
 672        vars.short = true
 673        assign where host.vars.notification.mail
 674     }
 675
 676 The solution is fairly simple: The `command` attribute is implemented as function returning
 677 an array required by the caller Icinga 2.
 678 The local variable `mailscript` sets the default value for the notification scrip location.
 679 If the notification custom attribute `short` is set, it will override the local variable `mailscript`
 680 with a new value.
 681 The `mailscript` variable is then used to compute the final notification command array being
 682 returned.
 683
 684 You can omit the `log()` calls, they only help debugging.
 685
 686     object NotificationCommand "mail-host-notification-test" {
 687       command = {{
 688         log("command as function")
 689         var mailscript = "mail-host-notification-long.sh"
 690         if (notification.vars.short) {
 691            mailscript = "mail-host-notification-short.sh"
 692         }
 693         log("Running command")
 694         log(mailscript)
 695
 696         var cmd = [ SysconfDir + "/icinga2/scripts/" + mailscript ]
 697         log(LogCritical, "me", cmd)
 698         return cmd
 699       }}
 700
 701       env = {
 702       }
 703     }
 704
 705 #### <a id="custom-functions-as-attribute"></a> Use Custom Functions as Attribute
 706
 707 To use custom functions as attributes, the function must be defined in a
 708 slightly unexpected way. The following example shows how to assign values
 709 depending on group membership. All hosts in the `slow-lan` host group use 300
 710 as value for `ping_wrta`, all other hosts use 100.
 711
 712     globals.group_specific_value = function(group, group_value, non_group_value) {
 713         return function() use (group, group_value, non_group_value) {
 714             if (group in host.groups) {
 715                 return group_value
 716             } else {
 717                 return non_group_value
 718             }
 719         }
 720     }
 721
 722     apply Service "ping4" {
 723         import "generic-service"
 724         check_command = "ping4"
 725
 726         vars.ping_wrta = group_specific_value("slow-lan", 300, 100)
 727         vars.ping_crta = group_specific_value("slow-lan", 500, 200)
 728
 729         assign where true
 730     }
 731
 732 #### <a id="use-functions-assign-where"></a> Use Functions in Assign Where Expressions
 733
 734 If a simple expression for matching a name or checking if an item
 735 exists in an array or dictionary does not fit, you should consider
 736 writing your own global [functions](17-language-reference.md#functions).
 737 You can call them inside `assign where` and `ignore where` expressions
 738 for [apply rules](3-monitoring-basics.md#using-apply-expressions) or
 739 [group assignments](3-monitoring-basics.md#group-assign-intro) just like
 740 any other global functions for example [match](18-library-reference.md#global-functions-match).
 741
 742 The following example requires the host `myprinter` being added
 743 to the host group `printers-lexmark` but only if the host uses
 744 a template matching the name `lexmark*`.
 745
 746     template Host "lexmark-printer-host" {
 747       vars.printer_type = "Lexmark"
 748     }
 749
 750     object Host "myprinter" {
 751       import "generic-host"
 752       import "lexmark-printer-host"
 753
 754       address = "192.168.1.1"
 755     }
 756
 757     /* register a global function for the assign where call */
 758     globals.check_host_templates = function(host, search) {
 759       /* iterate over all host templates and check if the search matches */
 760       for (tmpl in host.templates) {
 761         if (match(search, tmpl)) {
 762           return true
 763         }
 764       }
 765
 766       /* nothing matched */
 767       return false
 768     }
 769
 770     object HostGroup "printers-lexmark" {
 771       display_name = "Lexmark Printers"
 772       /* call the global function and pass the arguments */
 773       assign where check_host_templates(host, "lexmark*")
 774     }
 775
 776
 777 Take a different more complex example: All hosts with the
 778 custom attribute `vars_app` as nested dictionary should be
 779 added to the host group `ABAP-app-server`. But only if the
 780 `app_type` for all entries is set to `ABAP`.
 781
 782 It could read as wildcard match for nested dictionaries:
 783
 784     where host.vars.vars_app["*"].app_type == "ABAP"
 785
 786 The solution for this problem is to register a global
 787 function which checks the `app_type` for all hosts
 788 with the `vars_app` dictionary.
 789
 790     object Host "appserver01" {
 791       check_command = "dummy"
 792       vars.vars_app["ABC"] = { app_type = "ABAP" }
 793     }
 794     object Host "appserver02" {
 795       check_command = "dummy"
 796       vars.vars_app["DEF"] = { app_type = "ABAP" }
 797     }
 798
 799     globals.check_app_type = function(host, type) {
 800       /* ensure that other hosts without the custom attribute do not match */
 801       if (typeof(host.vars.vars_app) != Dictionary) {
 802         return false
 803       }
 804
 805       /* iterate over the vars_app dictionary */
 806       for (key => val in host.vars.vars_app) {
 807         /* if the value is a dictionary and if contains the app_type being the requested type */
 808         if (typeof(val) == Dictionary && val.app_type == type) {
 809           return true
 810         }
 811       }
 812
 813       /* nothing matched */
 814       return false
 815     }
 816
 817     object HostGroup "ABAP-app-server" {
 818       assign where check_app_type(host, "ABAP")
 819     }
 820
 821 ### <a id="access-object-attributes-at-runtime"></a> Access Object Attributes at Runtime
 822
 823 The [Object Accessor Functions](18-library-reference.md#object-accessor-functions)
 824 can be used to retrieve references to other objects by name.
 825
 826 This allows you to access configuration and runtime object attributes. A detailed
 827 list can be found [here](9-object-types.md#object-types).
 828
 829 Simple cluster example for accessing two host object states and calculating a virtual
 830 cluster state and output:
 831
 832     object Host "cluster-host-01" {
 833       check_command = "dummy"
 834       vars.dummy_state = 2
 835       vars.dummy_text = "This host is down."
 836     }
 837
 838     object Host "cluster-host-02" {
 839       check_command = "dummy"
 840       vars.dummy_state = 0
 841       vars.dummy_text = "This host is up."
 842     }
 843
 844     object Host "cluster" {
 845       check_command = "dummy"
 846       vars.cluster_nodes = [ "cluster-host-01", "cluster-host-02" ]
 847
 848       vars.dummy_state = {{
 849         var up_count = 0
 850         var down_count = 0
 851         var cluster_nodes = macro("$cluster_nodes$")
 852
 853         for (node in cluster_nodes) {
 854           if (get_host(node).state > 0) {
 855             down_count += 1
 856           } else {
 857             up_count += 1
 858           }
 859         }
 860
 861         if (up_count >= down_count) {
 862           return 0 //same up as down -> UP
 863         } else {
 864           return 2 //something is broken
 865         }
 866       }}
 867
 868       vars.dummy_text = {{
 869         var output = "Cluster hosts:\n"
 870         var cluster_nodes = macro("$cluster_nodes$")
 871
 872         for (node in cluster_nodes) {
 873           output += node + ": " + get_host(node).last_check_result.output + "\n"
 874         }
 875
 876         return output
 877       }}
 878     }
 879
 880
 881 The following example sets time dependent thresholds for the load check based on the current
 882 time of the day compared to the defined time period.
 883
 884     object TimePeriod "backup" {
 885       import "legacy-timeperiod"
 886
 887       ranges = {
 888         monday = "02:00-03:00"
 889         tuesday = "02:00-03:00"
 890         wednesday = "02:00-03:00"
 891         thursday = "02:00-03:00"
 892         friday = "02:00-03:00"
 893         saturday = "02:00-03:00"
 894         sunday = "02:00-03:00"
 895       }
 896     }
 897
 898     object Host "webserver-with-backup" {
 899       check_command = "hostalive"
 900       address = "127.0.0.1"
 901     }
 902
 903     object Service "webserver-backup-load" {
 904       check_command = "load"
 905       host_name = "webserver-with-backup"
 906
 907       vars.load_wload1 = {{
 908         if (get_time_period("backup").is_inside) {
 909           return 20
 910         } else {
 911           return 5
 912         }
 913       }}
 914       vars.load_cload1 = {{
 915         if (get_time_period("backup").is_inside) {
 916           return 40
 917         } else {
 918           return 10
 919         }
 920       }}
 921     }
 922
 923