granicus.if.org Git - icinga2/blob - doc/3-monitoring-basics.md

   1 # <a id="monitoring-basics"></a> Monitoring Basics
   2
   3 This part of the Icinga 2 documentation provides an overview of all the basic
   4 monitoring concepts you need to know to run Icinga 2.
   5
   6 ## <a id="hosts-services"></a> Hosts and Services
   7
   8 Icinga 2 can be used to monitor the availability of hosts and services. Hosts
   9 and services can be virtually anything which can be checked in some way:
  10
  11 * Network services (HTTP, SMTP, SNMP, SSH, etc.)
  12 * Printers
  13 * Switches / Routers
  14 * Temperature Sensors
  15 * Other local or network-accessible services
  16
  17 Host objects provide a mechanism to group services that are running
  18 on the same physical device.
  19
  20 Here is an example of a host object which defines two child services:
  21
  22     object Host "my-server1" {
  23       address = "10.0.0.1"
  24       check_command = "hostalive"
  25     }
  26
  27     object Service "ping4" {
  28       host_name = "localhost"
  29       check_command = "ping4"
  30     }
  31
  32     object Service "http" {
  33       host_name = "localhost"
  34       check_command = "http_ip"
  35     }
  36
  37 The example creates two services `ping4` and `http` which belong to the
  38 host `my-server1`.
  39
  40 It also specifies that the host should perform its own check using the `hostalive`
  41 check command.
  42
  43 The `address` custom attribute is used by check commands to determine which network
  44 address is associated with the host object.
  45
  46 ### <a id="host-states"></a> Host States
  47
  48 Hosts can be in any of the following states:
  49
  50   Name        | Description
  51   ------------|--------------
  52   UP          | The host is available.
  53   DOWN        | The host is unavailable.
  54
  55 ### <a id="service-states"></a> Service States
  56
  57 Services can be in any of the following states:
  58
  59   Name        | Description
  60   ------------|--------------
  61   OK          | The service is working properly.
  62   WARNING     | The service is experiencing some problems but is still considered to be in working condition.
  63   CRITICAL    | The service is in a critical state.
  64   UNKNOWN     | The check could not determine the service's state.
  65
  66 ### <a id="hard-soft-states"></a> Hard and Soft States
  67
  68 When detecting a problem with a host/service Icinga re-checks the object a number of
  69 times (based on the `max_check_attempts` and `retry_interval` settings) before sending
  70 notifications. This ensures that no unnecessary notifications are sent for
  71 transient failures. During this time the object is in a `SOFT` state.
  72
  73 After all re-checks have been executed and the object is still in a non-OK
  74 state the host/service switches to a `HARD` state and notifications are sent.
  75
  76   Name        | Description
  77   ------------|--------------
  78   HARD        | The host/service's state hasn't recently changed.
  79   SOFT        | The host/service has recently changed state and is being re-checked.
  80
  81
  82 ## <a id="using-templates"></a> Using Templates
  83
  84 Templates may be used to apply a set of identical attributes to more than one
  85 object:
  86
  87     template Service "generic-service" {
  88       max_check_attempts = 3
  89       check_interval = 5m
  90       retry_interval = 1m
  91       enable_perfdata = true
  92     }
  93
  94     object Service "ping4" {
  95       import "generic-service"
  96
  97       host_name = "localhost"
  98       check_command = "ping4"
  99     }
 100
 101     object Service "ping6" {
 102       import "generic-service"
 103
 104       host_name = "localhost"
 105       check_command = "ping6"
 106     }
 107
 108 In this example the `ping4` and `ping6` services inherit properties from the
 109 template `generic-service`.
 110
 111 Objects as well as templates themselves can import an arbitrary number of
 112 templates. Attributes inherited from a template can be overridden in the
 113 object if necessary.
 114
 115 ## <a id="using-apply"></a> Apply objects based on rules
 116
 117 Instead of assigning each object (`Service`, `Notification`, `Dependency`, `ScheduledDowntime`)
 118 based on attribute identifiers for example `host_name` objects can be [applied](#apply).
 119
 120     apply Service "load" {
 121       import "generic-service"
 122
 123       check_command = "load"
 124
 125       assign where "linux-server" in host.groups
 126       ignore where host.vars.no_load_check
 127     }
 128
 129 In this example the `load` service will be created as object for all hosts in the `linux-server`
 130 host group. If the `no_load_check` custom attribute is set, the host will be
 131 ignored.
 132
 133 Notifications are applied to specific targets (`Host` or `Service`) and work in a similar
 134 manner:
 135
 136     apply Notification "mail-noc" to Service {
 137       import "mail-service-notification"
 138       command = "mail-service-notification"
 139       user_groups = [ "noc" ]
 140
 141       assign where service.vars.sla == "24x7"
 142     }
 143
 144 In this example the `mail-noc` notification will be created as object for all services having the
 145 `sla` custom attribute set to `24x7`. The notification command is set to `mail-service-notification`
 146 and all members of the user group `noc` will get notified.
 147
 148 `Dependency` and `ScheduledDowntime` objects can be applied in a similar fashion.
 149
 150
 151 ## <a id="groups"></a> Groups
 152
 153 Groups are used for combining hosts, services, and users into
 154 accessible configuration attributes and views in external (web)
 155 interfaces.
 156
 157 Group membership is defined at the respective object itself. If
 158 you have a hostgroup name `windows` for example, and want to assign
 159 specific hosts to this group for later viewing the group on your
 160 alert dashboard, first create the hostgroup:
 161
 162     object HostGroup "windows" {
 163       display_name = "Windows Servers"
 164     }
 165
 166 Then add your hosts to this hostgroup
 167
 168     template Host "windows-server" {
 169       groups += [ "windows" ]
 170     }
 171
 172     object Host "mssql-srv1" {
 173       import "windows-server"
 174
 175       vars.mssql_port = 1433
 176     }
 177
 178     object Host "mssql-srv2" {
 179       import "windows-server"
 180
 181       vars.mssql_port = 1433
 182     }
 183
 184 This can be done for service and user groups the same way. Additionally
 185 the user groups are associated as attributes in `Notification` objects.
 186
 187     object UserGroup "windows-mssql-admins" {
 188       display_name = "Windows MSSQL Admins"
 189     }
 190
 191     template User "generic-windows-mssql-users" {
 192       groups += [ "windows-mssql-admins" ]
 193     }
 194
 195     object User "win-mssql-noc" {
 196       import "generic-windows-mssql-users"
 197
 198       email = "noc@example.com"
 199     }
 200
 201     object User "win-mssql-ops" {
 202       import "generic-windows-mssql-users"
 203
 204       email = "ops@example.com"
 205     }
 206
 207 ### <a id="group-assign"></a> Group Membership Assign
 208
 209 If there is a certain number of hosts, services or users matching a pattern
 210 it's reasonable to assign the group object to these members.
 211 Details on the `assign where` syntax can be found [here](#apply)
 212
 213     object HostGroup "mssql" {
 214       display_name = "MSSQL Servers"
 215       assign where host.vars.mssql_port
 216     }
 217
 218 In this inherited example from above all hosts with the `var` `mssql_port`
 219 set will be added as members to the host group `mssql`.
 220
 221 ## <a id="notifications"></a> Notifications
 222
 223 Notifications for service and host problems are an integral part of your
 224 monitoring setup.
 225
 226 When a host or service is in a downtime, a problem has been acknowledged or
 227 the dependency logic determined that the host/service is unreachable, no
 228 notirications are sent. You can configure additional type and state filters
 229 refining the notifications being actually sent.
 230
 231 There are many ways of sending notifications, e.g. by e-mail, XMPP,
 232 IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications.
 233 Instead it relies on external mechanisms such as shell scripts to notify users.
 234
 235 A notification specification requires one or more users (and/or user groups)
 236 who will be notified in case of problems. These users must have all custom
 237 attributes defined which will be used in the `NotificationCommand` on execution.
 238
 239 The user `icingaadmin` in the example below will get notified only on `WARNING` and
 240 `CRITICAL` states and `problem` and `recovery` notification types.
 241
 242     object User "icingaadmin" {
 243       display_name = "Icinga 2 Admin"
 244       enable_notifications = true
 245       states = [ OK, Warning, Critical ]
 246       types = [ Problem, Recovery ]
 247       email = "icinga@localhost"
 248     }
 249
 250 If you don't set the `states` and `types`
 251 configuration attributes for the `User` object, notifications for all states and types
 252 will be sent.
 253
 254 You should choose which information you (and your notified users) are interested in
 255 case of emergency, and also which information does not provide any value to you and
 256 your environment.
 257
 258 An example notification command is explained [here](#notification-commands).
 259
 260 You can add all shared attributes to a `Notification` template which is inherited
 261 to the defined notifications. That way you'll save duplicated attributes in each
 262 `Notification` object. Attributes can be overridden locally.
 263
 264     template Notification "generic-notification" {
 265       interval = 15m
 266
 267       command = "mail-service-notification"
 268
 269       states = [ Warning, Critical, Unknown ]
 270       types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
 271                 FlappingEnd, DowntimeStart,DowntimeEnd, DowntimeRemoved ]
 272
 273       period = "24x7"
 274     }
 275
 276 The time period `24x7` is shipped as example configuration with Icinga 2.
 277
 278 Use the `apply` keyword to create `Notification` objects for your services:
 279
 280     apply Notification "mail" to Service {
 281       import "generic-notification"
 282
 283       command = "mail-notification"
 284       users = [ "icingaadmin" ]
 285
 286       assign where service.name == "mysql"
 287     }
 288
 289 Instead of assigning users to notifications, you can also add the `user_groups`
 290 attribute with a list of user groups to the `Notification` object. Icinga 2 will
 291 send notifications to all group members.
 292
 293 ### <a id="notification-escalations"></a> Notification Escalations
 294
 295 When a problem notification is sent and a problem still exists after re-notification
 296 you may want to escalate the problem to the next support level. A different approach
 297 is to configure the default notification by email, and escalate the problem via sms
 298 if not already solved.
 299
 300 You can define notification start and end times as additional configuration
 301 attributes making the `Notification` object a so-called `notification escalation`.
 302 Using templates you can share the basic notification attributes such as users or the
 303 `interval` (and override them for the escalation then).
 304
 305 Using the example from above, you can define additional users being escalated for sms
 306 notifications between start and end time.
 307
 308     object User "icinga-oncall-2nd-level" {
 309       display_name = "Icinga 2nd Level"
 310
 311       vars.mobile = "+1 555 424642"
 312     }
 313
 314     object User "icinga-oncall-1st-level" {
 315       display_name = "Icinga 1st Level"
 316
 317       vars.mobile = "+1 555 424642"
 318     }
 319
 320 Define an additional `NotificationCommand` for SMS notifications.
 321
 322 > **Note**
 323 >
 324 > The example is not complete as there are many different SMS providers.
 325 > Please note that sending SMS notifications will require an SMS provider
 326 > or local hardware with a SIM card active.
 327
 328     object NotificationCommand "sms-notification" {
 329        command = [
 330          PluginDir + "/send_sms_notification",
 331          "$mobile$",
 332          "..."
 333     }
 334
 335 The two new notification escalations are added onto the host `localhost`
 336 and its service `ping4` using the `generic-notification` template.
 337 The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification`
 338 command) after `30m` until `1h`.
 339
 340 > **Note**
 341 >
 342 > The `interval` was set to 15m in the `generic-notification`
 343 > template example. Lower that value in your escalations by using a secondary
 344 > template or overriding the attribute directly in the `notifications` array
 345 > position for `escalation-sms-2nd-level`.
 346
 347 If the problem does not get resolved or acknowledged preventing further notifications
 348 the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was
 349 notified, but only for one hour (`2h` as `end` key for the `times` dictionary).
 350
 351     apply Notification "mail" to Service {
 352       import "generic-notification"
 353
 354       command = "mail-notification"
 355       users = [ "icingaadmin" ]
 356
 357       assign where service.name == "ping4"
 358     }
 359
 360     apply Notification "escalation-sms-2nd-level" to Service {
 361       import "generic-notification"
 362
 363       command = "sms-notification"
 364       users = [ "icinga-oncall-2nd-level" ]
 365
 366       times = {
 367         begin = 30m
 368         end = 1h
 369       }
 370
 371       assign where service.name == "ping4"
 372     }
 373
 374     apply Notification "escalation-sms-1st-level" to Service {
 375       import "generic-notification"
 376
 377       command = "sms-notification"
 378       users = [ "icinga-oncall-1st-level" ]
 379
 380       times = {
 381         begin = 1h
 382         end = 2h
 383       }
 384
 385       assign where service.name == "ping4"
 386     }
 387
 388 ### <a id="first-notification-delay"></a> First Notification Delay
 389
 390 Sometimes the problem in question should not be notified when the first notification
 391 happens, but a defined time duration afterwards. In Icinga 2 you can use the `times`
 392 dictionary and set `begin = 15m` as key and value if you want to suppress notifications
 393 in the first 15 minutes. Leave out the `end` key - if not set, Icinga 2 will not check against any
 394 end time for this notification.
 395
 396     apply Notification "mail" to Service {
 397       import "generic-notification"
 398
 399       command = "mail-notification"
 400       users = [ "icingaadmin" ]
 401
 402       times.begin = 15m // delay first notification
 403
 404       assign where service.name == "ping4"
 405     }
 406
 407 ### <a id="notification-filters-state-type"></a> Notification Filters by State and Type
 408
 409 If there are no notification state and type filter attributes defined at the `Notification`
 410 or `User` object Icinga 2 assumes that all states and types are being notified.
 411
 412 Available state and type filters for notifications are:
 413
 414     template Notification "generic-notification" {
 415
 416       states = [ Warning, Critical, Unknown ]
 417       types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
 418                 FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
 419     }
 420
 421 If you are familiar with Icinga 1.x `notification_options` please note that they have been split
 422 into type and state, and allow more fine granular filtering for example on downtimes and flapping.
 423 You can filter for acknowledgements and custom notifications too.
 424
 425
 426 ## <a id="timeperiods"></a> Time Periods
 427
 428 Time Periods define time ranges in Icinga where event actions are
 429 triggered, for example whether a service check is executed or not within
 430 the `check_period` attribute. Or a notification should be sent to
 431 users or not, filtered by the `period` and `notification_period`
 432 configuration attributes for `Notification` and `User` objects.
 433
 434 > **Note**
 435 >
 436 > If you are familar with Icinga 1.x - these time period definitions
 437 > are called `legacy timeperiods` in Icinga 2.
 438 >
 439 > An Icinga 2 legacy timeperiod requires the `ITL` provided template
 440 >`legacy-timeperiod`.
 441
 442 The `TimePeriod` attribute `ranges` may contain multiple directives,
 443 including weekdays, days of the month, and calendar dates.
 444 These types may overlap/override other types in your ranges dictionary.
 445
 446 The descending order of precedence is as follows:
 447
 448 * Calendar date (2008-01-01)
 449 * Specific month date (January 1st)
 450 * Generic month date (Day 15)
 451 * Offset weekday of specific month (2nd Tuesday in December)
 452 * Offset weekday (3rd Monday)
 453 * Normal weekday (Tuesday)
 454
 455 If you don't set any `check_period` or `notification_period` attribute
 456 on your configuration objects Icinga 2 assumes `24x7` as time period
 457 as shown below.
 458
 459     object TimePeriod "24x7" {
 460       import "legacy-timeperiod"
 461
 462       display_name = "Icinga 2 24x7 TimePeriod"
 463       ranges = {
 464         "monday"    = "00:00-24:00"
 465         "tuesday"   = "00:00-24:00"
 466         "wednesday" = "00:00-24:00"
 467         "thursday"  = "00:00-24:00"
 468         "friday"    = "00:00-24:00"
 469         "saturday"  = "00:00-24:00"
 470         "sunday"    = "00:00-24:00"
 471       }
 472     }
 473
 474 If your operation staff should only be notified during workhours
 475 create a new timeperiod named `workhours` defining a work day from
 476 09:00 to 17:00.
 477
 478     object TimePeriod "workhours" {
 479       import "legacy-timeperiod"
 480
 481       display_name = "Icinga 2 8x5 TimePeriod"
 482       ranges = {
 483         "monday"    = "09:00-17:00"
 484         "tuesday"   = "09:00-17:00"
 485         "wednesday" = "09:00-17:00"
 486         "thursday"  = "09:00-17:00"
 487         "friday"    = "09:00-17:00"
 488       }
 489     }
 490
 491 Use the `period` attribute to assign time periods to
 492 `Notification` and `Dependency` objects:
 493
 494     object Notification "mail" {
 495       import "generic-notification"
 496
 497       host_name = "localhost"
 498
 499       command = "mail-notification"
 500       users = [ "icingaadmin" ]
 501       period = "workhours"
 502     }
 503
 504
 505 ## <a id="commands"></a> Commands
 506
 507 Icinga 2 uses three different command object types to specify how
 508 checks should be performed, notifications should be sent and
 509 events should be handled.
 510
 511 ### <a id="command-environment-variables"></a> Environment Variables for Commands
 512
 513 Please check [Runtime Custom Attributes as Environment Variables](#runtime-custom-attribute-env-vars).
 514
 515
 516 ### <a id="check-commands"></a> Check Commands
 517
 518 `CheckCommand` objects define the command line how a check is called.
 519
 520 #### <a id="command-plugin-integration"></a> Integrate the Plugin with a CheckCommand Definition
 521
 522 `CheckCommand` objects require the [ITL template](#itl-plugin-check-command)
 523 `plugin-check-command` to support native plugin based check methods.
 524
 525 Unless you have done so already, download your check plugin and put it
 526 into the `PluginDir` directory. The following example uses the
 527 `check_disk` plugin shipped with the Monitoring Plugins package.
 528
 529 The plugin path and all command arguments are made a list of
 530 double-quoted string arguments for proper shell escaping.
 531
 532 Call the `check_disk` plugin with the `--help` parameter to see
 533 all available options. Our example defines warning (`-w`) and
 534 critical (`-c`) thresholds for the disk usage. Without any
 535 partition defined (`-p`) it will check all local partitions.
 536
 537     icinga@icinga2 $ /usr/lib/nagios/plugins/check_disk --help
 538     ...
 539     This plugin checks the amount of used disk space on a mounted file system
 540     and generates an alert if free space is less than one of the threshold values
 541
 542
 543     Usage:
 544      check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device}
 545     [-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ]
 546     [-t timeout] [-u unit] [-v] [-X type] [-N type]
 547     ...
 548
 549 Next step is to understand how command parameters are being passed from
 550 a host or service object, and add a `CheckCommand` definition based on these
 551 required parameters and/or default values.
 552
 553 #### <a id="command-passing-parameters"></a> Passing Check Command Parameters from Host or Service
 554
 555 Unline Icinga 1.x check command parameters are defined as custom attributes
 556 which can be accessed as runtime macros by the executed check command.
 557
 558 Define the default check command custom attribute `disk_wfree` and `disk_cfree`
 559 (freely definable naming schema) and their default threshold values. You can
 560 then use these custom attributes as runtime macros on the command line.
 561
 562 The default custom attributes can be overridden by the custom attributes
 563 defined in the service using the check command `disk`. The custom attributes
 564 can also be inherited from a parent template using additive inheritance (`+=`).
 565
 566     object CheckCommand "disk" {
 567       import "plugin-check-command"
 568
 569       command = [
 570         PluginDir + "/check_disk",
 571         "-w", "$disk_wfree$%",
 572         "-c", "$disk_cfree$%"
 573       ],
 574
 575       vars.disk_wfree = 20
 576       vars.disk_cfree = 10
 577     }
 578
 579 The host `localhost` with the service `disk` checks all disks with modified
 580 custom attributes (warning thresholds at `10%`, critical thresholds at `5%`
 581 free disk space).
 582
 583     object Host "localhost" {
 584       import "generic-host"
 585
 586       address = "127.0.0.1"
 587       address6 = "::1"
 588     }
 589
 590     object Service "disk" {
 591       import "generic-service"
 592
 593       host_name = "localhost"
 594       check_command = "disk"
 595
 596       vars.disk_wfree = 10
 597       vars.disk_cfree = 5
 598     }
 599
 600
 601 ### <a id="notification-commands"></a> Notification Commands
 602
 603 `NotificationCommand` objects define how notifications are delivered to external
 604 interfaces (E-Mail, XMPP, IRC, Twitter, etc).
 605
 606 `NotificationCommand` objects require the [ITL template](#itl-plugin-notification-command)
 607 `plugin-notification-command` to support native plugin-based notifications.
 608
 609 Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for
 610 the current check output) sending an email to the user(s) associated with the
 611 notification itself (`$user.email$`).
 612
 613 If you want to specify default values for some of the custom attribute definitions,
 614 you can add a `vars` dictionary as shown for the `CheckCommand` object.
 615
 616     object NotificationCommand "mail-service-notification" {
 617       import "plugin-notification-command"
 618
 619       command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ]
 620
 621       env = {
 622         "NOTIFICATIONTYPE" = "$notification.type$"
 623         "SERVICEDESC" = "$service.name$"
 624         "HOSTALIAS" = "$host.display_name$",
 625         "HOSTADDRESS" = "$address$",
 626         "SERVICESTATE" = "$service.state$",
 627         "LONGDATETIME" = "$icinga.long_date_time$",
 628         "SERVICEOUTPUT" = "$service.output$",
 629         "NOTIFICATIONAUTHORNAME" = "$notification.author$",
 630         "NOTIFICATIONCOMMENT" = "$notification.comment$",
 631         "HOSTDISPLAYNAME" = "$host.display_name$",
 632         "SERVICEDISPLAYNAME" = "$service.display_name$",
 633         "USEREMAIL" = "$user.email$"
 634       }
 635     }
 636
 637 The command attribute in the `mail-service-notification` command refers to the following
 638 shell script. The macros specified in the `env` array are exported
 639 as environment variables and can be used in the notification script:
 640
 641     #!/usr/bin/env bash
 642     template=$(cat <<TEMPLATE
 643     ***** Icinga  *****
 644
 645     Notification Type: $NOTIFICATIONTYPE
 646
 647     Service: $SERVICEDESC
 648     Host: $HOSTALIAS
 649     Address: $HOSTADDRESS
 650     State: $SERVICESTATE
 651
 652     Date/Time: $LONGDATETIME
 653
 654     Additional Info: $SERVICEOUTPUT
 655
 656     Comment: [$NOTIFICATIONAUTHORNAME] $NOTIFICATIONCOMMENT
 657     TEMPLATE
 658     )
 659
 660     /usr/bin/printf "%b" $template | mail -s "$NOTIFICATIONTYPE - $HOSTDISPLAYNAME - $SERVICEDISPLAYNAME is $SERVICESTATE" $USEREMAIL
 661
 662 While it's possible to specify the entire notification command right
 663 in the NotificationCommand object it is generally advisable to create a
 664 shell script in the `/etc/icinga2/scripts` directory and have the
 665 NotificationCommand object refer to that.
 666
 667 ### <a id="event-commands"></a> Event Commands
 668
 669 Unlike notifications event commands are called on every service state change
 670 if defined. Therefore the `EventCommand` object should define a command line
 671 evaluating the current service state and other service runtime attributes
 672 available through runtime vars. Runtime macros such as `$SERVICESTATETYPE$`
 673 and `$SERVICESTATE$` will be processed by Icinga 2 helping on fine-granular
 674 events being triggered.
 675
 676 Common use case scenarios are a failing HTTP check requiring an immediate
 677 restart via event command, or if an application is locked and requires
 678 a restart upon detection.
 679
 680 `EventCommand` objects require the ITL template `plugin-event-command`
 681 to support native plugin based checks.
 682
 683 When the event command is triggered on a service state change, it will
 684 send a check result using the `process_check_result` script forcibly
 685 changing the service state back to `OK` (`-r 0`) providing some debug
 686 information in the check output (`-o`).
 687
 688     object EventCommand "plugin-event-process-check-result" {
 689       import "plugin-event-command"
 690
 691       command = [
 692         PluginDir + "/process_check_result",
 693         "-H", "$host.name$",
 694         "-S", "$service.name$",
 695         "-c", LocalStateDir + "/run/icinga2/cmd/icinga2.cmd",
 696         "-r", "0",
 697         "-o", "Event Handler triggered in state '$service.state$' with output '$service.output$'."
 698       ]
 699     }
 700
 701 ### <a id="commands-arguments"></a> Command Arguments
 702
 703 By defining a check command line using the `command` attribute Icinga 2
 704 will resolve all macros in the static string or array. Sometimes it is
 705 required to extend the arguments list based on a met condition evaluated
 706 at command execution. Or making arguments optional - only set if the
 707 macro value can be resolved by Icinga 2.
 708
 709     object CheckCommand "check_http" {
 710       import "plugin-check-command"
 711
 712       command = PluginDir + "/check_http"
 713
 714       arguments = {
 715         "-H" = "$http_vhost$"
 716         "-I" = "$http_address$"
 717         "-u" = "$http_uri$"
 718         "-p" = "$http_port$"
 719         "-S" = {
 720           set_if = "$http_ssl$"
 721         }
 722         "-w" = "$http_warn_time$"
 723         "-c" = "$http_critical_time$"
 724       }
 725
 726       vars.http_address = "$address$"
 727       vars.http_ssl = false
 728     }
 729
 730 The example shows the `check_http` check command defining the most common
 731 arguments. Each of them is optional by default and will be omitted if
 732 the value is not set. For example if the service calling the check command
 733 does not have `vars.http_port` set, it won't get added to the command
 734 line.
 735 If the `vars.http_ssl` custom attribute is set in the service, host or command
 736 object definition, Icinga 2 will add the `-S` argument based on the `set_if`
 737 option to the command line.
 738 That way you can use the `check_http` command definition for both, with and
 739 without SSL enabled checks saving you duplicated command definitions.
 740
 741 Details on all available options can be found in the
 742 [CheckCommand object definition](#objecttype-checkcommand).
 743
 744
 745 ## <a id="dependencies"></a> Dependencies
 746
 747 Icinga 2 uses host and service [Dependency](#objecttype-dependency) objects
 748 for determing their network reachability.
 749 The `parent_host_name` and `parent_service_name` attributes are mandatory for
 750 service dependencies, `parent_host_name` is required for host dependencies.
 751
 752 A service can depend on a host, and vice versa. A service has an implicit
 753 dependency (parent) to its host. A host to host dependency acts implicit
 754 as host parent relation.
 755 When dependencies are calculated, not only the immediate parent is taken into
 756 account but all parents are inherited.
 757
 758 Notifications are suppressed if a host or service becomes unreachable.
 759
 760 A common scenario is the Icinga 2 server behind a router. Checking internet
 761 access by pinging the Google DNS server `google-dns` is a common method, but
 762 will fail in case the `dsl-router` host is down. Therefore the example below
 763 defines a host dependency which acts implicit as parent relation too.
 764
 765 Furthermore the host may be reachable but ping probes are dropped by the
 766 router's firewall. In case the `dsl-router``ping4` service check fails, all
 767 further checks for the `ping4` service on host `google-dns` service should
 768 be suppressed. This is achieved by setting the `disable_checks` attribute to `true`.
 769
 770     object Host "dsl-router" {
 771       address = "192.168.1.1"
 772     }
 773
 774     object Host "google-dns" {
 775       address = "8.8.8.8"
 776     }
 777
 778     apply Service "ping4" {
 779       import "generic-service"
 780
 781       check_command = "ping4"
 782
 783       assign where host.address
 784     }
 785
 786     apply Dependency "internet" to Service {
 787       parent_host_name = "dsl-router"
 788       disable_checks = true
 789
 790       assign where host.name != "dsl-router"
 791     }
 792
 793 Another classic example are agent based checks. You would define a health check
 794 for the agent daemon responding to your requests, and make all other services
 795 querying that daemon depend on that health check.
 796
 797 The following configuration defines two nrpe based service checks `nrpe-load`
 798 and `nrpe-disk` applied to the `nrpe-server`. The health check is defined as
 799 `nrpe-health` service.
 800
 801     apply Service "nrpe-health" {
 802       import "generic-service"
 803       check_command = "nrpe"
 804       assign where match("nrpe-*", host.name)
 805     }
 806
 807     apply Service "nrpe-load" {
 808       import "generic-service"
 809       check_command = "nrpe"
 810       vars.nrpe_command = "check_load"
 811       assign where match("nrpe-*", host.name)
 812     }
 813
 814     apply Service "nrpe-disk" {
 815       import "generic-service"
 816       check_command = "nrpe"
 817       vars.nrpe_command = "check_disk"
 818       assign where match("nrpe-*", host.name)
 819     }
 820
 821     object Host "nrpe-server" {
 822       import "generic-host"
 823       address = "192.168.1.5",
 824     }
 825
 826     apply Dependency "disable-nrpe-checks" to Service {
 827       parent_service_name = "nrpe-health"
 828
 829       states = [ Warning, Critical, Unknown ]
 830       disable_checks = true
 831       disable_notifications = true
 832       assign where match("nrpe-*", host.name)
 833       ignore where service.name == "nrpe-health"
 834     }
 835
 836 The `disable-nrpe-checks` dependency is applied to all services
 837 on the `nrpe-service` host but not the `nrpe-health` service itself.
 838
 839
 840 ## <a id="downtimes"></a> Downtimes
 841
 842 Downtimes can be scheduled for planned server maintenance or
 843 any other targetted service outage you are aware of in advance.
 844
 845 Downtimes will suppress any notifications, and may trigger other
 846 downtimes too. If the downtime was set by accident, or the duration
 847 exceeds the maintenance, you can manually cancel the downtime.
 848 Planned downtimes will also be taken into account for SLA reporting
 849 tools calculating the SLAs based on the state and downtime history.
 850
 851 Downtimes may overlap with their start and end times. If there
 852 are multiple downtimes triggered for one object, the overall downtime depth
 853 will be more than `1`. This is useful when you want to extend
 854 your maintenance window taking longer than expected.
 855
 856 If the downtime was scheduled after the problem changed to a critical hard
 857 state triggering a problem notification, and the service recovers during
 858 the downtime window, the recovery notification won't be suppressed.
 859
 860 ### <a id="fixed-flexible-downtimes"></a> Fixed and Flexible Downtimes
 861
 862 A `fixed` downtime will be activated at the defined start time, and
 863 removed at the end time. During this time window the service state
 864 will change to `NOT-OK` and then actually trigger the downtime.
 865 Notifications are suppressed and the downtime depth is incremented.
 866
 867 Common scenarios are a planned distribution upgrade on your linux
 868 servers, or database updates in your warehouse. The customer knows
 869 about a fixed downtime window between 23:00 and 24:00. After 24:00
 870 all problems should be alerted again. Solution is simple -
 871 schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
 872
 873 Unlike a `fixed` downtime, a `flexible` downtime end does not necessarily
 874 happen at the provided end time. Instead the downtime will be triggered
 875 by the state change in the time span defined by start and end time, but
 876 then last a defined duration in minutes.
 877
 878 Imagine the following scenario: Your service is frequently polled
 879 by users trying to grab free deleted domains for immediate registration.
 880 Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
 881 a network outage visible to the monitoring. The service is still alive,
 882 but answering too slow to Icinga 2 service checks.
 883 For that reason, you may want to schedule a downtime between 07:30 and
 884 08:00 with a duration of 15 minutes. The downtime will then last from
 885 its trigger time until the duration is over. After that, the downtime
 886 is removed (may happen before or after the actual end time!).
 887
 888 ### <a id="scheduling-downtime"></a> Scheduling a downtime
 889
 890 This can either happen through a web interface or by sending an [external command](#external-commands)
 891 to the external command pipe provided by the `ExternalCommandListener` configuration.
 892
 893 Fixed downtimes require a start and end time (a duration will be ignored).
 894 Flexible downtimes need a start and end time for the time span, and a duration
 895 independent from that time span.
 896
 897 ### <a id="triggered-downtimes"></a> Triggered Downtimes
 898
 899 This is optional when scheduling a downtime. If there is already a downtime
 900 scheduled for a future maintenance, the current downtime can be triggered by
 901 that downtime. This renders useful if you have scheduled a host downtime and
 902 are now scheduling a child host's downtime getting triggered by the parent
 903 downtime on NOT-OK state change.
 904
 905 ### <a id="recurring-downtimes"></a> Recurring Downtimes
 906
 907 [ScheduledDowntime objects](#objecttype-scheduleddowntime) can be used to set up
 908 recurring downtimes for services.
 909
 910 Example:
 911
 912     apply ScheduledDowntime "backup-downtime" to Service {
 913       author = "icingaadmin"
 914       comment = "Scheduled downtime for backup"
 915
 916       ranges = {
 917         monday = "02:00-03:00"
 918         tuesday = "02:00-03:00"
 919         wednesday = "02:00-03:00"
 920         thursday = "02:00-03:00"
 921         friday = "02:00-03:00"
 922         saturday = "02:00-03:00"
 923         sunday = "02:00-03:00"
 924       }
 925
 926       assign where "backup" in service.groups
 927     }
 928
 929
 930 ## <a id="comments"></a> Comments
 931
 932 Comments can be added at runtime and are persistent over restarts. You can
 933 add useful information for others on repeating incidents (for example
 934 "last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
 935 is primarly accessible using web interfaces.
 936
 937 Adding and deleting comment actions are possible through the external command pipe
 938 provided with the `ExternalCommandListener` configuration. The caller must
 939 pass the comment id in case of manipulating an existing comment.
 940
 941
 942 ## <a id="acknowledgements"></a> Acknowledgements
 943
 944 If a problem is alerted and notified you may signal the other notification
 945 receipients that you are aware of the problem and will handle it.
 946
 947 By sending an acknowledgement to Icinga 2 (using the external command pipe
 948 provided with `ExternalCommandListener` configuration) all future notifications
 949 are suppressed, a new comment is added with the provided description and
 950 a notification with the type `NotificationFilterAcknowledgement` is sent
 951 to all notified users.
 952
 953 ### <a id="expiring-acknowledgements"></a> Expiring Acknowledgements
 954
 955 Once a problem is acknowledged it may disappear from your `handled problems`
 956 dashboard and no-one ever looks at it again since it will suppress
 957 notifications too.
 958
 959 This `fire-and-forget` action is quite common. If you're sure that a
 960 current problem should be resolved in the future at a defined time,
 961 you can define an expiration time when acknowledging the problem.
 962
 963 Icinga 2 will clear the acknowledgement when expired and start to
 964 re-notify if the problem persists.
 965
 966
 967
 968 ## <a id="custom-attributes"></a> Custom Attributes
 969
 970 ### <a id="runtime-custom-attributes"></a> Using Custom Attributes at Runtime
 971
 972 Custom attributes may be used in command definitions to dynamically change how the command
 973 is executed.
 974
 975 Additionally there are Icinga 2 features such as the `PerfDataWriter` type
 976 which use custom attributes to format their output.
 977
 978 > **Tip**
 979 >
 980 > Custom attributes are identified by the 'vars' dictionary attribute as short name.
 981 > Accessing the different attribute keys is possible using the '.' accessor.
 982
 983 Custom attributes in command definitions or performance data templates are evaluated at
 984 runtime when executing a command. These custom attributes cannot be used elsewhere
 985 (e.g. in other configuration attributes).
 986
 987 Here is an example of a command definition which uses user-defined custom attributes:
 988
 989     object CheckCommand "my-ping" {
 990       import "plugin-check-command"
 991
 992       command = [
 993         PluginDir + "/check_ping",
 994         "-4",
 995         "-H", "$address$",
 996         "-w", "$ping_wrta$,$ping_wpl$%",
 997         "-c", "$ping_crta$,$ping_cpl$%",
 998         "-p", "$ping_packets$",
 999         "-t", "$ping_timeout$"
1000       ]
1001
1002       vars.ping_wrta = 100
1003       vars.ping_wpl = 5
1004       vars.ping_crta = 200
1005       vars.ping_cpl = 15
1006       vars.ping_packets = 5
1007       vars.ping_timeout = 0
1008     }
1009
1010 Custom attribute names used at runtime must be enclosed in two `$` signs, e.g.
1011 `$address$`. When using the `$` sign as single character, you need to escape
1012 it with an additional dollar sign (`$$`).
1013
1014 ### <a id="runtime-custom-attributes-evaluation-order"></a> Runtime Custom Attributes Evaluation Order
1015
1016 When executing commands Icinga 2 checks the following objects in this order to look
1017 up custom attributes and their respective values:
1018
1019 1. User object (only for notifications)
1020 2. Service object
1021 3. Host object
1022 4. Command object
1023 5. Global custom attributes in the Vars constant
1024
1025 This execution order allows you to define default values for custom attributes
1026 in your command objects. The `my-ping` command shown above uses this to set
1027 default values for some of the latency thresholds and timeouts.
1028
1029 When using the `my-ping` command you can override all or some of the custom
1030 attributes in the service definition like this:
1031
1032     object Service "ping" {
1033       host_name = "localhost"
1034       check_command = "my-ping"
1035
1036       vars.ping_packets = 10 // Overrides the default value of 5 given in the command
1037     }
1038
1039 If a custom attribute isn't defined anywhere an empty value is used and a warning is
1040 emitted to the Icinga 2 log.
1041
1042 > **Best Practice**
1043 >
1044 > By convention every host should have an `address` attribute. Hosts
1045 > which have an IPv6 address should also have an `address6` attribute.
1046
1047 ### <a id="runtime-custom-attribute-env-vars"></a> Runtime Custom Attributes as Environment Variables
1048
1049 The `env` command object attribute specifies a list of environment variables with values calculated
1050 from either runtime macros or custom attributes which should be exported as environment variables
1051 prior to executing the command.
1052
1053 This is useful for example for hiding sensitive information on the command line output
1054 when passing credentials to database checks:
1055
1056     object CheckCommand "mysql-health" {
1057       import "plugin-check-command",
1058
1059       command = PluginDir + "/check_mysql -H $address$ -d $db$",
1060
1061       vars.mysql_user = "icinga_check",
1062       vars.mysql_pass = "password"
1063
1064       env.MYSQLUSER = "$mysql_user$",
1065       env.MYSQLPASS = "$mysql_pass$"
1066     }
1067
1068 ### <a id="modified-attributes"></a> Modified Attributes
1069
1070 Icinga 2 allows you to modify defined object attributes at runtime different to
1071 the local configuration object attributes. These modified attributes are
1072 stored as bit-shifted-value and made available in backends. Icinga 2 stores
1073 modified attributes in its state file and restores them on restart.
1074
1075 Modified Attributes can be reset using external commands.
1076
1077
1078 ## <a id="runtime-macros"></a> Runtime Macros
1079
1080 Next to custom attributes there are additional runtime macros made available by Icinga 2.
1081 These runtime macros reflect the current object state and may change over time while
1082 custom attributes are configured statically (but can be modified at runtime using
1083 external commands).
1084
1085 ### <a id="runtime-macro-evaluation-order"></a> Runtime Macro Evaluation Order
1086
1087 Custom attributes can be accessed at [runtime](#runtime-custom-attributes) using their
1088 identifier omitting the `vars.` prefix.
1089 There are special cases when those custom attributes are not set and Icinga 2 provides
1090 a fallback to existing object attributes for example `host.address`.
1091
1092 In the following example the `$address$` macro will be resolved with the value of `vars.address`.
1093
1094     object Host "localhost" {
1095       import "generic-host"
1096       check_command = "my-host-macro-test"
1097       address = "127.0.0.1"
1098       vars.address = "127.2.2.2"
1099     }
1100
1101     object CheckCommand "my-host-macro-test" {
1102       command = "echo \"address: $address$ host.address: $host.address$ host.vars.address: $host.vars.address$\""
1103     }
1104
1105 The check command output will look like
1106
1107     "address: 127.2.2.2 host.address: 127.0.0.1 host.vars.address: 127.2.2.2"
1108
1109 If you alter the host object and remove the `vars.address` line, Icinga 2 will fail to look up `$address$` in the
1110 custom attributes dictionary and then look for the host object's attribute.
1111
1112 The check command output will change to
1113
1114     "address: 127.0.0.1 host.address: 127.0.0.1 host.vars.address: "
1115
1116
1117 The same example can be defined for services overriding the `address` field based on a specific host custom attribute.
1118
1119     object Host "localhost" {
1120       import "generic-host"
1121       address = "127.0.0.1"
1122       vars.macro_address = "127.3.3.3"
1123     }
1124
1125     apply Service "my-macro-test" to Host {
1126       import "generic-service"
1127       check_command = "my-service-macro-test"
1128       vars.address = "$host.vars.macro_address$"
1129
1130       assign where host.address
1131     }
1132
1133     object CheckCommand "my-service-macro-test" {
1134       command = "echo \"address: $address$ host.address: $host.address$ host.vars.macro_address: $host.vars.macro_address$ service.vars.address: $service.vars.address$\""
1135     }
1136
1137 When the service check is executed the output looks like
1138
1139     "address: 127.3.3.3 host.address: 127.0.0.1 host.vars.macro_address: 127.3.3.3 service.vars.address: 127.3.3.3"
1140
1141 That way you can easily override existing macros being accessed by their short name like `$address$` and refrain
1142 from defining multiple check commands (one for `$address$` and one for `$host.vars.macro_address$`).
1143
1144
1145 ### <a id="host-runtime-macros"></a> Host Runtime Macros
1146
1147 The following host custom attributes are available in all commands that are executed for
1148 hosts or services:
1149
1150   Name                         | Description
1151   -----------------------------|--------------
1152   host.name                    | The name of the host object.
1153   host.display_name            | The value of the `display_name` attribute.
1154   host.state                   | The host's current state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
1155   host.state_id                | The host's current state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
1156   host.state_type              | The host's current state type. Can be one of `SOFT` and `HARD`.
1157   host.check_attempt           | The current check attempt number.
1158   host.max_check_attempts      | The maximum number of checks which are executed before changing to a hard state.
1159   host.last_state              | The host's previous state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
1160   host.last_state_id           | The host's previous state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
1161   host.last_state_type         | The host's previous state type. Can be one of `SOFT` and `HARD`.
1162   host.last_state_change       | The last state change's timestamp.
1163   host.duration_sec            | The time since the last state change.
1164   host.latency                 | The host's check latency.
1165   host.execution_time          | The host's check execution time.
1166   host.output                  | The last check's output.
1167   host.perfdata                | The last check's performance data.
1168   host.last_check              | The timestamp when the last check was executed.
1169   host.num_services            | Number of services associated with the host.
1170   host.num_services_ok         | Number of services associated with the host which are in an `OK` state.
1171   host.num_services_warning    | Number of services associated with the host which are in a `WARNING` state.
1172   host.num_services_unknown    | Number of services associated with the host which are in an `UNKNOWN` state.
1173   host.num_services_critical   | Number of services associated with the host which are in a `CRITICAL` state.
1174
1175 ### <a id="service-runtime-macros"></a> Service Runtime Macros
1176
1177 The following service macros are available in all commands that are executed for
1178 services:
1179
1180   Name                       | Description
1181   ---------------------------|--------------
1182   service.name               | The short name of the service object.
1183   service.display_name       | The value of the `display_name` attribute.
1184   service.check_command      | The short name of the command along with any arguments to be used for the check.
1185   service.state              | The service's current state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
1186   service.state_id           | The service's current state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
1187   service.state_type         | The service's current state type. Can be one of `SOFT` and `HARD`.
1188   service.check_attempt      | The current check attempt number.
1189   service.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
1190   service.last_state         | The service's previous state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
1191   service.last_state_id      | The service's previous state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
1192   service.last_state_type    | The service's previous state type. Can be one of `SOFT` and `HARD`.
1193   service.last_state_change  | The last state change's timestamp.
1194   service.duration_sec       | The time since the last state change.
1195   service.latency            | The service's check latency.
1196   service.execution_time     | The service's check execution time.
1197   service.output             | The last check's output.
1198   service.perfdata           | The last check's performance data.
1199   service.last_check         | The timestamp when the last check was executed.
1200
1201 ### <a id="command-runtime-macros"></a> Command Runtime Macros
1202
1203 The following custom attributes are available in all commands:
1204
1205   Name                   | Description
1206   -----------------------|--------------
1207   command.name           | The name of the command object.
1208
1209 ### <a id="user-runtime-macros"></a> User Runtime Macros
1210
1211 The following custom attributes are available in all commands that are executed for
1212 users:
1213
1214   Name                   | Description
1215   -----------------------|--------------
1216   user.name              | The name of the user object.
1217   user.display_name      | The value of the display_name attribute.
1218
1219 ### <a id="notification-runtime-macros"></a> Notification Runtime Macros
1220
1221   Name                   | Description
1222   -----------------------|--------------
1223   notification.type      | The type of the notification.
1224   notification.author    | The author of the notification comment, if existing.
1225   notification.comment   | The comment of the notification, if existing.
1226
1227 ### <a id="global-runtime-macros"></a> Global Runtime Macros
1228
1229 The following macros are available in all executed commands:
1230
1231   Name                   | Description
1232   -----------------------|--------------
1233   icinga.timet           | Current UNIX timestamp.
1234   icinga.long_date_time  | Current date and time including timezone information. Example: `2014-01-03 11:23:08 +0000`
1235   icinga.short_date_time | Current date and time. Example: `2014-01-03 11:23:08`
1236   icinga.date            | Current date. Example: `2014-01-03`
1237   icinga.time            | Current time including timezone information. Example: `11:23:08 +0000`
1238   icinga.uptime          | Current uptime of the Icinga 2 process.
1239
1240 The following macros provide global statistics:
1241
1242   Name                              | Description
1243   ----------------------------------|--------------
1244   icinga.num_services_ok            | Current number of services in state 'OK'.
1245   icinga.num_services_warning       | Current number of services in state 'Warning'.
1246   icinga.num_services_critical      | Current number of services in state 'Critical'.
1247   icinga.num_services_unknown       | Current number of services in state 'Unknown'.
1248   icinga.num_services_pending       | Current number of pending services.
1249   icinga.num_services_unreachable   | Current number of unreachable services.
1250   icinga.num_services_flapping      | Current number of flapping services.
1251   icinga.num_services_in_downtime   | Current number of services in downtime.
1252   icinga.num_services_acknowledged  | Current number of acknowledged service problems.
1253   icinga.num_hosts_up               | Current number of hosts in state 'Up'.
1254   icinga.num_hosts_down             | Current number of hosts in state 'Down'.
1255   icinga.num_hosts_unreachable      | Current number of unreachable hosts.
1256   icinga.num_hosts_flapping         | Current number of flapping hosts.
1257   icinga.num_hosts_in_downtime      | Current number of hosts in downtime.
1258   icinga.num_hosts_acknowledged     | Current number of acknowledged host problems.
1259
1260
1261 ## <a id="check-result-freshness"></a> Check Result Freshness
1262
1263 In Icinga 2 active check freshness is enabled by default. It is determined by the
1264 `check_interval` attribute and no incoming check results in that period of time.
1265
1266     threshold = last check execution time + check interval
1267
1268 Passive check freshness is calculated from the `check_interval` attribute if set.
1269
1270     threshold = last check result time + check interval
1271
1272 If the freshness checks are invalid, a new check is executed defined by the
1273 `check_command` attribute.
1274
1275
1276 ## <a id="check-flapping"></a> Check Flapping
1277
1278 The flapping algorithm used in Icinga 2 does not store the past states but
1279 calculcates the flapping threshold from a single value based on counters and
1280 half-life values. Icinga 2 compares the value with a single flapping threshold
1281 configuration attribute named `flapping_threshold`.
1282
1283 Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
1284
1285
1286 ## <a id="volatile-services"></a> Volatile Services
1287
1288 By default all services remain in a non-volatile state. When a problem
1289 occurs, the `SOFT` state applies and once `max_check_attempts` attribute
1290 is reached with the check counter, a `HARD` state transition happens.
1291 Notifications are only triggered by `HARD` state changes and are then
1292 re-sent defined by the `interval` attribute.
1293
1294 It may be reasonable to have a volatile service which stays in a `HARD`
1295 state type if the service stays in a `NOT-OK` state. That way each
1296 service recheck will automatically trigger a notification unless the
1297 service is acknowledged or in a scheduled downtime.
1298
1299
1300 ## <a id="external-commands"></a> External Commands
1301
1302 Icinga 2 provides an external command pipe for processing commands
1303 triggering specific actions (for example rescheduling a service check
1304 through the web interface).
1305
1306 In order to enable the `ExternalCommandListener` configuration use the
1307 following command and restart Icinga 2 afterwards:
1308
1309     # icinga2-enable-feature command
1310
1311 Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd`
1312 using the default configuration.
1313
1314 Web interfaces and other Icinga addons are able to send commands to
1315 Icinga 2 through the external command pipe, for example for rescheduling
1316 a forced service check:
1317
1318     # /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd
1319
1320     # tail -f /var/log/messages
1321
1322     Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885
1323     Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4'
1324
1325 By default the command pipe file is owned by the group `icingacmd` with read/write
1326 permissions. Add your webserver's user to the group `icingacmd` to
1327 enable sending commands to Icinga 2 through your web interface:
1328
1329     # usermod -G -a icingacmd www-data
1330
1331 Debian packages use `nagios` as the default user and group name. Therefore change `icingacmd` to
1332 `nagios`.
1333
1334 ### <a id="external-command-list"></a> External Command List
1335
1336 A list of currently supported external commands can be found [here](#external-commands-list-detail)
1337
1338 Detailed information on the commands and their required parameters can be found
1339 on the [Icinga 1.x documentation](http://docs.icinga.org/latest/en/extcommands2.html).
1340
1341
1342 ## <a id="event-handlers"></a> Event Handlers
1343
1344 Event handlers are defined as `EventCommand` objects in Icinga 2.
1345
1346 Unlike notifications event commands are called on every host/service execution
1347 if defined. Therefore the `EventCommand` object should define a command line
1348 evaluating the current service state and other service runtime attributes
1349 available through runtime macros. Runtime macros such as `$service.state_type$`
1350 and `$service.state$` will be processed by Icinga 2 helping on fine-granular
1351 events being triggered.
1352
1353 Common use case scenarios are a failing HTTP check requiring an immediate
1354 restart via event command, or if an application is locked and requires
1355 a restart upon detection.
1356
1357
1358 ## <a id="logging"></a> Logging
1359
1360 Icinga 2 supports three different types of logging:
1361
1362 * File logging
1363 * Syslog (on *NIX-based operating systems)
1364 * Console logging (`STDOUT` on tty)
1365
1366 You can enable additional loggers using the `icinga2-enable-feature`
1367 and `icinga2-disable-feature` commands to configure loggers:
1368
1369 Feature  | Description
1370 ---------|------------
1371 debuglog | Debug log (path: `/var/log/icinga2/debug.log`, severity: `debug` or higher)
1372 mainlog  | Main log (path: `/var/log/icinga2/icinga2.log`, severity: `information` or higher)
1373 syslog   | Syslog (severity: `warning` or higher)
1374
1375 By default file the `mainlog` feature is enabled. When running Icinga 2
1376 on a terminal log messages with severity `information` or higher are
1377 written to the console.
1378
1379
1380 ## <a id="performance-data"></a> Performance Data
1381
1382 When a host or service check is executed plugins should provide so-called
1383 `performance data`. Next to that additional check performance data
1384 can be fetched using Icinga 2 runtime macros such as the check latency
1385 or the current service state (or additional custom attributes).
1386
1387 The performance data can be passed to external applications which aggregate and
1388 store them in their backends. These tools usually generate graphs for historical
1389 reporting and trending.
1390
1391 Well-known addons processing Icinga performance data are PNP4Nagios,
1392 inGraph and Graphite.
1393
1394 ### <a id="writing-performance-data-files"></a> Writing Performance Data Files
1395
1396 PNP4Nagios, inGraph and Graphios use performance data collector daemons to fetch
1397 the current performance files for their backend updates.
1398
1399 Therefore the Icinga 2 `PerfdataWriter` object allows you to define
1400 the output template format for host and services backed with Icinga 2
1401 runtime vars.
1402
1403     host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.checkcommand$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.statetype$"
1404     service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.description$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.checkcommand$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.statetype$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.statetype$"
1405
1406 The default templates are already provided with the Icinga 2 feature configuration
1407 which can be enabled using
1408
1409     # icinga2-enable-feature perfdata
1410
1411 By default all performance data files are rotated in a 15 seconds interval into
1412 the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.<timestamp>` and
1413 `service-perfdata.<timestamp>`.
1414 External collectors need to parse the rotated performance data files and then
1415 remove the processed files.
1416
1417 ### <a id="graphite-carbon-cache-writer"></a> Graphite Carbon Cache Writer
1418
1419 While there are some Graphite collector scripts and daemons like Graphios available for
1420 Icinga 1.x it's more reasonable to directly process the check and plugin performance
1421 in memory in Icinga 2. Once there are new metrics available, Icinga 2 will directly
1422 write them to the defined Graphite Carbon daemon tcp socket.
1423
1424 You can enable the feature using
1425
1426     # icinga2-enable-feature graphite
1427
1428 By default the `GraphiteWriter` object expects the Graphite Carbon Cache to listen at
1429 `127.0.0.1` on port `2003`.
1430
1431 The current naming schema is
1432
1433     icinga.<hostname>.<metricname>
1434     icinga.<hostname>.<servicename>.<metricname>
1435
1436
1437
1438 ## <a id="status-data"></a> Status Data
1439
1440 Icinga 1.x writes object configuration data and status data in a cyclic
1441 interval to its `objects.cache` and `status.dat` files. Icinga 2 provides
1442 the `StatusDataWriter` object which dumps all configuration objects and
1443 status updates in a regular interval.
1444
1445     # icinga2-enable-feature statusdata
1446
1447 Icinga 1.x Classic UI requires this data set as part of its backend.
1448
1449 > **Note**
1450 >
1451 > If you are not using any web interface or addon which uses these files
1452 > you can safely disable this feature.
1453
1454
1455
1456 ## <a id="compat-logging"></a> Compat Logging
1457
1458 The Icinga 1.x log format is considered being the `Compat Log`
1459 in Icinga 2 provided with the `CompatLogger` object.
1460
1461 These logs are not only used for informational representation in
1462 external web interfaces parsing the logs, but also to generate
1463 SLA reports and trends in Icinga 1.x Classic UI. Futhermore the
1464 `Livestatus` feature uses these logs for answering queries to
1465 historical tables.
1466
1467 The `CompatLogger` object can be enabled with
1468
1469     # icinga2-enable-feature compatlog
1470
1471 By default, the Icinga 1.x log file called `icinga.log` is located
1472 in `/var/log/icinga2/compat`. Rotated log files are moved into
1473 `var/log/icinga2/compat/archives`.
1474
1475 The format cannot be changed without breaking compatibility to
1476 existing log parsers.
1477
1478     # tail -f /var/log/icinga2/compat/icinga.log
1479
1480     [1382115688] LOG ROTATION: HOURLY
1481     [1382115688] LOG VERSION: 2.0
1482     [1382115688] HOST STATE: CURRENT;localhost;UP;HARD;1;
1483     [1382115688] SERVICE STATE: CURRENT;localhost;disk;WARNING;HARD;1;
1484     [1382115688] SERVICE STATE: CURRENT;localhost;http;OK;HARD;1;
1485     [1382115688] SERVICE STATE: CURRENT;localhost;load;OK;HARD;1;
1486     [1382115688] SERVICE STATE: CURRENT;localhost;ping4;OK;HARD;1;
1487     [1382115688] SERVICE STATE: CURRENT;localhost;ping6;OK;HARD;1;
1488     [1382115688] SERVICE STATE: CURRENT;localhost;processes;WARNING;HARD;1;
1489     [1382115688] SERVICE STATE: CURRENT;localhost;ssh;OK;HARD;1;
1490     [1382115688] SERVICE STATE: CURRENT;localhost;users;OK;HARD;1;
1491     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;disk;1382115705
1492     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;http;1382115705
1493     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;load;1382115705
1494     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382115705
1495     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ping6;1382115705
1496     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;processes;1382115705
1497     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ssh;1382115705
1498     [1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;users;1382115705
1499     [1382115731] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;localhost;ping6;2;critical test|
1500     [1382115731] SERVICE ALERT: localhost;ping6;CRITICAL;SOFT;2;critical test
1501
1502
1503
1504 ## <a id="check-result-files"></a> Check Result Files
1505
1506 Icinga 1.x writes its check result files into a temporary spool directory
1507 where it reads these check result files in a regular interval from.
1508 While this is extremly inefficient in performance regards it has been
1509 rendered useful for passing passive check results directly into Icinga 1.x
1510 skipping the external command pipe.
1511
1512 Several clustered/distributed environments and check-aggregation addons
1513 use that method. In order to support step-by-step migration of these
1514 environments, Icinga 2 ships the `CheckResultReader` object.
1515
1516 There is no feature configuration available, but it must be defined
1517 on-demand in your Icinga 2 objects configuration.
1518
1519     object CheckResultReader "reader" {
1520       spool_dir = "/data/check-results"
1521     }
1522
1523
1524
1525
1526
1527