The acknowledgement is removed if a state change occurs or if the host/service
recovers (OK/Up state).
-If you acknowlege a problem once you've received a `Critical` notification,
+If you acknowledge a problem once you've received a `Critical` notification,
the acknowledgement will be removed if there is a state transition to `Warning`.
```
OK -> WARNING -> CRITICAL -> WARNING -> OK
preferred.
The following example defines a time period called `holidays` where
-notifications should be supressed:
+notifications should be suppressed:
object TimePeriod "holidays" {
import "legacy-timeperiod"
-
+
ranges = {
"january 1" = "00:00-24:00" //new year's day
"july 4" = "00:00-24:00" //independence day
object TimePeriod "weekends-excluded" {
import "legacy-timeperiod"
-
+
ranges = {
"saturday" = "00:00-09:00,18:00-24:00"
"sunday" = "00:00-09:00,18:00-24:00"
object TimePeriod "prod-notification" {
import "legacy-timeperiod"
-
+
excludes = [ "holidays", "weekends-excluded" ]
-
+
ranges = {
"monday" = "00:00-24:00"
"tuesday" = "00:00-24:00"
}
}
+## External Check Results <a id="external-check-results"></a>
+
+Hosts or services which do not actively execute a check plugin to receive
+the state and output are called "passive checks" or "external check results".
+In this scenario an external client or script is sending in check results.
+
+You can feed check results into Icinga 2 with the following transport methods:
+
+* [process-check-result action](12-icinga2-api.md#icinga2-api-actions-process-check-result) available with the [REST API](12-icinga2-api.md#icinga2-api) (remote and local)
+* External command sent via command pipe (local only)
+
+Each time a new check result is received, the next expected check time
+is updated. This means that if there are no check result received from
+the external source, Icinga 2 will execute [freshness checks](08-advanced-topics.md#check-result-freshness).
+
+> **Note**
+>
+> The REST API action allows to specify the `check_source` attribute
+> which helps identifying the external sender. This is also visible
+> in Icinga Web 2 and the REST API queries.
+
## Check Result Freshness <a id="check-result-freshness"></a>
In Icinga 2 active check freshness is enabled by default. It is determined by the
`check_interval` attribute and no incoming check results in that period of time.
- threshold = last check execution time + check interval
+The threshold is calculated based on the last check execution time for actively executed checks:
+
+ (last check execution time + check interval) > current time
+
+If this host/service receives check results from an [external source](08-advanced-topics.md#external-check-results),
+the threshold is based on the last time a check result was received:
-Passive check freshness is calculated from the `check_interval` attribute if set.
+ (last check result time + check interval) > current time
- threshold = last check result time + check interval
+If the freshness checks fail, Icinga 2 will execute the defined check command.
+
+Best practice is to define a [dummy](10-icinga-template-library.md#plugin-check-command-dummy) `check_command` which gets
+executed when freshness checks fail.
+
+```
+apply Service "external-check" {
+ check_command = "dummy"
+ check_interval = 1m
-If the freshness checks are invalid, a new check is executed defined by the
-`check_command` attribute.
+ /* Set the state to UNKNOWN (3) if freshness checks fail. */
+ vars.dummy_state = 3
+
+ /* Use a runtime function to retrieve the last check time and more details. */
+ vars.dummy_text = {{
+ var service = get_service(macro("$host.name$"), macro("$service.name$"))
+ var lastCheck = DateTime(service.last_check).to_string()
+
+ return "No check results received. Last result time: " + lastCheck
+ }}
+
+ assign where "external" in host.vars.services
+}
+```
+
+References: [get_service](18-library-reference.md#objref-get_service), [macro](18-library-reference.md#scoped-functions-macro), [DateTime](18-library-reference.md#datetime-type).
+
+Example output in Icinga Web 2:
+
+![Icinga 2 Freshness Checks](images/advanced-topics/icinga2_external_checks_freshness_icingaweb2.png)
## Check Flapping <a id="check-flapping"></a>
Icinga 2 supports optional detection of hosts and services that are "flapping".
-Flapping occurs when a service or host changes state too frequently, resulting
-in a storm of problem and recovery notifications. Flapping can be the source of
-configuration problems (i.e. thresholds set too low), troublesome services,
-or real network problems.
+Flapping occurs when a service or host changes state too frequently, which would result in a storm of problem and
+recovery notifications. With flapping detection enabled a flapping notification will be sent while other notifications are
+suppresed until it calms down after receiving the same status from checks a few times. Flapping detection can help detect
+
+configuration problems (wrong thresholds), troublesome services, or network problems.
Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
-The `flapping_threshold` attributes allows to specify the percentage of state changes
-when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to flap.
+The `flapping_threshold_high` and `flapping_threshold_low` attributes allows to specify the thresholds that control
+when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to be flapping.
+
+The default thresholds are 30% for high and 25% for low. If the computed flapping value exceeds the high threshold a
+host or service is considered flapping until it drops below the low flapping threshold.
+
+`FlappingStart` and `FlappingEnd` notifications will be sent out accordingly, if configured. See the chapter on
+[notifications](alert-notifications) for details
+
+> Note: There is no distinctions between hard and soft states with flapping. All state changes count and notifications
+> will be sent out regardless of the objects state.
+
+### How it works <a id="check-flapping-how-it-works"></a>
+
+Icinga 2 saves the last 20 state changes for every host and service. See the graphic below:
+
+![Icinga 2 Flapping State Timeline](images/advanced-topics/flapping-state-graph.png)
+
+All the states ware weighted, with the most recent one being worth the most (1.15) and the 20th the least (0.8). The
+states in between are fairly distributed. The final flapping value are the weighted state changes divided by the total
+count of 20.
-Note: There are known issues with flapping detection. Please refrain from enabling
-flapping until [#4982](https://github.com/Icinga/icinga2/issues/4982) is fixed.
+In the example above, the added states would have a total value of 7.82 (`0.84 + 0.86 + 0.88 + 0.9 + 0.98 + 1.06 + 1.12 + 1.18`).
+This yields a flapping percentage of 39.1% (`7.82 / 20 * 100`). As the default upper flapping threshold is 30%, it would be
+considered flapping.
+
+If the next seven check results then would not be state changes, the flapping percentage would fall below the lower threshold
+of 25% and therefore the host or service would recover from flapping.
## Volatile Services <a id="volatile-services"></a>
object User "short-dummy" {
}
-
+
object UserGroup "short-dummy-group" {
assign where user.name == "short-dummy"
}
-
+
apply Notification "mail-admins-short" to Host {
import "mail-host-notification"
command = "mail-host-notification-test"
}
log("Running command")
log(mailscript)
-
+
var cmd = [ SysconfDir + "/icinga2/scripts/" + mailscript ]
log(LogCritical, "me", cmd)
return cmd
}}
-
+
env = {
}
}
}
}
}
-
+
apply Service "ping4" {
import "generic-service"
check_command = "ping4"
-
+
vars.ping_wrta = group_specific_value("slow-lan", 300, 100)
vars.ping_crta = group_specific_value("slow-lan", 500, 200)
-
+
assign where true
}
This allows you to access configuration and runtime object attributes. A detailed
list can be found [here](09-object-types.md#object-types).
-Simple cluster example for accessing two host object states and calculating a virtual
+#### Access Object Attributes at Runtime: Cluster Check <a id="access-object-attributes-at-runtime-cluster-check"></a>
+
+This is a simple cluster example for accessing two host object states and calculating a virtual
cluster state and output:
- object Host "cluster-host-01" {
- check_command = "dummy"
- vars.dummy_state = 2
- vars.dummy_text = "This host is down."
- }
+```
+object Host "cluster-host-01" {
+ check_command = "dummy"
+ vars.dummy_state = 2
+ vars.dummy_text = "This host is down."
+}
- object Host "cluster-host-02" {
- check_command = "dummy"
- vars.dummy_state = 0
- vars.dummy_text = "This host is up."
- }
+object Host "cluster-host-02" {
+ check_command = "dummy"
+ vars.dummy_state = 0
+ vars.dummy_text = "This host is up."
+}
- object Host "cluster" {
- check_command = "dummy"
- vars.cluster_nodes = [ "cluster-host-01", "cluster-host-02" ]
-
- vars.dummy_state = {{
- var up_count = 0
- var down_count = 0
- var cluster_nodes = macro("$cluster_nodes$")
-
- for (node in cluster_nodes) {
- if (get_host(node).state > 0) {
- down_count += 1
- } else {
- up_count += 1
- }
- }
+object Host "cluster" {
+ check_command = "dummy"
+ vars.cluster_nodes = [ "cluster-host-01", "cluster-host-02" ]
- if (up_count >= down_count) {
- return 0 //same up as down -> UP
- } else {
- return 2 //something is broken
- }
- }}
+ vars.dummy_state = {{
+ var up_count = 0
+ var down_count = 0
+ var cluster_nodes = macro("$cluster_nodes$")
- vars.dummy_text = {{
- var output = "Cluster hosts:\n"
- var cluster_nodes = macro("$cluster_nodes$")
+ for (node in cluster_nodes) {
+ if (get_host(node).state > 0) {
+ down_count += 1
+ } else {
+ up_count += 1
+ }
+ }
- for (node in cluster_nodes) {
- output += node + ": " + get_host(node).last_check_result.output + "\n"
- }
+ if (up_count >= down_count) {
+ return 0 //same up as down -> UP
+ } else {
+ return 2 //something is broken
+ }
+ }}
- return output
- }}
+ vars.dummy_text = {{
+ var output = "Cluster hosts:\n"
+ var cluster_nodes = macro("$cluster_nodes$")
+
+ for (node in cluster_nodes) {
+ output += node + ": " + get_host(node).last_check_result.output + "\n"
}
+ return output
+ }}
+}
+```
+
+#### Time Dependent Thresholds <a id="access-object-attributes-at-runtime-time-dependent-thresholds"></a>
The following example sets time dependent thresholds for the load check based on the current
time of the day compared to the defined time period.
- object TimePeriod "backup" {
- import "legacy-timeperiod"
-
- ranges = {
- monday = "02:00-03:00"
- tuesday = "02:00-03:00"
- wednesday = "02:00-03:00"
- thursday = "02:00-03:00"
- friday = "02:00-03:00"
- saturday = "02:00-03:00"
- sunday = "02:00-03:00"
- }
- }
+```
+object TimePeriod "backup" {
+ import "legacy-timeperiod"
+
+ ranges = {
+ monday = "02:00-03:00"
+ tuesday = "02:00-03:00"
+ wednesday = "02:00-03:00"
+ thursday = "02:00-03:00"
+ friday = "02:00-03:00"
+ saturday = "02:00-03:00"
+ sunday = "02:00-03:00"
+ }
+}
- object Host "webserver-with-backup" {
- check_command = "hostalive"
- address = "127.0.0.1"
- }
+object Host "webserver-with-backup" {
+ check_command = "hostalive"
+ address = "127.0.0.1"
+}
- object Service "webserver-backup-load" {
- check_command = "load"
- host_name = "webserver-with-backup"
+object Service "webserver-backup-load" {
+ check_command = "load"
+ host_name = "webserver-with-backup"
- vars.load_wload1 = {{
- if (get_time_period("backup").is_inside) {
- return 20
- } else {
- return 5
- }
- }}
- vars.load_cload1 = {{
- if (get_time_period("backup").is_inside) {
- return 40
- } else {
- return 10
- }
- }}
+ vars.load_wload1 = {{
+ if (get_time_period("backup").is_inside) {
+ return 20
+ } else {
+ return 5
+ }
+ }}
+ vars.load_cload1 = {{
+ if (get_time_period("backup").is_inside) {
+ return 40
+ } else {
+ return 10
}
+ }}
+}
+```
+## Advanced Value Types <a id="advanced-value-types"></a>
+
+In addition to the default value types Icinga 2 also uses a few other types
+to represent its internal state. The following types are exposed via the [API](12-icinga2-api.md#icinga2-api).
+
+### CheckResult <a id="advanced-value-types-checkresult"></a>
+
+ Name | Type | Description
+ --------------------------|-----------------------|----------------------------------
+ exit\_status | Number | The exit status returned by the check execution.
+ output | String | The check output.
+ performance\_data | Array | Array of [performance data values](08-advanced-topics.md#advanced-value-types-perfdatavalue).
+ check\_source | String | Name of the node executing the check.
+ state | Number | The current state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN).
+ command | Value | Array of command with shell-escaped arguments or command line string.
+ execution\_start | Timestamp | Check execution start time (as a UNIX timestamp).
+ execution\_end | Timestamp | Check execution end time (as a UNIX timestamp).
+ schedule\_start | Timestamp | Scheduled check execution start time (as a UNIX timestamp).
+ schedule\_end | Timestamp | Scheduled check execution end time (as a UNIX timestamp).
+ active | Boolean | Whether the result is from an active or passive check.
+ vars\_before | Dictionary | Internal attribute used for calculations.
+ vars\_after | Dictionary | Internal attribute used for calculations.
+
+### PerfdataValue <a id="advanced-value-types-perfdatavalue"></a>
+
+Icinga 2 parses performance data strings returned by check plugins and makes the information available to external interfaces (e.g. [GraphiteWriter](09-object-types.md#objecttype-graphitewriter) or the [Icinga 2 API](12-icinga2-api.md#icinga2-api)).
+
+ Name | Type | Description
+ --------------------------|-----------------------|----------------------------------
+ label | String | Performance data label.
+ value | Number | Normalized performance data value without unit.
+ counter | Boolean | Enabled if the original value contains `c` as unit. Defaults to `false`.
+ unit | String | Unit of measurement (`seconds`, `bytes`. `percent`) according to the [plugin API](05-service-monitoring.md#service-monitoring-plugin-api).
+ crit | Value | Critical threshold value.
+ warn | Value | Warning threshold value.
+ min | Value | Minimum value returned by the check.
+ max | Value | Maximum value returned by the check.