X-Git-Url: https://granicus.if.org/sourcecode?a=blobdiff_plain;f=doc%2F3-monitoring-basics.md;h=86c2137f3767c01f8659ebc0b91f9f0ea49f7e5e;hb=a93b56586822e37ff08b8882c62c431666bbd4bb;hp=fbba85f24c61b9af8a4c3f394aff01ed52dee308;hpb=c4d448efe59620fb359acff8cde426ecd4f16718;p=icinga2 diff --git a/doc/3-monitoring-basics.md b/doc/3-monitoring-basics.md index fbba85f24..86c2137f3 100644 --- a/doc/3-monitoring-basics.md +++ b/doc/3-monitoring-basics.md @@ -43,7 +43,7 @@ check command. The `address` attribute is used by check commands to determine which network address is associated with the host object. -Details on troubleshooting check problems can be found [here](#troubleshooting). +Details on troubleshooting check problems can be found [here](16-troubleshooting.md#troubleshooting). ### Host States @@ -82,77 +82,23 @@ state the host/service switches to a `HARD` state and notifications are sent. ### Host and Service Checks -Hosts and Services determine their state from a check result returned from a check -execution to the Icinga 2 application. By default the `generic-host` example template -will define `hostalive` as host check. If your host is unreachable for ping, you should -consider using a different check command, for instance the `http` check command, or if -there is no check available, the `dummy` check command. +Hosts and services determine their state by running checks in a regular interval. - object Host "uncheckable-host" { - check_command = "dummy" - vars.dummy_state = 1 - vars.dummy_text = "Pretending to be OK." + object Host "router" { + check_command = "hostalive" + address = "10.0.0.1" } -Service checks could also use a `dummy` check, but the common strategy is to -[integrate an existing plugin](#command-plugin-integration) as -[check command](#check-commands) and [reference](#command-passing-parameters) -that in your [Service](#objecttype-service) object definition. - -## Configuration Best Practice - -The [Getting Started](#getting-started) chapter already introduced various aspects -of the Icinga 2 configuration language. If you are ready to configure additional -hosts, services, notifications, dependencies, etc, you should think about the -requirements first and then decide for a possible strategy. - -There are many ways of creating Icinga 2 configuration objects: - -* Manually with your preferred editor, for example vi(m), nano, notepad, etc. -* Generated by a configuration management tool such as Puppet, Chef, Ansible, etc. -* A configuration addon for Icinga 2 -* A custom exporter script from your CMDB or inventory tool -* your own. - -In order to find the best strategy for your own configuration, ask yourself the following questions: - -* Do your hosts share a common group of services (for example linux hosts with disk, load, etc checks)? -* Only a small set of users receives notifications and escalations for all hosts/services? +The `hostalive` command is one of several built-in check commands. It sends ICMP +echo requests to the IP address specified in the `address` attribute to determine +whether a host is online. -If you can at least answer one of these questions with yes, look for the [apply rules](#using-apply) logic -instead of defining objects on a per host and service basis. +A number of other [built-in check commands](7-icinga-template-library.md#plugin-check-commands) are also +available. In addition to these commands the next few chapters will explain in +detail how to set up your own check commands. -* You are required to define specific configuration for each host/service? -* Does your configuration generation tool already know about the host-service-relationship? -Then you should look for the object specific configuration setting `host_name` etc accordingly. - -Finding the best files and directory tree for your configuration is up to you. Make sure that -the [icinga2.conf](#icinga2-conf) configuration file includes them, and then think about: - -* tree-based on locations, hostgroups, specific host attributes with sub levels of directories. -* flat `hosts.conf`, `services.conf`, etc files for rule based configuration. -* generated configuration with one file per host and a global configuration for groups, users, etc. -* one big file generated from an external application (probably a bad idea for maintaining changes). -* your own. - -In either way of choosing the right strategy you should additionally check the following: - -* Are there any specific attributes describing the host/service you could set as `vars` custom attributes? -You can later use them for applying assign/ignore rules, or export them into external interfaces. -* Put hosts into hostgroups, services into servicegroups and use these attributes for your apply rules. -* Use templates to store generic attributes for your objects and apply rules making your configuration more readable. -Details can be found in the [using templates](#using-templates) chapter. -* Apply rules may overlap. Keep a central place (for example, `services.conf` or `notifications.conf`) storing -the configuration instead of defining apply rules deep in your configuration tree. -* Every plugin used as check, notification or event command requires a `Command` definition. -Further details can be looked up in the [check commands](#check-commands) chapter. - -If you happen to have further questions, do not hesitate to join the [community support channels](https://support.icinga.org) -and ask community members for their experience and best practices. - - -### Object Inheritance Using Templates +## Templates Templates may be used to apply a set of identical attributes to more than one object: @@ -164,1869 +110,1512 @@ object: enable_perfdata = true } - object Service "ping4" { + apply Service "ping4" { import "generic-service" - host_name = "localhost" check_command = "ping4" + + assign where host.address } - object Service "ping6" { + apply Service "ping6" { import "generic-service" - host_name = "localhost" check_command = "ping6" + + assign where host.address6 } + In this example the `ping4` and `ping6` services inherit properties from the template `generic-service`. Objects as well as templates themselves can import an arbitrary number of -templates. Attributes inherited from a template can be overridden in the +other templates. Attributes inherited from a template can be overridden in the object if necessary. -### Apply objects based on rules +You can also import existing non-template objects. Note that templates +and objects share the same namespace, i.e. you can't define a template +that has the same name like an object. -Instead of assigning each object (`Service`, `Notification`, `Dependency`, `ScheduledDowntime`) -based on attribute identifiers for example `host_name` objects can be [applied](#apply). -Detailed scenario examples are used in their respective chapters, for example -[apply services with custom command arguments](#using-apply-services-command-arguments). +## Custom Attributes -#### Apply Services to Hosts +In addition to built-in attributes you can define your own attributes: - apply Service "load" { - import "generic-service" + object Host "localhost" { + vars.ssh_port = 2222 + } - check_command = "load" +Valid values for custom attributes include: - assign where "linux-server" in host.groups - ignore where host.vars.no_load_check - } +* Strings and numbers +* Arrays and dictionaries +* Functions -In this example the `load` service will be created as object for all hosts in the `linux-server` -host group. If the `no_load_check` custom attribute is set, the host will be -ignored. +### Functions as Custom Attributes -#### Apply Notifications to Hosts and Services +Icinga lets you specify functions for custom attributes. The special case here +is that whenever Icinga needs the value for such a custom attribute it runs +the function and uses whatever value the function returns: -Notifications are applied to specific targets (`Host` or `Service`) and work in a similar -manner: + object CheckCommand "random-value" { + import "plugin-check-command" - apply Notification "mail-noc" to Service { - import "mail-service-notification" - command = "mail-service-notification" - user_groups = [ "noc" ] + command = [ PluginDir + "/check_dummy", "0", "$text$" ] - assign where service.vars.sla == "24x7" + vars.text = {{ Math.random() * 100 }} } -In this example the `mail-noc` notification will be created as object for all services having the -`sla` custom attribute set to `24x7`. The notification command is set to `mail-service-notification` -and all members of the user group `noc` will get notified. +This example uses the [abbreviated lambda syntax](19-language-reference.md#nullary-lambdas). -#### Apply Dependencies to Hosts and Services +These functions have access to a number of variables: -Detailed examples can be found in the [dependencies](#dependencies) chapter. - -### Apply Recurring Downtimes to Hosts and Services + Variable | Description + -------------|--------------- + user | The User object (for notifications). + service | The Service object (for service checks/notifications/event handlers). + host | The Host object. + command | The command object (e.g. a CheckCommand object for checks). -Detailed examples can be found in the [recurring downtimes](#recurring-downtimes) chapter. +Here's an example: + vars.text = {{ host.check_interval }} -### Groups +In addition to these variables the `macro` function can be used to retrieve the +value of arbitrary macro expressions: -Groups are used for combining hosts, services, and users into -accessible configuration attributes and views in external (web) -interfaces. + vars.text = {{ + if (macro("$address$") == "127.0.0.1") { + log("Running a check for localhost!") + } -Group membership is defined at the respective object itself. If -you have a hostgroup name `windows` for example, and want to assign -specific hosts to this group for later viewing the group on your -alert dashboard, first create the hostgroup: + return "Some text" + }} - object HostGroup "windows" { - display_name = "Windows Servers" - } +The [Object Accessor Functions](20-library-reference.md#object-accessor-functions) can be used to retrieve references +to other objects by name. -Then add your hosts to this hostgroup +## Runtime Macros - template Host "windows-server" { - groups += [ "windows" ] - } +Macros can be used to access other objects' attributes at runtime. For example they +are used in command definitions to figure out which IP address a check should be +run against: - object Host "mssql-srv1" { - import "windows-server" + object CheckCommand "my-ping" { + import "plugin-check-command" - vars.mssql_port = 1433 - } + command = [ PluginDir + "/check_ping", "-H", "$ping_address$" ] - object Host "mssql-srv2" { - import "windows-server" + arguments = { + "-w" = "$ping_wrta$,$ping_wpl$%" + "-c" = "$ping_crta$,$ping_cpl$%" + "-p" = "$ping_packets$" + } - vars.mssql_port = 1433 - } + vars.ping_wrta = 100 + vars.ping_wpl = 5 -This can be done for service and user groups the same way. Additionally -the user groups are associated as attributes in `Notification` objects. + vars.ping_crta = 250 + vars.ping_cpl = 10 - object UserGroup "windows-mssql-admins" { - display_name = "Windows MSSQL Admins" + vars.ping_packets = 5 } - template User "generic-windows-mssql-users" { - groups += [ "windows-mssql-admins" ] + object Host "router" { + check_command = "my-ping" + address = "10.0.0.1" } - object User "win-mssql-noc" { - import "generic-windows-mssql-users" +In this example we are using the `$address$` macro to refer to the host's `address` +attribute. - email = "noc@example.com" - } +We can also directly refer to custom attributes, e.g. by using `$ping_wrta$`. Icinga +automatically tries to find the closest match for the attribute you specified. The +exact rules for this are explained in the next section. - object User "win-mssql-ops" { - import "generic-windows-mssql-users" - email = "ops@example.com" - } +### Evaluation Order + +When executing commands Icinga 2 checks the following objects in this order to look +up macros and their respective values: + +1. User object (only for notifications) +2. Service object +3. Host object +4. Command object +5. Global custom attributes in the `Vars` constant + +This execution order allows you to define default values for custom attributes +in your command objects. -#### Group Membership Assign +Here's how you can override the custom attribute `ping_packets` from the previous +example: -If there is a certain number of hosts, services, or users matching a pattern -it's reasonable to assign the group object to these members. -Details on the `assign where` syntax can be found [here](#apply) + object Service "ping" { + host_name = "localhost" + check_command = "my-ping" - object HostGroup "mssql" { - display_name = "MSSQL Servers" - assign where host.vars.mssql_port + vars.ping_packets = 10 // Overrides the default value of 5 given in the command } -In this inherited example from above all hosts with the `vars` attribute `mssql_port` -set will be added as members to the host group `mssql`. +If a custom attribute isn't defined anywhere an empty value is used and a warning is +written to the Icinga 2 log. -## Notifications +You can also directly refer to a specific attribute - thereby ignoring these evaluation +rules - by specifying the full attribute name: -Notifications for service and host problems are an integral part of your -monitoring setup. + $service.vars.ping_wrta$ -When a host or service is in a downtime, a problem has been acknowledged or -the dependency logic determined that the host/service is unreachable, no -notifications are sent. You can configure additional type and state filters -refining the notifications being actually sent. +This retrieves the value of the `ping_wrta` custom attribute for the service. This +returns an empty value if the server does not have such a custom attribute no matter +whether another object such as the host has this attribute. -There are many ways of sending notifications, e.g. by e-mail, XMPP, -IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications. -Instead it relies on external mechanisms such as shell scripts to notify users. -A notification specification requires one or more users (and/or user groups) -who will be notified in case of problems. These users must have all custom -attributes defined which will be used in the `NotificationCommand` on execution. +### Host Runtime Macros -The user `icingaadmin` in the example below will get notified only on `WARNING` and -`CRITICAL` states and `problem` and `recovery` notification types. +The following host custom attributes are available in all commands that are executed for +hosts or services: - object User "icingaadmin" { - display_name = "Icinga 2 Admin" - enable_notifications = true - states = [ OK, Warning, Critical ] - types = [ Problem, Recovery ] - email = "icinga@localhost" - } + Name | Description + -----------------------------|-------------- + host.name | The name of the host object. + host.display_name | The value of the `display_name` attribute. + host.state | The host's current state. Can be one of `UNREACHABLE`, `UP` and `DOWN`. + host.state_id | The host's current state. Can be one of `0` (up), `1` (down) and `2` (unreachable). + host.state_type | The host's current state type. Can be one of `SOFT` and `HARD`. + host.check_attempt | The current check attempt number. + host.max_check_attempts | The maximum number of checks which are executed before changing to a hard state. + host.last_state | The host's previous state. Can be one of `UNREACHABLE`, `UP` and `DOWN`. + host.last_state_id | The host's previous state. Can be one of `0` (up), `1` (down) and `2` (unreachable). + host.last_state_type | The host's previous state type. Can be one of `SOFT` and `HARD`. + host.last_state_change | The last state change's timestamp. + host.downtime_depth | The number of active downtimes. + host.duration_sec | The time since the last state change. + host.latency | The host's check latency. + host.execution_time | The host's check execution time. + host.output | The last check's output. + host.perfdata | The last check's performance data. + host.last_check | The timestamp when the last check was executed. + host.check_source | The monitoring instance that performed the last check. + host.num_services | Number of services associated with the host. + host.num_services_ok | Number of services associated with the host which are in an `OK` state. + host.num_services_warning | Number of services associated with the host which are in a `WARNING` state. + host.num_services_unknown | Number of services associated with the host which are in an `UNKNOWN` state. + host.num_services_critical | Number of services associated with the host which are in a `CRITICAL` state. -If you don't set the `states` and `types` configuration attributes for the `User` -object, notifications for all states and types will be sent. +### Service Runtime Macros -Details on troubleshooting notification problems can be found [here](#troubleshooting). +The following service macros are available in all commands that are executed for +services: -> **Note** -> -> Make sure that the [notification](#features) feature is enabled on your master instance -> in order to execute notification commands. + Name | Description + ---------------------------|-------------- + service.name | The short name of the service object. + service.display_name | The value of the `display_name` attribute. + service.check_command | The short name of the command along with any arguments to be used for the check. + service.state | The service's current state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`. + service.state_id | The service's current state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown). + service.state_type | The service's current state type. Can be one of `SOFT` and `HARD`. + service.check_attempt | The current check attempt number. + service.max_check_attempts | The maximum number of checks which are executed before changing to a hard state. + service.last_state | The service's previous state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`. + service.last_state_id | The service's previous state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown). + service.last_state_type | The service's previous state type. Can be one of `SOFT` and `HARD`. + service.last_state_change | The last state change's timestamp. + service.downtime_depth | The number of active downtimes. + service.duration_sec | The time since the last state change. + service.latency | The service's check latency. + service.execution_time | The service's check execution time. + service.output | The last check's output. + service.perfdata | The last check's performance data. + service.last_check | The timestamp when the last check was executed. + service.check_source | The monitoring instance that performed the last check. -You should choose which information you (and your notified users) are interested in -case of emergency, and also which information does not provide any value to you and -your environment. +### Command Runtime Macros -An example notification command is explained [here](#notification-commands). +The following custom attributes are available in all commands: -You can add all shared attributes to a `Notification` template which is inherited -to the defined notifications. That way you'll save duplicated attributes in each -`Notification` object. Attributes can be overridden locally. + Name | Description + -----------------------|-------------- + command.name | The name of the command object. - template Notification "generic-notification" { - interval = 15m +### User Runtime Macros - command = "mail-service-notification" +The following custom attributes are available in all commands that are executed for +users: - states = [ Warning, Critical, Unknown ] - types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart, - FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ] + Name | Description + -----------------------|-------------- + user.name | The name of the user object. + user.display_name | The value of the display_name attribute. - period = "24x7" - } +### Notification Runtime Macros -The time period `24x7` is shipped as example configuration with Icinga 2. + Name | Description + -----------------------|-------------- + notification.type | The type of the notification. + notification.author | The author of the notification comment, if existing. + notification.comment | The comment of the notification, if existing. -Use the `apply` keyword to create `Notification` objects for your services: +### Global Runtime Macros - apply Notification "mail" to Service { - import "generic-notification" +The following macros are available in all executed commands: - command = "mail-notification" - users = [ "icingaadmin" ] + Name | Description + -----------------------|-------------- + icinga.timet | Current UNIX timestamp. + icinga.long_date_time | Current date and time including timezone information. Example: `2014-01-03 11:23:08 +0000` + icinga.short_date_time | Current date and time. Example: `2014-01-03 11:23:08` + icinga.date | Current date. Example: `2014-01-03` + icinga.time | Current time including timezone information. Example: `11:23:08 +0000` + icinga.uptime | Current uptime of the Icinga 2 process. - assign where service.name == "mysql" - } +The following macros provide global statistics: -Instead of assigning users to notifications, you can also add the `user_groups` -attribute with a list of user groups to the `Notification` object. Icinga 2 will -send notifications to all group members. + Name | Description + ----------------------------------|-------------- + icinga.num_services_ok | Current number of services in state 'OK'. + icinga.num_services_warning | Current number of services in state 'Warning'. + icinga.num_services_critical | Current number of services in state 'Critical'. + icinga.num_services_unknown | Current number of services in state 'Unknown'. + icinga.num_services_pending | Current number of pending services. + icinga.num_services_unreachable | Current number of unreachable services. + icinga.num_services_flapping | Current number of flapping services. + icinga.num_services_in_downtime | Current number of services in downtime. + icinga.num_services_acknowledged | Current number of acknowledged service problems. + icinga.num_hosts_up | Current number of hosts in state 'Up'. + icinga.num_hosts_down | Current number of hosts in state 'Down'. + icinga.num_hosts_unreachable | Current number of unreachable hosts. + icinga.num_hosts_flapping | Current number of flapping hosts. + icinga.num_hosts_in_downtime | Current number of hosts in downtime. + icinga.num_hosts_acknowledged | Current number of acknowledged host problems. -### Notification Escalations -When a problem notification is sent and a problem still exists at the time of re-notification -you may want to escalate the problem to the next support level. A different approach -is to configure the default notification by email, and escalate the problem via SMS -if not already solved. -You can define notification start and end times as additional configuration -attributes making the `Notification` object a so-called `notification escalation`. -Using templates you can share the basic notification attributes such as users or the -`interval` (and override them for the escalation then). -Using the example from above, you can define additional users being escalated for SMS -notifications between start and end time. +## Apply Rules - object User "icinga-oncall-2nd-level" { - display_name = "Icinga 2nd Level" +Instead of assigning each object ([Service](6-object-types.md#objecttype-service), +[Notification](6-object-types.md#objecttype-notification), [Dependency](6-object-types.md#objecttype-dependency), +[ScheduledDowntime](6-object-types.md#objecttype-scheduleddowntime)) +based on attribute identifiers for example `host_name` objects can be [applied](19-language-reference.md#apply). - vars.mobile = "+1 555 424642" - } - - object User "icinga-oncall-1st-level" { - display_name = "Icinga 1st Level" - - vars.mobile = "+1 555 424642" - } +Before you start using the apply rules keep the following in mind: -Define an additional `NotificationCommand` for SMS notifications. +* Define the best match. + * A set of unique [custom attributes](#custom-attributes-apply) for these hosts/services? + * Or [group](3-monitoring-basics.md#groups) memberships, e.g. a host being a member of a hostgroup, applying services to it? + * A generic pattern [match](19-language-reference.md#function-calls) on the host/service name? + * [Multiple expressions combined](3-monitoring-basics.md#using-apply-expressions) with `&&` or `||` [operators](19-language-reference.md#expression-operators) +* All expressions must return a boolean value (an empty string is equal to `false` e.g.) > **Note** > -> The example is not complete as there are many different SMS providers. -> Please note that sending SMS notifications will require an SMS provider -> or local hardware with a SIM card active. +> You can set/override object attributes in apply rules using the respectively available +> objects in that scope (host and/or service objects). - object NotificationCommand "sms-notification" { - command = [ - PluginDir + "/send_sms_notification", - "$mobile$", - "..." - } +[Custom attributes](3-monitoring-basics.md#custom-attributes) can also store nested dictionaries and arrays. That way you can use them +for not only matching for their existance or values in apply expressions, but also assign +("inherit") their values into the generated objected from apply rules. -The two new notification escalations are added onto the host `localhost` -and its service `ping4` using the `generic-notification` template. -The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification` -command) after `30m` until `1h`. +* [Apply services to hosts](3-monitoring-basics.md#using-apply-services) +* [Apply notifications to hosts and services](3-monitoring-basics.md#using-apply-notifications) +* [Apply dependencies to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes) +* [Apply scheduled downtimes to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes) -> **Note** -> -> The `interval` was set to 15m in the `generic-notification` -> template example. Lower that value in your escalations by using a secondary -> template or by overriding the attribute directly in the `notifications` array -> position for `escalation-sms-2nd-level`. +A more advanced example is using [apply with for loops on arrays or +dictionaries](#using-apply-for) for example provided by +[custom atttributes](#custom-attributes-apply) or groups. -If the problem does not get resolved nor acknowledged preventing further notifications -the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was -notified, but only for one hour (`2h` as `end` key for the `times` dictionary). +> **Tip** +> +> Building configuration in that dynamic way requires detailed information +> of the generated objects. Use the `object list` [CLI command](8-cli-commands.md#cli-command-object) +> after successful [configuration validation](8-cli-commands.md#config-validation). - apply Notification "mail" to Service { - import "generic-notification" - command = "mail-notification" - users = [ "icingaadmin" ] +### Apply Rules Expressions - assign where service.name == "ping4" - } +You can use simple or advanced combinations of apply rule expressions. Each +expression must evaluate into the boolean `true` value. An empty string +will be for instance interpreted as `false`. In a similar fashion undefined +attributes will return `false`. - apply Notification "escalation-sms-2nd-level" to Service { - import "generic-notification" +Returns `false`: - command = "sms-notification" - users = [ "icinga-oncall-2nd-level" ] + assign where host.vars.attribute_does_not_exist - times = { - begin = 30m - end = 1h - } +Multiple `assign where` condition rows are evaluated as `OR` condition. - assign where service.name == "ping4" - } +You can combine multiple expressions for matching only a subset of objects. In some cases, +you want to be able to add more than one assign/ignore where expression which matches +a specific condition. To achieve this you can use the logical `and` and `or` operators. - apply Notification "escalation-sms-1st-level" to Service { - import "generic-notification" - command = "sms-notification" - users = [ "icinga-oncall-1st-level" ] +Match all `*mysql*` patterns in the host name and (`&&`) custom attribute `prod_mysql_db` +matches the `db-*` pattern. All hosts with the custom attribute `test_server` set to `true` +should be ignored, or any host name ending with `*internal` pattern. - times = { - begin = 1h - end = 2h - } + object HostGroup "mysql-server" { + display_name = "MySQL Server" - assign where service.name == "ping4" + assign where match("*mysql*", host.name) && match("db-*", host.vars.prod_mysql_db) + ignore where host.vars.test_server == true + ignore where match("*internal", host.name) } -### Notification Delay - -Sometimes the problem in question should not be notified when the notification is due -(the object reaching the `HARD` state) but a defined time duration afterwards. In Icinga 2 -you can use the `times` dictionary and set `begin = 15m` as key and value if you want to -postpone the first notification for 15 minutes. Leave out the `end` key - if not set, -Icinga 2 will not check against any end time for this notification. +Similar example for advanced notification apply rule filters: If the service +attribute `notes` contains the `has gold support 24x7` string `AND` one of the +two condition passes: Either the `customer` host custom attribute is set to `customer-xy` +`OR` the host custom attribute `always_notify` is set to `true`. - apply Notification "mail" to Service { - import "generic-notification" +The notification is ignored for services whose host name ends with `*internal` +`OR` the `priority` custom attribute is [less than](19-language-reference.md#expression-operators) `2`. - command = "mail-notification" - users = [ "icingaadmin" ] + template Notification "cust-xy-notification" { + users = [ "noc-xy", "mgmt-xy" ] + command = "mail-service-notification" + } - times.begin = 15m // delay first notification + apply Notification "notify-cust-xy-mysql" to Service { + import "cust-xy-notification" - assign where service.name == "ping4" + assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true + ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true) } -### Notification Filters by State and Type - -If there are no notification state and type filter attributes defined at the `Notification` -or `User` object Icinga 2 assumes that all states and types are being notified. -Available state and type filters for notifications are: - template Notification "generic-notification" { - states = [ Warning, Critical, Unknown ] - types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart, - FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ] - } +### Apply Services to Hosts -If you are familiar with Icinga 1.x `notification_options` please note that they have been split -into type and state to allow more fine granular filtering for example on downtimes and flapping. -You can filter for acknowledgements and custom notifications too. +The sample configuration already includes a detailed example in [hosts.conf](5-configuring-icinga-2.md#hosts-conf) +and [services.conf](5-configuring-icinga-2.md#services-conf) for this use case. +The example for `ssh` applies a service object to all hosts with the `address` +attribute being defined and the custom attribute `os` set to the string `Linux` in `vars`. -## Time Periods + apply Service "ssh" { + import "generic-service" -Time Periods define time ranges in Icinga where event actions are -triggered, for example whether a service check is executed or not within -the `check_period` attribute. Or a notification should be sent to -users or not, filtered by the `period` and `notification_period` -configuration attributes for `Notification` and `User` objects. + check_command = "ssh" -> **Note** -> -> If you are familar with Icinga 1.x - these time period definitions -> are called `legacy timeperiods` in Icinga 2. -> -> An Icinga 2 legacy timeperiod requires the `ITL` provided template ->`legacy-timeperiod`. - -The `TimePeriod` attribute `ranges` may contain multiple directives, -including weekdays, days of the month, and calendar dates. -These types may overlap/override other types in your ranges dictionary. - -The descending order of precedence is as follows: - -* Calendar date (2008-01-01) -* Specific month date (January 1st) -* Generic month date (Day 15) -* Offset weekday of specific month (2nd Tuesday in December) -* Offset weekday (3rd Monday) -* Normal weekday (Tuesday) - -If you don't set any `check_period` or `notification_period` attribute -on your configuration objects Icinga 2 assumes `24x7` as time period -as shown below. - - object TimePeriod "24x7" { - import "legacy-timeperiod" - - display_name = "Icinga 2 24x7 TimePeriod" - ranges = { - "monday" = "00:00-24:00" - "tuesday" = "00:00-24:00" - "wednesday" = "00:00-24:00" - "thursday" = "00:00-24:00" - "friday" = "00:00-24:00" - "saturday" = "00:00-24:00" - "sunday" = "00:00-24:00" - } + assign where host.address && host.vars.os == "Linux" } -If your operation staff should only be notified during workhours -create a new timeperiod named `workhours` defining a work day from -09:00 to 17:00. - object TimePeriod "workhours" { - import "legacy-timeperiod" +Other detailed scenario examples are used in their respective chapters, for example +[apply services with custom command arguments](#using-apply-services-command-arguments). - display_name = "Icinga 2 8x5 TimePeriod" - ranges = { - "monday" = "09:00-17:00" - "tuesday" = "09:00-17:00" - "wednesday" = "09:00-17:00" - "thursday" = "09:00-17:00" - "friday" = "09:00-17:00" - } - } +### Apply Notifications to Hosts and Services -Use the `period` attribute to assign time periods to -`Notification` and `Dependency` objects: +Notifications are applied to specific targets (`Host` or `Service`) and work in a similar +manner: - object Notification "mail" { - import "generic-notification" - host_name = "localhost" + apply Notification "mail-noc" to Service { + import "mail-service-notification" - command = "mail-notification" - users = [ "icingaadmin" ] - period = "workhours" + user_groups = [ "noc" ] + + assign where host.vars.notification.mail } -## Commands +In this example the `mail-noc` notification will be created as object for all services having the +`notification.mail` custom attribute defined. The notification command is set to `mail-service-notification` +and all members of the user group `noc` will get notified. -Icinga 2 uses three different command object types to specify how -checks should be performed, notifications should be sent, and -events should be handled. +### Apply Dependencies to Hosts and Services -### Environment Variables for Commands +Detailed examples can be found in the [dependencies](3-monitoring-basics.md#dependencies) chapter. -Please check [Runtime Custom Attributes as Environment Variables](#runtime-custom-attribute-env-vars). +### Apply Recurring Downtimes to Hosts and Services +The sample confituration includes an example in [downtimes.conf](5-configuring-icinga-2.md#downtimes-conf). -### Check Commands +Detailed examples can be found in the [recurring downtimes](4-advanced-topics.md#recurring-downtimes) chapter. -`CheckCommand` objects define the command line how a check is called. -> **Note** -> -> Make sure that the [checker](#features) feature is enabled in order to -> execute checks. +### Using Apply For Rules -#### Integrate the Plugin with a CheckCommand Definition +Next to the standard way of using apply rules there is the requirement of generating +apply rules objects based on set (array or dictionary). That way you'll save quite +of a lot of duplicated apply rules by combining them into one generic generating +the object name with or without a prefix. -`CheckCommand` objects require the [ITL template](#itl-plugin-check-command) -`plugin-check-command` to support native plugin based check methods. +The sample configuration already includes a detailed example in [hosts.conf](5-configuring-icinga-2.md#hosts-conf) +and [services.conf](5-configuring-icinga-2.md#services-conf) for this use case. -Unless you have done so already, download your check plugin and put it -into the `PluginDir` directory. The following example uses the -`check_disk` plugin shipped with the Monitoring Plugins package. +Imagine a different example: You are monitoring your switch (hosts) with many +interfaces (services). The following requirements/problems apply: -The plugin path and all command arguments are made a list of -double-quoted string arguments for proper shell escaping. +* Each interface service check should be named with a prefix and a running number +* Each interface has its own vlan tag +* Some interfaces have QoS enabled +* Additional attributes such as `display_name` or `notes, `notes_url` and `action_url` must be +dynamically generated -Call the `check_disk` plugin with the `--help` parameter to see -all available options. Our example defines warning (`-w`) and -critical (`-c`) thresholds for the disk usage. Without any -partition defined (`-p`) it will check all local partitions. +By defining the `interfaces` dictionary with three example interfaces on the `core-switch` +host object, you'll make sure to pass the storage required by the for loop in the service apply +rule. - icinga@icinga2 $ /usr/lib/nagios/plugins/check_disk --help - ... - This plugin checks the amount of used disk space on a mounted file system - and generates an alert if free space is less than one of the threshold values + object Host "core-switch" { + import "generic-host" + address = "127.0.0.1" - Usage: - check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device} - [-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ] - [-t timeout] [-u unit] [-v] [-X type] [-N type] - ... + vars.interfaces["0"] = { + port = 1 + vlan = "internal" + address = "127.0.0.2" + qos = "enabled" + } + vars.interfaces["1"] = { + port = 2 + vlan = "mgmt" + address = "127.0.1.2" + } + vars.interfaces["2"] = { + port = 3 + vlan = "remote" + address = "127.0.2.2" + } + } -> **Note** -> -> Don't execute plugins as `root` and always use the absolute path to the plugin! Trust us. +You can also omit the `"if-"` string, then all generated service names are directly +taken from the `if_name` variable value. -Next step is to understand how command parameters are being passed from -a host or service object, and add a `CheckCommand` definition based on these -required parameters and/or default values. +The config dictionary contains all key-value pairs for the specific interface in one +loop cycle, like `port`, `vlan`, `address` and `qos` for the `0` interface. -#### Passing Check Command Parameters from Host or Service +By defining a default value for the custom attribute `qos` in the `vars` dictionary +before adding the `config` dictionary we''ll ensure that this attribute is always defined. -Unlike Icinga 1.x check command parameters are defined as custom attributes -which can be accessed as runtime macros by the executed check command. +After `vars` is fully populated, all object attributes can be set. For strings, you can use +string concatention with the `+` operator. -Define the default check command custom attribute `disk_wfree` and `disk_cfree` -(freely definable naming schema) and their default threshold values. You can -then use these custom attributes as runtime macros for [command arguments](#command-arguments) -on the command line. +You can also specifiy the check command that way. -The default custom attributes can be overridden by the custom attributes -defined in the service using the check command `my-disk`. The custom attributes -can also be inherited from a parent template using additive inheritance (`+=`). + apply Service "if-" for (if_name => config in host.vars.interfaces) { + import "generic-service" + check_command = "ping4" + vars.qos = "disabled" + vars += config - object CheckCommand "my-disk" { - import "plugin-check-command" + display_name = "if-" + if_name + "-" + vars.vlan - command = [ PluginDir + "/check_disk" ] + notes = "Interface check for Port " + string(vars.port) + " in VLAN " + vars.vlan + " on Address " + vars.address + " QoS " + vars.qos + notes_url = "http://foreman.company.com/hosts/" + host.name + action_url = "http://snmp.checker.company.com/" + host.name + "if-" + if_name + } - arguments = { - "-w" = "$disk_wfree$%" - "-c" = "$disk_cfree$%" - } +Note that numbers must be explicitely casted to string when adding to strings. +This can be achieved by wrapping them into the [string()](19-language-reference.md#function-calls) function. - vars.disk_wfree = 20 - vars.disk_cfree = 10 - } +> **Tip** +> +> Building configuration in that dynamic way requires detailed information +> of the generated objects. Use the `object list` [CLI command](8-cli-commands.md#cli-command-object) +> after successful [configuration validation](8-cli-commands.md#config-validation). -The host `localhost` with the service `my-disk` checks all disks with modified -custom attributes (warning thresholds at `10%`, critical thresholds at `5%` -free disk space). +### Use Object Attributes in Apply Rules - object Host "localhost" { +Since apply rules are evaluated after the generic objects, you +can reference existing host and/or service object attributes as +values for any object attribute specified in that apply rule. + + object Host "opennebula-host" { import "generic-host" + address = "10.1.1.2" - address = "127.0.0.1" - address6 = "::1" + vars.hosting["xyz"] = { + http_uri = "/shop" + customer_name = "Customer xyz" + customer_id = "7568" + support_contract = "gold" + } + vars.hosting["abc"] = { + http_uri = "/shop" + customer_name = "Customer xyz" + customer_id = "7568" + support_contract = "silver" + } } - object Service "my-disk" { + apply Service for (customer => config in host.vars.hosting) { import "generic-service" + check_command = "ping4" - host_name = "localhost" - check_command = "my-disk" - - vars.disk_wfree = 10 - vars.disk_cfree = 5 - } - -#### Command Arguments + vars.qos = "disabled" -By defining a check command line using the `command` attribute Icinga 2 -will resolve all macros in the static string or array. Sometimes it is -required to extend the arguments list based on a met condition evaluated -at command execution. Or making arguments optional - only set if the -macro value can be resolved by Icinga 2. + vars += config - object CheckCommand "check_http" { - import "plugin-check-command" + vars.http_uri = "/" + vars.customer + "/" + config.http_uri - command = [ PluginDir + "/check_http" ] + display_name = "Shop Check for " + vars.customer_name + "-" + vars.customer_id - arguments = { - "-H" = "$http_vhost$" - "-I" = "$http_address$" - "-u" = "$http_uri$" - "-p" = "$http_port$" - "-S" = { - set_if = "$http_ssl$" - } - "--sni" = { - set_if = "$http_sni$" - } - "-a" = { - value = "$http_auth_pair$" - description = "Username:password on sites with basic authentication" - } - "--no-body" = { - set_if = "$http_ignore_body$" - } - "-r" = "$http_expect_body_regex$" - "-w" = "$http_warn_time$" - "-c" = "$http_critical_time$" - "-e" = "$http_expect$" - } + notes = "Support contract: " + vars.support_contract + " for Customer " + vars.customer_name + " (" + vars.customer_id + ")." - vars.http_address = "$address$" - vars.http_ssl = false - vars.http_sni = false + notes_url = "http://foreman.company.com/hosts/" + host.name + action_url = "http://snmp.checker.company.com/" + host.name + "/" + vars.customer_id } -The example shows the `check_http` check command defining the most common -arguments. Each of them is optional by default and will be omitted if -the value is not set. For example if the service calling the check command -does not have `vars.http_port` set, it won't get added to the command -line. -If the `vars.http_ssl` custom attribute is set in the service, host or command -object definition, Icinga 2 will add the `-S` argument based on the `set_if` -option to the command line. -That way you can use the `check_http` command definition for both, with and -without SSL enabled checks saving you duplicated command definitions. - -Details on all available options can be found in the -[CheckCommand object definition](#objecttype-checkcommand). +## Groups -### Apply Services with custom Command Arguments +A group is a collection of similar objects. Groups are primarily used as a +visualization aid in web interfaces. -Imagine the following scenario: The `my-host1` host is reachable using the default port 22, while -the `my-host2` host requires a different port on 2222. Both hosts are in the hostgroup `my-linux-servers`. +Group membership is defined at the respective object itself. If +you have a hostgroup name `windows` for example, and want to assign +specific hosts to this group for later viewing the group on your +alert dashboard, first create a HostGroup object: - object HostGroup "my-linux-servers" { - display_name = "Linux Servers" - assign where host.vars.os == "Linux" + object HostGroup "windows" { + display_name = "Windows Servers" } - /* this one has port 22 opened */ - object Host "my-host1" { - import "generic-host" - address = "129.168.1.50" - vars.os = "Linux" - } +Then add your hosts to this group: - /* this one listens on a different ssh port */ - object Host "my-host2" { - import "generic-host" - address = "129.168.2.50" - vars.os = "Linux" - vars.custom_ssh_port = 2222 + template Host "windows-server" { + groups += [ "windows" ] } -All hosts in the `my-linux-servers` hostgroup should get the `my-ssh` service applied based on an -[apply rule](#apply). The optional `ssh_port` command argument should be inherited from the host -the service is applied to. If not set, the check command `my-ssh` will omit the argument. - - object CheckCommand "my-ssh" { - import "plugin-check-command" + object Host "mssql-srv1" { + import "windows-server" - command = [ PluginDir + "/check_ssh" ] + vars.mssql_port = 1433 + } - arguments = { - "-p" = "$ssh_port$" - "host" = { - value = "$ssh_address$" - skip_key = true - order = -1 - } - } + object Host "mssql-srv2" { + import "windows-server" - vars.ssh_address = "$address$" + vars.mssql_port = 1433 } - /* apply ssh service */ - apply Service "my-ssh" { - import "generic-service" - check_command = "my-ssh" - - //set the command argument for ssh port with a custom host attribute, if set - vars.ssh_port = "$host.vars.custom_ssh_port$" +This can be done for service and user groups the same way: - assign where "my-linux-servers" in host.groups + object UserGroup "windows-mssql-admins" { + display_name = "Windows MSSQL Admins" } -The `my-host1` will get the `my-ssh` service checking on the default port: + template User "generic-windows-mssql-users" { + groups += [ "windows-mssql-admins" ] + } - [2014-05-26 21:52:23 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ssh', '129.168.1.50': PID 27281 + object User "win-mssql-noc" { + import "generic-windows-mssql-users" -The `my-host2` will inherit the `custom_ssh_port` variable to the service and execute a different command: + email = "noc@example.com" + } - [2014-05-26 21:51:32 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ssh', '-p', '2222', '129.168.2.50': PID 26956 + object User "win-mssql-ops" { + import "generic-windows-mssql-users" - -### Notification Commands - -`NotificationCommand` objects define how notifications are delivered to external -interfaces (E-Mail, XMPP, IRC, Twitter, etc). - -`NotificationCommand` objects require the [ITL template](#itl-plugin-notification-command) -`plugin-notification-command` to support native plugin-based notifications. - -> **Note** -> -> Make sure that the [notification](#features) feature is enabled on your master instance -> in order to execute notification commands. - -Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for -the current check output) sending an email to the user(s) associated with the -notification itself (`$user.email$`). - -If you want to specify default values for some of the custom attribute definitions, -you can add a `vars` dictionary as shown for the `CheckCommand` object. - - object NotificationCommand "mail-service-notification" { - import "plugin-notification-command" - - command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ] - - env = { - NOTIFICATIONTYPE = "$notification.type$" - SERVICEDESC = "$service.name$" - HOSTALIAS = "$host.display_name$" - HOSTADDRESS = "$address$" - SERVICESTATE = "$service.state$" - LONGDATETIME = "$icinga.long_date_time$" - SERVICEOUTPUT = "$service.output$" - NOTIFICATIONAUTHORNAME = "$notification.author$" - NOTIFICATIONCOMMENT = "$notification.comment$" - HOSTDISPLAYNAME = "$host.display_name$" - SERVICEDISPLAYNAME = "$service.display_name$" - USEREMAIL = "$user.email$" - } - } - -The command attribute in the `mail-service-notification` command refers to the following -shell script. The macros specified in the `env` array are exported -as environment variables and can be used in the notification script: - - #!/usr/bin/env bash - template=$(cat <