From a910e45f5100d25cc2633f8b98ecc4bde615538b Mon Sep 17 00:00:00 2001 From: Michael Friedrich Date: Tue, 18 Mar 2014 11:18:17 +0100 Subject: [PATCH] Documentation: Split 'advanced' into multiple sections. Fixes #5791 --- doc/6-advanced-topics.md | 493 ----------------------------- doc/6.01-downtimes.md | 102 ++++++ doc/6.02-comments.md | 10 + doc/6.03-acknowledgements.md | 27 ++ doc/6.04-cluster.md | 200 ++++++++++++ doc/6.05-domains.md | 31 ++ doc/6.06-dependencies.md | 53 ++++ doc/6.07-check-result-freshness.md | 13 + doc/6.08-check-flapping.md | 11 + doc/6.09-volatile-services.md | 12 + doc/6.10-modified-attributes.md | 8 + doc/6.11-plugin-api.md | 10 + 12 files changed, 477 insertions(+), 493 deletions(-) create mode 100644 doc/6.01-downtimes.md create mode 100644 doc/6.02-comments.md create mode 100644 doc/6.03-acknowledgements.md create mode 100644 doc/6.04-cluster.md create mode 100644 doc/6.05-domains.md create mode 100644 doc/6.06-dependencies.md create mode 100644 doc/6.07-check-result-freshness.md create mode 100644 doc/6.08-check-flapping.md create mode 100644 doc/6.09-volatile-services.md create mode 100644 doc/6.10-modified-attributes.md create mode 100644 doc/6.11-plugin-api.md diff --git a/doc/6-advanced-topics.md b/doc/6-advanced-topics.md index 70c3752a4..7ca11490e 100644 --- a/doc/6-advanced-topics.md +++ b/doc/6-advanced-topics.md @@ -1,494 +1 @@ # Advanced Topics - -## Downtimes - -Downtimes can be scheduled for planned server maintenance or -any other targetted service outage you are aware of in advance. - -Downtimes will suppress any notifications, and may trigger other -downtimes too. If the downtime was set by accident, or the duration -exceeds the maintenance, you can manually cancel the downtime. -Planned downtimes will also be taken into account for SLA reporting -tools calculating the SLAs based on the state and downtime history. - -> **Note** -> -> Downtimes may overlap with their start and end times. If there -> are multiple downtimes triggered for one object, the overall downtime depth -> will be more than `1`. This is useful when you want to extend -> your maintenance window taking longer than expected. - -### Fixed and Flexible Downtimes - -A `fixed` downtime will be activated at the defined start time, and -removed at the end time. During this time window the service state -will change to `NOT-OK` and then actually trigger the downtime. -Notifications are suppressed and the downtime depth is incremented. - -Common scenarios are a planned distribution upgrade on your linux -servers, or database updates in your warehouse. The customer knows -about a fixed downtime window between 23:00 and 24:00. After 24:00 -all problems should be alerted again. Solution is simple - -schedule a `fixed` downtime starting at 23:00 and ending at 24:00. - -Unlike a `fixed` downtime, a `flexible` downtime end does not necessarily -happen at the provided end time. Instead the downtime will be triggered -by the state change in the time span defined by start and end time, but -then last a defined duration in minutes. - -Imagine the following scenario: Your service is frequently polled -by users trying to grab free deleted domains for immediate registration. -Between 07:30 and 08:00 the impact will hit for 15 minutes and generate -a network outage visible to the monitoring. The service is still alive, -but answering too slow to Icinga 2 service checks. -For that reason, you may want to schedule a downtime between 07:30 and -08:00 with a duration of 15 minutes. The downtime will then last from -its trigger time until the duration is over. After that, the downtime -is removed (may happen before or after the actual end time!). - -### Scheduling a downtime - -This can either happen through a web interface (Icinga 1.x Classic UI or Web) -or by using the external command pipe provided by the `ExternalCommandListener` -configuration. - -Fixed downtimes require a start and end time (a duration will be ignored). -Flexible downtimes need a start and end time for the time span, and a duration -independent from that time span. - -> **Note** -> -> Modern web interfaces treat services in a downtime as `handled`. - -### Triggered Downtimes - -This is optional when scheduling a downtime. If there is already a downtime -scheduled for a future maintenance, the current downtime can be triggered by -that downtime. This renders useful if you have scheduled a host downtime and -are now scheduling a child host's downtime getting triggered by the parent -downtime on NOT-OK state change. - -### Recurring Downtimes - -[ScheduledDowntime objects](#objecttype-scheduleddowntime) can be used to set up -recurring downtimes for services. - -Example: - - template ScheduledDowntime "backup-downtime" { - author = "icingaadmin", - comment = "Scheduled downtime for backup", - - ranges = { - monday = "02:00-03:00", - tuesday = "02:00-03:00", - wednesday = "02:00-03:00", - thursday = "02:00-03:00", - friday = "02:00-03:00", - saturday = "02:00-03:00", - sunday = "02:00-03:00" - } - } - - object Host "localhost" inherits "generic-host" { - ... - services["load"] = { - templates = [ "generic-service" ], - - check_command = "load", - - scheduled_downtimes["backup"] = { - templates = [ "backup-downtime" ] - } - }, - } - - -## Comments - -Comments can be added at runtime and are persistent over restarts. You can -add useful information for others on repeating incidents (for example -"last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which -is primarly accessible using web interfaces. - -Adding and deleting comment actions are possible through the external command pipe -provided with the `ExternalCommandListener` configuration. The caller must -pass the comment id in case of manipulating an existing comment. - -## Acknowledgements - -If a problem is alerted and notified you may signal the other notification -receipients that you are aware of the problem and will handle it. - -By sending an acknowledgement to Icinga 2 (using the external command pipe -provided with `ExternalCommandListener` configuration) all future notifications -are suppressed, a new comment is added with the provided description and -a notification with the type `NotificationFilterAcknowledgement` is sent -to all notified users. - -> **Note** -> -> Modern web interfaces treat acknowledged problems as `handled`. - -### Expiring Acknowledgements - -Once a problem is acknowledged it may disappear from your `handled problems` -dashboard and no-one ever looks at it again since it will suppress -notifications too. - -This `fire-and-forget` action is quite common. If you're sure that a -current problem should be resolved in the future at a defined time, -you can define an expiration time when acknowledging the problem. - -Icinga 2 will clear the acknowledgement when expired and start to -re-notify if the problem persists. - -## Cluster - -An Icinga 2 cluster consists of two or more nodes and can reside on multiple -architectures. The base concept of Icinga 2 is the possibility to add additional -features using components. In case of a cluster setup you have to add the -cluster feature to all nodes. Before you start configuring the diffent nodes -it's necessary to setup the underlying communication layer based on SSL. - -### Certificate Authority and Certificates - -Icinga 2 comes with two scripts helping you to create CA and node certificates -for you Icinga 2 Cluster. - -The first step is the creation of CA using the following command: - - icinga2-build-ca - -Please make sure to export a variable containing an empty folder for the created -CA files: - - export ICINGA_CA="/root/icinga-ca" - -In the next step you have to create a certificate and a key file for every node -using the following command: - - icinga2-build-key icinga-node-1 - -Please create a certificate and a key file for every node in the Icinga 2 -Cluster and save the CA key in case you want to set up certificates for -additional nodes at a later date. - -### Enable the Cluster Configuration - -Until the cluster-component is moved into an independent feature you have to -enable the required libraries in the icinga2.conf configuration file: - - library "cluster" - -### Configure the ClusterListener Object - -The ClusterListener needs to be configured on every node in the cluster with the -following settings: - - Configuration Setting |Value - -------------------------|------------------------------------ - ca_path | path to ca.crt file - cert_path | path to server certificate - key_path | path to server key - bind_port | port for incoming and outgoing conns - peers | array of all reachable nodes - ------------------------- ------------------------------------ - -A sample config part can look like this: - - /** - * Load cluster library and configure ClusterListener using certificate files - */ - library "cluster" - - object ClusterListener "cluster" { - ca_path = "/etc/icinga2/ca/ca.crt", - cert_path = "/etc/icinga2/ca/icinga-node-1.crt", - key_path = "/etc/icinga2/ca/icinga-node-1.key", - - bind_port = 8888, - - peers = [ "icinga-node-2" ] - } - -> **Note** -> -> The certificate files must be readable by the user Icinga 2 is running as. Also, -> the private key file should not be world-readable. - -Peers configures the direction used to connect multiple nodes together. If have -a three node cluster consisting of - -* node-1 -* node-2 -* node-3 - -and `node-3` is only reachable from `node-2`, you have to consider this in your -peer configuration. - -### Configure Cluster Endpoints - -In addition to the configured port and hostname every endpoint can have specific -abilities to send configuration files to other nodes and limit the hosts allowed -to send configuration files. - - Configuration Setting |Value - -------------------------|------------------------------------ - host | hostname - port | port - accept_config | defines all nodes allowed to send configs - config_files | defines all files to be send to that node - MUST BE AN ABSOLUTE PATH - ------------------------- ------------------------------------ - -A sample config part can look like this: - - /** - * Configure config master endpoint - */ - - object Endpoint "icinga-node-1" { - host = "icinga-node-1.localdomain", - port = 8888, - config_files = ["/etc/icinga2/conf.d/*.conf"] - } - -If you update the configuration files on the configured file sender, it will -force a restart on all receiving nodes after validating the new config. - -A sample config part for a config receiver endpoint can look like this: - - /** - * Configure config receiver endpoint - */ - - object Endpoint "icinga-node-2" { - host = "icinga-node-2.localdomain", - port = 8888, - accept_config = [ "icinga-node-1" ] - } - -By default these configuration files are saved in /var/lib/icinga2/cluster/config. - -In order to load configuration files which were received from a remote Icinga 2 -instance you will have to add the following include directive to your -`icinga2.conf` configuration file: - - include (IcingaLocalStateDir + "/lib/icinga2/cluster/config/*/*") - -### Initial Cluster Sync - -In order to make sure that all of your cluster nodes have the same state you will -have to pick one of the nodes as your initial "master" and copy its state file -to all the other nodes. - -You can find the state file in `/var/lib/icinga2/icinga2.state`. Before copying -the state file you should make sure that all your cluster nodes are properly shut -down. - - -### Assign Services to Cluster Nodes - -By default all services are distributed among the cluster nodes with the `Checker` -feature enabled. -If you require specific services to be only executed by one or more checker nodes -within the cluster, you must define `authorities` as additional service object -attribute. Required Endpoints must be defined as array. - - object Host "dmz-host1" inherits "generic-host" { - services["dmz-oracledb"] = { - templates = [ "generic-service" ], - authorities = [ "icinga-node-1" ], - } - } - -> **Tip** -> -> Most common usecase is building a classic Master-Slave-Setup. The master node -> does not have the `Checker` feature enabled, and the slave nodes are checking -> services based on their location, inheriting from a global service template -> defining the authorities. - -### Cluster Health Check - -The Icinga 2 [ITL](#itl) ships an internal check command checking all configured -`EndPoints` in the cluster setup. The check result will become critical if -one or more configured nodes are not connected. - -Example: - - object Host "icinga2a" inherits "generic-host" { - services["cluster"] = { - templates = [ "generic-service" ], - check_interval = 1m, - check_command = "cluster", - authorities = [ "icinga2a" ] - }, - } - -> **Note** -> -> Each cluster node should execute its own local cluster health check to -> get an idea about network related connection problems from different -> point of views. Use the `authorities` attribute to assign the service -> check to the configured node. - -### Host With Multiple Cluster Nodes - -Special scenarios might require multiple cluster nodes running on a single host. -By default Icinga 2 and its features will drop their runtime data below the prefix -`IcingaLocalStateDir`. By default packages will set that path to `/var`. -You can either set that variable as constant configuration -definition in [icinga2.conf](#icinga2-conf) or pass it as runtime variable to -the Icinga 2 daemon. - - # icinga2 -c /etc/icinga2/node1/icinga2.conf -DIcingaLocalStateDir=/opt/node1/var - -## Domains - -A [Service](#objecttype-service) object can be restricted using the `domains` attribute -array specifying endpoint privileges. -A Domain object specifices the ACLs applied for each [Endpoint](#objecttype-endpoint). - -The following example assigns the domain `dmz-db` to the service `dmz-oracledb`. Endpoint -`icinga-node-dmz-1` does not allow any object modification (no commands, check results) and only -relays local messages to the remote node(s). The endpoint `icinga-node-dmz-2` processes all -messages read and write (accept check results, commands and also relay messages to remote -nodes). - -That way the service `dmz-oracledb` on endpoint `icinga-node-dmz-1` will not be modified -by any cluster event message, and could be checked by the local authority too presenting -a different state history. `icinga-node-dmz-2` still receives all cluster message updates -from the `icinga-node-dmz-1` endpoint. - - object Host "dmz-host1" inherits "generic-host" { - services["dmz-oracledb"] = { - templates = [ "generic-service" ], - domains = [ "dmz-db" ], - authorities = [ "icinga-node-dmz-1", "icinga-node-dmz-2"], - } - } - - object Domain "dmz-db" { - acl = { - icinga-node-dmz-1 = (DomainPrivReadOnly), - icinga-node-dmz-2 = (DomainPrivReadWrite) - } - } - -## Dependencies - -Icinga 2 uses host and service [Dependency](#objecttype-dependency) objects either directly -defined or as inline definition as `dependencies` dictionary. The `parent_host` and `parent_service` -attributes are mandatory, `child_host` and `child_service` attributes are obsolete within -inline definitions in an existing service object or service inline definition. - -A service can depend on a host, and vice versa. A service has an implicit dependency (parent) -to its host. A host to host dependency acts implicit as host parent relation. -When dependencies are calculated, not only the immediate parent is taken into -account but all parents are inherited. - -A common scenario is the Icinga 2 server behind a router. Checking internet -access by pinging the Google DNS server `google-dns` is a common method, but -will fail in case the `dsl-router` host is down. Therefore the example below -defines a host dependency which acts implicit as parent relation too. - -Furthermore the host may be reachable but ping samples are dropped by the -router's firewall. In case the `dsl-router``ping4` service check fails, all -further checks for the `google-dns` `ping4` service should be suppressed. -This is achieved by setting the `disable_checks` attribute to `true`. - - object Host "dsl-router" { - services["ping4"] = { - templates = "generic-service", - check_command = "ping4" - } - - macros = { - address = "192.168.1.1", - }, - } - - object Host "google-dns" { - services["ping4"] = { - templates = "generic-service", - check_command = "ping4", - dependencies["dsl-router-ping4"] = { - parent_host = "dsl-router", - parent_service = "ping4", - disable_checks = true - } - } - - macros = { - address = "8.8.8.8", - }, - - dependencies["dsl-router"] = { - parent_host = "dsl-router" - }, - - } - -## Check Result Freshness - -In Icinga 2 active check freshness is enabled by default. It is determined by the -`check_interval` attribute and no incoming check results in that period of time. - - threshold = last check execution time + check interval - -Passive check freshness is calculated from the `check_interval` attribute if set. - - threshold = last check result time + check interval - -If the freshness checks are invalid, a new check is executed defined by the -`check_command` attribute. - -## Check Flapping - -The flapping algorithm used in Icinga 2 does not store the past states but -calculcates the flapping threshold from a single value based on counters and -half-life values. Icinga 2 compares the value with a single flapping threshold -configuration attribute named `flapping_threshold`. - -> **Note** -> -> Flapping must be explicitely enabled setting the `Service` object attribute -> `enable_flapping = 1`. - -## Volatile Services - -By default all services remain in a non-volatile state. When a problem -occurs, the `SOFT` state applies and once `max_check_attempts` attribute -is reached with the check counter, a `HARD` state transition happens. -Notifications are only triggered by `HARD` state changes and are then -re-sent defined by the `notification_interval` attribute. - -It may be reasonable to have a volatile service which stays in a `HARD` -state type if the service stays in a `NOT-OK` state. That way each -service recheck will automatically trigger a notification unless the -service is acknowledged or in a scheduled downtime. - -## Modified Attributes - -Icinga 2 allows you to modify defined object attributes at runtime different to -the local configuration object attributes. These modified attributes are -stored as bit-shifted-value and made available in backends. Icinga 2 stores -modified attributes in its state file and restores them on restart. - -Modified Attributes can be reset using external commands. - - -## Plugin API - -Currently the native plugin api inherited from the `Monitoring Plugins` (former -`Nagios Plugins`) project is available. -Future specifications will be documented here. - -### Monitoring Plugin API - -The `Monitoring Plugin API` (former `Nagios Plugin API`) is defined in the -[Monitoring Plugins Development Guidelines](https://www.monitoring-plugins.org/doc/guidelines.html). - - - diff --git a/doc/6.01-downtimes.md b/doc/6.01-downtimes.md new file mode 100644 index 000000000..6d2ab0b19 --- /dev/null +++ b/doc/6.01-downtimes.md @@ -0,0 +1,102 @@ +## Downtimes + +Downtimes can be scheduled for planned server maintenance or +any other targetted service outage you are aware of in advance. + +Downtimes will suppress any notifications, and may trigger other +downtimes too. If the downtime was set by accident, or the duration +exceeds the maintenance, you can manually cancel the downtime. +Planned downtimes will also be taken into account for SLA reporting +tools calculating the SLAs based on the state and downtime history. + +> **Note** +> +> Downtimes may overlap with their start and end times. If there +> are multiple downtimes triggered for one object, the overall downtime depth +> will be more than `1`. This is useful when you want to extend +> your maintenance window taking longer than expected. + +### Fixed and Flexible Downtimes + +A `fixed` downtime will be activated at the defined start time, and +removed at the end time. During this time window the service state +will change to `NOT-OK` and then actually trigger the downtime. +Notifications are suppressed and the downtime depth is incremented. + +Common scenarios are a planned distribution upgrade on your linux +servers, or database updates in your warehouse. The customer knows +about a fixed downtime window between 23:00 and 24:00. After 24:00 +all problems should be alerted again. Solution is simple - +schedule a `fixed` downtime starting at 23:00 and ending at 24:00. + +Unlike a `fixed` downtime, a `flexible` downtime end does not necessarily +happen at the provided end time. Instead the downtime will be triggered +by the state change in the time span defined by start and end time, but +then last a defined duration in minutes. + +Imagine the following scenario: Your service is frequently polled +by users trying to grab free deleted domains for immediate registration. +Between 07:30 and 08:00 the impact will hit for 15 minutes and generate +a network outage visible to the monitoring. The service is still alive, +but answering too slow to Icinga 2 service checks. +For that reason, you may want to schedule a downtime between 07:30 and +08:00 with a duration of 15 minutes. The downtime will then last from +its trigger time until the duration is over. After that, the downtime +is removed (may happen before or after the actual end time!). + +### Scheduling a downtime + +This can either happen through a web interface (Icinga 1.x Classic UI or Web) +or by using the external command pipe provided by the `ExternalCommandListener` +configuration. + +Fixed downtimes require a start and end time (a duration will be ignored). +Flexible downtimes need a start and end time for the time span, and a duration +independent from that time span. + +> **Note** +> +> Modern web interfaces treat services in a downtime as `handled`. + +### Triggered Downtimes + +This is optional when scheduling a downtime. If there is already a downtime +scheduled for a future maintenance, the current downtime can be triggered by +that downtime. This renders useful if you have scheduled a host downtime and +are now scheduling a child host's downtime getting triggered by the parent +downtime on NOT-OK state change. + +### Recurring Downtimes + +[ScheduledDowntime objects](#objecttype-scheduleddowntime) can be used to set up +recurring downtimes for services. + +Example: + + template ScheduledDowntime "backup-downtime" { + author = "icingaadmin", + comment = "Scheduled downtime for backup", + + ranges = { + monday = "02:00-03:00", + tuesday = "02:00-03:00", + wednesday = "02:00-03:00", + thursday = "02:00-03:00", + friday = "02:00-03:00", + saturday = "02:00-03:00", + sunday = "02:00-03:00" + } + } + + object Host "localhost" inherits "generic-host" { + ... + services["load"] = { + templates = [ "generic-service" ], + + check_command = "load", + + scheduled_downtimes["backup"] = { + templates = [ "backup-downtime" ] + } + }, + } \ No newline at end of file diff --git a/doc/6.02-comments.md b/doc/6.02-comments.md new file mode 100644 index 000000000..10a796e6f --- /dev/null +++ b/doc/6.02-comments.md @@ -0,0 +1,10 @@ +## Comments + +Comments can be added at runtime and are persistent over restarts. You can +add useful information for others on repeating incidents (for example +"last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which +is primarly accessible using web interfaces. + +Adding and deleting comment actions are possible through the external command pipe +provided with the `ExternalCommandListener` configuration. The caller must +pass the comment id in case of manipulating an existing comment. \ No newline at end of file diff --git a/doc/6.03-acknowledgements.md b/doc/6.03-acknowledgements.md new file mode 100644 index 000000000..0a0398636 --- /dev/null +++ b/doc/6.03-acknowledgements.md @@ -0,0 +1,27 @@ +## Acknowledgements + +If a problem is alerted and notified you may signal the other notification +receipients that you are aware of the problem and will handle it. + +By sending an acknowledgement to Icinga 2 (using the external command pipe +provided with `ExternalCommandListener` configuration) all future notifications +are suppressed, a new comment is added with the provided description and +a notification with the type `NotificationFilterAcknowledgement` is sent +to all notified users. + +> **Note** +> +> Modern web interfaces treat acknowledged problems as `handled`. + +### Expiring Acknowledgements + +Once a problem is acknowledged it may disappear from your `handled problems` +dashboard and no-one ever looks at it again since it will suppress +notifications too. + +This `fire-and-forget` action is quite common. If you're sure that a +current problem should be resolved in the future at a defined time, +you can define an expiration time when acknowledging the problem. + +Icinga 2 will clear the acknowledgement when expired and start to +re-notify if the problem persists. \ No newline at end of file diff --git a/doc/6.04-cluster.md b/doc/6.04-cluster.md new file mode 100644 index 000000000..9284d79ec --- /dev/null +++ b/doc/6.04-cluster.md @@ -0,0 +1,200 @@ +## Cluster + +An Icinga 2 cluster consists of two or more nodes and can reside on multiple +architectures. The base concept of Icinga 2 is the possibility to add additional +features using components. In case of a cluster setup you have to add the +cluster feature to all nodes. Before you start configuring the diffent nodes +it's necessary to setup the underlying communication layer based on SSL. + +### Certificate Authority and Certificates + +Icinga 2 comes with two scripts helping you to create CA and node certificates +for you Icinga 2 Cluster. + +The first step is the creation of CA using the following command: + + icinga2-build-ca + +Please make sure to export a variable containing an empty folder for the created +CA files: + + export ICINGA_CA="/root/icinga-ca" + +In the next step you have to create a certificate and a key file for every node +using the following command: + + icinga2-build-key icinga-node-1 + +Please create a certificate and a key file for every node in the Icinga 2 +Cluster and save the CA key in case you want to set up certificates for +additional nodes at a later date. + +### Enable the Cluster Configuration + +Until the cluster-component is moved into an independent feature you have to +enable the required libraries in the icinga2.conf configuration file: + + library "cluster" + +### Configure the ClusterListener Object + +The ClusterListener needs to be configured on every node in the cluster with the +following settings: + + Configuration Setting |Value + -------------------------|------------------------------------ + ca_path | path to ca.crt file + cert_path | path to server certificate + key_path | path to server key + bind_port | port for incoming and outgoing conns + peers | array of all reachable nodes + ------------------------- ------------------------------------ + +A sample config part can look like this: + + /** + * Load cluster library and configure ClusterListener using certificate files + */ + library "cluster" + + object ClusterListener "cluster" { + ca_path = "/etc/icinga2/ca/ca.crt", + cert_path = "/etc/icinga2/ca/icinga-node-1.crt", + key_path = "/etc/icinga2/ca/icinga-node-1.key", + + bind_port = 8888, + + peers = [ "icinga-node-2" ] + } + +> **Note** +> +> The certificate files must be readable by the user Icinga 2 is running as. Also, +> the private key file should not be world-readable. + +Peers configures the direction used to connect multiple nodes together. If have +a three node cluster consisting of + +* node-1 +* node-2 +* node-3 + +and `node-3` is only reachable from `node-2`, you have to consider this in your +peer configuration. + +### Configure Cluster Endpoints + +In addition to the configured port and hostname every endpoint can have specific +abilities to send configuration files to other nodes and limit the hosts allowed +to send configuration files. + + Configuration Setting |Value + -------------------------|------------------------------------ + host | hostname + port | port + accept_config | defines all nodes allowed to send configs + config_files | defines all files to be send to that node - MUST BE AN ABSOLUTE PATH + ------------------------- ------------------------------------ + +A sample config part can look like this: + + /** + * Configure config master endpoint + */ + + object Endpoint "icinga-node-1" { + host = "icinga-node-1.localdomain", + port = 8888, + config_files = ["/etc/icinga2/conf.d/*.conf"] + } + +If you update the configuration files on the configured file sender, it will +force a restart on all receiving nodes after validating the new config. + +A sample config part for a config receiver endpoint can look like this: + + /** + * Configure config receiver endpoint + */ + + object Endpoint "icinga-node-2" { + host = "icinga-node-2.localdomain", + port = 8888, + accept_config = [ "icinga-node-1" ] + } + +By default these configuration files are saved in /var/lib/icinga2/cluster/config. + +In order to load configuration files which were received from a remote Icinga 2 +instance you will have to add the following include directive to your +`icinga2.conf` configuration file: + + include (IcingaLocalStateDir + "/lib/icinga2/cluster/config/*/*") + +### Initial Cluster Sync + +In order to make sure that all of your cluster nodes have the same state you will +have to pick one of the nodes as your initial "master" and copy its state file +to all the other nodes. + +You can find the state file in `/var/lib/icinga2/icinga2.state`. Before copying +the state file you should make sure that all your cluster nodes are properly shut +down. + + +### Assign Services to Cluster Nodes + +By default all services are distributed among the cluster nodes with the `Checker` +feature enabled. +If you require specific services to be only executed by one or more checker nodes +within the cluster, you must define `authorities` as additional service object +attribute. Required Endpoints must be defined as array. + + object Host "dmz-host1" inherits "generic-host" { + services["dmz-oracledb"] = { + templates = [ "generic-service" ], + authorities = [ "icinga-node-1" ], + } + } + +> **Tip** +> +> Most common usecase is building a classic Master-Slave-Setup. The master node +> does not have the `Checker` feature enabled, and the slave nodes are checking +> services based on their location, inheriting from a global service template +> defining the authorities. + +### Cluster Health Check + +The Icinga 2 [ITL](#itl) ships an internal check command checking all configured +`EndPoints` in the cluster setup. The check result will become critical if +one or more configured nodes are not connected. + +Example: + + object Host "icinga2a" inherits "generic-host" { + services["cluster"] = { + templates = [ "generic-service" ], + check_interval = 1m, + check_command = "cluster", + authorities = [ "icinga2a" ] + }, + } + +> **Note** +> +> Each cluster node should execute its own local cluster health check to +> get an idea about network related connection problems from different +> point of views. Use the `authorities` attribute to assign the service +> check to the configured node. + +### Host With Multiple Cluster Nodes + +Special scenarios might require multiple cluster nodes running on a single host. +By default Icinga 2 and its features will drop their runtime data below the prefix +`IcingaLocalStateDir`. By default packages will set that path to `/var`. +You can either set that variable as constant configuration +definition in [icinga2.conf](#icinga2-conf) or pass it as runtime variable to +the Icinga 2 daemon. + + # icinga2 -c /etc/icinga2/node1/icinga2.conf -DIcingaLocalStateDir=/opt/node1/var \ No newline at end of file diff --git a/doc/6.05-domains.md b/doc/6.05-domains.md new file mode 100644 index 000000000..3e79ae35b --- /dev/null +++ b/doc/6.05-domains.md @@ -0,0 +1,31 @@ +## Domains + +A [Service](#objecttype-service) object can be restricted using the `domains` attribute +array specifying endpoint privileges. +A Domain object specifices the ACLs applied for each [Endpoint](#objecttype-endpoint). + +The following example assigns the domain `dmz-db` to the service `dmz-oracledb`. Endpoint +`icinga-node-dmz-1` does not allow any object modification (no commands, check results) and only +relays local messages to the remote node(s). The endpoint `icinga-node-dmz-2` processes all +messages read and write (accept check results, commands and also relay messages to remote +nodes). + +That way the service `dmz-oracledb` on endpoint `icinga-node-dmz-1` will not be modified +by any cluster event message, and could be checked by the local authority too presenting +a different state history. `icinga-node-dmz-2` still receives all cluster message updates +from the `icinga-node-dmz-1` endpoint. + + object Host "dmz-host1" inherits "generic-host" { + services["dmz-oracledb"] = { + templates = [ "generic-service" ], + domains = [ "dmz-db" ], + authorities = [ "icinga-node-dmz-1", "icinga-node-dmz-2"], + } + } + + object Domain "dmz-db" { + acl = { + icinga-node-dmz-1 = (DomainPrivReadOnly), + icinga-node-dmz-2 = (DomainPrivReadWrite) + } + } \ No newline at end of file diff --git a/doc/6.06-dependencies.md b/doc/6.06-dependencies.md new file mode 100644 index 000000000..bbbdd94af --- /dev/null +++ b/doc/6.06-dependencies.md @@ -0,0 +1,53 @@ +## Dependencies + +Icinga 2 uses host and service [Dependency](#objecttype-dependency) objects either directly +defined or as inline definition as `dependencies` dictionary. The `parent_host` and `parent_service` +attributes are mandatory, `child_host` and `child_service` attributes are obsolete within +inline definitions in an existing service object or service inline definition. + +A service can depend on a host, and vice versa. A service has an implicit dependency (parent) +to its host. A host to host dependency acts implicit as host parent relation. +When dependencies are calculated, not only the immediate parent is taken into +account but all parents are inherited. + +A common scenario is the Icinga 2 server behind a router. Checking internet +access by pinging the Google DNS server `google-dns` is a common method, but +will fail in case the `dsl-router` host is down. Therefore the example below +defines a host dependency which acts implicit as parent relation too. + +Furthermore the host may be reachable but ping samples are dropped by the +router's firewall. In case the `dsl-router``ping4` service check fails, all +further checks for the `google-dns` `ping4` service should be suppressed. +This is achieved by setting the `disable_checks` attribute to `true`. + + object Host "dsl-router" { + services["ping4"] = { + templates = "generic-service", + check_command = "ping4" + } + + macros = { + address = "192.168.1.1", + }, + } + + object Host "google-dns" { + services["ping4"] = { + templates = "generic-service", + check_command = "ping4", + dependencies["dsl-router-ping4"] = { + parent_host = "dsl-router", + parent_service = "ping4", + disable_checks = true + } + } + + macros = { + address = "8.8.8.8", + }, + + dependencies["dsl-router"] = { + parent_host = "dsl-router" + }, + + } \ No newline at end of file diff --git a/doc/6.07-check-result-freshness.md b/doc/6.07-check-result-freshness.md new file mode 100644 index 000000000..6dac58738 --- /dev/null +++ b/doc/6.07-check-result-freshness.md @@ -0,0 +1,13 @@ +## Check Result Freshness + +In Icinga 2 active check freshness is enabled by default. It is determined by the +`check_interval` attribute and no incoming check results in that period of time. + + threshold = last check execution time + check interval + +Passive check freshness is calculated from the `check_interval` attribute if set. + + threshold = last check result time + check interval + +If the freshness checks are invalid, a new check is executed defined by the +`check_command` attribute. \ No newline at end of file diff --git a/doc/6.08-check-flapping.md b/doc/6.08-check-flapping.md new file mode 100644 index 000000000..3381b4b62 --- /dev/null +++ b/doc/6.08-check-flapping.md @@ -0,0 +1,11 @@ +## Check Flapping + +The flapping algorithm used in Icinga 2 does not store the past states but +calculcates the flapping threshold from a single value based on counters and +half-life values. Icinga 2 compares the value with a single flapping threshold +configuration attribute named `flapping_threshold`. + +> **Note** +> +> Flapping must be explicitely enabled setting the `Service` object attribute +> `enable_flapping = 1`. \ No newline at end of file diff --git a/doc/6.09-volatile-services.md b/doc/6.09-volatile-services.md new file mode 100644 index 000000000..3cf0b5e2b --- /dev/null +++ b/doc/6.09-volatile-services.md @@ -0,0 +1,12 @@ +## Volatile Services + +By default all services remain in a non-volatile state. When a problem +occurs, the `SOFT` state applies and once `max_check_attempts` attribute +is reached with the check counter, a `HARD` state transition happens. +Notifications are only triggered by `HARD` state changes and are then +re-sent defined by the `notification_interval` attribute. + +It may be reasonable to have a volatile service which stays in a `HARD` +state type if the service stays in a `NOT-OK` state. That way each +service recheck will automatically trigger a notification unless the +service is acknowledged or in a scheduled downtime. \ No newline at end of file diff --git a/doc/6.10-modified-attributes.md b/doc/6.10-modified-attributes.md new file mode 100644 index 000000000..d3229a408 --- /dev/null +++ b/doc/6.10-modified-attributes.md @@ -0,0 +1,8 @@ +## Modified Attributes + +Icinga 2 allows you to modify defined object attributes at runtime different to +the local configuration object attributes. These modified attributes are +stored as bit-shifted-value and made available in backends. Icinga 2 stores +modified attributes in its state file and restores them on restart. + +Modified Attributes can be reset using external commands. \ No newline at end of file diff --git a/doc/6.11-plugin-api.md b/doc/6.11-plugin-api.md new file mode 100644 index 000000000..c5b61cc7f --- /dev/null +++ b/doc/6.11-plugin-api.md @@ -0,0 +1,10 @@ +## Plugin API + +Currently the native plugin api inherited from the `Monitoring Plugins` (former +`Nagios Plugins`) project is available. +Future specifications will be documented here. + +### Monitoring Plugin API + +The `Monitoring Plugin API` (former `Nagios Plugin API`) is defined in the +[Monitoring Plugins Development Guidelines](https://www.monitoring-plugins.org/doc/guidelines.html). \ No newline at end of file -- 2.40.0