From 07790e456b6993a272a7e7a91b33b05823492959 Mon Sep 17 00:00:00 2001 From: Michael Friedrich Date: Wed, 8 May 2019 17:48:13 +0200 Subject: [PATCH] Docs: Improve features chapter and add details on HA setups refs #4855 --- doc/13-addons.md | 2 +- doc/14-features.md | 333 ++++++++++++++++++++++++++++++++------------- 2 files changed, 239 insertions(+), 96 deletions(-) diff --git a/doc/13-addons.md b/doc/13-addons.md index 760fe4ccd..01bdfee3f 100644 --- a/doc/13-addons.md +++ b/doc/13-addons.md @@ -60,7 +60,7 @@ Use your distribution's package manager to install the `pnp4nagios` package. If you're planning to use it, configure it to use the [bulk mode with npcd and npcdmod](https://docs.pnp4nagios.org/pnp-0.6/modes#bulk_mode_with_npcd_and_npcdmod) -in combination with Icinga 2's [PerfdataWriter](14-features.md#performance-data). NPCD collects the performance +in combination with Icinga 2's [PerfdataWriter](14-features.md#writing-performance-data-files). NPCD collects the performance data files which Icinga 2 generates. Enable performance data writer in icinga 2 diff --git a/doc/14-features.md b/doc/14-features.md index 3bdbd6743..dff90cd9b 100644 --- a/doc/14-features.md +++ b/doc/14-features.md @@ -38,7 +38,13 @@ files then: By default, log files will be rotated daily. -## DB IDO +## Core Backends + +### REST API + +The REST API is documented [here](12-icinga2-api.md#icinga2-api) as a core feature. + +### IDO Database (DB IDO) The IDO (Icinga Data Output) feature for Icinga 2 takes care of exporting all configuration and status information into a database. The IDO database is used @@ -49,10 +55,8 @@ chapter. Details on the configuration can be found in the [IdoMysqlConnection](09-object-types.md#objecttype-idomysqlconnection) and [IdoPgsqlConnection](09-object-types.md#objecttype-idopgsqlconnection) object configuration documentation. -The DB IDO feature supports [High Availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-db-ido) in -the Icinga 2 cluster. -### DB IDO Health +#### DB IDO Health If the monitoring health indicator is critical in Icinga Web 2, you can use the following queries to manually check whether Icinga 2 @@ -100,7 +104,21 @@ status_update_time A detailed list on the available table attributes can be found in the [DB IDO Schema documentation](24-appendix.md#schema-db-ido). -### DB IDO Cleanup +#### DB IDO in Cluster HA Zones + +The DB IDO feature supports [High Availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-db-ido) in +the Icinga 2 cluster. + +By default, both endpoints in a zone calculate the +endpoint which activates the feature, the other endpoint +automatically pauses it. If the cluster connection +breaks at some point, the paused IDO feature automatically +does a failover. + +You can disable this behaviour by setting `enable_ha = false` +in both feature configuration files. + +#### DB IDO Cleanup Objects get deactivated when they are deleted from the configuration. This is visible with the `is_active` column in the `icinga_objects` table. @@ -125,7 +143,7 @@ Example if you prefer to keep notification history for 30 days: The historical tables are populated depending on the data `categories` specified. Some tables are empty by default. -### DB IDO Tuning +#### DB IDO Tuning As with any application database, there are ways to optimize and tune the database performance. @@ -171,98 +189,30 @@ VACUUM > Don't use `VACUUM FULL` as this has a severe impact on performance. -## External Commands - -> **Note** -> -> Please use the [REST API](12-icinga2-api.md#icinga2-api) as modern and secure alternative -> for external actions. - -> **Note** -> -> This feature is DEPRECATED and will be removed in future releases. -> Check the [roadmap](https://github.com/Icinga/icinga2/milestones). - -Icinga 2 provides an external command pipe for processing commands -triggering specific actions (for example rescheduling a service check -through the web interface). - -In order to enable the `ExternalCommandListener` configuration use the -following command and restart Icinga 2 afterwards: +## Metrics -``` -# icinga2 feature enable command -``` +Whenever a host or service check is executed, or received via the REST API, +best practice is to provide performance data. -Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd` -using the default configuration. +This data is parsed by features sending metrics to time series databases (TSDB): -Web interfaces and other Icinga addons are able to send commands to -Icinga 2 through the external command pipe, for example for rescheduling -a forced service check: +* [Graphite](14-features.md#graphite-carbon-cache-writer) +* [InfluxDB](14-features.md#influxdb-writer) +* [OpenTSDB](14-features.md#opentsdb-writer) -``` -# /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd +Metrics, state changes and notifications can be managed with the following integrations: -# tail -f /var/log/messages +* [Elastic Stack](14-features.md#elastic-stack-integration) +* [Graylog](14-features.md#graylog-integration) -Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885 -Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4' -``` - -A list of currently supported external commands can be found [here](24-appendix.md#external-commands-list-detail). -Detailed information on the commands and their required parameters can be found -on the [Icinga 1.x documentation](https://docs.icinga.com/latest/en/extcommands2.html). +### Graphite Writer -## Performance Data +[Graphite](13-addons.md#addons-graphing-graphite) is a tool stack for storing +metrics and needs to be running prior to enabling the `graphite` feature. -When a host or service check is executed plugins should provide so-called -`performance data`. Next to that additional check performance data -can be fetched using Icinga 2 runtime macros such as the check latency -or the current service state (or additional custom attributes). - -The performance data can be passed to external applications which aggregate and -store them in their backends. These tools usually generate graphs for historical -reporting and trending. - -Well-known addons processing Icinga performance data are [PNP4Nagios](13-addons.md#addons-graphing-pnp), -[Graphite](13-addons.md#addons-graphing-graphite) or [OpenTSDB](14-features.md#opentsdb-writer). - -### Writing Performance Data Files - -PNP4Nagios and Graphios use performance data collector daemons to fetch -the current performance files for their backend updates. - -Therefore the Icinga 2 [PerfdataWriter](09-object-types.md#objecttype-perfdatawriter) -feature allows you to define the output template format for host and services helped -with Icinga 2 runtime vars. - -``` -host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$" -service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.name$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.state_type$" -``` - -The default templates are already provided with the Icinga 2 feature configuration -which can be enabled using - -``` -# icinga2 feature enable perfdata -``` - -By default all performance data files are rotated in a 15 seconds interval into -the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.` and -`service-perfdata.`. -External collectors need to parse the rotated performance data files and then -remove the processed files. - -### Graphite Carbon Cache Writer - -While there are some [Graphite](13-addons.md#addons-graphing-graphite) -collector scripts and daemons like Graphios available for Icinga 1.x it's more -reasonable to directly process the check and plugin performance -in memory in Icinga 2. Once there are new metrics available, Icinga 2 will directly -write them to the defined Graphite Carbon daemon tcp socket. +Icinga 2 writes parsed metrics directly to Graphite's Carbon Cache +TCP port, defaulting to `2003`. You can enable the feature using @@ -273,7 +223,7 @@ You can enable the feature using By default the [GraphiteWriter](09-object-types.md#objecttype-graphitewriter) feature expects the Graphite Carbon Cache to listen at `127.0.0.1` on TCP port `2003`. -#### Current Graphite Schema +#### Graphite Schema The current naming schema is defined as follows. The [Icinga Web 2 Graphite module](https://github.com/icinga/icingaweb2-module-graphite) depends on this schema. @@ -308,7 +258,8 @@ Metric values are stored like this: .perfdata..value ``` -The following characters are escaped in perfdata labels: +The following characters are escaped in performance labels +parsed from plugin output: Character | Escaped character --------------|-------------------------- @@ -317,7 +268,7 @@ The following characters are escaped in perfdata labels: / | _ :: | . -Note that perfdata labels may contain dots (`.`) allowing to +Note that labels may contain dots (`.`) allowing to add more subsequent levels inside the Graphite tree. `::` adds support for [multi performance labels](http://my-plugin.de/wiki/projects/check_multi/configuration/performance) and is therefore replaced by `.`. @@ -369,6 +320,25 @@ pattern = ^icinga2\. retentions = 1m:2d,5m:10d,30m:90d,360m:4y ``` +#### Graphite in Cluster HA Zones + +The Graphite feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features) +in cluster zones since 2.11. + +By default, all endpoints in a zone will activate the feature and start +writing metrics to a Carbon Cache socket. In HA enabled scenarios, +it is possible to set `enable_ha = true` in all feature configuration +files. This allows each endpoint to calculate the feature authority, +and only one endpoint actively writes metrics, the other endpoints +pause the feature. + +When the cluster connection breaks at some point, the remaining endpoint(s) +in that zone will automatically resume the feature. This built-in failover +mechanism ensures that metrics are written even if the cluster fails. + +The recommended way of running Graphite in this scenario is a dedicated server +where Carbon Cache/Relay is running as receiver. + ### InfluxDB Writer @@ -447,6 +417,25 @@ object InfluxdbWriter "influxdb" { } ``` +#### InfluxDB in Cluster HA Zones + +The InfluxDB feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features) +in cluster zones since 2.11. + +By default, all endpoints in a zone will activate the feature and start +writing metrics to the InfluxDB HTTP API. In HA enabled scenarios, +it is possible to set `enable_ha = true` in all feature configuration +files. This allows each endpoint to calculate the feature authority, +and only one endpoint actively writes metrics, the other endpoints +pause the feature. + +When the cluster connection breaks at some point, the remaining endpoint(s) +in that zone will automatically resume the feature. This built-in failover +mechanism ensures that metrics are written even if the cluster fails. + +The recommended way of running InfluxDB in this scenario is a dedicated server +where the InfluxDB HTTP API or Telegraf as Proxy are running. + ### Elastic Stack Integration [Icingabeat](https://github.com/icinga/icingabeat) is an Elastic Beat that fetches data @@ -524,6 +513,26 @@ check_result.perfdata..warn check_result.perfdata..crit ``` +#### Elasticsearch in Cluster HA Zones + +The Elasticsearch feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features) +in cluster zones since 2.11. + +By default, all endpoints in a zone will activate the feature and start +writing events to the Elasticsearch HTTP API. In HA enabled scenarios, +it is possible to set `enable_ha = true` in all feature configuration +files. This allows each endpoint to calculate the feature authority, +and only one endpoint actively writes events, the other endpoints +pause the feature. + +When the cluster connection breaks at some point, the remaining endpoint(s) +in that zone will automatically resume the feature. This built-in failover +mechanism ensures that events are written even if the cluster fails. + +The recommended way of running Elasticsearch in this scenario is a dedicated server +where you either have the Elasticsearch HTTP API, or a TLS secured HTTP proxy, +or Logstash for additional filtering. + ### Graylog Integration #### GELF Writer @@ -550,6 +559,24 @@ Currently these events are processed: * State changes * Notifications +#### Graylog/GELF in Cluster HA Zones + +The Gelf feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features) +in cluster zones since 2.11. + +By default, all endpoints in a zone will activate the feature and start +writing events to the Graylog HTTP API. In HA enabled scenarios, +it is possible to set `enable_ha = true` in all feature configuration +files. This allows each endpoint to calculate the feature authority, +and only one endpoint actively writes events, the other endpoints +pause the feature. + +When the cluster connection breaks at some point, the remaining endpoint(s) +in that zone will automatically resume the feature. This built-in failover +mechanism ensures that events are written even if the cluster fails. + +The recommended way of running Graylog in this scenario is a dedicated server +where you have the Graylog HTTP API listening. ### OpenTSDB Writer @@ -625,6 +652,75 @@ with the following tags > You might want to set the tsd.core.auto_create_metrics setting to `true` > in your opentsdb.conf configuration file. +#### OpenTSDB in Cluster HA Zones + +The OpenTSDB feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features) +in cluster zones since 2.11. + +By default, all endpoints in a zone will activate the feature and start +writing events to the OpenTSDB listener. In HA enabled scenarios, +it is possible to set `enable_ha = true` in all feature configuration +files. This allows each endpoint to calculate the feature authority, +and only one endpoint actively writes metrics, the other endpoints +pause the feature. + +When the cluster connection breaks at some point, the remaining endpoint(s) +in that zone will automatically resume the feature. This built-in failover +mechanism ensures that metrics are written even if the cluster fails. + +The recommended way of running OpenTSDB in this scenario is a dedicated server +where you have OpenTSDB running. + + +### Writing Performance Data Files + +PNP and Graphios use performance data collector daemons to fetch +the current performance files for their backend updates. + +Therefore the Icinga 2 [PerfdataWriter](09-object-types.md#objecttype-perfdatawriter) +feature allows you to define the output template format for host and services helped +with Icinga 2 runtime vars. + +``` +host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$" +service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.name$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.state_type$" +``` + +The default templates are already provided with the Icinga 2 feature configuration +which can be enabled using + +``` +# icinga2 feature enable perfdata +``` + +By default all performance data files are rotated in a 15 seconds interval into +the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.` and +`service-perfdata.`. +External collectors need to parse the rotated performance data files and then +remove the processed files. + +#### Perfdata Files in Cluster HA Zones + +The Perfdata feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features) +in cluster zones since 2.11. + +By default, all endpoints in a zone will activate the feature and start +writing metrics to the local spool directory. In HA enabled scenarios, +it is possible to set `enable_ha = true` in all feature configuration +files. This allows each endpoint to calculate the feature authority, +and only one endpoint actively writes metrics, the other endpoints +pause the feature. + +When the cluster connection breaks at some point, the remaining endpoint(s) +in that zone will automatically resume the feature. This built-in failover +mechanism ensures that metrics are written even if the cluster fails. + +The recommended way of running Perfdata is to mount the perfdata spool +directory via NFS on a central server where PNP with the NPCD collector +is running on. + + + ## Livestatus @@ -831,7 +927,9 @@ The `commands` table is populated with `CheckCommand`, `EventCommand` and `Notif A detailed list on the available table attributes can be found in the [Livestatus Schema documentation](24-appendix.md#schema-livestatus). -## Status Data Files +## Deprecated Features + +### Status Data Files > **Note** > @@ -850,7 +948,7 @@ status updates in a regular interval. If you are not using any web interface or addon which uses these files, you can safely disable this feature. -## Compat Log Files +### Compat Log Files > **Note** > @@ -876,7 +974,52 @@ By default, the Icinga 1.x log file called `icinga.log` is located in `/var/log/icinga2/compat`. Rotated log files are moved into `var/log/icinga2/compat/archives`. -## Check Result Files +### External Command Pipe + +> **Note** +> +> Please use the [REST API](12-icinga2-api.md#icinga2-api) as modern and secure alternative +> for external actions. + +> **Note** +> +> This feature is DEPRECATED and will be removed in future releases. +> Check the [roadmap](https://github.com/Icinga/icinga2/milestones). + +Icinga 2 provides an external command pipe for processing commands +triggering specific actions (for example rescheduling a service check +through the web interface). + +In order to enable the `ExternalCommandListener` configuration use the +following command and restart Icinga 2 afterwards: + +``` +# icinga2 feature enable command +``` + +Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd` +using the default configuration. + +Web interfaces and other Icinga addons are able to send commands to +Icinga 2 through the external command pipe, for example for rescheduling +a forced service check: + +``` +# /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd + +# tail -f /var/log/messages + +Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885 +Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4' +``` + +A list of currently supported external commands can be found [here](24-appendix.md#external-commands-list-detail). + +Detailed information on the commands and their required parameters can be found +on the [Icinga 1.x documentation](https://docs.icinga.com/latest/en/extcommands2.html). + + +### Check Result Files > **Note** > -- 2.40.0