granicus.if.org Git - icinga2/blob - doc/05-service-monitoring.md

   1 # Service Monitoring <a id="service-monitoring"></a>
   2
   3 The power of Icinga 2 lies in its modularity. There are thousands of
   4 community plugins available next to the standard plugins provided by
   5 the [Monitoring Plugins project](https://www.monitoring-plugins.org).
   6
   7 Start your research on [Icinga Exchange](https://exchange.icinga.com)
   8 and look which services are already [covered](05-service-monitoring.md#service-monitoring-overview).
   9
  10 The [requirements chapter](05-service-monitoring.md#service-monitoring-requirements) guides you
  11 through the plugin setup, tests and their integration with an [existing](05-service-monitoring.md#service-monitoring-plugin-checkcommand)
  12 or [new](05-service-monitoring.md#service-monitoring-plugin-checkcommand-new) CheckCommand object
  13 and host/service objects inside the [Director](05-service-monitoring.md#service-monitoring-plugin-checkcommand-integration-director)
  14 or [Icinga config files](05-service-monitoring.md#service-monitoring-plugin-checkcommand-integration-config-files).
  15 It also adds hints on [modifying](05-service-monitoring.md#service-monitoring-plugin-checkcommand-modify) existing commands.
  16
  17 Plugins follow the [Plugin API specification](05-service-monitoring.md#service-monitoring-plugin-api)
  18 which is enriched with examples and also code examples to get you started with
  19 [your own plugin](05-service-monitoring.md#service-monitoring-plugin-new).
  20
  21
  22
  23 ## Requirements <a id="service-monitoring-requirements"></a>
  24
  25 ### Plugins <a id="service-monitoring-plugins"></a>
  26
  27 All existing Icinga or Nagios plugins work with Icinga 2. Community
  28 plugins can be found for example on [Icinga Exchange](https://exchange.icinga.com).
  29
  30 The recommended way of setting up these plugins is to copy them
  31 into the `PluginDir` directory.
  32
  33 If you have plugins with many dependencies, consider creating a
  34 custom RPM/DEB package which handles the required libraries and binaries.
  35
  36 Configuration management tools such as Puppet, Ansible, Chef or Saltstack
  37 also help with automatically installing the plugins on different
  38 operating systems. They can also help with installing the required
  39 dependencies, e.g. Python libraries, Perl modules, etc.
  40
  41 ### Plugin Setup <a id="service-monitoring-plugins-setup"></a>
  42
  43 Good plugins provide installations and configuration instructions
  44 in their docs and/or README on GitHub.
  45
  46 Sometimes dependencies are not listed, or your distribution differs from the one
  47 described. Try running the plugin after setup and [ensure it works](05-service-monitoring.md#service-monitoring-plugins-it-works).
  48
  49 #### Ensure it works <a id="service-monitoring-plugins-it-works"></a>
  50
  51 Prior to using the check plugin with Icinga 2 you should ensure that it is working properly
  52 by trying to run it on the console using whichever user Icinga 2 is running as:
  53
  54 RHEL/CentOS/Fedora
  55
  56 ```
  57 sudo -u icinga /usr/lib64/nagios/plugins/check_mysql_health --help
  58 ```
  59
  60 Debian/Ubuntu
  61
  62 ```
  63 sudo -u nagios /usr/lib/nagios/plugins/check_mysql_health --help
  64 ```
  65
  66 Additional libraries may be required for some plugins. Please consult the plugin
  67 documentation and/or the included README file for installation instructions.
  68 Sometimes plugins contain hard-coded paths to other components. Instead of changing
  69 the plugin it might be easier to create a symbolic link to make sure it doesn't get
  70 overwritten during the next update.
  71
  72 Sometimes there are plugins which do not exactly fit your requirements.
  73 In that case you can modify an existing plugin or just write your own.
  74
  75 #### Plugin Dependency Errors <a id="service-monitoring-plugins-setup-dependency-errors"></a>
  76
  77 Plugins can be scripts (Shell, Python, Perl, Ruby, PHP, etc.)
  78 or compiled binaries (C, C++, Go).
  79
  80 These scripts/binaries may require additional libraries
  81 which must be installed on every system they are executed.
  82
  83 > **Tip**
  84 >
  85 > Don't test the plugins on your master instance, instead
  86 > do that on the satellites and clients which execute the
  87 > checks.
  88
  89 There are errors, now what? Typical errors are missing libraries,
  90 binaries or packages.
  91
  92 ##### Python Example
  93
  94 Example for a Python plugin which uses the `tinkerforge` module
  95 to query a network service:
  96
  97 ```
  98 ImportError: No module named tinkerforge.ip_connection
  99 ```
 100
 101 Its [documentation](https://github.com/NETWAYS/check_tinkerforge#installation)
 102 points to installing the `tinkerforge` Python module.
 103
 104 ##### Perl Example
 105
 106 Example for a Perl plugin which uses SNMP:
 107
 108 ```
 109 Can't locate Net/SNMP.pm in @INC (you may need to install the Net::SNMP module)
 110 ```
 111
 112 Prior to installing the Perl module via CPAN, look for a distribution
 113 specific package, e.g. `libnet-snmp-perl` on Debian/Ubuntu or `perl-Net-SNMP`
 114 on RHEL/CentOS.
 115
 116
 117 #### Optional: Custom Path <a id="service-monitoring-plugins-custom-path"></a>
 118
 119 If you are not using the default `PluginDir` directory, you
 120 can create a custom plugin directory and constant
 121 and reference this in the created CheckCommand objects.
 122
 123 Create a common directory e.g. `/opt/monitoring/plugins`
 124 and install the plugin there.
 125
 126 ```
 127 mkdir -p /opt/monitoring/plugins
 128 cp check_snmp_int.pl /opt/monitoring/plugins
 129 chmod +x /opt/monitoring/plugins/check_snmp_int.pl
 130 ```
 131
 132 Next create a new global constant, e.g. `CustomPluginDir`
 133 in your [constants.conf](04-configuration.md#constants-conf)
 134 configuration file:
 135
 136 ```
 137 vim /etc/icinga2/constants.conf
 138
 139 const PluginDir = "/usr/lib/nagios/plugins"
 140 const CustomPluginDir = "/opt/monitoring/plugins"
 141 ```
 142
 143 ### CheckCommand Definition <a id="service-monitoring-plugin-checkcommand"></a>
 144
 145 Each plugin requires a [CheckCommand](09-object-types.md#objecttype-checkcommand) object in your
 146 configuration which can be used in the [Service](09-object-types.md#objecttype-service) or
 147 [Host](09-object-types.md#objecttype-host) object definition.
 148
 149 Please check if the Icinga 2 package already provides an
 150 [existing CheckCommand definition](10-icinga-template-library.md#icinga-template-library).
 151
 152 If that's the case, thoroughly check the required parameters and integrate the check command
 153 into your host and service objects. Best practice is to run the plugin on the CLI
 154 with the required parameters first.
 155
 156 Example for database size checks with [check_mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health).
 157
 158 ```
 159 /usr/lib64/nagios/plugins/check_mysql_health --hostname '127.0.0.1' --username root --password icingar0xx --mode sql --name 'select sum(data_length + index_length) / 1024 / 1024 from information_schema.tables where table_schema = '\''icinga'\'';' '--name2' 'db_size' --units 'MB' --warning 4096 --critical 8192
 160 ```
 161
 162 The parameter names inside the ITL commands follow the
 163 `<command name>_<parameter name>` schema.
 164
 165 #### Icinga Director Integration <a id="service-monitoring-plugin-checkcommand-integration-director"></a>
 166
 167 Navigate into `Commands > External Commands` and search for `mysql_health`.
 168 Select `mysql_health` and navigate into the `Fields` tab.
 169
 170 In order to access the parameters, the Director requires you to first
 171 define the needed custom data fields:
 172
 173 * `mysql_health_hostname`
 174 * `mysql_health_username` and `mysql_health_password`
 175 * `mysql_health_mode`
 176 * `mysql_health_name`, `mysql_health_name2` and `mysql_health_units`
 177 * `mysql_health_warning` and `mysql_health_critical`
 178
 179 Create a new host template and object where you'll generic
 180 settings like `mysql_health_hostname` (if it differs from the host's
 181 `address` attribute) and `mysql_health_username` and `mysql_health_password`.
 182
 183 Create a new service template for `mysql-health` and set the `mysql_health`
 184 as check command. You can also define a default for `mysql_health_mode`.
 185
 186 Next, create a service apply rule or a new service set which gets assigned
 187 to matching host objects.
 188
 189
 190 #### Icinga Config File Integration <a id="service-monitoring-plugin-checkcommand-integration-config-files"></a>
 191
 192 Create or modify a host object which stores
 193 the generic database defaults and prepares details
 194 for a service apply for rule.
 195
 196 ```
 197 object Host "icinga2-master1.localdomain" {
 198   check_command = "hostalive"
 199   address = "..."
 200
 201   // Database listens locally, not external
 202   vars.mysql_health_hostname = "127.0.0.1"
 203
 204   // Basic database size checks for Icinga DBs
 205   vars.databases["icinga"] = {
 206     mysql_health_warning = 4096 //MB
 207     mysql_health_critical = 8192 //MB
 208   }
 209   vars.databases["icingaweb2"] = {
 210     mysql_health_warning = 4096 //MB
 211     mysql_health_critical = 8192 //MB
 212   }
 213 }
 214 ```
 215
 216 The host object prepares the database details and thresholds already
 217 for advanced [apply for](03-monitoring-basics.md#using-apply-for) rules. It also uses
 218 conditions to fetch host specified values, or set default values.
 219
 220 ```
 221 apply Service "db-size-" for (db_name => config in host.vars.databases) {
 222   check_interval = 1m
 223   retry_interval = 30s
 224
 225   check_command = "mysql_health"
 226
 227   if (config.mysql_health_username) {
 228     vars.mysql_healt_username = config.mysql_health_username
 229   } else {
 230     vars.mysql_health_username = "root"
 231   }
 232   if (config.mysql_health_password) {
 233     vars.mysql_healt_password = config.mysql_health_password
 234   } else {
 235     vars.mysql_health_password = "icingar0xx"
 236   }
 237
 238   vars.mysql_health_mode = "sql"
 239   vars.mysql_health_name = "select sum(data_length + index_length) / 1024 / 1024 from information_schema.tables where table_schema = '" + db_name + "';"
 240   vars.mysql_health_name2 = "db_size"
 241   vars.mysql_health_units = "MB"
 242
 243   if (config.mysql_health_warning) {
 244     vars.mysql_health_warning = config.mysql_health_warning
 245   }
 246   if (config.mysql_health_critical) {
 247     vars.mysql_health_critical = config.mysql_health_critical
 248   }
 249
 250   vars += config
 251 }
 252 ```
 253
 254 #### New CheckCommand <a id="service-monitoring-plugin-checkcommand-new"></a>
 255
 256 This chapter describes how to add a new CheckCommand object for a plugin.
 257
 258 Please make sure to follow these conventions when adding a new command object definition:
 259
 260 * Use [command arguments](03-monitoring-basics.md#command-arguments) whenever possible. The `command` attribute
 261 must be an array in `[ ... ]` for shell escaping.
 262 * Define a unique `prefix` for the command's specific arguments. Best practice is to follow this schema:
 263
 264 ```
 265 <command name>_<parameter name>
 266 ```
 267
 268 That way you can safely set them on host/service level and you'll always know which command they control.
 269 * Use command argument default values, e.g. for thresholds.
 270 * Use [advanced conditions](09-object-types.md#objecttype-checkcommand) like `set_if` definitions.
 271
 272 Before starting with the CheckCommand definition, please check
 273 the existing objects available inside the ITL. They follow best
 274 practices and are maintained by developers and our community.
 275
 276 This example picks a new plugin called [check_systemd](https://exchange.icinga.com/joseffriedrich/check_systemd)
 277 uploaded to Icinga Exchange in June 2019.
 278
 279 First, [install](05-service-monitoring.md#service-monitoring-plugins-setup) the plugin and ensure
 280 that [it works](05-service-monitoring.md#service-monitoring-plugins-it-works). Then run it with the
 281 `--help` parameter to see the actual parameters (docs might be outdated).
 282
 283 ```
 284 ./check_systemd.py --help
 285
 286 usage: check_systemd.py [-h] [-c SECONDS] [-e UNIT | -u UNIT] [-v] [-V]
 287                         [-w SECONDS]
 288
 289 ...
 290
 291 optional arguments:
 292   -h, --help            show this help message and exit
 293   -c SECONDS, --critical SECONDS
 294                         Startup time in seconds to result in critical status.
 295   -e UNIT, --exclude UNIT
 296                         Exclude a systemd unit from the checks. This option
 297                         can be applied multiple times. For example: -e mnt-
 298                         data.mount -e task.service.
 299   -u UNIT, --unit UNIT  Name of the systemd unit that is beeing tested.
 300   -v, --verbose         Increase output verbosity (use up to 3 times).
 301   -V, --version         show program's version number and exit
 302   -w SECONDS, --warning SECONDS
 303                         Startup time in seconds to result in warning status.
 304 ```
 305
 306 The argument description is important, based on this you need to create the
 307 command arguments.
 308
 309 > **Tip**
 310 >
 311 > When you are using the Director, you can prepare the commands as files
 312 > e.g. inside the `global-templates` zone. Then run the kickstart wizard
 313 > again to import the commands as external reference.
 314 >
 315 > If you prefer to use the Director GUI/CLI, please apply the steps
 316 > in the `Add Command` form.
 317
 318 Start with the basic plugin call without any parameters.
 319
 320 ```
 321 object CheckCommand "systemd" { // Plugin name without 'check_' prefix
 322   command = [ PluginContribDir + "/check_systemd.py" ] // Use the 'PluginContribDir' constant, see the contributed ITL commands
 323 }
 324 ```
 325
 326 Run a config validation to see if that works, `icinga2 daemon -C`
 327
 328 Next, analyse the plugin parameters. Plugins with a good help output show
 329 optional parameters in square brackes. This is the case for all parameters
 330 for this plugin. If there are required parameters, use the `required` key
 331 inside the argument.
 332
 333 The `arguments` attribute is a dictionary which takes the parameters as keys.
 334
 335 ```
 336   arguments = {
 337     "--unit" = { ... }
 338   }
 339 ```
 340
 341 If there a long parameter names available, prefer them. This increases
 342 readability in both the configuration as well as the executed command line.
 343
 344 The argument value itself is a sub dictionary which has additional keys:
 345
 346 * `value` which references the runtime macro string
 347 * `description` where you copy the plugin parameter help text into
 348 * `required`, `set_if`, etc. for advanced parameters, check the [CheckCommand object](09-object-types.md#objecttype-checkcommand) chapter.
 349
 350 The runtime macro syntax is required to allow value extraction when
 351 the command is executed.
 352
 353 > **Tip**
 354 >
 355 > Inside the Director, store the new command first in order to
 356 > unveil the `Arguments` tab.
 357
 358 Best practice is to use the command name as prefix, in this specific
 359 case e.g. `systemd_unit`.
 360
 361 ```
 362   arguments = {
 363     "--unit" = {
 364       value = "$systemd_unit$" // The service parameter would then be defined as 'vars.systemd_unit = "icinga2"'
 365       description = "Name of the systemd unit that is beeing tested."
 366     }
 367     "--warning" = {
 368       value = "$systemd_warning$"
 369       description = "Startup time in seconds to result in warning status."
 370     }
 371     "--critical" = {
 372       value = "$systemd_critical$"
 373       description = "Startup time in seconds to result in critical status."
 374     }
 375   }
 376 ```
 377
 378 This may take a while -- validate the configuration in between up until
 379 the CheckCommand definition is done.
 380
 381 Then test and integrate it into your monitoring configuration.
 382
 383 Remember: Do it once and right, and never touch the CheckCommand again.
 384 Optional arguments allow different use cases and scenarios.
 385
 386
 387 Once you have created your really good CheckCommand, please consider
 388 sharing it with our community by creating a new PR on [GitHub](https://github.com/Icinga/icinga2/blob/master/CONTRIBUTING.md).
 389 _Please also update the documentation for the ITL._
 390
 391
 392 > **Tip**
 393 >
 394 > Inside the Director, you can render the configuration in the Deployment
 395 > section. Extract the static configuration object and use that as a source
 396 > for sending it upstream.
 397
 398
 399
 400 #### Modify Existing CheckCommand <a id="service-monitoring-plugin-checkcommand-modify"></a>
 401
 402 Sometimes an existing CheckCommand inside the ITL is missing a parameter.
 403 Or you don't need a default parameter value being set.
 404
 405 Instead of copying the entire configuration object, you can import
 406 an object into another new object.
 407
 408 ```
 409 object CheckCommand "http-custom" {
 410   import "http" // Import existing http object
 411
 412   arguments += { // Use additive assignment to add missing parameters
 413     "--key" = {
 414       value = "$http_..." // Keep the parameter name the same as with http
 415     }
 416   }
 417
 418   // Override default parameters
 419   vars.http_address = "..."
 420 }
 421 ```
 422
 423 This CheckCommand can then be referenced in your host/service object
 424 definitions.
 425
 426
 427 ### Plugin API <a id="service-monitoring-plugin-api"></a>
 428
 429 Icinga 2 supports the native plugin API specification from the Monitoring Plugins project.
 430 It is defined in the [Monitoring Plugins](https://www.monitoring-plugins.org) guidelines.
 431
 432 The Icinga documentation revamps the specification into our
 433 own guideline enriched with examples and best practices.
 434
 435 #### Output <a id="service-monitoring-plugin-api-output"></a>
 436
 437 The output should be as short and as detailed as possible. The
 438 most common cases include:
 439
 440 - Viewing a problem list in Icinga Web and dashboards
 441 - Getting paged about a problem
 442 - Receiving the alert on the CLI or forwarding it to external (ticket) systems
 443
 444 Examples:
 445
 446 ```
 447 <STATUS>: <A short description what happened>
 448
 449 OK: MySQL connection time is fine (0.0002s)
 450 WARNING: MySQL connection time is slow (0.5s > 0.1s threshold)
 451 CRITICAL: MySQL connection time is causing degraded performance (3s > 0.5s threshold)
 452 ```
 453
 454 Icinga supports reading multi-line output where Icinga Web
 455 only shows the first line in the listings and everything in the detail view.
 456
 457 Example for an end2end check with many smaller test cases integrated:
 458
 459 ```
 460 OK: Online banking works.
 461 Testcase 1: Site reached.
 462 Testcase 2: Attempted login, JS loads.
 463 Testcase 3: Login succeeded.
 464 Testcase 4: View current state works.
 465 Testcase 5: Transactions fine.
 466 ```
 467
 468 If the extended output shouldn't be visible in your monitoring, but only for testing,
 469 it is recommended to implement the `--verbose` plugin parameter to allow
 470 developers and users to debug further. Check [here](05-service-monitoring.md#service-monitoring-plugin-api-verbose)
 471 for more implementation tips.
 472
 473 > **Tip**
 474 >
 475 > More debug output also helps when implementing your plugin.
 476 >
 477 > Best practice is to have the plugin parameter and handling implemented first,
 478 > then add it anywhere you want to see more, e.g. from initial database connections
 479 > to actual query results.
 480
 481
 482 #### Status <a id="service-monitoring-plugin-api-status"></a>
 483
 484 Value | Status    | Description
 485 ------|-----------|-------------------------------
 486 0     | OK        | The check went fine and everything is considered working.
 487 1     | Warning   | The check is above the given warning threshold, or anything else is suspicious requiring attention before it breaks.
 488 2     | Critical  | The check exceeded the critical threshold, or something really is broken and will harm the production environment.
 489 3     | Unknown   | Invalid parameters, low level resource errors (IO device busy, no fork resources, TCP sockets, etc.) preventing the actual check. Higher level errors such as DNS resolving, TCP connection timeouts should be treated as `Critical` instead. Whenever the plugin reaches its timeout (best practice) it should also terminate with `Unknown`.
 490
 491 Keep in mind that these are service states. Icinga automatically maps
 492 the [host state](03-monitoring-basics.md#check-result-state-mapping) from the returned plugin states.
 493
 494 #### Thresholds <a id="service-monitoring-plugin-api-thresholds"></a>
 495
 496 A plugin calculates specific values and may decide about the exit state on its own.
 497 This is done with thresholds - warning and critical values which are compared with
 498 the actual value. Upon this logic, the exit state is determined.
 499
 500 Imagine the following value and defined thresholds:
 501
 502 ```
 503 ptc_value = 57.8
 504
 505 warning = 50
 506 critical = 60
 507 ```
 508
 509 Whenever `ptc_value` is higher than warning or critical, it should return
 510 the appropriate [state](05-service-monitoring.md#service-monitoring-plugin-api-status).
 511
 512 The threshold evaluation order also is important:
 513
 514 * Critical thresholds are evaluated first and superseed everything else.
 515 * Warning thresholds are evaluated second
 516 * If no threshold is matched, return the OK state
 517
 518 Avoid using hardcoded threshold values in your plugins, always
 519 add them to the argument parser.
 520
 521 Example for Python:
 522
 523 ```
 524 import argparse
 525 import signal
 526 import sys
 527
 528 if __name__ == '__main__':
 529     parser = argparse.ArgumentParser()
 530
 531     parser.add_argument("-w", "--warning", help="Warning threshold. Single value or range, e.g. '20:50'.")
 532     parser.add_argument("-c", "--critical", help="Critical threshold. Single vluae or range, e.g. '25:45'.")
 533
 534     args = parser.parse_args()
 535 ```
 536
 537 Users might call plugins only with the critical threshold parameter,
 538 leaving out the warning parameter. Keep this in mind when evaluating
 539 the thresholds, always check if the parameters have been defined before.
 540
 541 ```
 542     if args.critical:
 543         if ptc_value > args.critical:
 544             print("CRITICAL - ...")
 545             sys.exit(2) # Critical
 546
 547     if args.warning:
 548         if ptc_value > args.warning:
 549             print("WARNING - ...")
 550             sys.exit(1) # Warning
 551
 552     print("OK - ...")
 553     sys.exit(0) # OK
 554 ```
 555
 556 The above is a simplified example for printing the [output](05-service-monitoring.md#service-monitoring-plugin-api-output)
 557 and using the [state](05-service-monitoring.md#service-monitoring-plugin-api-status)
 558 as exit code.
 559
 560 Before diving into the implementation, learn more about required
 561 [performance data metrics](05-service-monitoring.md#service-monitoring-plugin-api-performance-data-metrics)
 562 and more best practices below.
 563
 564 ##### Threshold Ranges <a id="service-monitoring-plugin-api-thresholds-ranges"></a>
 565
 566 Threshold ranges can be used to specify an alert window, e.g. whenever a calculated
 567 value is between a lower and higher critical threshold.
 568
 569 The schema for threshold ranges looks as follows. The `@` character in square brackets
 570 is optional.
 571
 572 ```
 573 [@]start:end
 574 ```
 575
 576 There are a few requirements for ranges:
 577
 578 * `start <= end`. Add a check in your code and let the user know about problematic values.
 579
 580 ```
 581 10:20   # OK
 582
 583 30:10   # Error
 584 ```
 585
 586 * `start:` can be omitted if its value is 0. This is the default handling for single threshold values too.
 587
 588 ```
 589 10      # Every value > 10 and < 0, outside of 0..10
 590 ```
 591
 592 * If `end` is omitted, assume end is infinity.
 593
 594 ```
 595 10:     # < 10, outside of 10..∞
 596 ```
 597
 598 * In order to specify negative infinity, use the `~` character.
 599
 600 ```
 601 ~:10    # > 10, outside of -∞..10
 602 ```
 603
 604 * Raise alert if value is outside of the defined range.
 605
 606 ```
 607 10:20   # < 10 or > 20, outside of 10..20
 608 ```
 609
 610 * Start with `@` to raise an alert if the value is **inside** the defined range, inclusive start/end values.
 611
 612 ```
 613 @10:20  # >= 10 and <= 20, inside of 10..20
 614 ```
 615
 616 Best practice is to either implement single threshold values, or fully support ranges.
 617 This requires parsing the input parameter values, therefore look for existing libraries
 618 already providing this functionality.
 619
 620 [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py)
 621 implements a simple parser to avoid dependencies.
 622
 623
 624 #### Performance Data Metrics <a id="service-monitoring-plugin-api-performance-data-metrics"></a>
 625
 626 Performance data metrics must be appended to the plugin output with a preceding `|` character.
 627 The schema is as follows:
 628
 629 ```
 630 <output> | 'label'=value[UOM];[warn];[crit];[min];[max]
 631 ```
 632
 633 The label should be encapsulated with single quotes. Avoid spaces or special characters such
 634 as `%` in there, this could lead to problems with metric receivers such as Graphite.
 635
 636 Labels must not include `'` and `=` characters. Keep the label length as short and unique as possible.
 637
 638 Example:
 639
 640 ```
 641 'load1'=4.7
 642 ```
 643
 644 Values must respect the C/POSIX locale and not implement e.g. German locale for floating point numbers with `,`.
 645 Icinga sets `LC_NUMERIC=C` to enforce this locale on plugin execution.
 646
 647 ##### Unit of Measurement (UOM) <a id="service-monitoring-plugin-api-performance-data-metrics-uom"></a>
 648
 649 Unit     | Description
 650 ---------|---------------------------------
 651 None     | Integer or floating point number for any type (processes, users, etc.).
 652 `s`      | Seconds, can be `s`, `ms`, `us`.
 653 `%`      | Percentage.
 654 `B`      | Bytes, can be `KB`, `MB`, `GB`, `TB`. Lowercase is also possible.
 655 `c`      | A continuous counter (e.g. interface traffic counters).
 656
 657 Icinga metric writers normalize these values to the lowest common base, e.g. seconds and bytes.
 658 Bad plugins change the UOM for different sizing, e.g. returning the disk usage in MB and later GB
 659 for the same performance data label. This is to ensure that graphs always look the same.
 660
 661 ```
 662 'rta'=12.445000ms 'pl'=0%
 663 ```
 664
 665 ##### Thresholds and Min/Max <a id="service-monitoring-plugin-api-performance-data-metrics-thresholds-min-max"></a>
 666
 667 Next to the performance data value, warn, crit, min, max can optionally be provided. They must be separated
 668 with the semi-colon `;` character. They share the same UOM with the performance data value.
 669
 670 ```
 671 $ check_ping -4 -H icinga.com -c '200,15%' -w '100,5%'
 672
 673 PING OK - Packet loss = 0%, RTA = 12.44 ms|rta=12.445000ms;100.000000;200.000000;0.000000 pl=0%;5;15;0
 674 ```
 675
 676 ##### Multiple Performance Data Values <a id="service-monitoring-plugin-api-performance-data-metrics-multiple"></a>
 677
 678 Multiple performance data values must be joined with a space character. The below example
 679 is from the [check_load](10-icinga-template-library.md#plugin-check-command-load) plugin.
 680
 681 ```
 682 load1=4.680;1.000;2.000;0; load5=0.000;5.000;10.000;0; load15=0.000;10.000;20.000;0;
 683 ```
 684
 685 #### Timeout <a id="service-monitoring-plugin-api-timeout"></a>
 686
 687 Icinga has a safety mechanism where it kills processes running for too
 688 long. The timeout can be specified in [CheckCommand objects](09-object-types.md#objecttype-checkcommand)
 689 or on the host/service object.
 690
 691 Best practice is to control the timeout in the plugin itself
 692 and provide a clear message followed by the Unknown state.
 693
 694 Example in Python taken from [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py):
 695
 696 ```
 697 import argparse
 698 import signal
 699 import sys
 700
 701 def handle_sigalrm(signum, frame, timeout=None):
 702     output('Plugin timed out after %d seconds' % timeout, 3)
 703
 704 if __name__ == '__main__':
 705     parser = argparse.ArgumentParser()
 706     # ... add more arguments
 707     parser.add_argument("-t", "--timeout", help="Timeout in seconds (default 10s)", type=int, default=10)
 708     args = parser.parse_args()
 709
 710     signal.signal(signal.SIGALRM, partial(handle_sigalrm, timeout=args.timeout))
 711     signal.alarm(args.timeout)
 712
 713     # ... perform the check and generate output/status
 714 ```
 715
 716 #### Versions <a id="service-monitoring-plugin-api-versions"></a>
 717
 718 Plugins should provide a version via `-V` or `--version` parameter
 719 which is bumped on releases. This allows to identify problems with
 720 too old or new versions on the community support channels.
 721
 722 Example in Python taken from [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py):
 723
 724 ```
 725 import argparse
 726 import signal
 727 import sys
 728
 729 __version__ = '0.9.1'
 730
 731 if __name__ == '__main__':
 732     parser = argparse.ArgumentParser()
 733
 734     parser.add_argument('-V', '--version', action='version', version='%(prog)s v' + sys.modules[__name__].__version__)
 735 ```
 736
 737 #### Verbose <a id="service-monitoring-plugin-api-verbose"></a>
 738
 739 Plugins should provide a verbose mode with `-v` or `--verbose` in order
 740 to show more detailed log messages. This helps to debug and analyse the
 741 flow and execution steps inside the plugin.
 742
 743 Ensure to add the parameter prior to implementing the check logic into
 744 the plugin.
 745
 746 Example in Python taken from [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py):
 747
 748 ```
 749 import argparse
 750 import signal
 751 import sys
 752
 753 if __name__ == '__main__':
 754     parser = argparse.ArgumentParser()
 755
 756     parser.add_argument('-v', '--verbose', action='store_true')
 757
 758     if args.verbose:
 759         print("Verbose debug output")
 760 ```
 761
 762
 763 ### Create a new Plugin <a id="service-monitoring-plugin-new"></a>
 764
 765 Sometimes an existing plugin does not satisfy your requirements. You
 766 can either kindly contact the original author about plans to add changes
 767 and/or create a patch.
 768
 769 If you just want to format the output and state of an existing plugin
 770 it might also be helpful to write a wrapper script. This script
 771 could pass all configured parameters, call the plugin script, parse
 772 its output/exit code and return your specified output/exit code.
 773
 774 On the other hand plugins for specific services and hardware might not yet
 775 exist.
 776
 777 > **Tip**
 778 >
 779 > Watch this presentation from Icinga Camp Berlin to learn more
 780 > about [How to write checks that don't suck](https://www.youtube.com/watch?v=Ey_APqSCoFQ).
 781
 782 Common best practices:
 783
 784 * Choose the programming language wisely
 785  * Scripting languages (Bash, Python, Perl, Ruby, PHP, etc.) are easier to write and setup but their check execution might take longer (invoking the script interpreter as overhead, etc.).
 786  * Plugins written in C/C++, Go, etc. improve check execution time but may generate an overhead with installation and packaging.
 787 * Use a modern VCS such as Git for developing the plugin, e.g. share your plugin on GitHub and let it sync to [Icinga Exchange](https://exchange.icinga.com).
 788 * **Look into existing plugins endorsed by community members.**
 789
 790 Implementation hints:
 791
 792 * Add parameters with key-value pairs to your plugin. They should allow long names (e.g. `--host localhost`) and also short parameters (e.g. `-H localhost`)
 793  * `-h|--help` should print the version and all details about parameters and runtime invocation. Note: Python's ArgParse class provides this OOTB.
 794  * `--version` should print the plugin [version](05-service-monitoring.md#service-monitoring-plugin-api-versions).
 795 * Add a [verbose/debug output](05-service-monitoring.md#service-monitoring-plugin-api-verbose) functionality for detailed on-demand logging.
 796 * Respect the exit codes required by the [Plugin API](05-service-monitoring.md#service-monitoring-plugin-api).
 797 * Always add [performance data](05-service-monitoring.md#service-monitoring-plugin-api-performance-data-metrics) to your plugin output.
 798 * Allow to specify [warning/critical thresholds](05-service-monitoring.md#service-monitoring-plugin-api-thresholds) as parameters.
 799
 800 Example skeleton:
 801
 802 ```
 803 # 1. include optional libraries
 804 # 2. global variables
 805 # 3. helper functions and/or classes
 806 # 4. define timeout condition
 807
 808 if (<timeout_reached>) then
 809   print "UNKNOWN - Timeout (...) reached | 'time'=30.0
 810 endif
 811
 812 # 5. main method
 813
 814 <execute and fetch data>
 815
 816 if (<threshold_critical_condition>) then
 817   print "CRITICAL - ... | 'time'=0.1 'myperfdatavalue'=5.0
 818   exit(2)
 819 else if (<threshold_warning_condition>) then
 820   print "WARNING - ... | 'time'=0.1 'myperfdatavalue'=3.0
 821   exit(1)
 822 else
 823   print "OK - ... | 'time'=0.2 'myperfdatavalue'=1.0
 824 endif
 825 ```
 826
 827 There are various plugin libraries available which will help
 828 with plugin execution and output formatting too, for example
 829 [nagiosplugin from Python](https://pypi.python.org/pypi/nagiosplugin/).
 830
 831 > **Note**
 832 >
 833 > Ensure to test your plugin properly with special cases before putting it
 834 > into production!
 835
 836 Once you've finished your plugin please upload/sync it to [Icinga Exchange](https://exchange.icinga.com/new).
 837 Thanks in advance!
 838
 839
 840 ## Service Monitoring Overview <a id="service-monitoring-overview"></a>
 841
 842 The following examples should help you to start implementing your own ideas.
 843 There is a variety of plugins available. This collection is not complete --
 844 if you have any updates, please send a documentation patch upstream.
 845
 846 Please visit our [community forum](https://community.icinga.com) which
 847 may provide an answer to your use case already. If not, do not hesitate
 848 to create a new topic.
 849
 850 ### General Monitoring <a id="service-monitoring-general"></a>
 851
 852 If the remote service is available (via a network protocol and port),
 853 and if a check plugin is also available, you don't necessarily need a local client.
 854 Instead, choose a plugin and configure its parameters and thresholds. The following examples are included in the [Icinga 2 Template Library](10-icinga-template-library.md#icinga-template-library):
 855
 856 * [ping4](10-icinga-template-library.md#plugin-check-command-ping4), [ping6](10-icinga-template-library.md#plugin-check-command-ping6),
 857 [fping4](10-icinga-template-library.md#plugin-check-command-fping4), [fping6](10-icinga-template-library.md#plugin-check-command-fping6), [hostalive](10-icinga-template-library.md#plugin-check-command-hostalive)
 858 * [tcp](10-icinga-template-library.md#plugin-check-command-tcp), [udp](10-icinga-template-library.md#plugin-check-command-udp), [ssl](10-icinga-template-library.md#plugin-check-command-ssl)
 859 * [ntp_time](10-icinga-template-library.md#plugin-check-command-ntp-time)
 860
 861 ### Linux Monitoring <a id="service-monitoring-linux"></a>
 862
 863 * [disk](10-icinga-template-library.md#plugin-check-command-disk)
 864 * [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap)
 865 * [procs](10-icinga-template-library.md#plugin-check-command-processes)
 866 * [users](10-icinga-template-library.md#plugin-check-command-users)
 867 * [running_kernel](10-icinga-template-library.md#plugin-contrib-command-running_kernel)
 868 * package management: [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum), etc.
 869 * [ssh](10-icinga-template-library.md#plugin-check-command-ssh)
 870 * performance: [iostat](10-icinga-template-library.md#plugin-contrib-command-iostat), [check_sar_perf](https://github.com/dnsmichi/icinga-plugins/blob/master/scripts/check_sar_perf.py)
 871
 872 ### Windows Monitoring <a id="service-monitoring-windows"></a>
 873
 874 * [check_wmi_plus](http://www.edcint.co.nz/checkwmiplus/)
 875 * [NSClient++](https://www.nsclient.org) (in combination with the Icinga 2 client and either [check_nscp_api](10-icinga-template-library.md#nscp-check-api) or [nscp-local](10-icinga-template-library.md#nscp-plugin-check-commands) check commands)
 876 * [Icinga 2 Windows Plugins](10-icinga-template-library.md#windows-plugins) (disk, load, memory, network, performance counters, ping, procs, service, swap, updates, uptime, users
 877 * vbs and Powershell scripts
 878
 879 ### Database Monitoring <a id="service-monitoring-database"></a>
 880
 881 * MySQL/MariaDB: [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health), [mysql](10-icinga-template-library.md#plugin-check-command-mysql), [mysql_query](10-icinga-template-library.md#plugin-check-command-mysql-query)
 882 * PostgreSQL: [postgres](10-icinga-template-library.md#plugin-contrib-command-postgres)
 883 * Oracle: [oracle_health](10-icinga-template-library.md#plugin-contrib-command-oracle_health)
 884 * MSSQL: [mssql_health](10-icinga-template-library.md#plugin-contrib-command-mssql_health)
 885 * DB2: [db2_health](10-icinga-template-library.md#plugin-contrib-command-db2_health)
 886 * MongoDB: [mongodb](10-icinga-template-library.md#plugin-contrib-command-mongodb)
 887 * Elasticsearch: [elasticsearch](10-icinga-template-library.md#plugin-contrib-command-elasticsearch)
 888 * Redis: [redis](10-icinga-template-library.md#plugin-contrib-command-redis)
 889
 890 ### SNMP Monitoring <a id="service-monitoring-snmp"></a>
 891
 892 * [Manubulon plugins](10-icinga-template-library.md#snmp-manubulon-plugin-check-commands) (interface, storage, load, memory, process)
 893 * [snmp](10-icinga-template-library.md#plugin-check-command-snmp), [snmpv3](10-icinga-template-library.md#plugin-check-command-snmpv3)
 894
 895 ### Network Monitoring <a id="service-monitoring-network"></a>
 896
 897 * [nwc_health](10-icinga-template-library.md#plugin-contrib-command-nwc_health)
 898 * [interfaces](10-icinga-template-library.md#plugin-contrib-command-interfaces)
 899 * [interfacetable](10-icinga-template-library.md#plugin-contrib-command-interfacetable)
 900 * [iftraffic](10-icinga-template-library.md#plugin-contrib-command-iftraffic), [iftraffic64](10-icinga-template-library.md#plugin-contrib-command-iftraffic64)
 901
 902 ### Web Monitoring <a id="service-monitoring-web"></a>
 903
 904 * [http](10-icinga-template-library.md#plugin-check-command-http)
 905 * [ftp](10-icinga-template-library.md#plugin-check-command-ftp)
 906 * [webinject](10-icinga-template-library.md#plugin-contrib-command-webinject)
 907 * [squid](10-icinga-template-library.md#plugin-contrib-command-squid)
 908 * [apache-status](10-icinga-template-library.md#plugin-contrib-command-apache-status)
 909 * [nginx_status](10-icinga-template-library.md#plugin-contrib-command-nginx_status)
 910 * [kdc](10-icinga-template-library.md#plugin-contrib-command-kdc)
 911 * [rbl](10-icinga-template-library.md#plugin-contrib-command-rbl)
 912
 913 ### Java Monitoring <a id="service-monitoring-java"></a>
 914
 915 * [jmx4perl](10-icinga-template-library.md#plugin-contrib-command-jmx4perl)
 916
 917 ### DNS Monitoring <a id="service-monitoring-dns"></a>
 918
 919 * [dns](10-icinga-template-library.md#plugin-check-command-dns)
 920 * [dig](10-icinga-template-library.md#plugin-check-command-dig)
 921 * [dhcp](10-icinga-template-library.md#plugin-check-command-dhcp)
 922
 923 ### Backup Monitoring <a id="service-monitoring-backup"></a>
 924
 925 * [check_bareos](https://github.com/widhalmt/check_bareos)
 926
 927 ### Log Monitoring <a id="service-monitoring-log"></a>
 928
 929 * [check_logfiles](https://labs.consol.de/nagios/check_logfiles/)
 930 * [check_logstash](https://github.com/widhalmt/check_logstash)
 931 * [check_graylog2_stream](https://github.com/Graylog2/check-graylog2-stream)
 932
 933 ### Virtualization Monitoring <a id="service-monitoring-virtualization"></a>
 934
 935 ### VMware Monitoring <a id="service-monitoring-virtualization-vmware"></a>
 936
 937 * [esxi_hardware](10-icinga-template-library.md#plugin-contrib-command-esxi-hardware)
 938 * [VMware](10-icinga-template-library.md#plugin-contrib-vmware)
 939
 940 **Tip**: If you are encountering timeouts using the VMware Perl SDK,
 941 check [this blog entry](https://www.claudiokuenzler.com/blog/650/slow-vmware-perl-sdk-soap-request-error-libwww-version).
 942 Ubuntu 16.04 LTS can have troubles with random entropy in Perl asked [here](https://monitoring-portal.org/t/check-vmware-api-slow-when-run-multiple-times/2868).
 943 In that case, [haveged](http://issihosts.com/haveged/) may help.
 944
 945 ### SAP Monitoring <a id="service-monitoring-sap"></a>
 946
 947 * [check_sap_health](https://labs.consol.de/nagios/check_sap_health/index.html)
 948 * [SAP CCMS](https://sourceforge.net/projects/nagios-sap-ccms/)
 949
 950 ### Mail Monitoring <a id="service-monitoring-mail"></a>
 951
 952 * [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [ssmtp](10-icinga-template-library.md#plugin-check-command-ssmtp)
 953 * [imap](10-icinga-template-library.md#plugin-check-command-imap), [simap](10-icinga-template-library.md#plugin-check-command-simap)
 954 * [pop](10-icinga-template-library.md#plugin-check-command-pop), [spop](10-icinga-template-library.md#plugin-check-command-spop)
 955 * [mailq](10-icinga-template-library.md#plugin-check-command-mailq)
 956
 957 ### Hardware Monitoring <a id="service-monitoring-hardware"></a>
 958
 959 * [hpasm](10-icinga-template-library.md#plugin-contrib-command-hpasm)
 960 * [ipmi-sensor](10-icinga-template-library.md#plugin-contrib-command-ipmi-sensor)
 961
 962 ### Metrics Monitoring <a id="service-monitoring-metrics"></a>
 963
 964 * [graphite](10-icinga-template-library.md#plugin-contrib-command-graphite)