1 # Service Monitoring <a id="service-monitoring"></a>
3 The power of Icinga 2 lies in its modularity. There are thousands of
4 community plugins available next to the standard plugins provided by
5 the [Monitoring Plugins project](https://www.monitoring-plugins.org).
7 Start your research on [Icinga Exchange](https://exchange.icinga.com)
8 and look which services are already [covered](05-service-monitoring.md#service-monitoring-overview).
10 The [requirements chapter](05-service-monitoring.md#service-monitoring-requirements) guides you
11 through the plugin setup, tests and their integration with an [existing](05-service-monitoring.md#service-monitoring-plugin-checkcommand)
12 or [new](05-service-monitoring.md#service-monitoring-plugin-checkcommand-new) CheckCommand object
13 and host/service objects inside the [Director](05-service-monitoring.md#service-monitoring-plugin-checkcommand-integration-director)
14 or [Icinga config files](05-service-monitoring.md#service-monitoring-plugin-checkcommand-integration-config-files).
15 It also adds hints on [modifying](05-service-monitoring.md#service-monitoring-plugin-checkcommand-modify) existing commands.
17 Plugins follow the [Plugin API specification](05-service-monitoring.md#service-monitoring-plugin-api)
18 which is enriched with examples and also code examples to get you started with
19 [your own plugin](05-service-monitoring.md#service-monitoring-plugin-new).
23 ## Requirements <a id="service-monitoring-requirements"></a>
25 ### Plugins <a id="service-monitoring-plugins"></a>
27 All existing Icinga or Nagios plugins work with Icinga 2. Community
28 plugins can be found for example on [Icinga Exchange](https://exchange.icinga.com).
30 The recommended way of setting up these plugins is to copy them
31 into the `PluginDir` directory.
33 If you have plugins with many dependencies, consider creating a
34 custom RPM/DEB package which handles the required libraries and binaries.
36 Configuration management tools such as Puppet, Ansible, Chef or Saltstack
37 also help with automatically installing the plugins on different
38 operating systems. They can also help with installing the required
39 dependencies, e.g. Python libraries, Perl modules, etc.
41 ### Plugin Setup <a id="service-monitoring-plugins-setup"></a>
43 Good plugins provide installations and configuration instructions
44 in their docs and/or README on GitHub.
46 Sometimes dependencies are not listed, or your distribution differs from the one
47 described. Try running the plugin after setup and [ensure it works](05-service-monitoring.md#service-monitoring-plugins-it-works).
49 #### Ensure it works <a id="service-monitoring-plugins-it-works"></a>
51 Prior to using the check plugin with Icinga 2 you should ensure that it is working properly
52 by trying to run it on the console using whichever user Icinga 2 is running as:
57 sudo -u icinga /usr/lib64/nagios/plugins/check_mysql_health --help
63 sudo -u nagios /usr/lib/nagios/plugins/check_mysql_health --help
66 Additional libraries may be required for some plugins. Please consult the plugin
67 documentation and/or the included README file for installation instructions.
68 Sometimes plugins contain hard-coded paths to other components. Instead of changing
69 the plugin it might be easier to create a symbolic link to make sure it doesn't get
70 overwritten during the next update.
72 Sometimes there are plugins which do not exactly fit your requirements.
73 In that case you can modify an existing plugin or just write your own.
75 #### Plugin Dependency Errors <a id="service-monitoring-plugins-setup-dependency-errors"></a>
77 Plugins can be scripts (Shell, Python, Perl, Ruby, PHP, etc.)
78 or compiled binaries (C, C++, Go).
80 These scripts/binaries may require additional libraries
81 which must be installed on every system they are executed.
85 > Don't test the plugins on your master instance, instead
86 > do that on the satellites and clients which execute the
89 There are errors, now what? Typical errors are missing libraries,
94 Example for a Python plugin which uses the `tinkerforge` module
95 to query a network service:
98 ImportError: No module named tinkerforge.ip_connection
101 Its [documentation](https://github.com/NETWAYS/check_tinkerforge#installation)
102 points to installing the `tinkerforge` Python module.
106 Example for a Perl plugin which uses SNMP:
109 Can't locate Net/SNMP.pm in @INC (you may need to install the Net::SNMP module)
112 Prior to installing the Perl module via CPAN, look for a distribution
113 specific package, e.g. `libnet-snmp-perl` on Debian/Ubuntu or `perl-Net-SNMP`
117 #### Optional: Custom Path <a id="service-monitoring-plugins-custom-path"></a>
119 If you are not using the default `PluginDir` directory, you
120 can create a custom plugin directory and constant
121 and reference this in the created CheckCommand objects.
123 Create a common directory e.g. `/opt/monitoring/plugins`
124 and install the plugin there.
127 mkdir -p /opt/monitoring/plugins
128 cp check_snmp_int.pl /opt/monitoring/plugins
129 chmod +x /opt/monitoring/plugins/check_snmp_int.pl
132 Next create a new global constant, e.g. `CustomPluginDir`
133 in your [constants.conf](04-configuration.md#constants-conf)
137 vim /etc/icinga2/constants.conf
139 const PluginDir = "/usr/lib/nagios/plugins"
140 const CustomPluginDir = "/opt/monitoring/plugins"
143 ### CheckCommand Definition <a id="service-monitoring-plugin-checkcommand"></a>
145 Each plugin requires a [CheckCommand](09-object-types.md#objecttype-checkcommand) object in your
146 configuration which can be used in the [Service](09-object-types.md#objecttype-service) or
147 [Host](09-object-types.md#objecttype-host) object definition.
149 Please check if the Icinga 2 package already provides an
150 [existing CheckCommand definition](10-icinga-template-library.md#icinga-template-library).
152 If that's the case, thoroughly check the required parameters and integrate the check command
153 into your host and service objects. Best practice is to run the plugin on the CLI
154 with the required parameters first.
156 Example for database size checks with [check_mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health).
159 /usr/lib64/nagios/plugins/check_mysql_health --hostname '127.0.0.1' --username root --password icingar0xx --mode sql --name 'select sum(data_length + index_length) / 1024 / 1024 from information_schema.tables where table_schema = '\''icinga'\'';' '--name2' 'db_size' --units 'MB' --warning 4096 --critical 8192
162 The parameter names inside the ITL commands follow the
163 `<command name>_<parameter name>` schema.
165 #### Icinga Director Integration <a id="service-monitoring-plugin-checkcommand-integration-director"></a>
167 Navigate into `Commands > External Commands` and search for `mysql_health`.
168 Select `mysql_health` and navigate into the `Fields` tab.
170 In order to access the parameters, the Director requires you to first
171 define the needed custom data fields:
173 * `mysql_health_hostname`
174 * `mysql_health_username` and `mysql_health_password`
175 * `mysql_health_mode`
176 * `mysql_health_name`, `mysql_health_name2` and `mysql_health_units`
177 * `mysql_health_warning` and `mysql_health_critical`
179 Create a new host template and object where you'll generic
180 settings like `mysql_health_hostname` (if it differs from the host's
181 `address` attribute) and `mysql_health_username` and `mysql_health_password`.
183 Create a new service template for `mysql-health` and set the `mysql_health`
184 as check command. You can also define a default for `mysql_health_mode`.
186 Next, create a service apply rule or a new service set which gets assigned
187 to matching host objects.
190 #### Icinga Config File Integration <a id="service-monitoring-plugin-checkcommand-integration-config-files"></a>
192 Create or modify a host object which stores
193 the generic database defaults and prepares details
194 for a service apply for rule.
197 object Host "icinga2-master1.localdomain" {
198 check_command = "hostalive"
201 // Database listens locally, not external
202 vars.mysql_health_hostname = "127.0.0.1"
204 // Basic database size checks for Icinga DBs
205 vars.databases["icinga"] = {
206 mysql_health_warning = 4096 //MB
207 mysql_health_critical = 8192 //MB
209 vars.databases["icingaweb2"] = {
210 mysql_health_warning = 4096 //MB
211 mysql_health_critical = 8192 //MB
216 The host object prepares the database details and thresholds already
217 for advanced [apply for](03-monitoring-basics.md#using-apply-for) rules. It also uses
218 conditions to fetch host specified values, or set default values.
221 apply Service "db-size-" for (db_name => config in host.vars.databases) {
225 check_command = "mysql_health"
227 if (config.mysql_health_username) {
228 vars.mysql_healt_username = config.mysql_health_username
230 vars.mysql_health_username = "root"
232 if (config.mysql_health_password) {
233 vars.mysql_healt_password = config.mysql_health_password
235 vars.mysql_health_password = "icingar0xx"
238 vars.mysql_health_mode = "sql"
239 vars.mysql_health_name = "select sum(data_length + index_length) / 1024 / 1024 from information_schema.tables where table_schema = '" + db_name + "';"
240 vars.mysql_health_name2 = "db_size"
241 vars.mysql_health_units = "MB"
243 if (config.mysql_health_warning) {
244 vars.mysql_health_warning = config.mysql_health_warning
246 if (config.mysql_health_critical) {
247 vars.mysql_health_critical = config.mysql_health_critical
254 #### New CheckCommand <a id="service-monitoring-plugin-checkcommand-new"></a>
256 This chapter describes how to add a new CheckCommand object for a plugin.
258 Please make sure to follow these conventions when adding a new command object definition:
260 * Use [command arguments](03-monitoring-basics.md#command-arguments) whenever possible. The `command` attribute
261 must be an array in `[ ... ]` for shell escaping.
262 * Define a unique `prefix` for the command's specific arguments. Best practice is to follow this schema:
265 <command name>_<parameter name>
268 That way you can safely set them on host/service level and you'll always know which command they control.
269 * Use command argument default values, e.g. for thresholds.
270 * Use [advanced conditions](09-object-types.md#objecttype-checkcommand) like `set_if` definitions.
272 Before starting with the CheckCommand definition, please check
273 the existing objects available inside the ITL. They follow best
274 practices and are maintained by developers and our community.
276 This example picks a new plugin called [check_systemd](https://exchange.icinga.com/joseffriedrich/check_systemd)
277 uploaded to Icinga Exchange in June 2019.
279 First, [install](05-service-monitoring.md#service-monitoring-plugins-setup) the plugin and ensure
280 that [it works](05-service-monitoring.md#service-monitoring-plugins-it-works). Then run it with the
281 `--help` parameter to see the actual parameters (docs might be outdated).
284 ./check_systemd.py --help
286 usage: check_systemd.py [-h] [-c SECONDS] [-e UNIT | -u UNIT] [-v] [-V]
292 -h, --help show this help message and exit
293 -c SECONDS, --critical SECONDS
294 Startup time in seconds to result in critical status.
295 -e UNIT, --exclude UNIT
296 Exclude a systemd unit from the checks. This option
297 can be applied multiple times. For example: -e mnt-
298 data.mount -e task.service.
299 -u UNIT, --unit UNIT Name of the systemd unit that is beeing tested.
300 -v, --verbose Increase output verbosity (use up to 3 times).
301 -V, --version show program's version number and exit
302 -w SECONDS, --warning SECONDS
303 Startup time in seconds to result in warning status.
306 The argument description is important, based on this you need to create the
311 > When you are using the Director, you can prepare the commands as files
312 > e.g. inside the `global-templates` zone. Then run the kickstart wizard
313 > again to import the commands as external reference.
315 > If you prefer to use the Director GUI/CLI, please apply the steps
316 > in the `Add Command` form.
318 Start with the basic plugin call without any parameters.
321 object CheckCommand "systemd" { // Plugin name without 'check_' prefix
322 command = [ PluginContribDir + "/check_systemd.py" ] // Use the 'PluginContribDir' constant, see the contributed ITL commands
326 Run a config validation to see if that works, `icinga2 daemon -C`
328 Next, analyse the plugin parameters. Plugins with a good help output show
329 optional parameters in square brackes. This is the case for all parameters
330 for this plugin. If there are required parameters, use the `required` key
333 The `arguments` attribute is a dictionary which takes the parameters as keys.
341 If there a long parameter names available, prefer them. This increases
342 readability in both the configuration as well as the executed command line.
344 The argument value itself is a sub dictionary which has additional keys:
346 * `value` which references the runtime macro string
347 * `description` where you copy the plugin parameter help text into
348 * `required`, `set_if`, etc. for advanced parameters, check the [CheckCommand object](09-object-types.md#objecttype-checkcommand) chapter.
350 The runtime macro syntax is required to allow value extraction when
351 the command is executed.
355 > Inside the Director, store the new command first in order to
356 > unveil the `Arguments` tab.
358 Best practice is to use the command name as prefix, in this specific
359 case e.g. `systemd_unit`.
364 value = "$systemd_unit$" // The service parameter would then be defined as 'vars.systemd_unit = "icinga2"'
365 description = "Name of the systemd unit that is beeing tested."
368 value = "$systemd_warning$"
369 description = "Startup time in seconds to result in warning status."
372 value = "$systemd_critical$"
373 description = "Startup time in seconds to result in critical status."
378 This may take a while -- validate the configuration in between up until
379 the CheckCommand definition is done.
381 Then test and integrate it into your monitoring configuration.
383 Remember: Do it once and right, and never touch the CheckCommand again.
384 Optional arguments allow different use cases and scenarios.
387 Once you have created your really good CheckCommand, please consider
388 sharing it with our community by creating a new PR on [GitHub](https://github.com/Icinga/icinga2/blob/master/CONTRIBUTING.md).
389 _Please also update the documentation for the ITL._
394 > Inside the Director, you can render the configuration in the Deployment
395 > section. Extract the static configuration object and use that as a source
396 > for sending it upstream.
400 #### Modify Existing CheckCommand <a id="service-monitoring-plugin-checkcommand-modify"></a>
402 Sometimes an existing CheckCommand inside the ITL is missing a parameter.
403 Or you don't need a default parameter value being set.
405 Instead of copying the entire configuration object, you can import
406 an object into another new object.
409 object CheckCommand "http-custom" {
410 import "http" // Import existing http object
412 arguments += { // Use additive assignment to add missing parameters
414 value = "$http_..." // Keep the parameter name the same as with http
418 // Override default parameters
419 vars.http_address = "..."
423 This CheckCommand can then be referenced in your host/service object
427 ### Plugin API <a id="service-monitoring-plugin-api"></a>
429 Icinga 2 supports the native plugin API specification from the Monitoring Plugins project.
430 It is defined in the [Monitoring Plugins](https://www.monitoring-plugins.org) guidelines.
432 The Icinga documentation revamps the specification into our
433 own guideline enriched with examples and best practices.
435 #### Output <a id="service-monitoring-plugin-api-output"></a>
437 The output should be as short and as detailed as possible. The
438 most common cases include:
440 - Viewing a problem list in Icinga Web and dashboards
441 - Getting paged about a problem
442 - Receiving the alert on the CLI or forwarding it to external (ticket) systems
447 <STATUS>: <A short description what happened>
449 OK: MySQL connection time is fine (0.0002s)
450 WARNING: MySQL connection time is slow (0.5s > 0.1s threshold)
451 CRITICAL: MySQL connection time is causing degraded performance (3s > 0.5s threshold)
454 Icinga supports reading multi-line output where Icinga Web
455 only shows the first line in the listings and everything in the detail view.
457 Example for an end2end check with many smaller test cases integrated:
460 OK: Online banking works.
461 Testcase 1: Site reached.
462 Testcase 2: Attempted login, JS loads.
463 Testcase 3: Login succeeded.
464 Testcase 4: View current state works.
465 Testcase 5: Transactions fine.
468 If the extended output shouldn't be visible in your monitoring, but only for testing,
469 it is recommended to implement the `--verbose` plugin parameter to allow
470 developers and users to debug further. Check [here](05-service-monitoring.md#service-monitoring-plugin-api-verbose)
471 for more implementation tips.
475 > More debug output also helps when implementing your plugin.
477 > Best practice is to have the plugin parameter and handling implemented first,
478 > then add it anywhere you want to see more, e.g. from initial database connections
479 > to actual query results.
482 #### Status <a id="service-monitoring-plugin-api-status"></a>
484 Value | Status | Description
485 ------|-----------|-------------------------------
486 0 | OK | The check went fine and everything is considered working.
487 1 | Warning | The check is above the given warning threshold, or anything else is suspicious requiring attention before it breaks.
488 2 | Critical | The check exceeded the critical threshold, or something really is broken and will harm the production environment.
489 3 | Unknown | Invalid parameters, low level resource errors (IO device busy, no fork resources, TCP sockets, etc.) preventing the actual check. Higher level errors such as DNS resolving, TCP connection timeouts should be treated as `Critical` instead. Whenever the plugin reaches its timeout (best practice) it should also terminate with `Unknown`.
491 Keep in mind that these are service states. Icinga automatically maps
492 the [host state](03-monitoring-basics.md#check-result-state-mapping) from the returned plugin states.
494 #### Thresholds <a id="service-monitoring-plugin-api-thresholds"></a>
496 A plugin calculates specific values and may decide about the exit state on its own.
497 This is done with thresholds - warning and critical values which are compared with
498 the actual value. Upon this logic, the exit state is determined.
500 Imagine the following value and defined thresholds:
509 Whenever `ptc_value` is higher than warning or critical, it should return
510 the appropriate [state](05-service-monitoring.md#service-monitoring-plugin-api-status).
512 The threshold evaluation order also is important:
514 * Critical thresholds are evaluated first and superseed everything else.
515 * Warning thresholds are evaluated second
516 * If no threshold is matched, return the OK state
518 Avoid using hardcoded threshold values in your plugins, always
519 add them to the argument parser.
528 if __name__ == '__main__':
529 parser = argparse.ArgumentParser()
531 parser.add_argument("-w", "--warning", help="Warning threshold. Single value or range, e.g. '20:50'.")
532 parser.add_argument("-c", "--critical", help="Critical threshold. Single vluae or range, e.g. '25:45'.")
534 args = parser.parse_args()
537 Users might call plugins only with the critical threshold parameter,
538 leaving out the warning parameter. Keep this in mind when evaluating
539 the thresholds, always check if the parameters have been defined before.
543 if ptc_value > args.critical:
544 print("CRITICAL - ...")
545 sys.exit(2) # Critical
548 if ptc_value > args.warning:
549 print("WARNING - ...")
550 sys.exit(1) # Warning
556 The above is a simplified example for printing the [output](05-service-monitoring.md#service-monitoring-plugin-api-output)
557 and using the [state](05-service-monitoring.md#service-monitoring-plugin-api-status)
560 Before diving into the implementation, learn more about required
561 [performance data metrics](05-service-monitoring.md#service-monitoring-plugin-api-performance-data-metrics)
562 and more best practices below.
564 ##### Threshold Ranges <a id="service-monitoring-plugin-api-thresholds-ranges"></a>
566 Threshold ranges can be used to specify an alert window, e.g. whenever a calculated
567 value is between a lower and higher critical threshold.
569 The schema for threshold ranges looks as follows. The `@` character in square brackets
576 There are a few requirements for ranges:
578 * `start <= end`. Add a check in your code and let the user know about problematic values.
586 * `start:` can be omitted if its value is 0. This is the default handling for single threshold values too.
589 10 # Every value > 10 and < 0, outside of 0..10
592 * If `end` is omitted, assume end is infinity.
595 10: # < 10, outside of 10..∞
598 * In order to specify negative infinity, use the `~` character.
601 ~:10 # > 10, outside of -∞..10
604 * Raise alert if value is outside of the defined range.
607 10:20 # < 10 or > 20, outside of 10..20
610 * Start with `@` to raise an alert if the value is **inside** the defined range, inclusive start/end values.
613 @10:20 # >= 10 and <= 20, inside of 10..20
616 Best practice is to either implement single threshold values, or fully support ranges.
617 This requires parsing the input parameter values, therefore look for existing libraries
618 already providing this functionality.
620 [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py)
621 implements a simple parser to avoid dependencies.
624 #### Performance Data Metrics <a id="service-monitoring-plugin-api-performance-data-metrics"></a>
626 Performance data metrics must be appended to the plugin output with a preceding `|` character.
627 The schema is as follows:
630 <output> | 'label'=value[UOM];[warn];[crit];[min];[max]
633 The label should be encapsulated with single quotes. Avoid spaces or special characters such
634 as `%` in there, this could lead to problems with metric receivers such as Graphite.
636 Labels must not include `'` and `=` characters. Keep the label length as short and unique as possible.
644 Values must respect the C/POSIX locale and not implement e.g. German locale for floating point numbers with `,`.
645 Icinga sets `LC_NUMERIC=C` to enforce this locale on plugin execution.
647 ##### Unit of Measurement (UOM) <a id="service-monitoring-plugin-api-performance-data-metrics-uom"></a>
650 ---------|---------------------------------
651 None | Integer or floating point number for any type (processes, users, etc.).
652 `s` | Seconds, can be `s`, `ms`, `us`.
654 `B` | Bytes, can be `KB`, `MB`, `GB`, `TB`. Lowercase is also possible.
655 `c` | A continuous counter (e.g. interface traffic counters).
657 Icinga metric writers normalize these values to the lowest common base, e.g. seconds and bytes.
658 Bad plugins change the UOM for different sizing, e.g. returning the disk usage in MB and later GB
659 for the same performance data label. This is to ensure that graphs always look the same.
662 'rta'=12.445000ms 'pl'=0%
665 ##### Thresholds and Min/Max <a id="service-monitoring-plugin-api-performance-data-metrics-thresholds-min-max"></a>
667 Next to the performance data value, warn, crit, min, max can optionally be provided. They must be separated
668 with the semi-colon `;` character. They share the same UOM with the performance data value.
671 $ check_ping -4 -H icinga.com -c '200,15%' -w '100,5%'
673 PING OK - Packet loss = 0%, RTA = 12.44 ms|rta=12.445000ms;100.000000;200.000000;0.000000 pl=0%;5;15;0
676 ##### Multiple Performance Data Values <a id="service-monitoring-plugin-api-performance-data-metrics-multiple"></a>
678 Multiple performance data values must be joined with a space character. The below example
679 is from the [check_load](10-icinga-template-library.md#plugin-check-command-load) plugin.
682 load1=4.680;1.000;2.000;0; load5=0.000;5.000;10.000;0; load15=0.000;10.000;20.000;0;
685 #### Timeout <a id="service-monitoring-plugin-api-timeout"></a>
687 Icinga has a safety mechanism where it kills processes running for too
688 long. The timeout can be specified in [CheckCommand objects](09-object-types.md#objecttype-checkcommand)
689 or on the host/service object.
691 Best practice is to control the timeout in the plugin itself
692 and provide a clear message followed by the Unknown state.
694 Example in Python taken from [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py):
701 def handle_sigalrm(signum, frame, timeout=None):
702 output('Plugin timed out after %d seconds' % timeout, 3)
704 if __name__ == '__main__':
705 parser = argparse.ArgumentParser()
706 # ... add more arguments
707 parser.add_argument("-t", "--timeout", help="Timeout in seconds (default 10s)", type=int, default=10)
708 args = parser.parse_args()
710 signal.signal(signal.SIGALRM, partial(handle_sigalrm, timeout=args.timeout))
711 signal.alarm(args.timeout)
713 # ... perform the check and generate output/status
716 #### Versions <a id="service-monitoring-plugin-api-versions"></a>
718 Plugins should provide a version via `-V` or `--version` parameter
719 which is bumped on releases. This allows to identify problems with
720 too old or new versions on the community support channels.
722 Example in Python taken from [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py):
729 __version__ = '0.9.1'
731 if __name__ == '__main__':
732 parser = argparse.ArgumentParser()
734 parser.add_argument('-V', '--version', action='version', version='%(prog)s v' + sys.modules[__name__].__version__)
737 #### Verbose <a id="service-monitoring-plugin-api-verbose"></a>
739 Plugins should provide a verbose mode with `-v` or `--verbose` in order
740 to show more detailed log messages. This helps to debug and analyse the
741 flow and execution steps inside the plugin.
743 Ensure to add the parameter prior to implementing the check logic into
746 Example in Python taken from [check_tinkerforge](https://github.com/NETWAYS/check_tinkerforge/blob/master/check_tinkerforge.py):
753 if __name__ == '__main__':
754 parser = argparse.ArgumentParser()
756 parser.add_argument('-v', '--verbose', action='store_true')
759 print("Verbose debug output")
763 ### Create a new Plugin <a id="service-monitoring-plugin-new"></a>
765 Sometimes an existing plugin does not satisfy your requirements. You
766 can either kindly contact the original author about plans to add changes
767 and/or create a patch.
769 If you just want to format the output and state of an existing plugin
770 it might also be helpful to write a wrapper script. This script
771 could pass all configured parameters, call the plugin script, parse
772 its output/exit code and return your specified output/exit code.
774 On the other hand plugins for specific services and hardware might not yet
779 > Watch this presentation from Icinga Camp Berlin to learn more
780 > about [How to write checks that don't suck](https://www.youtube.com/watch?v=Ey_APqSCoFQ).
782 Common best practices:
784 * Choose the programming language wisely
785 * Scripting languages (Bash, Python, Perl, Ruby, PHP, etc.) are easier to write and setup but their check execution might take longer (invoking the script interpreter as overhead, etc.).
786 * Plugins written in C/C++, Go, etc. improve check execution time but may generate an overhead with installation and packaging.
787 * Use a modern VCS such as Git for developing the plugin, e.g. share your plugin on GitHub and let it sync to [Icinga Exchange](https://exchange.icinga.com).
788 * **Look into existing plugins endorsed by community members.**
790 Implementation hints:
792 * Add parameters with key-value pairs to your plugin. They should allow long names (e.g. `--host localhost`) and also short parameters (e.g. `-H localhost`)
793 * `-h|--help` should print the version and all details about parameters and runtime invocation. Note: Python's ArgParse class provides this OOTB.
794 * `--version` should print the plugin [version](05-service-monitoring.md#service-monitoring-plugin-api-versions).
795 * Add a [verbose/debug output](05-service-monitoring.md#service-monitoring-plugin-api-verbose) functionality for detailed on-demand logging.
796 * Respect the exit codes required by the [Plugin API](05-service-monitoring.md#service-monitoring-plugin-api).
797 * Always add [performance data](05-service-monitoring.md#service-monitoring-plugin-api-performance-data-metrics) to your plugin output.
798 * Allow to specify [warning/critical thresholds](05-service-monitoring.md#service-monitoring-plugin-api-thresholds) as parameters.
803 # 1. include optional libraries
804 # 2. global variables
805 # 3. helper functions and/or classes
806 # 4. define timeout condition
808 if (<timeout_reached>) then
809 print "UNKNOWN - Timeout (...) reached | 'time'=30.0
814 <execute and fetch data>
816 if (<threshold_critical_condition>) then
817 print "CRITICAL - ... | 'time'=0.1 'myperfdatavalue'=5.0
819 else if (<threshold_warning_condition>) then
820 print "WARNING - ... | 'time'=0.1 'myperfdatavalue'=3.0
823 print "OK - ... | 'time'=0.2 'myperfdatavalue'=1.0
827 There are various plugin libraries available which will help
828 with plugin execution and output formatting too, for example
829 [nagiosplugin from Python](https://pypi.python.org/pypi/nagiosplugin/).
833 > Ensure to test your plugin properly with special cases before putting it
836 Once you've finished your plugin please upload/sync it to [Icinga Exchange](https://exchange.icinga.com/new).
840 ## Service Monitoring Overview <a id="service-monitoring-overview"></a>
842 The following examples should help you to start implementing your own ideas.
843 There is a variety of plugins available. This collection is not complete --
844 if you have any updates, please send a documentation patch upstream.
846 Please visit our [community forum](https://community.icinga.com) which
847 may provide an answer to your use case already. If not, do not hesitate
848 to create a new topic.
850 ### General Monitoring <a id="service-monitoring-general"></a>
852 If the remote service is available (via a network protocol and port),
853 and if a check plugin is also available, you don't necessarily need a local client.
854 Instead, choose a plugin and configure its parameters and thresholds. The following examples are included in the [Icinga 2 Template Library](10-icinga-template-library.md#icinga-template-library):
856 * [ping4](10-icinga-template-library.md#plugin-check-command-ping4), [ping6](10-icinga-template-library.md#plugin-check-command-ping6),
857 [fping4](10-icinga-template-library.md#plugin-check-command-fping4), [fping6](10-icinga-template-library.md#plugin-check-command-fping6), [hostalive](10-icinga-template-library.md#plugin-check-command-hostalive)
858 * [tcp](10-icinga-template-library.md#plugin-check-command-tcp), [udp](10-icinga-template-library.md#plugin-check-command-udp), [ssl](10-icinga-template-library.md#plugin-check-command-ssl)
859 * [ntp_time](10-icinga-template-library.md#plugin-check-command-ntp-time)
861 ### Linux Monitoring <a id="service-monitoring-linux"></a>
863 * [disk](10-icinga-template-library.md#plugin-check-command-disk)
864 * [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap)
865 * [procs](10-icinga-template-library.md#plugin-check-command-processes)
866 * [users](10-icinga-template-library.md#plugin-check-command-users)
867 * [running_kernel](10-icinga-template-library.md#plugin-contrib-command-running_kernel)
868 * package management: [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum), etc.
869 * [ssh](10-icinga-template-library.md#plugin-check-command-ssh)
870 * performance: [iostat](10-icinga-template-library.md#plugin-contrib-command-iostat), [check_sar_perf](https://github.com/dnsmichi/icinga-plugins/blob/master/scripts/check_sar_perf.py)
872 ### Windows Monitoring <a id="service-monitoring-windows"></a>
874 * [check_wmi_plus](http://www.edcint.co.nz/checkwmiplus/)
875 * [NSClient++](https://www.nsclient.org) (in combination with the Icinga 2 client and either [check_nscp_api](10-icinga-template-library.md#nscp-check-api) or [nscp-local](10-icinga-template-library.md#nscp-plugin-check-commands) check commands)
876 * [Icinga 2 Windows Plugins](10-icinga-template-library.md#windows-plugins) (disk, load, memory, network, performance counters, ping, procs, service, swap, updates, uptime, users
877 * vbs and Powershell scripts
879 ### Database Monitoring <a id="service-monitoring-database"></a>
881 * MySQL/MariaDB: [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health), [mysql](10-icinga-template-library.md#plugin-check-command-mysql), [mysql_query](10-icinga-template-library.md#plugin-check-command-mysql-query)
882 * PostgreSQL: [postgres](10-icinga-template-library.md#plugin-contrib-command-postgres)
883 * Oracle: [oracle_health](10-icinga-template-library.md#plugin-contrib-command-oracle_health)
884 * MSSQL: [mssql_health](10-icinga-template-library.md#plugin-contrib-command-mssql_health)
885 * DB2: [db2_health](10-icinga-template-library.md#plugin-contrib-command-db2_health)
886 * MongoDB: [mongodb](10-icinga-template-library.md#plugin-contrib-command-mongodb)
887 * Elasticsearch: [elasticsearch](10-icinga-template-library.md#plugin-contrib-command-elasticsearch)
888 * Redis: [redis](10-icinga-template-library.md#plugin-contrib-command-redis)
890 ### SNMP Monitoring <a id="service-monitoring-snmp"></a>
892 * [Manubulon plugins](10-icinga-template-library.md#snmp-manubulon-plugin-check-commands) (interface, storage, load, memory, process)
893 * [snmp](10-icinga-template-library.md#plugin-check-command-snmp), [snmpv3](10-icinga-template-library.md#plugin-check-command-snmpv3)
895 ### Network Monitoring <a id="service-monitoring-network"></a>
897 * [nwc_health](10-icinga-template-library.md#plugin-contrib-command-nwc_health)
898 * [interfaces](10-icinga-template-library.md#plugin-contrib-command-interfaces)
899 * [interfacetable](10-icinga-template-library.md#plugin-contrib-command-interfacetable)
900 * [iftraffic](10-icinga-template-library.md#plugin-contrib-command-iftraffic), [iftraffic64](10-icinga-template-library.md#plugin-contrib-command-iftraffic64)
902 ### Web Monitoring <a id="service-monitoring-web"></a>
904 * [http](10-icinga-template-library.md#plugin-check-command-http)
905 * [ftp](10-icinga-template-library.md#plugin-check-command-ftp)
906 * [webinject](10-icinga-template-library.md#plugin-contrib-command-webinject)
907 * [squid](10-icinga-template-library.md#plugin-contrib-command-squid)
908 * [apache-status](10-icinga-template-library.md#plugin-contrib-command-apache-status)
909 * [nginx_status](10-icinga-template-library.md#plugin-contrib-command-nginx_status)
910 * [kdc](10-icinga-template-library.md#plugin-contrib-command-kdc)
911 * [rbl](10-icinga-template-library.md#plugin-contrib-command-rbl)
913 ### Java Monitoring <a id="service-monitoring-java"></a>
915 * [jmx4perl](10-icinga-template-library.md#plugin-contrib-command-jmx4perl)
917 ### DNS Monitoring <a id="service-monitoring-dns"></a>
919 * [dns](10-icinga-template-library.md#plugin-check-command-dns)
920 * [dig](10-icinga-template-library.md#plugin-check-command-dig)
921 * [dhcp](10-icinga-template-library.md#plugin-check-command-dhcp)
923 ### Backup Monitoring <a id="service-monitoring-backup"></a>
925 * [check_bareos](https://github.com/widhalmt/check_bareos)
927 ### Log Monitoring <a id="service-monitoring-log"></a>
929 * [check_logfiles](https://labs.consol.de/nagios/check_logfiles/)
930 * [check_logstash](https://github.com/widhalmt/check_logstash)
931 * [check_graylog2_stream](https://github.com/Graylog2/check-graylog2-stream)
933 ### Virtualization Monitoring <a id="service-monitoring-virtualization"></a>
935 ### VMware Monitoring <a id="service-monitoring-virtualization-vmware"></a>
937 * [esxi_hardware](10-icinga-template-library.md#plugin-contrib-command-esxi-hardware)
938 * [VMware](10-icinga-template-library.md#plugin-contrib-vmware)
940 **Tip**: If you are encountering timeouts using the VMware Perl SDK,
941 check [this blog entry](https://www.claudiokuenzler.com/blog/650/slow-vmware-perl-sdk-soap-request-error-libwww-version).
942 Ubuntu 16.04 LTS can have troubles with random entropy in Perl asked [here](https://monitoring-portal.org/t/check-vmware-api-slow-when-run-multiple-times/2868).
943 In that case, [haveged](http://issihosts.com/haveged/) may help.
945 ### SAP Monitoring <a id="service-monitoring-sap"></a>
947 * [check_sap_health](https://labs.consol.de/nagios/check_sap_health/index.html)
948 * [SAP CCMS](https://sourceforge.net/projects/nagios-sap-ccms/)
950 ### Mail Monitoring <a id="service-monitoring-mail"></a>
952 * [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [ssmtp](10-icinga-template-library.md#plugin-check-command-ssmtp)
953 * [imap](10-icinga-template-library.md#plugin-check-command-imap), [simap](10-icinga-template-library.md#plugin-check-command-simap)
954 * [pop](10-icinga-template-library.md#plugin-check-command-pop), [spop](10-icinga-template-library.md#plugin-check-command-spop)
955 * [mailq](10-icinga-template-library.md#plugin-check-command-mailq)
957 ### Hardware Monitoring <a id="service-monitoring-hardware"></a>
959 * [hpasm](10-icinga-template-library.md#plugin-contrib-command-hpasm)
960 * [ipmi-sensor](10-icinga-template-library.md#plugin-contrib-command-ipmi-sensor)
962 ### Metrics Monitoring <a id="service-monitoring-metrics"></a>
964 * [graphite](10-icinga-template-library.md#plugin-contrib-command-graphite)