From: Jean Flach Date: Thu, 19 Oct 2017 15:32:52 +0000 (+0200) Subject: Fix flapping X-Git-Tag: v2.8.0~28^2~1 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=a21ffd6fe4b83705b9607732cc6b2d8c0ef6d15f;p=icinga2 Fix flapping Re-implement flapping following the 'old way' of just observing the last 20 stage changes. refs #4982 --- diff --git a/doc/08-advanced-topics.md b/doc/08-advanced-topics.md index a7d129ae1..b3cd68c48 100644 --- a/doc/08-advanced-topics.md +++ b/doc/08-advanced-topics.md @@ -414,19 +414,42 @@ Example output in Icinga Web 2: Icinga 2 supports optional detection of hosts and services that are "flapping". -Flapping occurs when a service or host changes state too frequently, resulting -in a storm of problem and recovery notifications. Flapping can be the source of -configuration problems (i.e. thresholds set too low), troublesome services, -or real network problems. +Flapping occurs when a service or host changes state too frequently, which would result in a storm of problem and +recovery notifications. With flapping enabled a flapping notification will be sent while other notifications are +suppresed until it calms down after receiving the same status from checks a few times. flapping can help detecting +configuration problems (wrong thresholds), troublesome services, or network problems. Flapping detection can be enabled or disabled using the `enable_flapping` attribute. -The `flapping_threshold` attributes allows to specify the percentage of state changes -when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to flap. +The `flapping_threshold_high` and `flapping_threshold_low` attributes allows to specify the thresholds that control +when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to be flapping. -Note: There are known issues with flapping detection. Please refrain from enabling -flapping until [#4982](https://github.com/Icinga/icinga2/issues/4982) is fixed. +The default thresholds are 30% for high and 25% for low. If the computed flapping value excedes the high threshold a +host or service is considered flapping until it drops below the low flapping threshold. -## Volatile Services +`FlappingStart` and `FlappingEnd` notifications will be sent out accordingly, if configured. See the chapter on +[notifications](alert-notifications) for details + +> Note: There is no distinctions between hard and soft states with flapping. All state changes count and notifications +> will be sent out regardless of the objects state. + +### How it works + +Icinga 2 saves the last 20 state changes for every host and service. See the graphic below: + +![Icinga 2 Flapping State Timeline](images/advanced-topics/flapping-state-graph.png) + +All the states ware weighted, with the most recent one being worth the most (1.15) and the 20th the least (0.8). The +states inbetween are fairly distributed. The final flapping value are the weightened state changes divided by the total +count of 20. + +In the example above, the added states would have a total value of 7.82 (`0.84 + 0.86 + 0.88 + 0.9 + 0.98 + 1.06 + 1.12 + 1.18`). +This yiels a flapping percentage of 39.1% (`7.82 / 20 * 100`). As the default upper flapping threshold is 30%, it would be +considered flapping. + +If the next seven check results then would not be state changes, the flapping percentage would fall below the lower threshold +of 25% and therefore the host or service would recover from flapping. + +# Volatile Services By default all services remain in a non-volatile state. When a problem occurs, the `SOFT` state applies and once `max_check_attempts` attribute diff --git a/doc/09-object-types.md b/doc/09-object-types.md index 036534344..d09e32327 100644 --- a/doc/09-object-types.md +++ b/doc/09-object-types.md @@ -730,7 +730,8 @@ Configuration Attributes: enable\_flapping | Boolean | **Optional.** Whether flap detection is enabled. Defaults to false. enable\_perfdata | Boolean | **Optional.** Whether performance data processing is enabled. Defaults to true. event\_command | Object name | **Optional.** The name of an event command that should be executed every time the host's state changes or the host is in a `SOFT` state. - flapping\_threshold | Number | **Optional.** The flapping threshold in percent when a host is considered to be flapping. + flapping\_threshold\_high | Number | **Optional.** Flapping upper bound in percent for a host to be considered flapping. Default `30.0` + flapping\_threshold\_low | Number | **Optional.** Flapping lower bound in percent for a host to be considered not flapping. Default `25.0` volatile | Boolean | **Optional.** The volatile setting enables always `HARD` state types if `NOT-OK` state changes occur. Defaults to false. zone | Object name | **Optional.** The zone this object is a member of. Please read the [distributed monitoring](06-distributed-monitoring.md#distributed-monitoring) chapter for details. command\_endpoint | Object name | **Optional.** The endpoint where commands are executed on. @@ -767,6 +768,7 @@ Runtime Attributes: downtime\_depth | Number | Whether the host has one or more active downtimes. flapping\_last\_change | Timestamp | When the last flapping change occurred (as a UNIX timestamp). flapping | Boolean | Whether the host is flapping between states. + flapping\_current | Number | Current flapping value in percent (see flapping\_thresholds) state | Number | The current state (0 = UP, 1 = DOWN). last\_state | Number | The previous state (0 = UP, 1 = DOWN). last\_hard\_state | Number | The last hard state (0 = UP, 1 = DOWN). @@ -1465,9 +1467,10 @@ Configuration Attributes: enable\_passive\_checks | Boolean | **Optional.** Whether passive checks are enabled. Defaults to `true`. enable\_event\_handler | Boolean | **Optional.** Enables event handlers for this host. Defaults to `true`. enable\_flapping | Boolean | **Optional.** Whether flap detection is enabled. Defaults to `false`. + flapping\_threshold\_high | Number | **Optional.** Flapping upper bound in percent for a service to be considered flapping. `30.0` + flapping\_threshold\_low | Number | **Optional.** Flapping lower bound in percent for a service to be considered not flapping. `25.0` enable\_perfdata | Boolean | **Optional.** Whether performance data processing is enabled. Defaults to `true`. event\_command | Object name | **Optional.** The name of an event command that should be executed every time the service's state changes or the service is in a `SOFT` state. - flapping\_threshold | Number | **Optional.** The flapping threshold in percent when a service is considered to be flapping. volatile | Boolean | **Optional.** The volatile setting enables always `HARD` state types if `NOT-OK` state changes occur. Defaults to `false`. zone | Object name | **Optional.** The zone this object is a member of. Please read the [distributed monitoring](06-distributed-monitoring.md#distributed-monitoring) chapter for details. name | String | **Required.** The service name. Must be unique on a per-host basis. For advanced usage in [apply rules](03-monitoring-basics.md#using-apply) only. @@ -1502,6 +1505,7 @@ Runtime Attributes: acknowledgement\_expiry | Timestamp | When the acknowledgement expires (as a UNIX timestamp; 0 = no expiry). downtime\_depth | Number | Whether the service has one or more active downtimes. flapping\_last\_change | Timestamp | When the last flapping change occurred (as a UNIX timestamp). + flapping\_current | Number | Current flapping value in percent (see flapping\_thresholds) flapping | Boolean | Whether the host is flapping between states. state | Number | The current state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN). last\_state | Number | The previous state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN). diff --git a/lib/compat/compatlogger.cpp b/lib/compat/compatlogger.cpp index a7d3d91f0..dabefe785 100644 --- a/lib/compat/compatlogger.cpp +++ b/lib/compat/compatlogger.cpp @@ -317,10 +317,10 @@ void CompatLogger::FlappingChangedHandler(const Checkable::Ptr& checkable) String flapping_output; if (checkable->IsFlapping()) { - flapping_output = "Checkable appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)"; + flapping_output = "Checkable appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThresholdHigh()) + "% threshold)"; flapping_state_str = "STARTED"; } else { - flapping_output = "Checkable appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)"; + flapping_output = "Checkable appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThresholdLow()) + "% threshold)"; flapping_state_str = "STOPPED"; } diff --git a/lib/compat/statusdatawriter.cpp b/lib/compat/statusdatawriter.cpp index 080be1892..22a22f7df 100644 --- a/lib/compat/statusdatawriter.cpp +++ b/lib/compat/statusdatawriter.cpp @@ -296,8 +296,8 @@ void StatusDataWriter::DumpHostObject(std::ostream& fp, const Host::Ptr& host) fp << "\n"; fp << "\t" << "initial_state" "\t" "o" "\n" - "\t" "low_flap_threshold" "\t" << host->GetFlappingThreshold() << "\n" - "\t" "high_flap_threshold" "\t" << host->GetFlappingThreshold() << "\n" + "\t" "low_flap_threshold" "\t" << host->GetFlappingThresholdLow() << "\n" + "\t" "high_flap_threshold" "\t" << host->GetFlappingThresholdHigh() << "\n" "\t" "process_perf_data" "\t" << CompatUtility::GetCheckableProcessPerformanceData(host) << "\n" "\t" "check_freshness" "\t" "1" "\n"; @@ -470,8 +470,8 @@ void StatusDataWriter::DumpServiceObject(std::ostream& fp, const Service::Ptr& s String icon_image_alt = service->GetIconImageAlt(); fp << "\t" "initial_state" "\t" "o" "\n" - "\t" "low_flap_threshold" "\t" << service->GetFlappingThreshold() << "\n" - "\t" "high_flap_threshold" "\t" << service->GetFlappingThreshold() << "\n" + "\t" "low_flap_threshold" "\t" << service->GetFlappingThresholdLow() << "\n" + "\t" "high_flap_threshold" "\t" << service->GetFlappingThresholdHigh() << "\n" "\t" "process_perf_data" "\t" << CompatUtility::GetCheckableProcessPerformanceData(service) << "\n" "\t" "check_freshness" << "\t" "1" "\n"; if (!notes.IsEmpty()) diff --git a/lib/db_ido/dbevents.cpp b/lib/db_ido/dbevents.cpp index 28f6604bb..706e3ddd3 100644 --- a/lib/db_ido/dbevents.cpp +++ b/lib/db_ido/dbevents.cpp @@ -1194,10 +1194,10 @@ void DbEvents::AddFlappingChangedLogHistory(const Checkable::Ptr& checkable) String flapping_output; if (checkable->IsFlapping()) { - flapping_output = "Service appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)"; + flapping_output = "Service appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThresholdHigh()) + "% threshold)"; flapping_state_str = "STARTED"; } else { - flapping_output = "Service appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)"; + flapping_output = "Service appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThresholdLow()) + "% threshold)"; flapping_state_str = "STOPPED"; } @@ -1323,8 +1323,8 @@ void DbEvents::AddFlappingChangedHistory(const Checkable::Ptr& checkable) fields1->Set("flapping_type", service ? 1 : 0); fields1->Set("object_id", checkable); fields1->Set("percent_state_change", checkable->GetFlappingCurrent()); - fields1->Set("low_threshold", checkable->GetFlappingThreshold()); - fields1->Set("high_threshold", checkable->GetFlappingThreshold()); + fields1->Set("low_threshold", checkable->GetFlappingThresholdLow()); + fields1->Set("high_threshold", checkable->GetFlappingThresholdHigh()); fields1->Set("instance_id", 0); /* DbConnection class fills in real ID */ @@ -1369,8 +1369,8 @@ void DbEvents::AddEnableFlappingChangedHistory(const Checkable::Ptr& checkable) fields1->Set("flapping_type", service ? 1 : 0); fields1->Set("object_id", checkable); fields1->Set("percent_state_change", checkable->GetFlappingCurrent()); - fields1->Set("low_threshold", checkable->GetFlappingThreshold()); - fields1->Set("high_threshold", checkable->GetFlappingThreshold()); + fields1->Set("low_threshold", checkable->GetFlappingThresholdLow()); + fields1->Set("high_threshold", checkable->GetFlappingThresholdHigh()); fields1->Set("instance_id", 0); /* DbConnection class fills in real ID */ diff --git a/lib/icinga/checkable-check.cpp b/lib/icinga/checkable-check.cpp index 235f3a9dd..a89fa1463 100644 --- a/lib/icinga/checkable-check.cpp +++ b/lib/icinga/checkable-check.cpp @@ -315,14 +315,11 @@ void Checkable::ProcessCheckResult(const CheckResult::Ptr& cr, const MessageOrig olock.Lock(); SetLastCheckResult(cr); - bool was_flapping, is_flapping; + bool was_flapping = IsFlapping(); - was_flapping = IsFlapping(); + UpdateFlappingStatus(old_state != cr->GetState()); - if (GetStateType() == StateTypeHard) - UpdateFlappingStatus(stateChange); - - is_flapping = IsFlapping(); + bool is_flapping = IsFlapping(); if (cr->GetActive()) { UpdateNextCheck(origin); @@ -368,13 +365,13 @@ void Checkable::ProcessCheckResult(const CheckResult::Ptr& cr, const MessageOrig ExecuteEventHandler(); /* Flapping start/end notifications */ - if (send_notification && !was_flapping && is_flapping) { + if (!in_downtime && !was_flapping && is_flapping) { /* FlappingStart notifications happen on state changes, not in downtimes */ if (!IsPaused()) OnNotificationsRequested(this, NotificationFlappingStart, cr, "", "", MessageOrigin::Ptr()); Log(LogNotice, "Checkable") - << "Flapping: Checkable '" << GetName() << "' started flapping (" << GetFlappingThreshold() << "% < " << GetFlappingCurrent() << "%)."; + << "Flapping: Checkable '" << GetName() << "' started flapping (Current flapping value " << GetFlappingCurrent() << "% > threshold " << GetFlappingThresholdHigh() << "%)."; NotifyFlapping(origin); } else if (!in_downtime && was_flapping && !is_flapping) { @@ -383,7 +380,7 @@ void Checkable::ProcessCheckResult(const CheckResult::Ptr& cr, const MessageOrig OnNotificationsRequested(this, NotificationFlappingEnd, cr, "", "", MessageOrigin::Ptr()); Log(LogNotice, "Checkable") - << "Flapping: Checkable '" << GetName() << "' stopped flapping (" << GetFlappingThreshold() << "% >= " << GetFlappingCurrent() << "%)."; + << "Flapping: Checkable '" << GetName() << "' stopped flapping (Current flapping value " << GetFlappingCurrent() << "% < threshold " << GetFlappingThresholdLow() << "%)."; NotifyFlapping(origin); } diff --git a/lib/icinga/checkable-flapping.cpp b/lib/icinga/checkable-flapping.cpp index 3a17772a0..af5ced050 100644 --- a/lib/icinga/checkable-flapping.cpp +++ b/lib/icinga/checkable-flapping.cpp @@ -17,58 +17,43 @@ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. * ******************************************************************************/ +#include #include "icinga/checkable.hpp" #include "icinga/icingaapplication.hpp" #include "base/utility.hpp" using namespace icinga; -#define FLAPPING_INTERVAL (30 * 60) - -double Checkable::GetFlappingCurrent(void) const -{ - if (GetFlappingPositive() + GetFlappingNegative() <= 0) - return 0; - - return 100 * GetFlappingPositive() / (GetFlappingPositive() + GetFlappingNegative()); -} - void Checkable::UpdateFlappingStatus(bool stateChange) { - double ts, now; - long positive, negative; - - now = Utility::GetTime(); + std::bitset<20> stateChangeBuf = GetFlappingBuffer(); + int oldestIndex = (GetFlappingBuffer() & 0xFF00000) >> 20; - ts = GetFlappingLastChange(); - positive = GetFlappingPositive(); - negative = GetFlappingNegative(); + stateChangeBuf[oldestIndex] = stateChange; + oldestIndex = (oldestIndex + 1) % 20; - double diff = now - ts; + double stateChanges = 0; - if (positive + negative > FLAPPING_INTERVAL) { - double pct = (positive + negative - FLAPPING_INTERVAL) / FLAPPING_INTERVAL; - positive -= pct * positive; - negative -= pct * negative; + for (int i = 0; i < 20; i++) { + if (stateChangeBuf[(oldestIndex + i) % 20]) + stateChanges += 0.8 + (0.02 * i); } - if (stateChange) - positive += diff; - else - negative += diff; + double flappingValue = 100.0 * stateChanges / 20.0; - if (positive < 0) - positive = 0; + bool flapping; - if (negative < 0) - negative = 0; + if (GetFlapping()) + flapping = flappingValue > GetFlappingThresholdLow(); + else + flapping = flappingValue > GetFlappingThresholdHigh(); -// Log(LogDebug, "Checkable") -// << "Flapping counter for '" << GetName() << "' is positive=" << positive << ", negative=" << negative; + if (flapping != GetFlapping()) + SetFlappingLastChange(Utility::GetTime()); - SetFlappingLastChange(now); - SetFlappingPositive(positive); - SetFlappingNegative(negative); + SetFlappingBuffer((stateChangeBuf.to_ulong() | (oldestIndex << 20))); + SetFlappingCurrent(flappingValue); + SetFlapping(flapping); } bool Checkable::IsFlapping(void) const @@ -76,5 +61,5 @@ bool Checkable::IsFlapping(void) const if (!GetEnableFlapping() || !IcingaApplication::GetInstance()->GetEnableFlapping()) return false; else - return GetFlappingCurrent() > GetFlappingThreshold(); + return GetFlapping(); } diff --git a/lib/icinga/checkable.hpp b/lib/icinga/checkable.hpp index 03fd5ae14..0617eca12 100644 --- a/lib/icinga/checkable.hpp +++ b/lib/icinga/checkable.hpp @@ -180,8 +180,6 @@ public: intrusive_ptr GetEventCommand(void) const; /* Flapping Detection */ - double GetFlappingCurrent(void) const; - bool IsFlapping(void) const; void UpdateFlappingStatus(bool stateChange); diff --git a/lib/icinga/checkable.ti b/lib/icinga/checkable.ti index f90a85dbe..ced78b8e9 100644 --- a/lib/icinga/checkable.ti +++ b/lib/icinga/checkable.ti @@ -70,9 +70,7 @@ abstract class Checkable : CustomVarObject }}} }; [config] bool volatile; - [config] double flapping_threshold { - default {{{ return 30; }}} - }; + [config] bool enable_active_checks { default {{{ return true; }}} }; @@ -92,6 +90,16 @@ abstract class Checkable : CustomVarObject default {{{ return true; }}} }; + [config, deprecated] double flapping_threshold; + + [config] double flapping_threshold_low { + default {{{ return 25; }}} + }; + + [config] double flapping_threshold_high{ + default {{{ return 30; }}} + }; + [config] String notes; [config] String notes_url; [config] String action_url; @@ -139,12 +147,6 @@ abstract class Checkable : CustomVarObject }; [state] Timestamp acknowledgement_expiry; [state] bool force_next_notification; - [state] int flapping_positive; - [state] int flapping_negative; - [state] Timestamp flapping_last_change; - [no_storage, protected] bool flapping { - get {{{ return false; }}} - }; [no_storage] Timestamp last_check { get; }; @@ -152,6 +154,13 @@ abstract class Checkable : CustomVarObject get; }; + [state] double flapping_current { + default {{{ return 0; }}} + }; + [state] Timestamp flapping_last_change; + [state, no_user_view, no_user_modify] int flapping_buffer; + [state, protected] bool flapping; + [config, navigation] name(Endpoint) command_endpoint (CommandEndpointRaw) { navigate {{{ return Endpoint::GetByName(GetCommandEndpointRaw()); diff --git a/lib/icinga/compatutility.cpp b/lib/icinga/compatutility.cpp index b33674c6b..b66f4b7b6 100644 --- a/lib/icinga/compatutility.cpp +++ b/lib/icinga/compatutility.cpp @@ -300,12 +300,12 @@ int CompatUtility::GetCheckableIsVolatile(const Checkable::Ptr& checkable) double CompatUtility::GetCheckableLowFlapThreshold(const Checkable::Ptr& checkable) { - return checkable->GetFlappingThreshold(); + return checkable->GetFlappingThresholdLow(); } double CompatUtility::GetCheckableHighFlapThreshold(const Checkable::Ptr& checkable) { - return checkable->GetFlappingThreshold(); + return checkable->GetFlappingThresholdHigh(); } int CompatUtility::GetCheckableFreshnessChecksEnabled(const Checkable::Ptr& checkable) diff --git a/test/CMakeLists.txt b/test/CMakeLists.txt index 85b8b59cb..2a234419d 100644 --- a/test/CMakeLists.txt +++ b/test/CMakeLists.txt @@ -99,10 +99,10 @@ add_boost_test(base icinga_checkresult/service_1attempt icinga_checkresult/service_2attempts icinga_checkresult/service_3attempts - icinga_checkresult/host_flapping_notification - icinga_checkresult/service_flapping_notification - icinga_notification/state_filter - icinga_notification/type_filter + icinga_checkresult/host_flapping_notification + icinga_checkresult/service_flapping_notification + icinga_notification/state_filter + icinga_notification/type_filter icinga_macros/simple icinga_perfdata/empty icinga_perfdata/simple @@ -136,3 +136,17 @@ if(ICINGA2_WITH_LIVESTATUS) TESTS livestatus/hosts livestatus/services ) endif() + +set(icinga_checkable_test_SOURCES + icinga-checkable-flapping.cpp +) + +add_boost_test(icinga_checkable + SOURCES icinga-checkable-test.cpp ${icinga_checkable_test_SOURCES} + LIBRARIES base config icinga cli + TESTS icinga_checkable_flapping/host_not_flapping + icinga_checkable_flapping/host_flapping + icinga_checkable_flapping/host_flapping_recover + icinga_checkable_flapping/host_flapping_docs_example +) + diff --git a/test/icinga-checkable-flapping.cpp b/test/icinga-checkable-flapping.cpp new file mode 100644 index 000000000..1618fd3eb --- /dev/null +++ b/test/icinga-checkable-flapping.cpp @@ -0,0 +1,260 @@ +/****************************************************************************** + * Icinga 2 * + * Copyright (C) 2012-2016 Icinga Development Team (https://www.icinga.org/) * + * * + * This program is free software; you can redistribute it and/or * + * modify it under the terms of the GNU General Public License * + * as published by the Free Software Foundation; either version 2 * + * of the License, or (at your option) any later version. * + * * + * This program is distributed in the hope that it will be useful, * + * but WITHOUT ANY WARRANTY; without even the implied warranty of * + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * + * GNU General Public License for more details. * + * * + * You should have received a copy of the GNU General Public License * + * along with this program; if not, write to the Free Software Foundation * + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. * + ******************************************************************************/ + +#include +#include +#include "icinga/host.hpp" +#include + +using namespace icinga; + +#ifdef I2_DEBUG +static CheckResult::Ptr MakeCheckResult(ServiceState state) +{ + CheckResult::Ptr cr = new CheckResult(); + + cr->SetState(state); + + double now = Utility::GetTime(); + cr->SetScheduleStart(now); + cr->SetScheduleEnd(now); + cr->SetExecutionStart(now); + cr->SetExecutionEnd(now); + + Utility::IncrementTime(60); + + return cr; +} + +static void LogFlapping(const Checkable::Ptr& obj) +{ + std::bitset<20> stateChangeBuf = obj->GetFlappingBuffer(); + int oldestIndex = (obj->GetFlappingBuffer() & 0xFF00000) >> 20; + + std::cout << "Flapping: " << obj->IsFlapping() << "\nHT: " << obj->GetFlappingThresholdHigh() << " LT: " << obj->GetFlappingThresholdLow() + << "\nOur value: " << obj->GetFlappingCurrent() << "\nPtr: " << oldestIndex << " Buf: " << stateChangeBuf << '\n'; +} + + +static void LogHostStatus(const Host::Ptr &host) +{ + std::cout << "Current status: state: " << host->GetState() << " state_type: " << host->GetStateType() + << " check attempt: " << host->GetCheckAttempt() << "/" << host->GetMaxCheckAttempts() << std::endl; +} +#endif /* I2_DEBUG */ + +BOOST_AUTO_TEST_SUITE(icinga_checkable_flapping) + +BOOST_AUTO_TEST_CASE(host_not_flapping) +{ +#ifndef I2_DEBUG + BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!"); +#else /* I2_DEBUG */ + std::cout << "Running test with a non-flapping host...\n"; + + Host::Ptr host = new Host(); + host->SetName("test"); + host->SetEnableFlapping(true); + host->SetMaxCheckAttempts(5); + + // Host otherwise is soft down + host->SetState(HostUp); + host->SetStateType(StateTypeHard); + + Utility::SetTime(0); + + BOOST_CHECK(host->GetFlappingCurrent() == 0); + + LogFlapping(host); + LogHostStatus(host); + + // watch the state being stable + int i = 0; + while (i++ < 10) { + // For some reason, elusive to me, the first check is a state change + host->ProcessCheckResult(MakeCheckResult(ServiceOK)); + + LogFlapping(host); + LogHostStatus(host); + + BOOST_CHECK(host->GetState() == 0); + BOOST_CHECK(host->GetCheckAttempt() == 1); + BOOST_CHECK(host->GetStateType() == StateTypeHard); + + //Should not be flapping + BOOST_CHECK(!host->IsFlapping()); + BOOST_CHECK(host->GetFlappingCurrent() < 30.0); + } +#endif /* I2_DEBUG */ +} + +BOOST_AUTO_TEST_CASE(host_flapping) +{ +#ifndef I2_DEBUG + BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!"); +#else /* I2_DEBUG */ + std::cout << "Running test with host changing state with every check...\n"; + + Host::Ptr host = new Host(); + host->SetName("test"); + host->SetEnableFlapping(true); + host->SetMaxCheckAttempts(5); + + Utility::SetTime(0); + + int i = 0; + while (i++ < 25) { + if (i % 2) + host->ProcessCheckResult(MakeCheckResult(ServiceOK)); + else + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + + LogFlapping(host); + LogHostStatus(host); + + //30 Percent is our high Threshold + if (i >= 6) { + BOOST_CHECK(host->IsFlapping()); + } else { + BOOST_CHECK(!host->IsFlapping()); + } + } +#endif /* I2_DEBUG */ +} + +BOOST_AUTO_TEST_CASE(host_flapping_recover) +{ +#ifndef I2_DEBUG + BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!"); +#else /* I2_DEBUG */ + std::cout << "Running test with flapping recovery...\n"; + + Host::Ptr host = new Host(); + host->SetName("test"); + host->SetEnableFlapping(true); + host->SetMaxCheckAttempts(5); + + // Host otherwise is soft down + host->SetState(HostUp); + host->SetStateType(StateTypeHard); + + Utility::SetTime(0); + + // A few warning + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + + LogFlapping(host); + LogHostStatus(host); + for (int i = 0; i <= 7; i++) { + if (i % 2) + host->ProcessCheckResult(MakeCheckResult(ServiceOK)); + else + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + } + + LogFlapping(host); + LogHostStatus(host); + + // We should be flapping now + BOOST_CHECK(host->GetFlappingCurrent() > 30.0); + BOOST_CHECK(host->IsFlapping()); + + // Now recover from flapping + int count = 0; + while (host->IsFlapping()) { + BOOST_CHECK(host->GetFlappingCurrent() > 25.0); + BOOST_CHECK(host->IsFlapping()); + + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + LogFlapping(host); + LogHostStatus(host); + count++; + } + + std::cout << "Recovered from flapping after " << count << " Warning results.\n"; + + BOOST_CHECK(host->GetFlappingCurrent() < 25.0); + BOOST_CHECK(!host->IsFlapping()); +#endif /* I2_DEBUG */ +} + +BOOST_AUTO_TEST_CASE(host_flapping_docs_example) +{ +#ifndef I2_DEBUG + BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!"); +#else /* I2_DEBUG */ + std::cout << "Simulating the documentation example...\n"; + + Host::Ptr host = new Host(); + host->SetName("test"); + host->SetEnableFlapping(true); + host->SetMaxCheckAttempts(5); + + // Host otherwise is soft down + host->SetState(HostUp); + host->SetStateType(StateTypeHard); + + Utility::SetTime(0); + + host->ProcessCheckResult(MakeCheckResult(ServiceCritical)); + host->ProcessCheckResult(MakeCheckResult(ServiceCritical)); + host->ProcessCheckResult(MakeCheckResult(ServiceCritical)); + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + host->ProcessCheckResult(MakeCheckResult(ServiceCritical)); + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + host->ProcessCheckResult(MakeCheckResult(ServiceOK)); + host->ProcessCheckResult(MakeCheckResult(ServiceOK)); + host->ProcessCheckResult(MakeCheckResult(ServiceOK)); + host->ProcessCheckResult(MakeCheckResult(ServiceOK)); + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + host->ProcessCheckResult(MakeCheckResult(ServiceOK)); + host->ProcessCheckResult(MakeCheckResult(ServiceOK)); + host->ProcessCheckResult(MakeCheckResult(ServiceOK)); + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + host->ProcessCheckResult(MakeCheckResult(ServiceWarning)); + host->ProcessCheckResult(MakeCheckResult(ServiceCritical)); + + LogFlapping(host); + LogHostStatus(host); + BOOST_CHECK(host->GetFlappingCurrent() == 39.1); + BOOST_CHECK(host->IsFlapping()); + + host->ProcessCheckResult(MakeCheckResult(ServiceCritical)); + host->ProcessCheckResult(MakeCheckResult(ServiceCritical)); + host->ProcessCheckResult(MakeCheckResult(ServiceCritical)); + host->ProcessCheckResult(MakeCheckResult(ServiceCritical)); + host->ProcessCheckResult(MakeCheckResult(ServiceCritical)); + host->ProcessCheckResult(MakeCheckResult(ServiceCritical)); + host->ProcessCheckResult(MakeCheckResult(ServiceCritical)); + + LogFlapping(host); + LogHostStatus(host); + BOOST_CHECK(host->GetFlappingCurrent() < 25.0); + BOOST_CHECK(!host->IsFlapping()); +#endif +} + +BOOST_AUTO_TEST_SUITE_END() diff --git a/test/icinga-checkable-test.cpp b/test/icinga-checkable-test.cpp new file mode 100644 index 000000000..dee1adc8a --- /dev/null +++ b/test/icinga-checkable-test.cpp @@ -0,0 +1,66 @@ +/****************************************************************************** + * Icinga 2 * + * Copyright (C) 2012-2016 Icinga Development Team (https://www.icinga.org/) * + * * + * This program is free software; you can redistribute it and/or * + * modify it under the terms of the GNU General Public License * + * as published by the Free Software Foundation; either version 2 * + * of the License, or (at your option) any later version. * + * * + * This program is distributed in the hope that it will be useful, * + * but WITHOUT ANY WARRANTY; without even the implied warranty of * + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * + * GNU General Public License for more details. * + * * + * You should have received a copy of the GNU General Public License * + * along with this program; if not, write to the Free Software Foundation * + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. * + ******************************************************************************/ + +#define BOOST_TEST_MAIN +#define BOOST_TEST_MODULE icinga2_test + +#include "cli/daemonutility.hpp" +#include "base/application.hpp" +#include "base/loader.hpp" +#include +#include + +using namespace icinga; + +struct IcingaCheckableFixture +{ + IcingaCheckableFixture(void) + { + BOOST_TEST_MESSAGE("setup running Icinga 2 core"); + + Application::InitializeBase(); + + /* start the Icinga application and load the configuration */ + Application::DeclareSysconfDir("etc"); + Application::DeclareLocalStateDir("var"); + + ActivationScope ascope; + + Loader::LoadExtensionLibrary("icinga"); + Loader::LoadExtensionLibrary("methods"); //loaded by ITL + + std::vector configs; + std::vector newItems; + + DaemonUtility::LoadConfigFiles(configs, newItems, "icinga2.debug", "icinga2.vars"); + + /* ignore config errors */ + WorkQueue upq; + ConfigItem::ActivateItems(upq, newItems); + } + + ~IcingaCheckableFixture(void) + { + BOOST_TEST_MESSAGE("cleanup Icinga 2 core"); + Application::UninitializeBase(); + } +}; + +BOOST_GLOBAL_FIXTURE(IcingaCheckableFixture); + diff --git a/test/icinga-checkresult.cpp b/test/icinga-checkresult.cpp index a128e9519..f7a2387ac 100644 --- a/test/icinga-checkresult.cpp +++ b/test/icinga-checkresult.cpp @@ -395,11 +395,9 @@ BOOST_AUTO_TEST_CASE(host_flapping_notification) #else /* I2_DEBUG */ boost::signals2::connection c = Checkable::OnNotificationsRequested.connect(boost::bind(&NotificationHandler, _1, _2)); - int softStateCount = 20; int timeStepInterval = 60; Host::Ptr host = new Host(); - host->SetMaxCheckAttempts(softStateCount); host->Activate(); host->SetAuthority(true); host->SetStateRaw(ServiceOK); @@ -418,18 +416,25 @@ BOOST_AUTO_TEST_CASE(host_flapping_notification) std::cout << "Inserting flapping check results" << std::endl; - for (int i = 0; i < softStateCount; i++) { + for (int i = 0; i < 10; i++) { ServiceState state = (i % 2 == 0 ? ServiceOK : ServiceCritical); host->ProcessCheckResult(MakeCheckResult(state)); Utility::IncrementTime(timeStepInterval); } - std::cout << "Checking host state (must be flapping in SOFT state)" << std::endl; - BOOST_CHECK(host->GetStateType() == StateTypeSoft); BOOST_CHECK(host->IsFlapping() == true); - std::cout << "No FlappingStart notification type must have been triggered in a SOFT state" << std::endl; - CheckNotification(host, false, NotificationFlappingStart); + CheckNotification(host, true, NotificationFlappingStart); + + std::cout << "Now calm down..." << std::endl; + + for (int i = 0; i < 20; i++) { + host->ProcessCheckResult(MakeCheckResult(ServiceOK)); + Utility::IncrementTime(timeStepInterval); + } + + CheckNotification(host, true, NotificationFlappingEnd); + c.disconnect(); @@ -443,11 +448,9 @@ BOOST_AUTO_TEST_CASE(service_flapping_notification) #else /* I2_DEBUG */ boost::signals2::connection c = Checkable::OnNotificationsRequested.connect(boost::bind(&NotificationHandler, _1, _2)); - int softStateCount = 20; int timeStepInterval = 60; Host::Ptr service = new Host(); - service->SetMaxCheckAttempts(softStateCount); service->Activate(); service->SetAuthority(true); service->SetStateRaw(ServiceOK); @@ -466,18 +469,24 @@ BOOST_AUTO_TEST_CASE(service_flapping_notification) std::cout << "Inserting flapping check results" << std::endl; - for (int i = 0; i < softStateCount; i++) { + for (int i = 0; i < 10; i++) { ServiceState state = (i % 2 == 0 ? ServiceOK : ServiceCritical); service->ProcessCheckResult(MakeCheckResult(state)); Utility::IncrementTime(timeStepInterval); } - std::cout << "Checking service state (must be flapping in SOFT state)" << std::endl; - BOOST_CHECK(service->GetStateType() == StateTypeSoft); BOOST_CHECK(service->IsFlapping() == true); - std::cout << "No FlappingStart notification type must have been triggered in a SOFT state" << std::endl; - CheckNotification(service, false, NotificationFlappingStart); + CheckNotification(service, true, NotificationFlappingStart); + + std::cout << "Now calm down..." << std::endl; + + for (int i = 0; i < 20; i++) { + service->ProcessCheckResult(MakeCheckResult(ServiceOK)); + Utility::IncrementTime(timeStepInterval); + } + + CheckNotification(service, true, NotificationFlappingEnd); c.disconnect();