Icinga 2 supports optional detection of hosts and services that are "flapping".
-Flapping occurs when a service or host changes state too frequently, resulting
-in a storm of problem and recovery notifications. Flapping can be the source of
-configuration problems (i.e. thresholds set too low), troublesome services,
-or real network problems.
+Flapping occurs when a service or host changes state too frequently, which would result in a storm of problem and
+recovery notifications. With flapping enabled a flapping notification will be sent while other notifications are
+suppresed until it calms down after receiving the same status from checks a few times. flapping can help detecting
+configuration problems (wrong thresholds), troublesome services, or network problems.
Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
-The `flapping_threshold` attributes allows to specify the percentage of state changes
-when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to flap.
+The `flapping_threshold_high` and `flapping_threshold_low` attributes allows to specify the thresholds that control
+when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to be flapping.
-Note: There are known issues with flapping detection. Please refrain from enabling
-flapping until [#4982](https://github.com/Icinga/icinga2/issues/4982) is fixed.
+The default thresholds are 30% for high and 25% for low. If the computed flapping value excedes the high threshold a
+host or service is considered flapping until it drops below the low flapping threshold.
-## Volatile Services <a id="volatile-services"></a>
+`FlappingStart` and `FlappingEnd` notifications will be sent out accordingly, if configured. See the chapter on
+[notifications](alert-notifications) for details
+
+> Note: There is no distinctions between hard and soft states with flapping. All state changes count and notifications
+> will be sent out regardless of the objects state.
+
+### How it works <a id="how-it-works"></a>
+
+Icinga 2 saves the last 20 state changes for every host and service. See the graphic below:
+
+![Icinga 2 Flapping State Timeline](images/advanced-topics/flapping-state-graph.png)
+
+All the states ware weighted, with the most recent one being worth the most (1.15) and the 20th the least (0.8). The
+states inbetween are fairly distributed. The final flapping value are the weightened state changes divided by the total
+count of 20.
+
+In the example above, the added states would have a total value of 7.82 (`0.84 + 0.86 + 0.88 + 0.9 + 0.98 + 1.06 + 1.12 + 1.18`).
+This yiels a flapping percentage of 39.1% (`7.82 / 20 * 100`). As the default upper flapping threshold is 30%, it would be
+considered flapping.
+
+If the next seven check results then would not be state changes, the flapping percentage would fall below the lower threshold
+of 25% and therefore the host or service would recover from flapping.
+
+# Volatile Services <a id="volatile-services"></a>
By default all services remain in a non-volatile state. When a problem
occurs, the `SOFT` state applies and once `max_check_attempts` attribute
enable\_flapping | Boolean | **Optional.** Whether flap detection is enabled. Defaults to false.
enable\_perfdata | Boolean | **Optional.** Whether performance data processing is enabled. Defaults to true.
event\_command | Object name | **Optional.** The name of an event command that should be executed every time the host's state changes or the host is in a `SOFT` state.
- flapping\_threshold | Number | **Optional.** The flapping threshold in percent when a host is considered to be flapping.
+ flapping\_threshold\_high | Number | **Optional.** Flapping upper bound in percent for a host to be considered flapping. Default `30.0`
+ flapping\_threshold\_low | Number | **Optional.** Flapping lower bound in percent for a host to be considered not flapping. Default `25.0`
volatile | Boolean | **Optional.** The volatile setting enables always `HARD` state types if `NOT-OK` state changes occur. Defaults to false.
zone | Object name | **Optional.** The zone this object is a member of. Please read the [distributed monitoring](06-distributed-monitoring.md#distributed-monitoring) chapter for details.
command\_endpoint | Object name | **Optional.** The endpoint where commands are executed on.
downtime\_depth | Number | Whether the host has one or more active downtimes.
flapping\_last\_change | Timestamp | When the last flapping change occurred (as a UNIX timestamp).
flapping | Boolean | Whether the host is flapping between states.
+ flapping\_current | Number | Current flapping value in percent (see flapping\_thresholds)
state | Number | The current state (0 = UP, 1 = DOWN).
last\_state | Number | The previous state (0 = UP, 1 = DOWN).
last\_hard\_state | Number | The last hard state (0 = UP, 1 = DOWN).
enable\_passive\_checks | Boolean | **Optional.** Whether passive checks are enabled. Defaults to `true`.
enable\_event\_handler | Boolean | **Optional.** Enables event handlers for this host. Defaults to `true`.
enable\_flapping | Boolean | **Optional.** Whether flap detection is enabled. Defaults to `false`.
+ flapping\_threshold\_high | Number | **Optional.** Flapping upper bound in percent for a service to be considered flapping. `30.0`
+ flapping\_threshold\_low | Number | **Optional.** Flapping lower bound in percent for a service to be considered not flapping. `25.0`
enable\_perfdata | Boolean | **Optional.** Whether performance data processing is enabled. Defaults to `true`.
event\_command | Object name | **Optional.** The name of an event command that should be executed every time the service's state changes or the service is in a `SOFT` state.
- flapping\_threshold | Number | **Optional.** The flapping threshold in percent when a service is considered to be flapping.
volatile | Boolean | **Optional.** The volatile setting enables always `HARD` state types if `NOT-OK` state changes occur. Defaults to `false`.
zone | Object name | **Optional.** The zone this object is a member of. Please read the [distributed monitoring](06-distributed-monitoring.md#distributed-monitoring) chapter for details.
name | String | **Required.** The service name. Must be unique on a per-host basis. For advanced usage in [apply rules](03-monitoring-basics.md#using-apply) only.
acknowledgement\_expiry | Timestamp | When the acknowledgement expires (as a UNIX timestamp; 0 = no expiry).
downtime\_depth | Number | Whether the service has one or more active downtimes.
flapping\_last\_change | Timestamp | When the last flapping change occurred (as a UNIX timestamp).
+ flapping\_current | Number | Current flapping value in percent (see flapping\_thresholds)
flapping | Boolean | Whether the host is flapping between states.
state | Number | The current state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN).
last\_state | Number | The previous state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN).
String flapping_output;
if (checkable->IsFlapping()) {
- flapping_output = "Checkable appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)";
+ flapping_output = "Checkable appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThresholdHigh()) + "% threshold)";
flapping_state_str = "STARTED";
} else {
- flapping_output = "Checkable appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)";
+ flapping_output = "Checkable appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThresholdLow()) + "% threshold)";
flapping_state_str = "STOPPED";
}
fp << "\n";
fp << "\t" << "initial_state" "\t" "o" "\n"
- "\t" "low_flap_threshold" "\t" << host->GetFlappingThreshold() << "\n"
- "\t" "high_flap_threshold" "\t" << host->GetFlappingThreshold() << "\n"
+ "\t" "low_flap_threshold" "\t" << host->GetFlappingThresholdLow() << "\n"
+ "\t" "high_flap_threshold" "\t" << host->GetFlappingThresholdHigh() << "\n"
"\t" "process_perf_data" "\t" << CompatUtility::GetCheckableProcessPerformanceData(host) << "\n"
"\t" "check_freshness" "\t" "1" "\n";
String icon_image_alt = service->GetIconImageAlt();
fp << "\t" "initial_state" "\t" "o" "\n"
- "\t" "low_flap_threshold" "\t" << service->GetFlappingThreshold() << "\n"
- "\t" "high_flap_threshold" "\t" << service->GetFlappingThreshold() << "\n"
+ "\t" "low_flap_threshold" "\t" << service->GetFlappingThresholdLow() << "\n"
+ "\t" "high_flap_threshold" "\t" << service->GetFlappingThresholdHigh() << "\n"
"\t" "process_perf_data" "\t" << CompatUtility::GetCheckableProcessPerformanceData(service) << "\n"
"\t" "check_freshness" << "\t" "1" "\n";
if (!notes.IsEmpty())
String flapping_output;
if (checkable->IsFlapping()) {
- flapping_output = "Service appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)";
+ flapping_output = "Service appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThresholdHigh()) + "% threshold)";
flapping_state_str = "STARTED";
} else {
- flapping_output = "Service appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)";
+ flapping_output = "Service appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThresholdLow()) + "% threshold)";
flapping_state_str = "STOPPED";
}
fields1->Set("flapping_type", service ? 1 : 0);
fields1->Set("object_id", checkable);
fields1->Set("percent_state_change", checkable->GetFlappingCurrent());
- fields1->Set("low_threshold", checkable->GetFlappingThreshold());
- fields1->Set("high_threshold", checkable->GetFlappingThreshold());
+ fields1->Set("low_threshold", checkable->GetFlappingThresholdLow());
+ fields1->Set("high_threshold", checkable->GetFlappingThresholdHigh());
fields1->Set("instance_id", 0); /* DbConnection class fills in real ID */
fields1->Set("flapping_type", service ? 1 : 0);
fields1->Set("object_id", checkable);
fields1->Set("percent_state_change", checkable->GetFlappingCurrent());
- fields1->Set("low_threshold", checkable->GetFlappingThreshold());
- fields1->Set("high_threshold", checkable->GetFlappingThreshold());
+ fields1->Set("low_threshold", checkable->GetFlappingThresholdLow());
+ fields1->Set("high_threshold", checkable->GetFlappingThresholdHigh());
fields1->Set("instance_id", 0); /* DbConnection class fills in real ID */
olock.Lock();
SetLastCheckResult(cr);
- bool was_flapping, is_flapping;
+ bool was_flapping = IsFlapping();
- was_flapping = IsFlapping();
+ UpdateFlappingStatus(old_state != cr->GetState());
- if (GetStateType() == StateTypeHard)
- UpdateFlappingStatus(stateChange);
-
- is_flapping = IsFlapping();
+ bool is_flapping = IsFlapping();
if (cr->GetActive()) {
UpdateNextCheck(origin);
ExecuteEventHandler();
/* Flapping start/end notifications */
- if (send_notification && !was_flapping && is_flapping) {
+ if (!in_downtime && !was_flapping && is_flapping) {
/* FlappingStart notifications happen on state changes, not in downtimes */
if (!IsPaused())
OnNotificationsRequested(this, NotificationFlappingStart, cr, "", "", MessageOrigin::Ptr());
Log(LogNotice, "Checkable")
- << "Flapping: Checkable '" << GetName() << "' started flapping (" << GetFlappingThreshold() << "% < " << GetFlappingCurrent() << "%).";
+ << "Flapping: Checkable '" << GetName() << "' started flapping (Current flapping value " << GetFlappingCurrent() << "% > threshold " << GetFlappingThresholdHigh() << "%).";
NotifyFlapping(origin);
} else if (!in_downtime && was_flapping && !is_flapping) {
OnNotificationsRequested(this, NotificationFlappingEnd, cr, "", "", MessageOrigin::Ptr());
Log(LogNotice, "Checkable")
- << "Flapping: Checkable '" << GetName() << "' stopped flapping (" << GetFlappingThreshold() << "% >= " << GetFlappingCurrent() << "%).";
+ << "Flapping: Checkable '" << GetName() << "' stopped flapping (Current flapping value " << GetFlappingCurrent() << "% < threshold " << GetFlappingThresholdLow() << "%).";
NotifyFlapping(origin);
}
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. *
******************************************************************************/
+#include <bitset>
#include "icinga/checkable.hpp"
#include "icinga/icingaapplication.hpp"
#include "base/utility.hpp"
using namespace icinga;
-#define FLAPPING_INTERVAL (30 * 60)
-
-double Checkable::GetFlappingCurrent(void) const
-{
- if (GetFlappingPositive() + GetFlappingNegative() <= 0)
- return 0;
-
- return 100 * GetFlappingPositive() / (GetFlappingPositive() + GetFlappingNegative());
-}
-
void Checkable::UpdateFlappingStatus(bool stateChange)
{
- double ts, now;
- long positive, negative;
-
- now = Utility::GetTime();
+ std::bitset<20> stateChangeBuf = GetFlappingBuffer();
+ int oldestIndex = (GetFlappingBuffer() & 0xFF00000) >> 20;
- ts = GetFlappingLastChange();
- positive = GetFlappingPositive();
- negative = GetFlappingNegative();
+ stateChangeBuf[oldestIndex] = stateChange;
+ oldestIndex = (oldestIndex + 1) % 20;
- double diff = now - ts;
+ double stateChanges = 0;
- if (positive + negative > FLAPPING_INTERVAL) {
- double pct = (positive + negative - FLAPPING_INTERVAL) / FLAPPING_INTERVAL;
- positive -= pct * positive;
- negative -= pct * negative;
+ for (int i = 0; i < 20; i++) {
+ if (stateChangeBuf[(oldestIndex + i) % 20])
+ stateChanges += 0.8 + (0.02 * i);
}
- if (stateChange)
- positive += diff;
- else
- negative += diff;
+ double flappingValue = 100.0 * stateChanges / 20.0;
- if (positive < 0)
- positive = 0;
+ bool flapping;
- if (negative < 0)
- negative = 0;
+ if (GetFlapping())
+ flapping = flappingValue > GetFlappingThresholdLow();
+ else
+ flapping = flappingValue > GetFlappingThresholdHigh();
-// Log(LogDebug, "Checkable")
-// << "Flapping counter for '" << GetName() << "' is positive=" << positive << ", negative=" << negative;
+ if (flapping != GetFlapping())
+ SetFlappingLastChange(Utility::GetTime());
- SetFlappingLastChange(now);
- SetFlappingPositive(positive);
- SetFlappingNegative(negative);
+ SetFlappingBuffer((stateChangeBuf.to_ulong() | (oldestIndex << 20)));
+ SetFlappingCurrent(flappingValue);
+ SetFlapping(flapping);
}
bool Checkable::IsFlapping(void) const
if (!GetEnableFlapping() || !IcingaApplication::GetInstance()->GetEnableFlapping())
return false;
else
- return GetFlappingCurrent() > GetFlappingThreshold();
+ return GetFlapping();
}
intrusive_ptr<EventCommand> GetEventCommand(void) const;
/* Flapping Detection */
- double GetFlappingCurrent(void) const;
-
bool IsFlapping(void) const;
void UpdateFlappingStatus(bool stateChange);
}}}
};
[config] bool volatile;
- [config] double flapping_threshold {
- default {{{ return 30; }}}
- };
+
[config] bool enable_active_checks {
default {{{ return true; }}}
};
default {{{ return true; }}}
};
+ [config, deprecated] double flapping_threshold;
+
+ [config] double flapping_threshold_low {
+ default {{{ return 25; }}}
+ };
+
+ [config] double flapping_threshold_high{
+ default {{{ return 30; }}}
+ };
+
[config] String notes;
[config] String notes_url;
[config] String action_url;
};
[state] Timestamp acknowledgement_expiry;
[state] bool force_next_notification;
- [state] int flapping_positive;
- [state] int flapping_negative;
- [state] Timestamp flapping_last_change;
- [no_storage, protected] bool flapping {
- get {{{ return false; }}}
- };
[no_storage] Timestamp last_check {
get;
};
get;
};
+ [state] double flapping_current {
+ default {{{ return 0; }}}
+ };
+ [state] Timestamp flapping_last_change;
+ [state, no_user_view, no_user_modify] int flapping_buffer;
+ [state, protected] bool flapping;
+
[config, navigation] name(Endpoint) command_endpoint (CommandEndpointRaw) {
navigate {{{
return Endpoint::GetByName(GetCommandEndpointRaw());
double CompatUtility::GetCheckableLowFlapThreshold(const Checkable::Ptr& checkable)
{
- return checkable->GetFlappingThreshold();
+ return checkable->GetFlappingThresholdLow();
}
double CompatUtility::GetCheckableHighFlapThreshold(const Checkable::Ptr& checkable)
{
- return checkable->GetFlappingThreshold();
+ return checkable->GetFlappingThresholdHigh();
}
int CompatUtility::GetCheckableFreshnessChecksEnabled(const Checkable::Ptr& checkable)
icinga_checkresult/service_1attempt
icinga_checkresult/service_2attempts
icinga_checkresult/service_3attempts
- icinga_checkresult/host_flapping_notification
- icinga_checkresult/service_flapping_notification
- icinga_notification/state_filter
- icinga_notification/type_filter
+ icinga_checkresult/host_flapping_notification
+ icinga_checkresult/service_flapping_notification
+ icinga_notification/state_filter
+ icinga_notification/type_filter
icinga_macros/simple
icinga_perfdata/empty
icinga_perfdata/simple
TESTS livestatus/hosts livestatus/services
)
endif()
+
+set(icinga_checkable_test_SOURCES
+ icinga-checkable-flapping.cpp
+)
+
+add_boost_test(icinga_checkable
+ SOURCES icinga-checkable-test.cpp ${icinga_checkable_test_SOURCES}
+ LIBRARIES base config icinga cli
+ TESTS icinga_checkable_flapping/host_not_flapping
+ icinga_checkable_flapping/host_flapping
+ icinga_checkable_flapping/host_flapping_recover
+ icinga_checkable_flapping/host_flapping_docs_example
+)
+
--- /dev/null
+/******************************************************************************
+ * Icinga 2 *
+ * Copyright (C) 2012-2016 Icinga Development Team (https://www.icinga.org/) *
+ * *
+ * This program is free software; you can redistribute it and/or *
+ * modify it under the terms of the GNU General Public License *
+ * as published by the Free Software Foundation; either version 2 *
+ * of the License, or (at your option) any later version. *
+ * *
+ * This program is distributed in the hope that it will be useful, *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
+ * GNU General Public License for more details. *
+ * *
+ * You should have received a copy of the GNU General Public License *
+ * along with this program; if not, write to the Free Software Foundation *
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. *
+ ******************************************************************************/
+
+#include <boost/test/unit_test.hpp>
+#include <bitset>
+#include "icinga/host.hpp"
+#include <iostream>
+
+using namespace icinga;
+
+#ifdef I2_DEBUG
+static CheckResult::Ptr MakeCheckResult(ServiceState state)
+{
+ CheckResult::Ptr cr = new CheckResult();
+
+ cr->SetState(state);
+
+ double now = Utility::GetTime();
+ cr->SetScheduleStart(now);
+ cr->SetScheduleEnd(now);
+ cr->SetExecutionStart(now);
+ cr->SetExecutionEnd(now);
+
+ Utility::IncrementTime(60);
+
+ return cr;
+}
+
+static void LogFlapping(const Checkable::Ptr& obj)
+{
+ std::bitset<20> stateChangeBuf = obj->GetFlappingBuffer();
+ int oldestIndex = (obj->GetFlappingBuffer() & 0xFF00000) >> 20;
+
+ std::cout << "Flapping: " << obj->IsFlapping() << "\nHT: " << obj->GetFlappingThresholdHigh() << " LT: " << obj->GetFlappingThresholdLow()
+ << "\nOur value: " << obj->GetFlappingCurrent() << "\nPtr: " << oldestIndex << " Buf: " << stateChangeBuf << '\n';
+}
+
+
+static void LogHostStatus(const Host::Ptr &host)
+{
+ std::cout << "Current status: state: " << host->GetState() << " state_type: " << host->GetStateType()
+ << " check attempt: " << host->GetCheckAttempt() << "/" << host->GetMaxCheckAttempts() << std::endl;
+}
+#endif /* I2_DEBUG */
+
+BOOST_AUTO_TEST_SUITE(icinga_checkable_flapping)
+
+BOOST_AUTO_TEST_CASE(host_not_flapping)
+{
+#ifndef I2_DEBUG
+ BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!");
+#else /* I2_DEBUG */
+ std::cout << "Running test with a non-flapping host...\n";
+
+ Host::Ptr host = new Host();
+ host->SetName("test");
+ host->SetEnableFlapping(true);
+ host->SetMaxCheckAttempts(5);
+
+ // Host otherwise is soft down
+ host->SetState(HostUp);
+ host->SetStateType(StateTypeHard);
+
+ Utility::SetTime(0);
+
+ BOOST_CHECK(host->GetFlappingCurrent() == 0);
+
+ LogFlapping(host);
+ LogHostStatus(host);
+
+ // watch the state being stable
+ int i = 0;
+ while (i++ < 10) {
+ // For some reason, elusive to me, the first check is a state change
+ host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+
+ LogFlapping(host);
+ LogHostStatus(host);
+
+ BOOST_CHECK(host->GetState() == 0);
+ BOOST_CHECK(host->GetCheckAttempt() == 1);
+ BOOST_CHECK(host->GetStateType() == StateTypeHard);
+
+ //Should not be flapping
+ BOOST_CHECK(!host->IsFlapping());
+ BOOST_CHECK(host->GetFlappingCurrent() < 30.0);
+ }
+#endif /* I2_DEBUG */
+}
+
+BOOST_AUTO_TEST_CASE(host_flapping)
+{
+#ifndef I2_DEBUG
+ BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!");
+#else /* I2_DEBUG */
+ std::cout << "Running test with host changing state with every check...\n";
+
+ Host::Ptr host = new Host();
+ host->SetName("test");
+ host->SetEnableFlapping(true);
+ host->SetMaxCheckAttempts(5);
+
+ Utility::SetTime(0);
+
+ int i = 0;
+ while (i++ < 25) {
+ if (i % 2)
+ host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+ else
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+
+ LogFlapping(host);
+ LogHostStatus(host);
+
+ //30 Percent is our high Threshold
+ if (i >= 6) {
+ BOOST_CHECK(host->IsFlapping());
+ } else {
+ BOOST_CHECK(!host->IsFlapping());
+ }
+ }
+#endif /* I2_DEBUG */
+}
+
+BOOST_AUTO_TEST_CASE(host_flapping_recover)
+{
+#ifndef I2_DEBUG
+ BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!");
+#else /* I2_DEBUG */
+ std::cout << "Running test with flapping recovery...\n";
+
+ Host::Ptr host = new Host();
+ host->SetName("test");
+ host->SetEnableFlapping(true);
+ host->SetMaxCheckAttempts(5);
+
+ // Host otherwise is soft down
+ host->SetState(HostUp);
+ host->SetStateType(StateTypeHard);
+
+ Utility::SetTime(0);
+
+ // A few warning
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+
+ LogFlapping(host);
+ LogHostStatus(host);
+ for (int i = 0; i <= 7; i++) {
+ if (i % 2)
+ host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+ else
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ }
+
+ LogFlapping(host);
+ LogHostStatus(host);
+
+ // We should be flapping now
+ BOOST_CHECK(host->GetFlappingCurrent() > 30.0);
+ BOOST_CHECK(host->IsFlapping());
+
+ // Now recover from flapping
+ int count = 0;
+ while (host->IsFlapping()) {
+ BOOST_CHECK(host->GetFlappingCurrent() > 25.0);
+ BOOST_CHECK(host->IsFlapping());
+
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ LogFlapping(host);
+ LogHostStatus(host);
+ count++;
+ }
+
+ std::cout << "Recovered from flapping after " << count << " Warning results.\n";
+
+ BOOST_CHECK(host->GetFlappingCurrent() < 25.0);
+ BOOST_CHECK(!host->IsFlapping());
+#endif /* I2_DEBUG */
+}
+
+BOOST_AUTO_TEST_CASE(host_flapping_docs_example)
+{
+#ifndef I2_DEBUG
+ BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!");
+#else /* I2_DEBUG */
+ std::cout << "Simulating the documentation example...\n";
+
+ Host::Ptr host = new Host();
+ host->SetName("test");
+ host->SetEnableFlapping(true);
+ host->SetMaxCheckAttempts(5);
+
+ // Host otherwise is soft down
+ host->SetState(HostUp);
+ host->SetStateType(StateTypeHard);
+
+ Utility::SetTime(0);
+
+ host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+ host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+ host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+ host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+ host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+ host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+ host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+ host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+ host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+
+ LogFlapping(host);
+ LogHostStatus(host);
+ BOOST_CHECK(host->GetFlappingCurrent() == 39.1);
+ BOOST_CHECK(host->IsFlapping());
+
+ host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+ host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+ host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+ host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+ host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+ host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+ host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+
+ LogFlapping(host);
+ LogHostStatus(host);
+ BOOST_CHECK(host->GetFlappingCurrent() < 25.0);
+ BOOST_CHECK(!host->IsFlapping());
+#endif
+}
+
+BOOST_AUTO_TEST_SUITE_END()
--- /dev/null
+/******************************************************************************
+ * Icinga 2 *
+ * Copyright (C) 2012-2016 Icinga Development Team (https://www.icinga.org/) *
+ * *
+ * This program is free software; you can redistribute it and/or *
+ * modify it under the terms of the GNU General Public License *
+ * as published by the Free Software Foundation; either version 2 *
+ * of the License, or (at your option) any later version. *
+ * *
+ * This program is distributed in the hope that it will be useful, *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
+ * GNU General Public License for more details. *
+ * *
+ * You should have received a copy of the GNU General Public License *
+ * along with this program; if not, write to the Free Software Foundation *
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. *
+ ******************************************************************************/
+
+#define BOOST_TEST_MAIN
+#define BOOST_TEST_MODULE icinga2_test
+
+#include "cli/daemonutility.hpp"
+#include "base/application.hpp"
+#include "base/loader.hpp"
+#include <BoostTestTargetConfig.h>
+#include <fstream>
+
+using namespace icinga;
+
+struct IcingaCheckableFixture
+{
+ IcingaCheckableFixture(void)
+ {
+ BOOST_TEST_MESSAGE("setup running Icinga 2 core");
+
+ Application::InitializeBase();
+
+ /* start the Icinga application and load the configuration */
+ Application::DeclareSysconfDir("etc");
+ Application::DeclareLocalStateDir("var");
+
+ ActivationScope ascope;
+
+ Loader::LoadExtensionLibrary("icinga");
+ Loader::LoadExtensionLibrary("methods"); //loaded by ITL
+
+ std::vector<std::string> configs;
+ std::vector<ConfigItem::Ptr> newItems;
+
+ DaemonUtility::LoadConfigFiles(configs, newItems, "icinga2.debug", "icinga2.vars");
+
+ /* ignore config errors */
+ WorkQueue upq;
+ ConfigItem::ActivateItems(upq, newItems);
+ }
+
+ ~IcingaCheckableFixture(void)
+ {
+ BOOST_TEST_MESSAGE("cleanup Icinga 2 core");
+ Application::UninitializeBase();
+ }
+};
+
+BOOST_GLOBAL_FIXTURE(IcingaCheckableFixture);
+
#else /* I2_DEBUG */
boost::signals2::connection c = Checkable::OnNotificationsRequested.connect(boost::bind(&NotificationHandler, _1, _2));
- int softStateCount = 20;
int timeStepInterval = 60;
Host::Ptr host = new Host();
- host->SetMaxCheckAttempts(softStateCount);
host->Activate();
host->SetAuthority(true);
host->SetStateRaw(ServiceOK);
std::cout << "Inserting flapping check results" << std::endl;
- for (int i = 0; i < softStateCount; i++) {
+ for (int i = 0; i < 10; i++) {
ServiceState state = (i % 2 == 0 ? ServiceOK : ServiceCritical);
host->ProcessCheckResult(MakeCheckResult(state));
Utility::IncrementTime(timeStepInterval);
}
- std::cout << "Checking host state (must be flapping in SOFT state)" << std::endl;
- BOOST_CHECK(host->GetStateType() == StateTypeSoft);
BOOST_CHECK(host->IsFlapping() == true);
- std::cout << "No FlappingStart notification type must have been triggered in a SOFT state" << std::endl;
- CheckNotification(host, false, NotificationFlappingStart);
+ CheckNotification(host, true, NotificationFlappingStart);
+
+ std::cout << "Now calm down..." << std::endl;
+
+ for (int i = 0; i < 20; i++) {
+ host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+ Utility::IncrementTime(timeStepInterval);
+ }
+
+ CheckNotification(host, true, NotificationFlappingEnd);
+
c.disconnect();
#else /* I2_DEBUG */
boost::signals2::connection c = Checkable::OnNotificationsRequested.connect(boost::bind(&NotificationHandler, _1, _2));
- int softStateCount = 20;
int timeStepInterval = 60;
Host::Ptr service = new Host();
- service->SetMaxCheckAttempts(softStateCount);
service->Activate();
service->SetAuthority(true);
service->SetStateRaw(ServiceOK);
std::cout << "Inserting flapping check results" << std::endl;
- for (int i = 0; i < softStateCount; i++) {
+ for (int i = 0; i < 10; i++) {
ServiceState state = (i % 2 == 0 ? ServiceOK : ServiceCritical);
service->ProcessCheckResult(MakeCheckResult(state));
Utility::IncrementTime(timeStepInterval);
}
- std::cout << "Checking service state (must be flapping in SOFT state)" << std::endl;
- BOOST_CHECK(service->GetStateType() == StateTypeSoft);
BOOST_CHECK(service->IsFlapping() == true);
- std::cout << "No FlappingStart notification type must have been triggered in a SOFT state" << std::endl;
- CheckNotification(service, false, NotificationFlappingStart);
+ CheckNotification(service, true, NotificationFlappingStart);
+
+ std::cout << "Now calm down..." << std::endl;
+
+ for (int i = 0; i < 20; i++) {
+ service->ProcessCheckResult(MakeCheckResult(ServiceOK));
+ Utility::IncrementTime(timeStepInterval);
+ }
+
+ CheckNotification(service, true, NotificationFlappingEnd);
c.disconnect();