]> granicus.if.org Git - icinga2/commitdiff
Fix flapping
authorJean Flach <jean-marcel.flach@icinga.com>
Thu, 19 Oct 2017 15:32:52 +0000 (17:32 +0200)
committerJean Flach <jean-marcel.flach@icinga.com>
Tue, 24 Oct 2017 13:54:05 +0000 (15:54 +0200)
Re-implement flapping following the 'old way' of just observing the last
20 stage changes.

refs #4982

14 files changed:
doc/08-advanced-topics.md
doc/09-object-types.md
lib/compat/compatlogger.cpp
lib/compat/statusdatawriter.cpp
lib/db_ido/dbevents.cpp
lib/icinga/checkable-check.cpp
lib/icinga/checkable-flapping.cpp
lib/icinga/checkable.hpp
lib/icinga/checkable.ti
lib/icinga/compatutility.cpp
test/CMakeLists.txt
test/icinga-checkable-flapping.cpp [new file with mode: 0644]
test/icinga-checkable-test.cpp [new file with mode: 0644]
test/icinga-checkresult.cpp

index a7d129ae1ca3a11a33743e6493b205e176b5ec25..b3cd68c48144e6e26122015ac19fd42f8bb239f7 100644 (file)
@@ -414,19 +414,42 @@ Example output in Icinga Web 2:
 
 Icinga 2 supports optional detection of hosts and services that are "flapping".
 
-Flapping occurs when a service or host changes state too frequently, resulting
-in a storm of problem and recovery notifications. Flapping can be the source of
-configuration problems (i.e. thresholds set too low), troublesome services,
-or real network problems.
+Flapping occurs when a service or host changes state too frequently, which would result in a storm of problem and
+recovery notifications. With flapping enabled a flapping notification will be sent while other notifications are
+suppresed until it calms down after receiving the same status from checks a few times. flapping can help detecting 
+configuration problems (wrong thresholds), troublesome services, or network problems.
 
 Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
-The `flapping_threshold` attributes allows to specify the percentage of state changes
-when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to flap.
+The `flapping_threshold_high` and `flapping_threshold_low` attributes allows to specify the thresholds that control
+when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to be flapping.
 
-Note: There are known issues with flapping detection. Please refrain from enabling
-flapping until [#4982](https://github.com/Icinga/icinga2/issues/4982) is fixed.
+The default thresholds are 30% for high and 25% for low. If the computed flapping value excedes the high threshold a
+host or service is considered flapping until it drops below the low flapping threshold.
 
-## Volatile Services <a id="volatile-services"></a>
+`FlappingStart` and `FlappingEnd` notifications will be sent out accordingly, if configured. See the chapter on
+[notifications](alert-notifications) for details
+
+> Note: There is no distinctions between hard and soft states with flapping. All state changes count and notifications
+> will be sent out regardless of the objects state.
+
+### How it works <a id="how-it-works"></a>
+
+Icinga 2 saves the last 20 state changes for every host and service. See the graphic below:
+
+![Icinga 2 Flapping State Timeline](images/advanced-topics/flapping-state-graph.png)
+
+All the states ware weighted, with the most recent one being worth the most (1.15) and the 20th the least (0.8). The
+states inbetween are fairly distributed. The final flapping value are the weightened state changes divided by the total
+count of 20.
+
+In the example above, the added states would have a total value of 7.82 (`0.84 + 0.86 + 0.88 + 0.9 + 0.98 + 1.06 + 1.12 + 1.18`).
+This yiels a flapping percentage of 39.1% (`7.82 / 20 * 100`). As the default upper flapping threshold is 30%, it would be
+considered flapping.
+
+If the next seven check results then would not be state changes, the flapping percentage would fall below the lower threshold
+of 25% and therefore the host or service would recover from flapping.
+
+# Volatile Services <a id="volatile-services"></a>
 
 By default all services remain in a non-volatile state. When a problem
 occurs, the `SOFT` state applies and once `max_check_attempts` attribute
index 036534344a8cecf342636a1ed4ffc52bdbf7a962..d09e323271fb21befe5e4f620cfa9002a7b34e6e 100644 (file)
@@ -730,7 +730,8 @@ Configuration Attributes:
   enable\_flapping          | Boolean               | **Optional.** Whether flap detection is enabled. Defaults to false.
   enable\_perfdata          | Boolean               | **Optional.** Whether performance data processing is enabled. Defaults to true.
   event\_command            | Object name           | **Optional.** The name of an event command that should be executed every time the host's state changes or the host is in a `SOFT` state.
-  flapping\_threshold       | Number                | **Optional.** The flapping threshold in percent when a host is considered to be flapping.
+  flapping\_threshold\_high | Number                | **Optional.** Flapping upper bound in percent for a host to be considered flapping. Default `30.0`
+  flapping\_threshold\_low  | Number                | **Optional.** Flapping lower bound in percent for a host to be considered  not flapping. Default `25.0`
   volatile                  | Boolean               | **Optional.** The volatile setting enables always `HARD` state types if `NOT-OK` state changes occur. Defaults to false.
   zone                     | Object name           | **Optional.** The zone this object is a member of. Please read the [distributed monitoring](06-distributed-monitoring.md#distributed-monitoring) chapter for details.
   command\_endpoint         | Object name           | **Optional.** The endpoint where commands are executed on.
@@ -767,6 +768,7 @@ Runtime Attributes:
   downtime\_depth           | Number                | Whether the host has one or more active downtimes.
   flapping\_last\_change    | Timestamp             | When the last flapping change occurred (as a UNIX timestamp).
   flapping                  | Boolean               | Whether the host is flapping between states.
+  flapping\_current         | Number                | Current flapping value in percent (see flapping\_thresholds)
   state                     | Number                | The current state (0 = UP, 1 = DOWN).
   last\_state               | Number                | The previous state (0 = UP, 1 = DOWN).
   last\_hard\_state         | Number                | The last hard state (0 = UP, 1 = DOWN).
@@ -1465,9 +1467,10 @@ Configuration Attributes:
   enable\_passive\_checks   | Boolean               | **Optional.** Whether passive checks are enabled. Defaults to `true`.
   enable\_event\_handler    | Boolean               | **Optional.** Enables event handlers for this host. Defaults to `true`.
   enable\_flapping          | Boolean               | **Optional.** Whether flap detection is enabled. Defaults to `false`.
+  flapping\_threshold\_high | Number                | **Optional.** Flapping upper bound in percent for a service to be considered flapping. `30.0`
+  flapping\_threshold\_low  | Number                | **Optional.** Flapping lower bound in percent for a service to be considered  not flapping. `25.0`
   enable\_perfdata          | Boolean               | **Optional.** Whether performance data processing is enabled. Defaults to `true`.
   event\_command            | Object name           | **Optional.** The name of an event command that should be executed every time the service's state changes or the service is in a `SOFT` state.
-  flapping\_threshold       | Number                | **Optional.** The flapping threshold in percent when a service is considered to be flapping.
   volatile                  | Boolean               | **Optional.** The volatile setting enables always `HARD` state types if `NOT-OK` state changes occur. Defaults to `false`.
   zone                     | Object name           | **Optional.** The zone this object is a member of. Please read the [distributed monitoring](06-distributed-monitoring.md#distributed-monitoring) chapter for details.
   name                      | String                | **Required.** The service name. Must be unique on a per-host basis. For advanced usage in [apply rules](03-monitoring-basics.md#using-apply) only.
@@ -1502,6 +1505,7 @@ Runtime Attributes:
   acknowledgement\_expiry   | Timestamp             | When the acknowledgement expires (as a UNIX timestamp; 0 = no expiry).
   downtime\_depth           | Number                | Whether the service has one or more active downtimes.
   flapping\_last\_change    | Timestamp             | When the last flapping change occurred (as a UNIX timestamp).
+  flapping\_current         | Number                | Current flapping value in percent (see flapping\_thresholds)
   flapping                  | Boolean               | Whether the host is flapping between states.
   state                     | Number                | The current state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN).
   last\_state               | Number                | The previous state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN).
index a7d3d91f015fecfad0da690a2fbf8507e7f94252..dabefe785a2b393d861816e016baa0ad3ba9d3de 100644 (file)
@@ -317,10 +317,10 @@ void CompatLogger::FlappingChangedHandler(const Checkable::Ptr& checkable)
        String flapping_output;
        
        if (checkable->IsFlapping()) {
-               flapping_output = "Checkable appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)";
+               flapping_output = "Checkable appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThresholdHigh()) + "% threshold)";
                flapping_state_str = "STARTED";
        } else {
-               flapping_output = "Checkable appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)";
+               flapping_output = "Checkable appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThresholdLow()) + "% threshold)";
                flapping_state_str = "STOPPED";
        }
 
index 080be18928b4ec095198d0f55196b1951dedf074..22a22f7dfdb270a14a401ffe95fb76f031b9ad5e 100644 (file)
@@ -296,8 +296,8 @@ void StatusDataWriter::DumpHostObject(std::ostream& fp, const Host::Ptr& host)
        fp << "\n";
 
        fp << "\t" << "initial_state" "\t" "o" "\n"
-             "\t" "low_flap_threshold" "\t" << host->GetFlappingThreshold() << "\n"
-             "\t" "high_flap_threshold" "\t" << host->GetFlappingThreshold() << "\n"
+             "\t" "low_flap_threshold" "\t" << host->GetFlappingThresholdLow() << "\n"
+             "\t" "high_flap_threshold" "\t" << host->GetFlappingThresholdHigh() << "\n"
              "\t" "process_perf_data" "\t" << CompatUtility::GetCheckableProcessPerformanceData(host) << "\n"
              "\t" "check_freshness" "\t" "1" "\n";
 
@@ -470,8 +470,8 @@ void StatusDataWriter::DumpServiceObject(std::ostream& fp, const Service::Ptr& s
                String icon_image_alt = service->GetIconImageAlt();
 
                fp << "\t" "initial_state" "\t" "o" "\n"
-                     "\t" "low_flap_threshold" "\t" << service->GetFlappingThreshold() << "\n"
-                     "\t" "high_flap_threshold" "\t" << service->GetFlappingThreshold() << "\n"
+                     "\t" "low_flap_threshold" "\t" << service->GetFlappingThresholdLow() << "\n"
+                     "\t" "high_flap_threshold" "\t" << service->GetFlappingThresholdHigh() << "\n"
                      "\t" "process_perf_data" "\t" << CompatUtility::GetCheckableProcessPerformanceData(service) << "\n"
                      "\t" "check_freshness" << "\t" "1" "\n";
                if (!notes.IsEmpty())
index 28f6604bb35fb3f03dfc8eee68ba70d13cb54dd7..706e3ddd357016a2c332582dc11453daeabe93a0 100644 (file)
@@ -1194,10 +1194,10 @@ void DbEvents::AddFlappingChangedLogHistory(const Checkable::Ptr& checkable)
        String flapping_output;
        
        if (checkable->IsFlapping()) {
-               flapping_output = "Service appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)";
+               flapping_output = "Service appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThresholdHigh()) + "% threshold)";
                flapping_state_str = "STARTED";
        } else {
-               flapping_output = "Service appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)";
+               flapping_output = "Service appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThresholdLow()) + "% threshold)";
                flapping_state_str = "STOPPED";
        }
 
@@ -1323,8 +1323,8 @@ void DbEvents::AddFlappingChangedHistory(const Checkable::Ptr& checkable)
        fields1->Set("flapping_type", service ? 1 : 0);
        fields1->Set("object_id", checkable);
        fields1->Set("percent_state_change", checkable->GetFlappingCurrent());
-       fields1->Set("low_threshold", checkable->GetFlappingThreshold());
-       fields1->Set("high_threshold", checkable->GetFlappingThreshold());
+       fields1->Set("low_threshold", checkable->GetFlappingThresholdLow());
+       fields1->Set("high_threshold", checkable->GetFlappingThresholdHigh());
 
        fields1->Set("instance_id", 0); /* DbConnection class fills in real ID */
 
@@ -1369,8 +1369,8 @@ void DbEvents::AddEnableFlappingChangedHistory(const Checkable::Ptr& checkable)
        fields1->Set("flapping_type", service ? 1 : 0);
        fields1->Set("object_id", checkable);
        fields1->Set("percent_state_change", checkable->GetFlappingCurrent());
-       fields1->Set("low_threshold", checkable->GetFlappingThreshold());
-       fields1->Set("high_threshold", checkable->GetFlappingThreshold());
+       fields1->Set("low_threshold", checkable->GetFlappingThresholdLow());
+       fields1->Set("high_threshold", checkable->GetFlappingThresholdHigh());
 
        fields1->Set("instance_id", 0); /* DbConnection class fills in real ID */
 
index 235f3a9dd38fe52b565ae58c733675c44fa202d0..a89fa1463b88f011cabe0126b46c5bbe4ceb925a 100644 (file)
@@ -315,14 +315,11 @@ void Checkable::ProcessCheckResult(const CheckResult::Ptr& cr, const MessageOrig
        olock.Lock();
        SetLastCheckResult(cr);
 
-       bool was_flapping, is_flapping;
+       bool was_flapping = IsFlapping();
 
-       was_flapping = IsFlapping();
+       UpdateFlappingStatus(old_state != cr->GetState());
 
-       if (GetStateType() == StateTypeHard)
-               UpdateFlappingStatus(stateChange);
-
-       is_flapping = IsFlapping();
+       bool is_flapping = IsFlapping();
 
        if (cr->GetActive()) {
                UpdateNextCheck(origin);
@@ -368,13 +365,13 @@ void Checkable::ProcessCheckResult(const CheckResult::Ptr& cr, const MessageOrig
                ExecuteEventHandler();
 
        /* Flapping start/end notifications */
-       if (send_notification && !was_flapping && is_flapping) {
+       if (!in_downtime && !was_flapping && is_flapping) {
                /* FlappingStart notifications happen on state changes, not in downtimes */
                if (!IsPaused())
                        OnNotificationsRequested(this, NotificationFlappingStart, cr, "", "", MessageOrigin::Ptr());
 
                Log(LogNotice, "Checkable")
-                       << "Flapping: Checkable '" << GetName() << "' started flapping (" << GetFlappingThreshold() << "% < " << GetFlappingCurrent() << "%).";
+                       << "Flapping: Checkable '" << GetName() << "' started flapping (Current flapping value " << GetFlappingCurrent() << "% > threshold " << GetFlappingThresholdHigh() << "%).";
 
                NotifyFlapping(origin);
        } else if (!in_downtime && was_flapping && !is_flapping) {
@@ -383,7 +380,7 @@ void Checkable::ProcessCheckResult(const CheckResult::Ptr& cr, const MessageOrig
                        OnNotificationsRequested(this, NotificationFlappingEnd, cr, "", "", MessageOrigin::Ptr());
 
                Log(LogNotice, "Checkable")
-                       << "Flapping: Checkable '" << GetName() << "' stopped flapping (" << GetFlappingThreshold() << "% >= " << GetFlappingCurrent() << "%).";
+                       << "Flapping: Checkable '" << GetName() << "' stopped flapping (Current flapping value " << GetFlappingCurrent() << "% < threshold " << GetFlappingThresholdLow() << "%).";
 
                NotifyFlapping(origin);
        }
index 3a17772a0d86e051f75d543e02a71798bb97dc72..af5ced0502049dc542178b62c68466b6846aa05a 100644 (file)
  * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.             *
  ******************************************************************************/
 
+#include <bitset>
 #include "icinga/checkable.hpp"
 #include "icinga/icingaapplication.hpp"
 #include "base/utility.hpp"
 
 using namespace icinga;
 
-#define FLAPPING_INTERVAL (30 * 60)
-
-double Checkable::GetFlappingCurrent(void) const
-{
-       if (GetFlappingPositive() + GetFlappingNegative() <= 0)
-               return 0;
-
-       return 100 * GetFlappingPositive() / (GetFlappingPositive() + GetFlappingNegative());
-}
-
 void Checkable::UpdateFlappingStatus(bool stateChange)
 {
-       double ts, now;
-       long positive, negative;
-
-       now = Utility::GetTime();
+       std::bitset<20> stateChangeBuf = GetFlappingBuffer();
+       int oldestIndex = (GetFlappingBuffer() & 0xFF00000) >> 20;
 
-       ts = GetFlappingLastChange();
-       positive = GetFlappingPositive();
-       negative = GetFlappingNegative();
+       stateChangeBuf[oldestIndex] = stateChange;
+       oldestIndex = (oldestIndex + 1) % 20;
 
-       double diff = now - ts;
+       double stateChanges = 0;
 
-       if (positive + negative > FLAPPING_INTERVAL) {
-               double pct = (positive + negative - FLAPPING_INTERVAL) / FLAPPING_INTERVAL;
-               positive -= pct * positive;
-               negative -= pct * negative;
+       for (int i = 0; i < 20; i++) {
+               if (stateChangeBuf[(oldestIndex + i) % 20])
+                       stateChanges += 0.8 + (0.02 * i);
        }
 
-       if (stateChange)
-               positive += diff;
-       else
-               negative += diff;
+       double flappingValue = 100.0 * stateChanges / 20.0;
 
-       if (positive < 0)
-               positive = 0;
+       bool flapping;
 
-       if (negative < 0)
-               negative = 0;
+       if (GetFlapping())
+               flapping = flappingValue > GetFlappingThresholdLow();
+       else
+               flapping = flappingValue > GetFlappingThresholdHigh();
 
-//     Log(LogDebug, "Checkable")
-//         << "Flapping counter for '" << GetName() << "' is positive=" << positive << ", negative=" << negative;
+       if (flapping != GetFlapping())
+               SetFlappingLastChange(Utility::GetTime());
 
-       SetFlappingLastChange(now);
-       SetFlappingPositive(positive);
-       SetFlappingNegative(negative);
+       SetFlappingBuffer((stateChangeBuf.to_ulong() | (oldestIndex << 20)));
+       SetFlappingCurrent(flappingValue);
+       SetFlapping(flapping);
 }
 
 bool Checkable::IsFlapping(void) const
@@ -76,5 +61,5 @@ bool Checkable::IsFlapping(void) const
        if (!GetEnableFlapping() || !IcingaApplication::GetInstance()->GetEnableFlapping())
                return false;
        else
-               return GetFlappingCurrent() > GetFlappingThreshold();
+               return GetFlapping();
 }
index 03fd5ae141e9eae8fad937e37ac91b5199c47646..0617eca127260eb5229e526e3cd9be440b2e75bd 100644 (file)
@@ -180,8 +180,6 @@ public:
        intrusive_ptr<EventCommand> GetEventCommand(void) const;
 
        /* Flapping Detection */
-       double GetFlappingCurrent(void) const;
-
        bool IsFlapping(void) const;
        void UpdateFlappingStatus(bool stateChange);
 
index f90a85dbe67d874bf70d149a7fa5e5dd156a1640..ced78b8e9fa2ac96239536efb170c529bb4e02b6 100644 (file)
@@ -70,9 +70,7 @@ abstract class Checkable : CustomVarObject
                }}}
        };
        [config] bool volatile;
-       [config] double flapping_threshold {
-               default {{{ return 30; }}}
-       };
+
        [config] bool enable_active_checks {
                default {{{ return true; }}}
        };
@@ -92,6 +90,16 @@ abstract class Checkable : CustomVarObject
                default {{{ return true; }}}
        };
 
+       [config, deprecated] double flapping_threshold;
+
+       [config] double flapping_threshold_low {
+               default {{{ return 25; }}}
+       };
+
+       [config] double flapping_threshold_high{
+               default {{{ return 30; }}}
+       };
+
        [config] String notes;
        [config] String notes_url;
        [config] String action_url;
@@ -139,12 +147,6 @@ abstract class Checkable : CustomVarObject
        };
        [state] Timestamp acknowledgement_expiry;
        [state] bool force_next_notification;
-       [state] int flapping_positive;
-       [state] int flapping_negative;
-       [state] Timestamp flapping_last_change;
-       [no_storage, protected] bool flapping {
-               get {{{ return false; }}}
-       };
        [no_storage] Timestamp last_check {
                get;
        };
@@ -152,6 +154,13 @@ abstract class Checkable : CustomVarObject
                get;
        };
 
+       [state] double flapping_current {
+               default {{{ return 0; }}}
+       };
+       [state] Timestamp flapping_last_change;
+       [state, no_user_view, no_user_modify] int flapping_buffer;
+       [state, protected] bool flapping;
+
        [config, navigation] name(Endpoint) command_endpoint (CommandEndpointRaw) {
                navigate {{{
                        return Endpoint::GetByName(GetCommandEndpointRaw());
index b33674c6b3df994671f5b99001607e5e72e50da6..b66f4b7b6f01ffce82e121436b0354c67c4b52c5 100644 (file)
@@ -300,12 +300,12 @@ int CompatUtility::GetCheckableIsVolatile(const Checkable::Ptr& checkable)
 
 double CompatUtility::GetCheckableLowFlapThreshold(const Checkable::Ptr& checkable)
 {
-       return checkable->GetFlappingThreshold();
+       return checkable->GetFlappingThresholdLow();
 }
 
 double CompatUtility::GetCheckableHighFlapThreshold(const Checkable::Ptr& checkable)
 {
-       return checkable->GetFlappingThreshold();
+       return checkable->GetFlappingThresholdHigh();
 }
 
 int CompatUtility::GetCheckableFreshnessChecksEnabled(const Checkable::Ptr& checkable)
index 85b8b59cb2b705ac4f7702dee20473b65d57ae9b..2a234419d301ec9dedd032a99295362de04aae02 100644 (file)
@@ -99,10 +99,10 @@ add_boost_test(base
         icinga_checkresult/service_1attempt
         icinga_checkresult/service_2attempts
         icinga_checkresult/service_3attempts
-       icinga_checkresult/host_flapping_notification
-       icinga_checkresult/service_flapping_notification
-       icinga_notification/state_filter
-       icinga_notification/type_filter
+        icinga_checkresult/host_flapping_notification
+        icinga_checkresult/service_flapping_notification
+        icinga_notification/state_filter
+        icinga_notification/type_filter
         icinga_macros/simple
         icinga_perfdata/empty
         icinga_perfdata/simple
@@ -136,3 +136,17 @@ if(ICINGA2_WITH_LIVESTATUS)
     TESTS livestatus/hosts livestatus/services
   )
 endif()
+
+set(icinga_checkable_test_SOURCES
+    icinga-checkable-flapping.cpp
+)
+
+add_boost_test(icinga_checkable
+  SOURCES icinga-checkable-test.cpp ${icinga_checkable_test_SOURCES}
+  LIBRARIES base config icinga cli
+  TESTS icinga_checkable_flapping/host_not_flapping
+        icinga_checkable_flapping/host_flapping
+        icinga_checkable_flapping/host_flapping_recover
+        icinga_checkable_flapping/host_flapping_docs_example
+)
+
diff --git a/test/icinga-checkable-flapping.cpp b/test/icinga-checkable-flapping.cpp
new file mode 100644 (file)
index 0000000..1618fd3
--- /dev/null
@@ -0,0 +1,260 @@
+/******************************************************************************
+ * Icinga 2                                                                   *
+ * Copyright (C) 2012-2016 Icinga Development Team (https://www.icinga.org/)  *
+ *                                                                            *
+ * This program is free software; you can redistribute it and/or              *
+ * modify it under the terms of the GNU General Public License                *
+ * as published by the Free Software Foundation; either version 2             *
+ * of the License, or (at your option) any later version.                     *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU General Public License for more details.                               *
+ *                                                                            *
+ * You should have received a copy of the GNU General Public License          *
+ * along with this program; if not, write to the Free Software Foundation     *
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.             *
+ ******************************************************************************/
+
+#include <boost/test/unit_test.hpp>
+#include <bitset>
+#include "icinga/host.hpp"
+#include <iostream>
+
+using namespace icinga;
+
+#ifdef I2_DEBUG
+static CheckResult::Ptr MakeCheckResult(ServiceState state)
+{
+       CheckResult::Ptr cr = new CheckResult();
+
+       cr->SetState(state);
+
+       double now = Utility::GetTime();
+       cr->SetScheduleStart(now);
+       cr->SetScheduleEnd(now);
+       cr->SetExecutionStart(now);
+       cr->SetExecutionEnd(now);
+
+       Utility::IncrementTime(60);
+
+       return cr;
+}
+
+static void LogFlapping(const Checkable::Ptr& obj)
+{
+       std::bitset<20> stateChangeBuf = obj->GetFlappingBuffer();
+       int oldestIndex = (obj->GetFlappingBuffer() & 0xFF00000) >> 20;
+
+    std::cout << "Flapping: " << obj->IsFlapping() << "\nHT: " << obj->GetFlappingThresholdHigh() << " LT: " << obj->GetFlappingThresholdLow()
+       << "\nOur value: " << obj->GetFlappingCurrent() << "\nPtr: " << oldestIndex << " Buf: " << stateChangeBuf << '\n';
+}
+
+
+static void LogHostStatus(const Host::Ptr &host)
+{
+       std::cout << "Current status: state: " << host->GetState() << " state_type: " << host->GetStateType()
+           << " check attempt: " << host->GetCheckAttempt() << "/" << host->GetMaxCheckAttempts() << std::endl;
+}
+#endif /* I2_DEBUG */
+
+BOOST_AUTO_TEST_SUITE(icinga_checkable_flapping)
+
+BOOST_AUTO_TEST_CASE(host_not_flapping)
+{
+#ifndef I2_DEBUG
+    BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!");
+#else /* I2_DEBUG */
+       std::cout << "Running test with a non-flapping host...\n";
+
+       Host::Ptr host = new Host();
+       host->SetName("test");
+       host->SetEnableFlapping(true);
+       host->SetMaxCheckAttempts(5);
+
+       // Host otherwise is soft down
+       host->SetState(HostUp);
+       host->SetStateType(StateTypeHard);
+
+       Utility::SetTime(0);
+
+       BOOST_CHECK(host->GetFlappingCurrent() == 0);
+
+       LogFlapping(host);
+       LogHostStatus(host);
+
+       // watch the state being stable
+       int i = 0;
+       while (i++ < 10) {
+               // For some reason, elusive to me, the first check is a state change
+               host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+
+               LogFlapping(host);
+               LogHostStatus(host);
+
+               BOOST_CHECK(host->GetState() == 0);
+               BOOST_CHECK(host->GetCheckAttempt() == 1);
+               BOOST_CHECK(host->GetStateType() == StateTypeHard);
+
+               //Should not be flapping
+               BOOST_CHECK(!host->IsFlapping());
+               BOOST_CHECK(host->GetFlappingCurrent() < 30.0);
+       }
+#endif /* I2_DEBUG */
+}
+
+BOOST_AUTO_TEST_CASE(host_flapping)
+{
+#ifndef I2_DEBUG
+    BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!");
+#else /* I2_DEBUG */
+       std::cout << "Running test with host changing state with every check...\n";
+
+       Host::Ptr host = new Host();
+       host->SetName("test");
+       host->SetEnableFlapping(true);
+       host->SetMaxCheckAttempts(5);
+
+       Utility::SetTime(0);
+
+       int i = 0;
+       while (i++ < 25) {
+               if (i % 2)
+                       host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+               else
+                       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+
+               LogFlapping(host);
+               LogHostStatus(host);
+
+               //30 Percent is our high Threshold
+               if (i >= 6) {
+                       BOOST_CHECK(host->IsFlapping());
+               } else {
+                       BOOST_CHECK(!host->IsFlapping());
+               }
+       }
+#endif /* I2_DEBUG */
+}
+
+BOOST_AUTO_TEST_CASE(host_flapping_recover)
+{
+#ifndef I2_DEBUG
+    BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!");
+#else /* I2_DEBUG */
+       std::cout << "Running test with flapping recovery...\n";
+
+       Host::Ptr host = new Host();
+       host->SetName("test");
+       host->SetEnableFlapping(true);
+       host->SetMaxCheckAttempts(5);
+
+       // Host otherwise is soft down
+       host->SetState(HostUp);
+       host->SetStateType(StateTypeHard);
+
+       Utility::SetTime(0);
+
+       // A few warning 
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+
+       LogFlapping(host);
+       LogHostStatus(host);
+       for (int i = 0; i <= 7; i++) {
+               if (i % 2)
+                       host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+               else
+                       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       }
+
+       LogFlapping(host);
+       LogHostStatus(host);
+
+       // We should be flapping now
+       BOOST_CHECK(host->GetFlappingCurrent() > 30.0);
+       BOOST_CHECK(host->IsFlapping());
+
+       // Now recover from flapping
+       int count = 0;
+       while (host->IsFlapping()) {
+               BOOST_CHECK(host->GetFlappingCurrent() > 25.0);
+               BOOST_CHECK(host->IsFlapping());
+
+               host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+               LogFlapping(host);
+               LogHostStatus(host);
+               count++;
+       }
+
+       std::cout << "Recovered from flapping after " << count << " Warning results.\n";
+
+       BOOST_CHECK(host->GetFlappingCurrent() < 25.0);
+       BOOST_CHECK(!host->IsFlapping());
+#endif /* I2_DEBUG */
+}
+
+BOOST_AUTO_TEST_CASE(host_flapping_docs_example)
+{
+#ifndef I2_DEBUG
+    BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!");
+#else /* I2_DEBUG */
+       std::cout << "Simulating the documentation example...\n";
+
+       Host::Ptr host = new Host();
+       host->SetName("test");
+       host->SetEnableFlapping(true);
+       host->SetMaxCheckAttempts(5);
+
+       // Host otherwise is soft down
+       host->SetState(HostUp);
+       host->SetStateType(StateTypeHard);
+
+       Utility::SetTime(0);
+
+       host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+       host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+       host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+       host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+       host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+       host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+       host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+       host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
+       host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+
+       LogFlapping(host);
+       LogHostStatus(host);
+       BOOST_CHECK(host->GetFlappingCurrent() == 39.1);
+       BOOST_CHECK(host->IsFlapping());
+
+       host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+       host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+       host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+       host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+       host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+       host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+       host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
+
+       LogFlapping(host);
+       LogHostStatus(host);
+       BOOST_CHECK(host->GetFlappingCurrent() < 25.0);
+       BOOST_CHECK(!host->IsFlapping());
+#endif
+}
+
+BOOST_AUTO_TEST_SUITE_END()
diff --git a/test/icinga-checkable-test.cpp b/test/icinga-checkable-test.cpp
new file mode 100644 (file)
index 0000000..dee1adc
--- /dev/null
@@ -0,0 +1,66 @@
+/******************************************************************************
+ * Icinga 2                                                                   *
+ * Copyright (C) 2012-2016 Icinga Development Team (https://www.icinga.org/)  *
+ *                                                                            *
+ * This program is free software; you can redistribute it and/or              *
+ * modify it under the terms of the GNU General Public License                *
+ * as published by the Free Software Foundation; either version 2             *
+ * of the License, or (at your option) any later version.                     *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU General Public License for more details.                               *
+ *                                                                            *
+ * You should have received a copy of the GNU General Public License          *
+ * along with this program; if not, write to the Free Software Foundation     *
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.             *
+ ******************************************************************************/
+
+#define BOOST_TEST_MAIN
+#define BOOST_TEST_MODULE icinga2_test
+
+#include "cli/daemonutility.hpp"
+#include "base/application.hpp"
+#include "base/loader.hpp"
+#include <BoostTestTargetConfig.h>
+#include <fstream>
+
+using namespace icinga;
+
+struct IcingaCheckableFixture
+{
+       IcingaCheckableFixture(void)
+       {
+               BOOST_TEST_MESSAGE("setup running Icinga 2 core");
+
+               Application::InitializeBase();
+
+               /* start the Icinga application and load the configuration */
+               Application::DeclareSysconfDir("etc");
+               Application::DeclareLocalStateDir("var");
+
+               ActivationScope ascope;
+
+               Loader::LoadExtensionLibrary("icinga");
+               Loader::LoadExtensionLibrary("methods"); //loaded by ITL
+
+               std::vector<std::string> configs;
+               std::vector<ConfigItem::Ptr> newItems;
+
+               DaemonUtility::LoadConfigFiles(configs, newItems, "icinga2.debug", "icinga2.vars");
+
+               /* ignore config errors */
+               WorkQueue upq;
+               ConfigItem::ActivateItems(upq, newItems);
+       }
+
+       ~IcingaCheckableFixture(void)
+       {
+               BOOST_TEST_MESSAGE("cleanup Icinga 2 core");
+               Application::UninitializeBase();
+       }
+};
+
+BOOST_GLOBAL_FIXTURE(IcingaCheckableFixture);
+
index a128e9519c1d9b77e6bd37356a1db94619544c1c..f7a2387acee865caf6a5a53a58808e799ea8e08a 100644 (file)
@@ -395,11 +395,9 @@ BOOST_AUTO_TEST_CASE(host_flapping_notification)
 #else /* I2_DEBUG */
        boost::signals2::connection c = Checkable::OnNotificationsRequested.connect(boost::bind(&NotificationHandler, _1, _2));
 
-       int softStateCount = 20;
        int timeStepInterval = 60;
 
        Host::Ptr host = new Host();
-       host->SetMaxCheckAttempts(softStateCount);
        host->Activate();
        host->SetAuthority(true);
        host->SetStateRaw(ServiceOK);
@@ -418,18 +416,25 @@ BOOST_AUTO_TEST_CASE(host_flapping_notification)
 
        std::cout << "Inserting flapping check results" << std::endl;
 
-       for (int i = 0; i < softStateCount; i++) {
+       for (int i = 0; i < 10; i++) {
                ServiceState state = (i % 2 == 0 ? ServiceOK : ServiceCritical);
                host->ProcessCheckResult(MakeCheckResult(state));
                Utility::IncrementTime(timeStepInterval);
        }
 
-       std::cout << "Checking host state (must be flapping in SOFT state)" << std::endl;
-       BOOST_CHECK(host->GetStateType() == StateTypeSoft);
        BOOST_CHECK(host->IsFlapping() == true);
 
-       std::cout << "No FlappingStart notification type must have been triggered in a SOFT state" << std::endl;
-       CheckNotification(host, false, NotificationFlappingStart);
+       CheckNotification(host, true, NotificationFlappingStart);
+
+       std::cout << "Now calm down..." << std::endl;
+
+       for (int i = 0; i < 20; i++) {
+               host->ProcessCheckResult(MakeCheckResult(ServiceOK));
+               Utility::IncrementTime(timeStepInterval);
+       }
+
+       CheckNotification(host, true, NotificationFlappingEnd);
+
 
        c.disconnect();
 
@@ -443,11 +448,9 @@ BOOST_AUTO_TEST_CASE(service_flapping_notification)
 #else /* I2_DEBUG */
        boost::signals2::connection c = Checkable::OnNotificationsRequested.connect(boost::bind(&NotificationHandler, _1, _2));
 
-       int softStateCount = 20;
        int timeStepInterval = 60;
 
        Host::Ptr service = new Host();
-       service->SetMaxCheckAttempts(softStateCount);
        service->Activate();
        service->SetAuthority(true);
        service->SetStateRaw(ServiceOK);
@@ -466,18 +469,24 @@ BOOST_AUTO_TEST_CASE(service_flapping_notification)
 
        std::cout << "Inserting flapping check results" << std::endl;
 
-       for (int i = 0; i < softStateCount; i++) {
+       for (int i = 0; i < 10; i++) {
                ServiceState state = (i % 2 == 0 ? ServiceOK : ServiceCritical);
                service->ProcessCheckResult(MakeCheckResult(state));
                Utility::IncrementTime(timeStepInterval);
        }
 
-       std::cout << "Checking service state (must be flapping in SOFT state)" << std::endl;
-       BOOST_CHECK(service->GetStateType() == StateTypeSoft);
        BOOST_CHECK(service->IsFlapping() == true);
 
-       std::cout << "No FlappingStart notification type must have been triggered in a SOFT state" << std::endl;
-       CheckNotification(service, false, NotificationFlappingStart);
+       CheckNotification(service, true, NotificationFlappingStart);
+
+       std::cout << "Now calm down..." << std::endl;
+
+       for (int i = 0; i < 20; i++) {
+               service->ProcessCheckResult(MakeCheckResult(ServiceOK));
+               Utility::IncrementTime(timeStepInterval);
+       }
+
+       CheckNotification(service, true, NotificationFlappingEnd);
 
        c.disconnect();