From: Michael Friedrich <michael.friedrich@icinga.com>
Date: Fri, 10 May 2019 10:48:34 +0000 (+0200)
Subject: API: Automatically repair broken packages
X-Git-Tag: v2.11.0-rc1~104^2~1
X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=6cce9c0fdd60b7474bb8c3ee6aa6ebb3ffe5ce1a;p=icinga2

API: Automatically repair broken packages

This partially reverts #7150 and avoids exceptions
inside the flow. Each time an empty active stage
is detected, Icinga tries to repair it from the
the given directory tree.

Also, the code now takes into account that it should
create the package storage on startup, whether within
the API object, or if disabled, inside the application.

Caching the active stages for packages in memory
only is in effect with the API feature being enabled.
This is useful for other deployed config packages,
not only the internal one.

fixes #7173
refs #7150
refs #7119
fixes #6959
---

diff --git a/doc/15-troubleshooting.md b/doc/15-troubleshooting.md
index f7d46f7b8..6f4a965a9 100644
--- a/doc/15-troubleshooting.md
+++ b/doc/15-troubleshooting.md
@@ -780,7 +780,7 @@ Wrong:
 Correct:
 
 ```
-/var/lib/icinga2/api/packages/_api/abcd-ef12-3456-7890/conf.d/downtimes/1234-5678-9012-3456.conf
+/var/lib/icinga2/api/packages/_api/dbe0bef8-c72c-4cc9-9779-da7c4527c5b2/conf.d/downtimes/1234-5678-9012-3456.conf
 ```
 
 At creation time, the object lives in memory but its storage is broken. Upon restart,
@@ -792,16 +792,17 @@ read by the Icinga daemon. This information is stored in `/var/lib/icinga2/api/p
 2.11 now limits the direct active-stage file access (this is hidden from the user),
 and caches active stages for packages in-memory.
 
-Bonus on startup/config validation: Icinga now logs a critical message when a deployed
-config package is broken.
+It also tries to repair the broken package, and lots a new message:
 
 ```
-icinga2 daemon -C
+systemctl restart icinga2
+
+tail -f /var/log/icinga2/icinga2.log
 
-[2019-04-26 12:58:14 +0200] critical/ApiListener: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation.
+[2019-05-10 12:27:15 +0200] information/ConfigObjectUtility: Repairing config package '_api' with stage 'dbe0bef8-c72c-4cc9-9779-da7c4527c5b2'.
 ```
 
-In order to fix the broken config package, and mark a deployed stage as active
+If this does not happen, you can manually fixthe broken config package, and mark a deployed stage as active
 again, carefully do the following steps with creating a backup before:
 
 Navigate into the API package prefix.
@@ -820,7 +821,7 @@ ls -lahtr
 drwx------  4 michi  wheel   128B Mar 27 14:39 ..
 -rw-r--r--  1 michi  wheel    25B Mar 27 14:39 include.conf
 -rw-r--r--  1 michi  wheel   405B Mar 27 14:39 active.conf
-drwx------  7 michi  wheel   224B Mar 27 15:01 abcd-ef12-3456-7890
+drwx------  7 michi  wheel   224B Mar 27 15:01 dbe0bef8-c72c-4cc9-9779-da7c4527c5b2
 drwx------  5 michi  wheel   160B Apr 26 12:47 .
 ```
 
@@ -832,16 +833,22 @@ directory. Copy the directory name `abcd-ef12-3456-7890` and
 add it into a new file `active-stage`. This can be done like this:
 
 ```
-echo "abcd-ef12-3456-7890" > active-stage
+echo "dbe0bef8-c72c-4cc9-9779-da7c4527c5b2" > active-stage
 ```
 
-Re-run config validation.
+`active.conf` needs to have the correct active stage too, add it again
+like this. Note: This is deep down in the code, use with care!
 
 ```
-icinga2 daemon -C
+sed -i 's/ActiveStages\["_api"\].*/ActiveStages\["_api"\] = "dbe0bef8-c72c-4cc9-9779-da7c4527c5b2"/g' /var/lib/icinga2/api/packages/_api/active.conf
+```
+
+Restart Icinga 2.
+
+```
+systemctl restart icinga2
 ```
 
-The validation should not show an error.
 
 > **Note**
 >
diff --git a/doc/16-upgrading-icinga-2.md b/doc/16-upgrading-icinga-2.md
index f3575fdd4..5b2ba51a9 100644
--- a/doc/16-upgrading-icinga-2.md
+++ b/doc/16-upgrading-icinga-2.md
@@ -123,12 +123,13 @@ directory path, because the active-stage file was empty/truncated/unreadable at
 this point.
 
 2.11 makes this mechanism more stable and detects broken config packages.
+It will also attempt to fix them, the following log entry is perfectly fine.
 
 ```
-[2019-04-26 12:58:14 +0200] critical/ApiListener: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation.
+[2019-05-10 12:12:09 +0200] information/ConfigObjectUtility: Repairing config package '_api' with stage 'dbe0bef8-c72c-4cc9-9779-da7c4527c5b2'.
 ```
 
-In order to fix this, please follow [this troubleshooting entry](15-troubleshooting.md#troubleshooting-api-missing-runtime-objects).
+If you still encounter problems, please follow [this troubleshooting entry](15-troubleshooting.md#troubleshooting-api-missing-runtime-objects).
 
 
 ## Upgrading to v2.10 <a id="upgrading-to-2-10"></a>
diff --git a/lib/cli/daemoncommand.cpp b/lib/cli/daemoncommand.cpp
index adbb1afa5..1da4b1cc5 100644
--- a/lib/cli/daemoncommand.cpp
+++ b/lib/cli/daemoncommand.cpp
@@ -288,6 +288,9 @@ int DaemonCommand::Run(const po::variables_map& vm, const std::vector<std::strin
 		Logger::DisableConsoleLog();
 	}
 
+	/* Create the internal API object storage. Do this here too with setups without API. */
+	ConfigObjectUtility::CreateStorage();
+
 	/* Remove ignored Downtime/Comment objects. */
 	try {
 		String configDir = ConfigObjectUtility::GetConfigDir();
diff --git a/lib/remote/apilistener.cpp b/lib/remote/apilistener.cpp
index f4903e6a6..cfd207a31 100644
--- a/lib/remote/apilistener.cpp
+++ b/lib/remote/apilistener.cpp
@@ -7,6 +7,7 @@
 #include "remote/jsonrpc.hpp"
 #include "remote/apifunction.hpp"
 #include "remote/configpackageutility.hpp"
+#include "remote/configobjectutility.hpp"
 #include "base/convert.hpp"
 #include "base/defer.hpp"
 #include "base/io-engine.hpp"
@@ -135,6 +136,9 @@ void ApiListener::OnConfigLoaded()
 		Log(LogWarning, "ApiListener", "Please read the upgrading documentation for v2.8: https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/");
 	}
 
+	/* Create the internal API object storage. */
+	ConfigObjectUtility::CreateStorage();
+
 	/* Cache API packages and their active stage name. */
 	UpdateActivePackageStagesCache();
 
diff --git a/lib/remote/configobjectutility.cpp b/lib/remote/configobjectutility.cpp
index c1a0f6543..9aeb30c06 100644
--- a/lib/remote/configobjectutility.cpp
+++ b/lib/remote/configobjectutility.cpp
@@ -10,15 +10,21 @@
 #include "base/dependencygraph.hpp"
 #include "base/utility.hpp"
 #include <boost/algorithm/string/case_conv.hpp>
+#include <boost/filesystem.hpp>
+#include <boost/system/error_code.hpp>
 #include <fstream>
 
 using namespace icinga;
 
 String ConfigObjectUtility::GetConfigDir()
 {
-	/* This may throw an exception the caller above must handle. */
-	return ConfigPackageUtility::GetPackageDir() + "/_api/" +
-		ConfigPackageUtility::GetActiveStage("_api");
+	String prefix = ConfigPackageUtility::GetPackageDir() + "/_api/";
+	String activeStage = ConfigPackageUtility::GetActiveStage("_api");
+
+	if (activeStage.IsEmpty())
+		RepairPackage("_api");
+
+	return prefix + activeStage;
 }
 
 String ConfigObjectUtility::GetObjectConfigPath(const Type::Ptr& type, const String& fullName)
@@ -33,6 +39,59 @@ String ConfigObjectUtility::GetObjectConfigPath(const Type::Ptr& type, const Str
 		"/" + EscapeName(fullName) + ".conf";
 }
 
+void ConfigObjectUtility::RepairPackage(const String& package)
+{
+	/* Try to fix the active stage, whenever we find a directory in there.
+	 * This automatically heals packages < 2.11 which remained broken.
+	 */
+	namespace fs = boost::filesystem;
+
+	fs::path path(ConfigPackageUtility::GetPackageDir() + "/" + package + "/");
+
+	fs::recursive_directory_iterator end;
+
+	String foundActiveStage;
+
+	for (fs::recursive_directory_iterator it(path); it != end; it++) {
+		boost::system::error_code ec;
+
+		const fs::path d = *it;
+		if (fs::is_directory(d, ec)) {
+			/* Extract the relative directory name. */
+			foundActiveStage = d.stem().string();
+
+			break; // Use the first found directory.
+		}
+	}
+
+	if (!foundActiveStage.IsEmpty()) {
+		Log(LogInformation, "ConfigObjectUtility")
+			<< "Repairing config package '" << package << "' with stage '" << foundActiveStage << "'.";
+
+		ConfigPackageUtility::ActivateStage(package, foundActiveStage);
+	} else {
+		BOOST_THROW_EXCEPTION(std::invalid_argument("Cannot repair package '" + package + "', please check the troubleshooting docs."));
+	}
+}
+
+void ConfigObjectUtility::CreateStorage()
+{
+	boost::mutex::scoped_lock lock(ConfigPackageUtility::GetStaticPackageMutex());
+
+	/* For now, we only use _api as our creation target. */
+	String package = "_api";
+
+	if (!ConfigPackageUtility::PackageExists(package)) {
+		Log(LogNotice, "ConfigObjectUtility")
+			<< "Package " << package << " doesn't exist yet, creating it.";
+
+		ConfigPackageUtility::CreatePackage(package);
+
+		String stage = ConfigPackageUtility::CreateStage(package);
+		ConfigPackageUtility::ActivateStage(package, stage);
+	}
+}
+
 String ConfigObjectUtility::EscapeName(const String& name)
 {
 	return Utility::EscapeString(name, "<>:\"/\\|?*", true);
@@ -88,16 +147,7 @@ String ConfigObjectUtility::CreateObjectConfig(const Type::Ptr& type, const Stri
 bool ConfigObjectUtility::CreateObject(const Type::Ptr& type, const String& fullName,
 	const String& config, const Array::Ptr& errors, const Array::Ptr& diagnosticInformation)
 {
-	{
-		boost::mutex::scoped_lock lock(ConfigPackageUtility::GetStaticPackageMutex());
-
-		if (!ConfigPackageUtility::PackageExists("_api")) {
-			ConfigPackageUtility::CreatePackage("_api");
-
-			String stage = ConfigPackageUtility::CreateStage("_api");
-			ConfigPackageUtility::ActivateStage("_api", stage);
-		}
-	}
+	CreateStorage();
 
 	ConfigItem::Ptr item = ConfigItem::GetByTypeAndName(type, fullName);
 
diff --git a/lib/remote/configobjectutility.hpp b/lib/remote/configobjectutility.hpp
index 5de16d2c9..404bc3bad 100644
--- a/lib/remote/configobjectutility.hpp
+++ b/lib/remote/configobjectutility.hpp
@@ -23,6 +23,8 @@ class ConfigObjectUtility
 public:
 	static String GetConfigDir();
 	static String GetObjectConfigPath(const Type::Ptr& type, const String& fullName);
+	static void RepairPackage(const String& package);
+	static void CreateStorage();
 
 	static String CreateObjectConfig(const Type::Ptr& type, const String& fullName,
 		bool ignoreOnError, const Array::Ptr& templates, const Dictionary::Ptr& attrs);
diff --git a/lib/remote/configpackageutility.cpp b/lib/remote/configpackageutility.cpp
index d0bf90061..ac877d16c 100644
--- a/lib/remote/configpackageutility.cpp
+++ b/lib/remote/configpackageutility.cpp
@@ -265,7 +265,7 @@ String ConfigPackageUtility::GetActiveStageFromFile(const String& packageName)
 	fp.close();
 
 	if (fp.fail())
-		BOOST_THROW_EXCEPTION(std::invalid_argument("Cannot detect active stage for package '" + packageName + "'. Broken config package, check the troubleshooting documentation."));
+		return ""; /* Don't use exceptions here. The caller must deal with empty stages at this point. Happens on initial package creation for example. */
 
 	return stage.Trim();
 }
@@ -283,13 +283,16 @@ void ConfigPackageUtility::SetActiveStageToFile(const String& packageName, const
 
 String ConfigPackageUtility::GetActiveStage(const String& packageName)
 {
+	String activeStage;
+
 	ApiListener::Ptr listener = ApiListener::GetInstance();
 
-	/* config packages without API make no sense. */
+	/* If we don't have an API feature, just use the file storage without caching this.
+	 * This happens when ScheduledDowntime objects generate Downtime objects.
+	 * TODO: Make the API a first class citizen.
+	 */
 	if (!listener)
-		BOOST_THROW_EXCEPTION(std::invalid_argument("No ApiListener instance configured."));
-
-	String activeStage;
+		return GetActiveStageFromFile(packageName);
 
 	/* First use runtime state. */
 	try {
@@ -301,8 +304,6 @@ String ConfigPackageUtility::GetActiveStage(const String& packageName)
 		/* When we've read something, correct memory. */
 		if (!activeStage.IsEmpty())
 			listener->SetActivePackageStage(packageName, activeStage);
-		else
-			BOOST_THROW_EXCEPTION(std::invalid_argument("Cannot detect active stage for package '" + packageName + "'. Broken config package, check the troubleshooting documentation."));
 	}
 
 	return activeStage;
@@ -310,16 +311,16 @@ String ConfigPackageUtility::GetActiveStage(const String& packageName)
 
 void ConfigPackageUtility::SetActiveStage(const String& packageName, const String& stageName)
 {
+	/* Update the marker on disk for restarts. */
+	SetActiveStageToFile(packageName, stageName);
+
 	ApiListener::Ptr listener = ApiListener::GetInstance();
 
-	/* config packages without API make no sense. */
+	/* No API, no caching. */
 	if (!listener)
-		BOOST_THROW_EXCEPTION(std::invalid_argument("No ApiListener instance configured."));
+		return;
 
 	listener->SetActivePackageStage(packageName, stageName);
-
-	/* Also update the marker on disk for restarts. */
-	SetActiveStageToFile(packageName, stageName);
 }
 
 std::vector<std::pair<String, bool> > ConfigPackageUtility::GetFiles(const String& packageName, const String& stageName)