From: Daniel Berlin <dberlin@dberlin.org>
Date: Wed, 8 Feb 2017 15:22:52 +0000 (+0000)
Subject: LVI: Add a per-value worklist limit to LazyValueInfo.
X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=f48edbb0e13e2f5dfd17df3111855fc327d6630b;p=llvm

LVI: Add a per-value worklist limit to LazyValueInfo.

Summary:
LVI is now depth first, which is optimal for iteration strategy in
terms of work per call.  However, the way the results get cached means
it can still go very badly N^2 or worse right now.  The overdefined
cache is per-block, because LVI wants to try to get different results
for the same name in different blocks (IE solve the problem
PredicateInfo solves).  This means even if we discover a value is
overdefined after going very deep, it doesn't cache this information,
causing it to end up trying to rediscover it again and again.  The
same is true for values along the way.  In practice, overdefined
anywhere should mean overdefined everywhere (this is how, for example,
SCCP works).

Until we get around to reworking the overdefined cache, we need to
limit the worklist size we process.  Note that permanently reverting
the DFS strategy exploration seems the wrong strategy (temporarily
seems fine if we really want).  BFS is clearly the wrong approach, it
just gets luckier on some testcases.  It's also very hard to design
an effective throttle for BFS. For DFS, the throttle is directly related
to the depth of the CFG.  So really deep CFGs will get cutoff, smaller
ones will not. As the CFG simplifies, you get better results.
In BFS, the limit is it's related to the fan-out times average block size,
which is harder to reason about or make good choices for.

Bug being filed about the overdefined cache, but it will require major
surgery to fix it (plumbing predicateinfo through CVP or LVI).

Note: I did not make this number configurable because i'm not sure
anyone really needs to tweak this knob.  We run CVP 3 times. On the
testcases i have the slow ones happen in the middle, where CVP is
doing cleanup work other things are effective at.  Over the course of
3 runs, we don't see to have any real loss of performance.

I haven't gotten a minimized testcase yet, but just imagine in your
head a testcase where, going *up* the CFG, you have branches, one of
which leads 50000 blocks deep, and the other, to something where the
answer is overdefined immediately.  BFS would discover the overdefined
faster than DFS, but do more work to do so.  In practice, the right
answer is "once DFS discovers overdefined for a value, stop trying to
get more info about that value" (and so, DFS would normally cache the
overdefined results for every value it passed through in those 50k
blocks, and never do that work again. But it don't, because of the
naming problem)

Reviewers: chandlerc, djasper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29715

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294463 91177308-0d34-0410-b5e6-96231b3b80d8
---

diff --git a/lib/Analysis/LazyValueInfo.cpp b/lib/Analysis/LazyValueInfo.cpp
index b3b1acb4b1b..e8f2d536c19 100644
--- a/lib/Analysis/LazyValueInfo.cpp
+++ b/lib/Analysis/LazyValueInfo.cpp
@@ -39,6 +39,10 @@ using namespace PatternMatch;
 
 #define DEBUG_TYPE "lazy-value-info"
 
+// This is the number of worklist items we will process to try to discover an
+// answer for a given value.
+static const unsigned MaxProcessedPerValue = 500;
+
 char LazyValueInfoWrapperPass::ID = 0;
 INITIALIZE_PASS_BEGIN(LazyValueInfoWrapperPass, "lazy-value-info",
                 "Lazy Value Information Analysis", false, true)
@@ -563,7 +567,7 @@ namespace {
     /// This stack holds the state of the value solver during a query.
     /// It basically emulates the callstack of the naive
     /// recursive value lookup process.
-    std::stack<std::pair<BasicBlock*, Value*> > BlockValueStack;
+    SmallVector<std::pair<BasicBlock*, Value*>, 8> BlockValueStack;
 
     /// Keeps track of which block-value pairs are in BlockValueStack.
     DenseSet<std::pair<BasicBlock*, Value*> > BlockValueSet;
@@ -576,7 +580,7 @@ namespace {
 
       DEBUG(dbgs() << "PUSH: " << *BV.second << " in " << BV.first->getName()
                    << "\n");
-      BlockValueStack.push(BV);
+      BlockValueStack.push_back(BV);
       return true;
     }
 
@@ -646,24 +650,50 @@ namespace {
 } // end anonymous namespace
 
 void LazyValueInfoImpl::solve() {
+  SmallVector<std::pair<BasicBlock *, Value *>, 8> StartingStack(
+      BlockValueStack.begin(), BlockValueStack.end());
+
+  unsigned processedCount = 0;
   while (!BlockValueStack.empty()) {
-    std::pair<BasicBlock*, Value*> &e = BlockValueStack.top();
+    processedCount++;
+    // Abort if we have to process too many values to get a result for this one.
+    // Because of the design of the overdefined cache currently being per-block
+    // to avoid naming-related issues (IE it wants to try to give different
+    // results for the same name in different blocks), overdefined results don't
+    // get cached globally, which in turn means we will often try to rediscover
+    // the same overdefined result again and again.  Once something like
+    // PredicateInfo is used in LVI or CVP, we should be able to make the
+    // overdefined cache global, and remove this throttle.
+    if (processedCount > MaxProcessedPerValue) {
+      DEBUG(dbgs() << "Giving up on stack because we are getting too deep\n");
+      // Fill in the original values
+      while (!StartingStack.empty()) {
+        std::pair<BasicBlock *, Value *> &e = StartingStack.back();
+        TheCache.insertResult(e.second, e.first,
+                              LVILatticeVal::getOverdefined());
+        StartingStack.pop_back();
+      }
+      BlockValueSet.clear();
+      BlockValueStack.clear();
+      return;
+    }
+    std::pair<BasicBlock *, Value *> &e = BlockValueStack.back();
     assert(BlockValueSet.count(e) && "Stack value should be in BlockValueSet!");
 
     if (solveBlockValue(e.second, e.first)) {
       // The work item was completely processed.
-      assert(BlockValueStack.top() == e && "Nothing should have been pushed!");
+      assert(BlockValueStack.back() == e && "Nothing should have been pushed!");
       assert(TheCache.hasCachedValueInfo(e.second, e.first) &&
              "Result should be in cache!");
 
       DEBUG(dbgs() << "POP " << *e.second << " in " << e.first->getName()
                    << " = " << TheCache.getCachedValueInfo(e.second, e.first) << "\n");
 
-      BlockValueStack.pop();
+      BlockValueStack.pop_back();
       BlockValueSet.erase(e);
     } else {
       // More work needs to be done before revisiting.
-      assert(BlockValueStack.top() != e && "Stack should have been pushed!");
+      assert(BlockValueStack.back() != e && "Stack should have been pushed!");
     }
   }
 }