granicus.if.org Git - clang/blob - www/analyzer/checker_dev_manual.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2           "http://www.w3.org/TR/html4/strict.dtd">
   3 <html>
   4 <head>
   5   <title>Checker Developer Manual</title>
   6   <link type="text/css" rel="stylesheet" href="menu.css">
   7   <link type="text/css" rel="stylesheet" href="content.css">
   8   <script type="text/javascript" src="scripts/menu.js"></script>
   9 </head>
  10 <body>
  11
  12 <div id="page">
  13 <!--#include virtual="menu.html.incl"-->
  14
  15 <div id="content">
  16
  17 <h1 style="color:red">This Page Is Under Construction</h1>
  18
  19 <h1>Checker Developer Manual</h1>
  20
  21 <p>The static analyzer engine performs symbolic execution of the program and
  22 relies on a set of checkers to implement the logic for detecting and
  23 constructing bug reports. This page provides hints and guidelines for anyone
  24 who is interested in implementing their own checker. The static analyzer is a
  25 part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a>
  26 and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
  27 for general developer guidelines and information. </p>
  28
  29     <ul>
  30       <li><a href="#start">Getting Started</a></li>
  31       <li><a href="#analyzer">Analyzer Overview</a></li>
  32       <li><a href="#idea">Idea for a Checker</a></li>
  33       <li><a href="#registration">Checker Registration</a></li>
  34       <li><a href="#skeleton">Checker Skeleton</a></li>
  35       <li><a href="#node">Exploded Node</a></li>
  36       <li><a href="#bugs">Bug Reports</a></li>
  37       <li><a href="#ast">AST Visitors</a></li>
  38       <li><a href="#testing">Testing</a></li>
  39       <li><a href="#commands">Useful Commands</a></li>
  40     </ul>
  41
  42 <h2 id=start>Getting Started</h2>
  43   <ul>
  44     <li>To check out the source code and build the project, follow steps 1-4 of
  45     the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>
  46   page.</li>
  47
  48     <li>The analyzer source code is located under the Clang source tree:
  49     <br><tt>
  50     $ <b>cd llvm/tools/clang</b>
  51     </tt>
  52     <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
  53      <tt>test/Analysis</tt>.</li>
  54
  55     <li>The analyzer regression tests can be executed from the Clang's build
  56     directory:
  57     <br><tt>
  58     $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
  59     </tt></li>
  60
  61     <li>Analyze a file with the specified checker:
  62     <br><tt>
  63     $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
  64     </tt></li>
  65
  66     <li>List the available checkers:
  67     <br><tt>
  68     $ <b>clang -cc1 -analyzer-checker-help</b>
  69     </tt></li>
  70
  71     <li>See the analyzer help for different output formats, fine tuning, and
  72     debug options:
  73     <br><tt>
  74     $ <b>clang -cc1 -help | grep "analyzer"</b>
  75     </tt></li>
  76
  77   </ul>
  78
  79 <h2 id=analyzer>Static Analyzer Overview</h2>
  80   The analyzer core performs symbolic execution of the given program. All the
  81   input values are represented with symbolic values; further, the engine deduces
  82   the values of all the expressions in the program based on the input symbols
  83   and the path. The execution is path sensitive and every possible path through
  84   the program is explored. The explored execution traces are represented with
  85   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplidedGraph</a> object.
  86   Each node of the graph is
  87   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
  88   which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
  89   <p>
  90   <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
  91   represents the corresponding location in the program (or the CFG graph).
  92   <tt>ProgramPoint</tt> is also used to record additional information on
  93   when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
  94   kind means that the state is the result of purging dead symbols - the
  95   analyzer's equivalent of garbage collection.
  96   <p>
  97   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
  98   represents abstract state of the program. It consists of:
  99   <ul>
 100     <li><tt>Environment</tt> - a mapping from source code expressions to symbolic
 101     values
 102     <li><tt>Store</tt> - a mapping from memory locations to symbolic values
 103     <li><tt>GenericDataMap</tt> - constraints on symbolic values
 104   </ul>
 105
 106   <h3>Interaction with Checkers</h3>
 107   Checkers are not merely passive receivers of the analyzer core changes - they
 108   actively participate in the <tt>ProgramState</tt> construction through the
 109   <tt>GenericDataMap</tt> which can be used to store the checker-defined part
 110   of the state. Each time the analyzer engine explores a new statement, it
 111   notifies each checker registered to listen for that statement, giving it an
 112   opportunity to either report a bug or modify the state. (As a rule of thumb,
 113   the checker itself should be stateless.) The checkers are called one after another
 114   in the predefined order; thus, calling all the checkers adds a chain to the
 115   <tt>ExplodedGraph</tt>.
 116
 117   <h3>Representing Values</h3>
 118   During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>
 119   objects are used to represent the semantic evaluation of expressions. They can
 120   represent things like concrete integers, symbolic values, or memory locations
 121   (which are memory regions). They are a discriminated union of "values",
 122   symbolic and otherwise.
 123   <p>
 124   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)
 125   is meant to represent abstract, but named, symbolic value.
 126   Symbolic values can have constraints associated with them. Symbols represent
 127   an actual (immutable) value. We might not know what its specific value is, but
 128   we can associate constraints with that value as we analyze a path.
 129   <p>
 130
 131   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.
 132   It is used to provide a lexicon of how to describe abstract memory. Regions can
 133   layer on top of other regions, providing a layered approach to representing memory.
 134   For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,
 135   but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could
 136   be used to represent the memory associated with a specific field of that object.
 137   So how do we represent symbolic memory regions? That's what <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>
 138   is for.  It is a <tt>MemRegion</tt> that has an associated symbol. Since the
 139   symbol is unique and has a unique name; that symbol names the region.
 140   <p>
 141   Let's see how the analyzer processes the expressions in the following example:
 142   <p>
 143   <pre class="code_example">
 144   int foo(int x) {
 145      int y = x * 2;
 146      int z = x;
 147      ...
 148   }
 149   </pre>
 150   <p>
 151 Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,
 152 we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in
 153 this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.
 154 Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,
 155 which references the value <b>currently bound</b> to <tt>x</tt>. That value is
 156 symbolic; it's whatever <tt>x</tt> was bound to at the start of the function.
 157 Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,
 158 and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When
 159 we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,
 160 and create a new <tt>SVal</tt> that represents their multiplication (which in
 161 this case is a new symbolic expression, which we might call <tt>$1</tt>). When we
 162 evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),
 163 and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)
 164 to the <tt>MemRegion</tt> in the symbolic store.
 165 <br>
 166 The second line is similar. When we evaluate <tt>x</tt> again, we do the same
 167 dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>
 168 might reference the same underlying values.
 169
 170 <p>
 171 To summarize, MemRegions are unique names for blocks of memory. Symbols are
 172 unique names for abstract symbolic values. Some MemRegions represents abstract
 173 symbolic chunks of memory, and thus are also based on symbols. SVals are just
 174 references to values, and can reference either MemRegions, Symbols, or concrete
 175 values (e.g., the number 1).
 176
 177   <!--
 178   TODO: Add a picture.
 179   <br>
 180   Symbols<br>
 181   FunctionalObjects are used throughout.
 182   -->
 183 <h2 id=idea>Idea for a Checker</h2>
 184   Here are several questions which you should consider when evaluating your
 185   checker idea:
 186   <ul>
 187     <li>Can the check be effectively implemented without path-sensitive
 188     analysis? See <a href="#ast">AST Visitors</a>.</li>
 189
 190     <li>How high the false positive rate is going to be? Looking at the occurrences
 191     of the issue you want to write a checker for in the existing code bases might
 192     give you some ideas. </li>
 193
 194     <li>How the current limitations of the analysis will effect the false alarm
 195     rate? Currently, the analyzer only reasons about one procedure at a time (no
 196     inter-procedural analysis). Also, it uses a simple range tracking based
 197     solver to model symbolic execution.</li>
 198
 199     <li>Consult the <a
 200     href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&amp;bug_status=NEW&amp;bug_status=REOPENED&amp;version=trunk&amp;component=Static%20Analyzer&amp;product=clang">Bugzilla database</a>
 201     to get some ideas for new checkers and consider starting with improving/fixing
 202     bugs in the existing checkers.</li>
 203   </ul>
 204
 205 <h2 id=registration>Checker Registration</h2>
 206   All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt>
 207   folder. Follow the steps below to register a new checker with the analyzer.
 208 <ol>
 209   <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt>
 210 <pre class="code_example">
 211 using namespace clang;
 212 using namespace ento;
 213
 214 namespace {
 215 class NewChecker: public Checker< check::PreStmt&lt;CallExpr> > {
 216 public:
 217   void checkPreStmt(const CallExpr *CE, CheckerContext &amp;Ctx) const {}
 218 }
 219 }
 220 void ento::registerNewChecker(CheckerManager &amp;mgr) {
 221   mgr.registerChecker&lt;NewChecker>();
 222 }
 223 </pre>
 224
 225 <li>Pick the package name for your checker and add the registration code to
 226 <tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should
 227 first be developed as experimental. Suppose our new checker performs security
 228 related checks, then we should add the following lines under
 229 <tt>SecurityExperimental</tt> package:
 230 <pre class="code_example">
 231 let ParentPackage = SecurityExperimental in {
 232 ...
 233 def NewChecker : Checker<"NewChecker">,
 234   HelpText<"This text should give a short description of the checks performed.">,
 235   DescFile<"NewChecker.cpp">;
 236 ...
 237 } // end "security.experimental"
 238 </pre>
 239
 240 <li>Make the source code file visible to CMake by adding it to
 241 <tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
 242
 243 <li>Compile and see your checker in the list of available checkers by running:<br>
 244 <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
 245 </ol>
 246
 247
 248 <h2 id=skeleton>Checker Skeleton</h2>
 249   There are two main decisions you need to make:
 250   <ul>
 251     <li> Which events the checker should be tracking.
 252     See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a>
 253     for the list of available checker callbacks.</li>
 254     <li> What data you want to store as part of the checker-specific program
 255     state. Try to minimize the checker state as much as possible. </li>
 256   </ul>
 257
 258 <h2 id=bugs>Bug Reports</h2>
 259
 260 <h2 id=ast>AST Visitors</h2>
 261   Some checks might not require path-sensitivity to be effective. Simple AST walk
 262   might be sufficient. If that is the case, consider implementing a Clang
 263   compiler warning. On the other hand, a check might not be acceptable as a compiler
 264   warning; for example, because of a relatively high false positive rate. In this
 265   situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
 266   <tt><b>checkASTCodeBody</b></tt> are your best friends.
 267
 268 <h2 id=testing>Testing</h2>
 269   Every patch should be well tested with Clang regression tests. The checker tests
 270   live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,
 271   execute the following from the <tt>clang</tt> build directory:
 272     <pre class="code">
 273     $ <b>TESTDIRS=Analysis make test</b>
 274     </pre>
 275
 276 <h2 id=commands>Useful Commands/Debugging Hints</h2>
 277 <ul>
 278 <li>
 279 While investigating a checker-related issue, instruct the analyzer to only
 280 execute a single checker:
 281 <br><tt>
 282 $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
 283 </tt>
 284 </li>
 285 <li>
 286 To dump AST:
 287 <br><tt>
 288 $ <b>clang -cc1 -ast-dump test.c</b>
 289 </tt>
 290 </li>
 291 <li>
 292 To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
 293 <br><tt>
 294 $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
 295 </tt>
 296 </li>
 297 <li>
 298 To see all available debug checkers:
 299 <br><tt>
 300 $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
 301 </tt>
 302 </li>
 303 <li>
 304 To see which function is failing while processing a large file use
 305 <tt>-analyzer-display-progress</tt> option.
 306 </li>
 307 <li>
 308 While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt>
 309 instead of <tt>clang --analyze</tt>, as the later would call the compiler
 310 in a separate process.
 311 </li>
 312 <li>
 313 To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while
 314 debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and
 315 execute:
 316 <br><tt>
 317 (gdb) <b>p ViewGraph(0)</b>
 318 </tt>
 319 </li>
 320 <li>
 321 To see the <tt>ProgramState</tt> while debugging use the following command.
 322 <br><tt>
 323 (gdb) <b>p State->dump()</b>
 324 </tt>
 325 </li>
 326 <li>
 327 To see <tt>clang::Expr</tt> while debugging use the following command. If you
 328 pass in a SourceManager object, it will also dump the corresponding line in the
 329 source code.
 330 <br><tt>
 331 (gdb) <b>p E->dump()</b>
 332 </tt>
 333 </li>
 334 <li>
 335 To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
 336 <br><tt>
 337 (gdb) <b>p ENode->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
 338 </tt>
 339 </li>
 340 </ul>
 341
 342 </div>
 343 </div>
 344 </body>
 345 </html>