From: Anna Zaks Date: Sat, 18 May 2013 22:51:28 +0000 (+0000) Subject: [analyzer] Extend the checker developer manual. A patch by Sam Handler! X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=08cf30eb32f938c70c6f3214e6be4fddc782a333;p=clang [analyzer] Extend the checker developer manual. A patch by Sam Handler! git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@182204 91177308-0d34-0410-b5e6-96231b3b80d8 --- diff --git a/www/analyzer/checker_dev_manual.html b/www/analyzer/checker_dev_manual.html index a824953031..2216176a13 100644 --- a/www/analyzer/checker_dev_manual.html +++ b/www/analyzer/checker_dev_manual.html @@ -14,7 +14,7 @@
-

This Page Is Under Construction

+

This Page Is Under Construction

Checker Developer Manual

@@ -33,15 +33,20 @@ for developer guidelines and send your questions and proposals to

Getting Started

@@ -108,7 +113,7 @@ for developer guidelines and send your questions and proposals to
  • GenericDataMap - constraints on symbolic values -

    Interaction with Checkers

    +

    Interaction with Checkers

    Checkers are not merely passive receivers of the analyzer core changes - they actively participate in the ProgramState construction through the GenericDataMap which can be used to store the checker-defined part @@ -119,7 +124,7 @@ for developer guidelines and send your questions and proposals to in the predefined order; thus, calling all the checkers adds a chain to the ExplodedGraph. -

    Representing Values

    +

    Representing Values

    During symbolic execution, SVal objects are used to represent the semantic evaluation of expressions. They can represent things like concrete @@ -132,7 +137,7 @@ for developer guidelines and send your questions and proposals to number. In some cases, SVal is not a symbol, but it really should be a symbolic value. This happens when the analyzer cannot reason about something (yet). An example is floating point numbers. In such cases, the - SVal will evaluate to UnknownVal. + SVal will evaluate to UnknownVal. This represents a case that is outside the realm of the analyzer's reasoning capabilities. SVals are value objects and their values can be viewed using the .dump() method. Often they wrap persistent objects such as @@ -201,6 +206,7 @@ values (e.g., the number 1). Symbols
    FunctionalObjects are used throughout. --> +

    Idea for a Checker

    Here are several questions which you should consider when evaluating your checker idea: @@ -223,61 +229,274 @@ values (e.g., the number 1). bugs in the existing checkers.
  • +

    Once an idea for a checker has been chosen, there are two key decisions that +need to be made: +

    + +

    Checker Registration

    - All checker implementation files are located in clang/lib/StaticAnalyzer/Checkers - folder. Follow the steps below to register a new checker with the analyzer. + All checker implementation files are located in + clang/lib/StaticAnalyzer/Checkers folder. The steps below describe + how the checker SimpleStreamChecker, which checks for misuses of + stream APIs, was registered with the analyzer. + Similar steps should be followed for a new checker.
      -
    1. Create a new checker implementation file, for example ./lib/StaticAnalyzer/Checkers/NewChecker.cpp +
    2. A new checker implementation file, SimpleStreamChecker.cpp, was + created in the directory lib/StaticAnalyzer/Checkers. +
    3. The following registration code was added to the implementation file:
      -using namespace clang;
      -using namespace ento;
      +void ento::registerSimpleStreamChecker(CheckerManager &mgr) {
      +  mgr.registerChecker<SimpleStreamChecker>();
      +}
      +
      +
    4. A package was selected for the checker and the checker was defined in the +table of checkers at lib/StaticAnalyzer/Checkers/Checkers.td. Since all +checkers should first be developed as "alpha", and the SimpleStreamChecker +performs UNIX API checks, the correct package is "alpha.unix", and the following +was added to the corresponding UnixAlpha section of Checkers.td: +
      +let ParentPackage = UnixAlpha in {
      +...
      +def SimpleStreamChecker : Checker<"SimpleStream">,
      +  HelpText<"Check for misuses of stream APIs">,
      +  DescFile<"SimpleStreamChecker.cpp">;
      +...
      +} // end "alpha.unix"
      +
      + +
    5. The source code file was made visible to CMake by adding it to +lib/StaticAnalyzer/Checkers/CMakeLists.txt. + +
    + +After adding a new checker to the analyzer, one can verify that the new checker +was successfully added by seeing if it appears in the list of available checkers: +
    $clang -cc1 -analyzer-checker-help + +

    Events, Callbacks, and Checker Class Structure

    + +

    All checkers inherit from the +Checker template class; the template parameter(s) describe the type of +events that the checker is interested in processing. The various types of events +that are available are described in the file +CheckerDocumentation.cpp -namespace { -class NewChecker: public Checker< check::PreStmt<CallExpr> > { +

    For each event type requested, a corresponding callback function must be +defined in the checker class ( +CheckerDocumentation.cpp shows the +correct function name and signature for each event type). + +

    As an example, consider SimpleStreamChecker. This checker needs to +take action at the following times: + +

    + +

    These events that will be used for each of these actions are, respectively, PreCall, +PostCall, +DeadSymbols, +and PointerEscape. +The high-level structure of the checker's class is thus: + +

    +class SimpleStreamChecker : public Checker<check::PreCall,
    +                                           check::PostCall,
    +                                           check::DeadSymbols,
    +                                           check::PointerEscape> {
     public:
    -  void checkPreStmt(const CallExpr *CE, CheckerContext &Ctx) const {}
    -}
    -}
    -void ento::registerNewChecker(CheckerManager &mgr) {
    -  mgr.registerChecker<NewChecker>();
    -}
    +
    +  void checkPreCall(const CallEvent &Call, CheckerContext &C) const;
    +
    +  void checkPostCall(const CallEvent &Call, CheckerContext &C) const;
    +
    +  void checkDeadSymbols(SymbolReaper &SR, CheckerContext &C) const;
    +
    +  ProgramStateRef checkPointerEscape(ProgramStateRef State,
    +                                     const InvalidatedSymbols &Escaped,
    +                                     const CallEvent *Call,
    +                                     PointerEscapeKind Kind) const;
    +};
    +
    + +

    Custom Program States

    + +

    Checkers often need to keep track of information specific to the checks they +perform. However, since checkers have no guarantee about the order in which the +program will be explored, or even that all possible paths will be explored, this +state information cannot be kept within individual checkers. Therefore, if +checkers need to store custom information, they need to add new categories of +data to the ProgramState. The preferred way to do so is to use one of +several macros designed for this purpose. They are: + +

    + +

    All of these macros take as parameters the name to be used for the custom +category of state information and the data type(s) to be used for storage. The +data type(s) specified will become the parameter type and/or return type of the +methods that manipulate the new category of state information. Each of these +methods are templated with the name of the custom data type. + +

    For example, a common case is the need to track data associated with a +symbolic expression; a map type is the most logical way to implement this. The +key for this map will be a pointer to a symbolic expression +(SymbolRef). If the data type to be associated with the symbolic +expression is an integer, then the custom category of state information would be +declared as + +

    +REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int)
     
    -
  • Pick the package name for your checker and add the registration code to -./lib/StaticAnalyzer/Checkers/Checkers.td. Note, all checkers should -first be developed as experimental. Suppose our new checker performs security -related checks, then we should add the following lines under -SecurityExperimental package: +The data would be accessed with the function +
    -let ParentPackage = SecurityExperimental in {
    +ProgramStateRef state;
    +SymbolRef Sym;
     ...
    -def NewChecker : Checker<"NewChecker">,
    -  HelpText<"This text should give a short description of the checks performed.">,
    -  DescFile<"NewChecker.cpp">;
    +int currentlValue = state->get<ExampleDataType>(Sym);
    +
    + +and set with the function + +
    +ProgramStateRef state;
    +SymbolRef Sym;
    +int newValue;
     ...
    -} // end "security.experimental"
    +ProgramStateRef newState = state->set<ExampleDataType>(Sym, newValue);
     
    -
  • Make the source code file visible to CMake by adding it to -./lib/StaticAnalyzer/Checkers/CMakeLists.txt. +

    In addition, the macros define a data type used for storing the data of the +new data category; the name of this type is the name of the data category with +"Ty" appended. For REGISTER_TRAIT_WITH_PROGRAMSTATE, this will simply +be passed data type; for the other three macros, this will be a specialized +version of the llvm::ImmutableList, +llvm::ImmutableSet, +or llvm::ImmutableMap +templated class. For the ExampleDataType example above, the type +created would be equivalent to writing the declaration: -

  • Compile and see your checker in the list of available checkers by running:
    -$clang -cc1 -analyzer-checker-help - - +
    +typedef llvm::ImmutableMap<SymbolRef, int> ExampleDataTypeTy;
    +
    -

    Checker Skeleton

    - There are two main decisions you need to make: -
      -
    • Which events the checker should be tracking. - See CheckerDocumentation - for the list of available checker callbacks.
    • -
    • What data you want to store as part of the checker-specific program - state. Try to minimize the checker state as much as possible.
    • -
    +

    These macros will cover a majority of use cases; however, they still have a +few limitations. They cannot be used inside namespaces (since they expand to +contain top-level namespace references), and the data types that they define +cannot be referenced from more than one file. +

    Note that ProgramStates are immutable; instead of modifying an existing +one, functions that modify the state will return a copy of the previous state +with the change applied. This updated state must be then provided to the +analyzer core by calling the CheckerContext::addTransition function.

    Bug Reports

    + +

    When a checker detects a mistake in the analyzed code, it needs a way to +report it to the analyzer core so that it can be displayed. The two classes used +to construct this report are BugType +and +BugReport. + +

    +BugType, as the name would suggest, represents a type of bug. The +constructor for BugType takes two parameters: The name of the bug +type, and the name of the category of the bug. These are used (e.g.) in the +summary page generated by the scan-build tool. + +

    + The BugReport class represents a specific occurrence of a bug. In + the most common case, three parameters are used to form a BugReport: +

      +
    1. The type of bug, specified as an instance of the BugType class. +
    2. A short descriptive string. This is placed at the location of the bug in +the detailed line-by-line output generated by scan-build. +
    3. The context in which the bug occurred. This includes both the location of +the bug in the program and the program's state when the location is reached. These are +both encapsulated in an ExplodedNode. +
    + +

    In order to obtain the correct ExplodedNode, a decision must be made +as to whether or not analysis can continue along the current path. This decision +is based on whether the detected bug is one that would prevent the program under +analysis from continuing. For example, leaking of a resource should not stop +analysis, as the program can continue to run after the leak. Dereferencing a +null pointer, on the other hand, should stop analysis, as there is no way for +the program to meaningfully continue after such an error. + +

    If analysis can continue, then the most recent ExplodedNode +generated by the checker can be passed to the BugReport constructor +without additional modification. This ExplodedNode will be the one +returned by the most recent call to CheckerContext::addTransition. +If no transition has been performed during the current callback, the checker should call CheckerContext::addTransition() +and use the returned node for bug reporting. + +

    If analysis can not continue, then the current state should be transitioned +into a so-called sink node, a node from which no further analysis will be +performed. This is done by calling the +CheckerContext::generateSink function; this function is the same as the +addTransition function, but marks the state as a sink node. Like +addTransition, this returns an ExplodedNode with the updated +state, which can then be passed to the BugReport constructor. + +

    +After a BugReport is created, it should be passed to the analyzer core +by calling CheckerContext::emitReport. +

    AST Visitors

    Some checks might not require path-sensitivity to be effective. Simple AST walk might be sufficient. If that is the case, consider implementing a Clang @@ -361,6 +580,31 @@ To dump AST of a method that the current ExplodedNode belongs to:
  • +

    Additional Sources of Information

    + +Here are some additional resources that are useful when working on the Clang +Static Analyzer: + + +