granicus.if.org Git - clang/blob - docs/SanitizerCoverage.rst

   1 =================
   2 SanitizerCoverage
   3 =================
   4
   5 .. contents::
   6    :local:
   7
   8 Introduction
   9 ============
  10
  11 Sanitizer tools have a very simple code coverage tool built in. It allows to
  12 get function-level, basic-block-level, and edge-level coverage at a very low
  13 cost.
  14
  15 How to build and run
  16 ====================
  17
  18 SanitizerCoverage can be used with :doc:`AddressSanitizer`,
  19 :doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
  20 In addition to ``-fsanitize=``, pass one of the following compile-time flags:
  21
  22 * ``-fsanitize-coverage=func`` for function-level coverage (very fast).
  23 * ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
  24   **extra** slowdown).
  25 * ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
  26
  27 You may also specify ``-fsanitize-coverage=indirect-calls`` for
  28 additional `caller-callee coverage`_.
  29
  30 At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
  31 ``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
  32
  33 To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
  34 to one of the above compile-time flags. At runtime, use
  35 ``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
  36
  37 Example:
  38
  39 .. code-block:: console
  40
  41     % cat -n cov.cc
  42          1  #include <stdio.h>
  43          2  __attribute__((noinline))
  44          3  void foo() { printf("foo\n"); }
  45          4
  46          5  int main(int argc, char **argv) {
  47          6    if (argc == 2)
  48          7      foo();
  49          8    printf("main\n");
  50          9  }
  51     % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
  52     % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
  53     main
  54     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
  55     % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
  56     foo
  57     main
  58     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
  59     -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
  60
  61 Every time you run an executable instrumented with SanitizerCoverage
  62 one ``*.sancov`` file is created during the process shutdown.
  63 If the executable is dynamically linked against instrumented DSOs,
  64 one ``*.sancov`` file will be also created for every DSO.
  65
  66 Postprocessing
  67 ==============
  68
  69 The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
  70 one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
  71 magic defines the size of the following offsets. The rest of the data is the
  72 offsets in the corresponding binary/DSO that were executed during the run.
  73
  74 A simple script
  75 ``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
  76 provided to dump these offsets.
  77
  78 .. code-block:: console
  79
  80     % sancov.py print a.out.22679.sancov a.out.22673.sancov
  81     sancov.py: read 2 PCs from a.out.22679.sancov
  82     sancov.py: read 1 PCs from a.out.22673.sancov
  83     sancov.py: 2 files merged; 2 PCs total
  84     0x465250
  85     0x4652a0
  86
  87 You can then filter the output of ``sancov.py`` through ``addr2line --exe
  88 ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
  89 numbers:
  90
  91 .. code-block:: console
  92
  93     % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out
  94     cov.cc:3
  95     cov.cc:5
  96
  97 How good is the coverage?
  98 =========================
  99
 100 It is possible to find out which PCs are not covered, by subtracting the covered
 101 set from the set of all instrumented PCs. The latter can be obtained by listing
 102 all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
 103 can do this for you. Just supply the path to binary and a list of covered PCs:
 104
 105 .. code-block:: console
 106
 107     % sancov.py print a.out.12345.sancov > covered.txt
 108     sancov.py: read 2 64-bit PCs from a.out.12345.sancov
 109     sancov.py: 1 file merged; 2 PCs total
 110     % sancov.py missing a.out < covered.txt
 111     sancov.py: found 3 instrumented PCs in a.out
 112     sancov.py: read 2 PCs from stdin
 113     sancov.py: 1 PCs missing from coverage
 114     0x4cc61c
 115
 116 Edge coverage
 117 =============
 118
 119 Consider this code:
 120
 121 .. code-block:: c++
 122
 123     void foo(int *a) {
 124       if (a)
 125         *a = 0;
 126     }
 127
 128 It contains 3 basic blocks, let's name them A, B, C:
 129
 130 .. code-block:: none
 131
 132     A
 133     |\
 134     | \
 135     |  B
 136     | /
 137     |/
 138     C
 139
 140 If blocks A, B, and C are all covered we know for certain that the edges A=>B
 141 and B=>C were executed, but we still don't know if the edge A=>C was executed.
 142 Such edges of control flow graph are called
 143 `critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
 144 edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
 145 edges by introducing new dummy blocks and then instruments those blocks:
 146
 147 .. code-block:: none
 148
 149     A
 150     |\
 151     | \
 152     D  B
 153     | /
 154     |/
 155     C
 156
 157 Bitset
 158 ======
 159
 160 When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
 161 dumped as a bitset (text file with 1 for blocks that have been executed and 0
 162 for blocks that were not).
 163
 164 .. code-block:: console
 165
 166     % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
 167     % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
 168     main
 169     % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
 170     foo
 171     main
 172     % head *bitset*
 173     ==> a.out.38214.bitset-sancov <==
 174     01101
 175     ==> a.out.6128.bitset-sancov <==
 176     11011%
 177
 178 For a given executable the length of the bitset is always the same (well,
 179 unless dlopen/dlclose come into play), so the bitset coverage can be
 180 easily used for bitset-based corpus distillation.
 181
 182 Caller-callee coverage
 183 ======================
 184
 185 (Experimental!)
 186 Every indirect function call is instrumented with a run-time function call that
 187 captures caller and callee.  At the shutdown time the process dumps a separate
 188 file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
 189 pairs of lines (odd lines are callers, even lines are callees)
 190
 191 .. code-block:: console
 192
 193     a.out 0x4a2e0c
 194     a.out 0x4a6510
 195     a.out 0x4a2e0c
 196     a.out 0x4a87f0
 197
 198 Current limitations:
 199
 200 * Only the first 14 callees for every caller are recorded, the rest are silently
 201   ignored.
 202 * The output format is not very compact since caller and callee may reside in
 203   different modules and we need to spell out the module names.
 204 * The routine that dumps the output is not optimized for speed
 205 * Only Linux x86_64 is tested so far.
 206 * Sandboxes are not supported.
 207
 208 Coverage counters
 209 =================
 210
 211 This experimental feature is inspired by
 212 `AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
 213 instrumentation. With additional compile-time and run-time flags you can get
 214 more sensitive coverage information.  In addition to boolean values assigned to
 215 every basic block (edge) the instrumentation will collect imprecise counters.
 216 On exit, every counter will be mapped to a 8-bit bitset representing counter
 217 ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
 218 be dumped to disk.
 219
 220 .. code-block:: console
 221
 222     % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
 223     % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
 224     % ls -l *counters-sancov
 225     ... a.out.17110.counters-sancov
 226     % xxd *counters-sancov
 227     0000000: 0001 0100 01
 228
 229 These counters may also be used for in-process coverage-guided fuzzers. See
 230 ``include/sanitizer/coverage_interface.h``:
 231
 232 .. code-block:: c++
 233
 234     // The coverage instrumentation may optionally provide imprecise counters.
 235     // Rather than exposing the counter values to the user we instead map
 236     // the counters to a bitset.
 237     // Every counter is associated with 8 bits in the bitset.
 238     // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
 239     // The i-th bit is set to 1 if the counter value is in the i-th range.
 240     // This counter-based coverage implementation is *not* thread-safe.
 241
 242     // Returns the number of registered coverage counters.
 243     uintptr_t __sanitizer_get_number_of_counters();
 244     // Updates the counter 'bitset', clears the counters and returns the number of
 245     // new bits in 'bitset'.
 246     // If 'bitset' is nullptr, only clears the counters.
 247     // Otherwise 'bitset' should be at least
 248     // __sanitizer_get_number_of_counters bytes long and 8-aligned.
 249     uintptr_t
 250     __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
 251
 252 Tracing basic blocks
 253 ====================
 254 An *experimental* feature to support basic block (or edge) tracing.
 255 With ``-fsanitize-coverage=trace-bb`` the compiler will insert
 256 ``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
 257 (depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
 258
 259 Tracing data flow
 260 =================
 261
 262 An *experimental* feature to support data-flow-guided fuzzing.
 263 With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
 264 around comparison instructions and switch statements.
 265 The fuzzer will need to define the following functions,
 266 they will be called by the instrumented code.
 267
 268 .. code-block:: c++
 269
 270   // Called before a comparison instruction.
 271   // SizeAndType is a packed value containing
 272   //   - [63:32] the Size of the operands of comparison in bits
 273   //   - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
 274   // Arg1 and Arg2 are arguments of the comparison.
 275   void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
 276
 277   // Called before a switch statement.
 278   // Val is the switch operand.
 279   // Cases[0] is the number of case constants.
 280   // Cases[1] is the size of Val in bits.
 281   // Cases[2:] are the case constants.
 282   void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
 283
 284 This interface is a subject to change.
 285 The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
 286
 287 Output directory
 288 ================
 289
 290 By default, .sancov files are created in the current working directory.
 291 This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
 292
 293 .. code-block:: console
 294
 295     % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
 296     % ls -l /tmp/cov/*sancov
 297     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
 298     -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
 299
 300 Sudden death
 301 ============
 302
 303 Normally, coverage data is collected in memory and saved to disk when the
 304 program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
 305 ``__sanitizer_cov_dump()`` is called.
 306
 307 If the program ends with a signal that ASan does not handle (or can not handle
 308 at all, like SIGKILL), coverage data will be lost. This is a big problem on
 309 Android, where SIGKILL is a normal way of evicting applications from memory.
 310
 311 With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
 312 memory-mapped file as soon as it collected.
 313
 314 .. code-block:: console
 315
 316     % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
 317     main
 318     % ls
 319     7036.sancov.map  7036.sancov.raw  a.out
 320     % sancov.py rawunpack 7036.sancov.raw
 321     sancov.py: reading map 7036.sancov.map
 322     sancov.py: unpacking 7036.sancov.raw
 323     writing 1 PCs to a.out.7036.sancov
 324     % sancov.py print a.out.7036.sancov
 325     sancov.py: read 1 PCs from a.out.7036.sancov
 326     sancov.py: 1 files merged; 1 PCs total
 327     0x4b2bae
 328
 329 Note that on 64-bit platforms, this method writes 2x more data than the default,
 330 because it stores full PC values instead of 32-bit offsets.
 331
 332 In-process fuzzing
 333 ==================
 334
 335 Coverage data could be useful for fuzzers and sometimes it is preferable to run
 336 a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
 337
 338 You can use ``__sanitizer_get_total_unique_coverage()`` from
 339 ``<sanitizer/coverage_interface.h>`` which returns the number of currently
 340 covered entities in the program. This will tell the fuzzer if the coverage has
 341 increased after testing every new input.
 342
 343 If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
 344 before exiting the process.  Use ``__asan_set_death_callback`` from
 345 ``<sanitizer/asan_interface.h>`` to do that.
 346
 347 An example of such fuzzer can be found in `the LLVM tree
 348 <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
 349
 350 Performance
 351 ===========
 352
 353 This coverage implementation is **fast**. With function-level coverage
 354 (``-fsanitize-coverage=func``) the overhead is not measurable. With
 355 basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
 356 between 0 and 25%.
 357
 358 ==============  =========  =========  =========  =========  =========  =========
 359      benchmark      cov0        cov1   diff 0-1       cov2   diff 0-2   diff 1-2
 360 ==============  =========  =========  =========  =========  =========  =========
 361  400.perlbench    1296.00    1307.00       1.01    1465.00       1.13       1.12
 362      401.bzip2     858.00     854.00       1.00    1010.00       1.18       1.18
 363        403.gcc     613.00     617.00       1.01     683.00       1.11       1.11
 364        429.mcf     605.00     582.00       0.96     610.00       1.01       1.05
 365      445.gobmk     896.00     880.00       0.98    1050.00       1.17       1.19
 366      456.hmmer     892.00     892.00       1.00     918.00       1.03       1.03
 367      458.sjeng     995.00    1009.00       1.01    1217.00       1.22       1.21
 368 462.libquantum     497.00     492.00       0.99     534.00       1.07       1.09
 369    464.h264ref    1461.00    1467.00       1.00    1543.00       1.06       1.05
 370    471.omnetpp     575.00     590.00       1.03     660.00       1.15       1.12
 371      473.astar     658.00     652.00       0.99     715.00       1.09       1.10
 372  483.xalancbmk     471.00     491.00       1.04     582.00       1.24       1.19
 373       433.milc     616.00     627.00       1.02     627.00       1.02       1.00
 374       444.namd     602.00     601.00       1.00     654.00       1.09       1.09
 375     447.dealII     630.00     634.00       1.01     653.00       1.04       1.03
 376     450.soplex     365.00     368.00       1.01     395.00       1.08       1.07
 377     453.povray     427.00     434.00       1.02     495.00       1.16       1.14
 378        470.lbm     357.00     375.00       1.05     370.00       1.04       0.99
 379    482.sphinx3     927.00     928.00       1.00    1000.00       1.08       1.08
 380 ==============  =========  =========  =========  =========  =========  =========
 381
 382 Why another coverage?
 383 =====================
 384
 385 Why did we implement yet another code coverage?
 386   * We needed something that is lightning fast, plays well with
 387     AddressSanitizer, and does not significantly increase the binary size.
 388   * Traditional coverage implementations based in global counters
 389     `suffer from contention on counters
 390     <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.