11 Sanitizer tools have a very simple code coverage tool built in. It allows to
12 get function-level, basic-block-level, and edge-level coverage at a very low
18 SanitizerCoverage can be used with :doc:`AddressSanitizer`,
19 :doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
20 In addition to ``-fsanitize=``, pass one of the following compile-time flags:
22 * ``-fsanitize-coverage=func`` for function-level coverage (very fast).
23 * ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
25 * ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
27 You may also specify ``-fsanitize-coverage=indirect-calls`` for
28 additional `caller-callee coverage`_.
30 At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
31 ``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
33 To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
34 to one of the above compile-time flags. At runtime, use
35 ``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
39 .. code-block:: console
43 2 __attribute__((noinline))
44 3 void foo() { printf("foo\n"); }
46 5 int main(int argc, char **argv) {
51 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
52 % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
54 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
55 % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
58 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
59 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
61 Every time you run an executable instrumented with SanitizerCoverage
62 one ``*.sancov`` file is created during the process shutdown.
63 If the executable is dynamically linked against instrumented DSOs,
64 one ``*.sancov`` file will be also created for every DSO.
69 The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
70 one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
71 magic defines the size of the following offsets. The rest of the data is the
72 offsets in the corresponding binary/DSO that were executed during the run.
75 ``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
76 provided to dump these offsets.
78 .. code-block:: console
80 % sancov.py print a.out.22679.sancov a.out.22673.sancov
81 sancov.py: read 2 PCs from a.out.22679.sancov
82 sancov.py: read 1 PCs from a.out.22673.sancov
83 sancov.py: 2 files merged; 2 PCs total
87 You can then filter the output of ``sancov.py`` through ``addr2line --exe
88 ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
91 .. code-block:: console
93 % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out
97 How good is the coverage?
98 =========================
100 It is possible to find out which PCs are not covered, by subtracting the covered
101 set from the set of all instrumented PCs. The latter can be obtained by listing
102 all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
103 can do this for you. Just supply the path to binary and a list of covered PCs:
105 .. code-block:: console
107 % sancov.py print a.out.12345.sancov > covered.txt
108 sancov.py: read 2 64-bit PCs from a.out.12345.sancov
109 sancov.py: 1 file merged; 2 PCs total
110 % sancov.py missing a.out < covered.txt
111 sancov.py: found 3 instrumented PCs in a.out
112 sancov.py: read 2 PCs from stdin
113 sancov.py: 1 PCs missing from coverage
128 It contains 3 basic blocks, let's name them A, B, C:
140 If blocks A, B, and C are all covered we know for certain that the edges A=>B
141 and B=>C were executed, but we still don't know if the edge A=>C was executed.
142 Such edges of control flow graph are called
143 `critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
144 edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
145 edges by introducing new dummy blocks and then instruments those blocks:
160 When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
161 dumped as a bitset (text file with 1 for blocks that have been executed and 0
162 for blocks that were not).
164 .. code-block:: console
166 % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
167 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
169 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
173 ==> a.out.38214.bitset-sancov <==
175 ==> a.out.6128.bitset-sancov <==
178 For a given executable the length of the bitset is always the same (well,
179 unless dlopen/dlclose come into play), so the bitset coverage can be
180 easily used for bitset-based corpus distillation.
182 Caller-callee coverage
183 ======================
186 Every indirect function call is instrumented with a run-time function call that
187 captures caller and callee. At the shutdown time the process dumps a separate
188 file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
189 pairs of lines (odd lines are callers, even lines are callees)
191 .. code-block:: console
200 * Only the first 14 callees for every caller are recorded, the rest are silently
202 * The output format is not very compact since caller and callee may reside in
203 different modules and we need to spell out the module names.
204 * The routine that dumps the output is not optimized for speed
205 * Only Linux x86_64 is tested so far.
206 * Sandboxes are not supported.
211 This experimental feature is inspired by
212 `AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
213 instrumentation. With additional compile-time and run-time flags you can get
214 more sensitive coverage information. In addition to boolean values assigned to
215 every basic block (edge) the instrumentation will collect imprecise counters.
216 On exit, every counter will be mapped to a 8-bit bitset representing counter
217 ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
220 .. code-block:: console
222 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
223 % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
224 % ls -l *counters-sancov
225 ... a.out.17110.counters-sancov
226 % xxd *counters-sancov
227 0000000: 0001 0100 01
229 These counters may also be used for in-process coverage-guided fuzzers. See
230 ``include/sanitizer/coverage_interface.h``:
234 // The coverage instrumentation may optionally provide imprecise counters.
235 // Rather than exposing the counter values to the user we instead map
236 // the counters to a bitset.
237 // Every counter is associated with 8 bits in the bitset.
238 // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
239 // The i-th bit is set to 1 if the counter value is in the i-th range.
240 // This counter-based coverage implementation is *not* thread-safe.
242 // Returns the number of registered coverage counters.
243 uintptr_t __sanitizer_get_number_of_counters();
244 // Updates the counter 'bitset', clears the counters and returns the number of
245 // new bits in 'bitset'.
246 // If 'bitset' is nullptr, only clears the counters.
247 // Otherwise 'bitset' should be at least
248 // __sanitizer_get_number_of_counters bytes long and 8-aligned.
250 __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
254 An *experimental* feature to support basic block (or edge) tracing.
255 With ``-fsanitize-coverage=trace-bb`` the compiler will insert
256 ``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
257 (depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
262 An *experimental* feature to support data-flow-guided fuzzing.
263 With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
264 around comparison instructions and switch statements.
265 The fuzzer will need to define the following functions,
266 they will be called by the instrumented code.
270 // Called before a comparison instruction.
271 // SizeAndType is a packed value containing
272 // - [63:32] the Size of the operands of comparison in bits
273 // - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
274 // Arg1 and Arg2 are arguments of the comparison.
275 void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
277 // Called before a switch statement.
278 // Val is the switch operand.
279 // Cases[0] is the number of case constants.
280 // Cases[1] is the size of Val in bits.
281 // Cases[2:] are the case constants.
282 void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
284 This interface is a subject to change.
285 The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
290 By default, .sancov files are created in the current working directory.
291 This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
293 .. code-block:: console
295 % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
296 % ls -l /tmp/cov/*sancov
297 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
298 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
303 Normally, coverage data is collected in memory and saved to disk when the
304 program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
305 ``__sanitizer_cov_dump()`` is called.
307 If the program ends with a signal that ASan does not handle (or can not handle
308 at all, like SIGKILL), coverage data will be lost. This is a big problem on
309 Android, where SIGKILL is a normal way of evicting applications from memory.
311 With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
312 memory-mapped file as soon as it collected.
314 .. code-block:: console
316 % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
319 7036.sancov.map 7036.sancov.raw a.out
320 % sancov.py rawunpack 7036.sancov.raw
321 sancov.py: reading map 7036.sancov.map
322 sancov.py: unpacking 7036.sancov.raw
323 writing 1 PCs to a.out.7036.sancov
324 % sancov.py print a.out.7036.sancov
325 sancov.py: read 1 PCs from a.out.7036.sancov
326 sancov.py: 1 files merged; 1 PCs total
329 Note that on 64-bit platforms, this method writes 2x more data than the default,
330 because it stores full PC values instead of 32-bit offsets.
335 Coverage data could be useful for fuzzers and sometimes it is preferable to run
336 a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
338 You can use ``__sanitizer_get_total_unique_coverage()`` from
339 ``<sanitizer/coverage_interface.h>`` which returns the number of currently
340 covered entities in the program. This will tell the fuzzer if the coverage has
341 increased after testing every new input.
343 If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
344 before exiting the process. Use ``__asan_set_death_callback`` from
345 ``<sanitizer/asan_interface.h>`` to do that.
347 An example of such fuzzer can be found in `the LLVM tree
348 <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
353 This coverage implementation is **fast**. With function-level coverage
354 (``-fsanitize-coverage=func``) the overhead is not measurable. With
355 basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
358 ============== ========= ========= ========= ========= ========= =========
359 benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
360 ============== ========= ========= ========= ========= ========= =========
361 400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
362 401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
363 403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
364 429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
365 445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
366 456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
367 458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
368 462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
369 464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
370 471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
371 473.astar 658.00 652.00 0.99 715.00 1.09 1.10
372 483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
373 433.milc 616.00 627.00 1.02 627.00 1.02 1.00
374 444.namd 602.00 601.00 1.00 654.00 1.09 1.09
375 447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
376 450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
377 453.povray 427.00 434.00 1.02 495.00 1.16 1.14
378 470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
379 482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
380 ============== ========= ========= ========= ========= ========= =========
382 Why another coverage?
383 =====================
385 Why did we implement yet another code coverage?
386 * We needed something that is lightning fast, plays well with
387 AddressSanitizer, and does not significantly increase the binary size.
388 * Traditional coverage implementations based in global counters
389 `suffer from contention on counters
390 <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.