From 51b642c4827997ad34d2de86f9a240d6427a603c Mon Sep 17 00:00:00 2001 From: Kostya Serebryany Date: Tue, 2 May 2017 00:32:57 +0000 Subject: [PATCH] [sanitizer-coverage] update the SanitizerCoverage docs to reflect the current state git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@301888 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/SanitizerCoverage.rst | 400 ++++++++++++++++--------------------- 1 file changed, 167 insertions(+), 233 deletions(-) diff --git a/docs/SanitizerCoverage.rst b/docs/SanitizerCoverage.rst index 43d6a07fa6..1c42b4236a 100644 --- a/docs/SanitizerCoverage.rst +++ b/docs/SanitizerCoverage.rst @@ -8,202 +8,12 @@ SanitizerCoverage Introduction ============ -Sanitizer tools have a very simple code coverage tool built in. It allows to -get function-level, basic-block-level, and edge-level coverage at a very low -cost. - -How to build and run -==================== - -SanitizerCoverage can be used with :doc:`AddressSanitizer`, -:doc:`LeakSanitizer`, :doc:`MemorySanitizer`, -UndefinedBehaviorSanitizer, or without any sanitizer. Pass one of the -following compile-time flags: - -* ``-fsanitize-coverage=func`` for function-level coverage (very fast). -* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30% - **extra** slowdown). -* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown). - -At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, -``LSAN_OPTIONS``, ``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as -appropriate. For the standalone coverage mode, use ``UBSAN_OPTIONS``. - -Example: - -.. code-block:: console - - % cat -n cov.cc - 1 #include - 2 __attribute__((noinline)) - 3 void foo() { printf("foo\n"); } - 4 - 5 int main(int argc, char **argv) { - 6 if (argc == 2) - 7 foo(); - 8 printf("main\n"); - 9 } - % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func - % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov - main - -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov - % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov - foo - main - -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov - -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov - -Every time you run an executable instrumented with SanitizerCoverage -one ``*.sancov`` file is created during the process shutdown. -If the executable is dynamically linked against instrumented DSOs, -one ``*.sancov`` file will be also created for every DSO. - -Postprocessing -============== - -The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic, -one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the -magic defines the size of the following offsets. The rest of the data is the -offsets in the corresponding binary/DSO that were executed during the run. - -A simple script -``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is -provided to dump these offsets. - -.. code-block:: console - - % sancov.py print a.out.22679.sancov a.out.22673.sancov - sancov.py: read 2 PCs from a.out.22679.sancov - sancov.py: read 1 PCs from a.out.22673.sancov - sancov.py: 2 files merged; 2 PCs total - 0x465250 - 0x4652a0 - -You can then filter the output of ``sancov.py`` through ``addr2line --exe -ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line -numbers: - -.. code-block:: console - - % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out - cov.cc:3 - cov.cc:5 - -Sancov Tool -=========== - -A new experimental ``sancov`` tool is developed to process coverage files. -The tool is part of LLVM project and is currently supported only on Linux. -It can handle symbolization tasks autonomously without any extra support -from the environment. You need to pass .sancov files (named -``..sancov`` and paths to all corresponding binary elf files. -Sancov matches these files using module names and binaries file names. - -.. code-block:: console - - USAGE: sancov [options] (|<.sancov file>)... - - Action (required) - -print - Print coverage addresses - -covered-functions - Print all covered functions. - -not-covered-functions - Print all not covered functions. - -symbolize - Symbolizes the report. - - Options - -blacklist= - Blacklist file (sanitizer blacklist format). - -demangle - Print demangled function name. - -strip_path_prefix= - Strip this prefix from file paths in reports - - -Coverage Reports (Experimental) -================================ - -``.sancov`` files do not contain enough information to generate a source-level -coverage report. The missing information is contained -in debug info of the binary. Thus the ``.sancov`` has to be symbolized -to produce a ``.symcov`` file first: - -.. code-block:: console - - sancov -symbolize my_program.123.sancov my_program > my_program.123.symcov - -The ``.symcov`` file can be browsed overlayed over the source code by -running ``tools/sancov/coverage-report-server.py`` script that will start -an HTTP server. - - -How good is the coverage? -========================= - -It is possible to find out which PCs are not covered, by subtracting the covered -set from the set of all instrumented PCs. The latter can be obtained by listing -all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py`` -can do this for you. Just supply the path to binary and a list of covered PCs: - -.. code-block:: console - - % sancov.py print a.out.12345.sancov > covered.txt - sancov.py: read 2 64-bit PCs from a.out.12345.sancov - sancov.py: 1 file merged; 2 PCs total - % sancov.py missing a.out < covered.txt - sancov.py: found 3 instrumented PCs in a.out - sancov.py: read 2 PCs from stdin - sancov.py: 1 PCs missing from coverage - 0x4cc61c - -Edge coverage -============= - -Consider this code: - -.. code-block:: c++ - - void foo(int *a) { - if (a) - *a = 0; - } - -It contains 3 basic blocks, let's name them A, B, C: - -.. code-block:: none - - A - |\ - | \ - | B - | / - |/ - C - -If blocks A, B, and C are all covered we know for certain that the edges A=>B -and B=>C were executed, but we still don't know if the edge A=>C was executed. -Such edges of control flow graph are called -`critical `_. The -edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical -edges by introducing new dummy blocks and then instruments those blocks: - -.. code-block:: none - - A - |\ - | \ - D B - | / - |/ - C - -Tracing PCs -=========== - -*Experimental* feature similar to tracing basic blocks, but with a different API. -With ``-fsanitize-coverage=trace-pc`` the compiler will insert -``__sanitizer_cov_trace_pc()`` on every edge. -With an additional ``...=trace-pc,indirect-calls`` flag -``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call. -These callbacks are not implemented in the Sanitizer run-time and should be defined -by the user. So, these flags do not require the other sanitizer to be used. -This mechanism is used for fuzzing the Linux kernel (https://github.com/google/syzkaller) -and can be used with `AFL `__. +LLVM has a simple code coverage instrumentation built in (SanitizerCoverage). +It inserts calls to user-defined functions on function-, basic-block-, and edge- levels. +Default implementations of those callbacks are provided and implement +simple coverage reporting and visualization, +however if you need *just* coverage visualization you may want to use +:doc:`SourceBasedCodeCoverage ` instead. Tracing PCs with guards ======================= @@ -217,7 +27,7 @@ on every edge: Every edge will have its own `guard_variable` (uint32_t). -The compler will also insert a module constructor that will call +The compler will also insert calls to a module constructor: .. code-block:: c++ @@ -226,7 +36,7 @@ The compler will also insert a module constructor that will call // more than once with the same values of start/stop. __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop); -With `trace-pc-guards,indirect-calls` +With an additional ``...=trace-pc,indirect-calls`` flag ``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call. The functions `__sanitizer_cov_trace_pc_*` should be defined by the user. @@ -309,6 +119,75 @@ Example: guard: 0x71bcdc 4 PC 0x4ecdc7 in main trace-pc-guard-example.cc:4:17 guard: 0x71bcd0 1 PC 0x4ecd20 in foo() trace-pc-guard-example.cc:2:14 +Tracing PCs +=========== + +With ``-fsanitize-coverage=trace-pc`` the compiler will insert +``__sanitizer_cov_trace_pc()`` on every edge. +With an additional ``...=trace-pc,indirect-calls`` flag +``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call. +These callbacks are not implemented in the Sanitizer run-time and should be defined +by the user. +This mechanism is used for fuzzing the Linux kernel +(https://github.com/google/syzkaller). + + +Instrumentation points +====================== +Sanitizer Coverage offers different levels of instrumentation. + +* ``edge`` (default): edges are instrumented (see below). +* ``bb``: basic blocks are instrumented. +* ``func``: only the entry block of every function will be instrumented. + +Use these flags together with ``trace-pc-guard`` or ``trace-pc``, +like this: ``-fsanitize-coverage=func,trace-pc-guard``. + +When ``edge`` or ``bb`` is used, some of the edges/blocks may still be left +uninstrumented if such instrumentation is considered redundant. +**TODO**: add a user-visible option to disable the optimization. + + +Edge coverage +------------- + +Consider this code: + +.. code-block:: c++ + + void foo(int *a) { + if (a) + *a = 0; + } + +It contains 3 basic blocks, let's name them A, B, C: + +.. code-block:: none + + A + |\ + | \ + | B + | / + |/ + C + +If blocks A, B, and C are all covered we know for certain that the edges A=>B +and B=>C were executed, but we still don't know if the edge A=>C was executed. +Such edges of control flow graph are called +`critical `_. The +edge-level coverage simply splits all critical +edges by introducing new dummy blocks and then instruments those blocks: + +.. code-block:: none + + A + |\ + | \ + D B + | / + |/ + C Tracing data flow ================= @@ -349,52 +228,107 @@ the `LLVM GEP instructions `_ This interface is a subject to change. -The current implementation is not thread-safe and thus can be safely used only for single-threaded targets. -Output directory -================ +Default implementation +====================== -By default, .sancov files are created in the current working directory. -This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``: +The sanitizer run-time (AddressSanitizer, MemorySanitizer, etc) provide a +default implementations of some of the coverage callbacks. +You may use this implementation to dump the coverage on disk at the process +exit. + +Example: .. code-block:: console - % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo - % ls -l /tmp/cov/*sancov - -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov - -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov + % cat -n cov.cc + 1 #include + 2 __attribute__((noinline)) + 3 void foo() { printf("foo\n"); } + 4 + 5 int main(int argc, char **argv) { + 6 if (argc == 2) + 7 foo(); + 8 printf("main\n"); + 9 } + % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=trace-pc-guard + % ASAN_OPTIONS=coverage=1 ./a.out; wc -c *.sancov + main + SanitizerCoverage: ./a.out.7312.sancov 2 PCs written + 24 a.out.7312.sancov + % ASAN_OPTIONS=coverage=1 ./a.out foo ; wc -c *.sancov + foo + main + SanitizerCoverage: ./a.out.7316.sancov 3 PCs written + 24 a.out.7312.sancov + 32 a.out.7316.sancov -Sudden death -============ +Every time you run an executable instrumented with SanitizerCoverage +one ``*.sancov`` file is created during the process shutdown. +If the executable is dynamically linked against instrumented DSOs, +one ``*.sancov`` file will be also created for every DSO. -*Deprecated, don't use* +Sancov data format +------------------ -Normally, coverage data is collected in memory and saved to disk when the -program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when -``__sanitizer_cov_dump()`` is called. +The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic, +one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the +magic defines the size of the following offsets. The rest of the data is the +offsets in the corresponding binary/DSO that were executed during the run. -If the program ends with a signal that ASan does not handle (or can not handle -at all, like SIGKILL), coverage data will be lost. This is a big problem on -Android, where SIGKILL is a normal way of evicting applications from memory. +Sancov Tool +----------- -With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a -memory-mapped file as soon as it collected. +An simple ``sancov`` tool is provided to process coverage files. +The tool is part of LLVM project and is currently supported only on Linux. +It can handle symbolization tasks autonomously without any extra support +from the environment. You need to pass .sancov files (named +``..sancov`` and paths to all corresponding binary elf files. +Sancov matches these files using module names and binaries file names. .. code-block:: console - % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out - main - % ls - 7036.sancov.map 7036.sancov.raw a.out - % sancov.py rawunpack 7036.sancov.raw - sancov.py: reading map 7036.sancov.map - sancov.py: unpacking 7036.sancov.raw - writing 1 PCs to a.out.7036.sancov - % sancov.py print a.out.7036.sancov - sancov.py: read 1 PCs from a.out.7036.sancov - sancov.py: 1 files merged; 1 PCs total - 0x4b2bae - -Note that on 64-bit platforms, this method writes 2x more data than the default, -because it stores full PC values instead of 32-bit offsets. + USAGE: sancov [options] (|<.sancov file>)... + + Action (required) + -print - Print coverage addresses + -covered-functions - Print all covered functions. + -not-covered-functions - Print all not covered functions. + -symbolize - Symbolizes the report. + + Options + -blacklist= - Blacklist file (sanitizer blacklist format). + -demangle - Print demangled function name. + -strip_path_prefix= - Strip this prefix from file paths in reports + + +Coverage Reports +---------------- +**Experimental** + +``.sancov`` files do not contain enough information to generate a source-level +coverage report. The missing information is contained +in debug info of the binary. Thus the ``.sancov`` has to be symbolized +to produce a ``.symcov`` file first: + +.. code-block:: console + + sancov -symbolize my_program.123.sancov my_program > my_program.123.symcov + +The ``.symcov`` file can be browsed overlayed over the source code by +running ``tools/sancov/coverage-report-server.py`` script that will start +an HTTP server. + +Output directory +---------------- + +By default, .sancov files are created in the current working directory. +This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``: + +.. code-block:: console + + % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo + % ls -l /tmp/cov/*sancov + -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov + -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov -- 2.40.0