From: Matthew Fernandez Date: Sat, 7 Nov 2020 22:20:27 +0000 (-0800) Subject: add a section of performance and profiling to dev docs X-Git-Tag: 2.46.0~14^2 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=bfd08ce02f62e2c7700f71e36afa5c771062e842;p=graphviz add a section of performance and profiling to dev docs --- diff --git a/DEVELOPERS.md b/DEVELOPERS.md index cc3cef457..44b24f25d 100644 --- a/DEVELOPERS.md +++ b/DEVELOPERS.md @@ -128,6 +128,87 @@ is green 1. Merge the merge request +## Performance and profiling + +The runtime and memory usage of Graphviz is dependent on the user’s graph. It is +easy to create “expensive” graphs without realizing it using only moderately +sized input. For this reason, users regularly encounter performance bottlenecks +that they need help with. This situation is likely to persist even with hardware +advances, as the size and complexity of the graphs users construct will expand +as well. + +This section documents how to build performance-optimized Graphviz binaries and +how to identify performance bottlenecks. Note that this information assumes you +are working in a Linux environment. + +### Building an optimized Graphviz + +The first step to getting an optimized build is to make sure you are using a +recent compiler. If you have not upgraded your C and C++ compilers for a while, +make sure you do this first. + +The simplest way to change flags used during compilation is by setting the +`CFLAGS` and `CXXFLAGS` environment variables: + +```sh +env CFLAGS="..." CXXFLAGS="..." ./configure +``` + +You should use the maximum optimization level for your compiler. E.g. `-O3` for +GCC and Clang. If your toolchain supports it, it is recommended to also enable +link-time optimization (`-flto`). + +You can further optimize compilation for the machine you are building on with +`-march=native -mtune=native`. Be aware that the resulting binaries will no +longer be portable (may not run if copied to another machine). These flags are +also not recommended if you are debugging a user issue, because you will end up +profiling and optimizing different code than what may execute on their machine. + +Most profilers need a symbol table and/or debugging metadata to give you useful +feedback. You can enable this on GCC and Clang with `-g`. + +Putting this all together: + +```sh +env CFLAGS="-O3 -flto -march=native -mtune=native -g" \ + CXXFLAGS="-O3 -flto -march=native -mtune=native -g" ./configure +``` + +### Profiling + +#### [Callgrind](https://valgrind.org/docs/manual/cl-manual.html) + +Callgrind is a tool of [Valgrind](https://valgrind.org/) that can measure how +many times a function is called and how expensive the execution of a function is +compared to overall runtime. When you have built an optimized binary according +to the above instructions, you can run it under Callgrind: + +```sh +valgrind --tool=callgrind dot -Tsvg test.dot +``` + +This will produce a file like callgrind.out.2534 in the current directory. You +can use [Kcachegrind](https://kcachegrind.github.io/) to view the results by +running it in the same directory with no arguments: + +```sh +kcachegrind +``` + +If you have multiple trace results in the current directory, Kcachegrind will +load all of them and even let you compare them to each other. See the +Kcachegrind documentation for more information about how to use this tool. + +Be aware that execution under Callgrind will be a lot slower than a normal run. +If you need to see instruction-level execution costs, you can pass +`--dump-instr=yes` to Valgrind, but this will further slow execution and is +usually not necessary. To profile with less overhead, you can use a statistical +profiler like Linux Perf. + +#### [Linux Perf](https://perf.wiki.kernel.org/index.php/Main_Page) + +TODO + ## TODO with this guide * Update with new example commits after next stable release.