From bfd08ce02f62e2c7700f71e36afa5c771062e842 Mon Sep 17 00:00:00 2001
From: Matthew Fernandez <matthew.fernandez@gmail.com>
Date: Sat, 7 Nov 2020 14:20:27 -0800
Subject: [PATCH] add a section of performance and profiling to dev docs

---
 DEVELOPERS.md | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/DEVELOPERS.md b/DEVELOPERS.md
index cc3cef457..44b24f25d 100644
--- a/DEVELOPERS.md
+++ b/DEVELOPERS.md
@@ -128,6 +128,87 @@ is green
 
 1. Merge the merge request
 
+## Performance and profiling
+
+The runtime and memory usage of Graphviz is dependent on the userâs graph. It is
+easy to create âexpensiveâ graphs without realizing it using only moderately
+sized input. For this reason, users regularly encounter performance bottlenecks
+that they need help with. This situation is likely to persist even with hardware
+advances, as the size and complexity of the graphs users construct will expand
+as well.
+
+This section documents how to build performance-optimized Graphviz binaries and
+how to identify performance bottlenecks. Note that this information assumes you
+are working in a Linux environment.
+
+### Building an optimized Graphviz
+
+The first step to getting an optimized build is to make sure you are using a
+recent compiler. If you have not upgraded your C and C++ compilers for a while,
+make sure you do this first.
+
+The simplest way to change flags used during compilation is by setting the
+`CFLAGS` and `CXXFLAGS` environment variables:
+
+```sh
+env CFLAGS="..." CXXFLAGS="..." ./configure
+```
+
+You should use the maximum optimization level for your compiler. E.g. `-O3` for
+GCC and Clang. If your toolchain supports it, it is recommended to also enable
+link-time optimization (`-flto`).
+
+You can further optimize compilation for the machine you are building on with
+`-march=native -mtune=native`. Be aware that the resulting binaries will no
+longer be portable (may not run if copied to another machine). These flags are
+also not recommended if you are debugging a user issue, because you will end up
+profiling and optimizing different code than what may execute on their machine.
+
+Most profilers need a symbol table and/or debugging metadata to give you useful
+feedback. You can enable this on GCC and Clang with `-g`.
+
+Putting this all together:
+
+```sh
+env CFLAGS="-O3 -flto -march=native -mtune=native -g" \
+  CXXFLAGS="-O3 -flto -march=native -mtune=native -g" ./configure
+```
+
+### Profiling
+
+#### [Callgrind](https://valgrind.org/docs/manual/cl-manual.html)
+
+Callgrind is a tool of [Valgrind](https://valgrind.org/) that can measure how
+many times a function is called and how expensive the execution of a function is
+compared to overall runtime. When you have built an optimized binary according
+to the above instructions, you can run it under Callgrind:
+
+```sh
+valgrind --tool=callgrind dot -Tsvg test.dot
+```
+
+This will produce a file like callgrind.out.2534 in the current directory. You
+can use [Kcachegrind](https://kcachegrind.github.io/) to view the results by
+running it in the same directory with no arguments:
+
+```sh
+kcachegrind
+```
+
+If you have multiple trace results in the current directory, Kcachegrind will
+load all of them and even let you compare them to each other. See the
+Kcachegrind documentation for more information about how to use this tool.
+
+Be aware that execution under Callgrind will be a lot slower than a normal run.
+If you need to see instruction-level execution costs, you can pass
+`--dump-instr=yes` to Valgrind, but this will further slow execution and is
+usually not necessary. To profile with less overhead, you can use a statistical
+profiler like Linux Perf.
+
+#### [Linux Perf](https://perf.wiki.kernel.org/index.php/Main_Page)
+
+TODO
+
 ## TODO with this guide
 
 * Update with new example commits after next stable release.
-- 
2.40.0