1 =============================
2 Introduction to the Clang AST
3 =============================
5 This document gives a gentle introduction to the mysteries of the Clang
6 AST. It is targeted at developers who either want to contribute to
7 Clang, or use tools that work based on Clang's AST, like the AST
13 Clang's AST is different from ASTs produced by some other compilers in
14 that it closely resembles both the written C++ code and the C++
15 standard. For example, parenthesis expressions and compile time
16 constants are available in an unreduced form in the AST. This makes
17 Clang's AST a good fit for refactoring tools.
19 Documentation for all Clang AST nodes is available via the generated
20 `Doxygen <http://clang.llvm.org/doxygen>`_. The doxygen online
21 documentation is also indexed by your favorite search engine, which will
22 make a search for clang and the AST node's class name usually turn up
23 the doxygen of the class you're looking for (for example, search for:
29 A good way to familarize yourself with the Clang AST is to actually look
30 at it on some simple example code. Clang has a builtin AST-dump modes,
31 which can be enabled with the flags ``-ast-dump`` and ``-ast-dump-xml``. Note
32 that ``-ast-dump-xml`` currently only works with debug builds of clang.
34 Let's look at a simple example AST:
40 int result = (x / 42);
44 # Clang by default is a frontend for many tools; -cc1 tells it to directly
45 # use the C++ compiler mode. -undef leaves out some internal declarations.
46 $ clang -cc1 -undef -ast-dump-xml test.cc
47 ... cutting out internal declarations of clang ...
48 <TranslationUnit ptr="0x4871160">
49 <Function ptr="0x48a5800" name="f" prototype="true">
50 <FunctionProtoType ptr="0x4871de0" canonical="0x4871de0">
51 <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
53 <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
56 <ParmVar ptr="0x4871d80" name="x" initstyle="c">
57 <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
60 (CompoundStmt 0x48a5a38 <t2.cc:1:14, line:4:1>
61 (DeclStmt 0x48a59c0 <line:2:3, col:24>
62 0x48a58c0 "int result =
63 (ParenExpr 0x48a59a0 <col:16, col:23> 'int'
64 (BinaryOperator 0x48a5978 <col:17, col:21> 'int' '/'
65 (ImplicitCastExpr 0x48a5960 <col:17> 'int' <LValueToRValue>
66 (DeclRefExpr 0x48a5918 <col:17> 'int' lvalue ParmVar 0x4871d80 'x' 'int'))
67 (IntegerLiteral 0x48a5940 <col:21> 'int' 42)))")
68 (ReturnStmt 0x48a5a18 <line:3:3, col:10>
69 (ImplicitCastExpr 0x48a5a00 <col:10> 'int' <LValueToRValue>
70 (DeclRefExpr 0x48a59d8 <col:10> 'int' lvalue Var 0x48a58c0 'result' 'int'))))
76 In general, ``-ast-dump-xml`` dumps declarations in an XML-style format and
77 statements in an S-expression-style format. The toplevel declaration in
78 a translation unit is always the `translation unit
79 declaration <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_.
80 In this example, our first user written declaration is the `function
81 declaration <http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html>`_
82 of "``f``". The body of "``f``" is a `compound
83 statement <http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html>`_,
84 whose child nodes are a `declaration
85 statement <http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html>`_
86 that declares our result variable, and the `return
87 statement <http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html>`_.
92 All information about the AST for a translation unit is bundled up in
94 `ASTContext <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html>`_.
95 It allows traversal of the whole translation unit starting from
96 `getTranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64>`_,
97 or to access Clang's `table of
98 identifiers <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4>`_
99 for the parsed translation unit.
104 Clang's AST nodes are modeled on a class hierarchy that does not have a
105 common ancestor. Instead, there are multiple larger hierarchies for
106 basic node types like
107 `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ and
108 `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_. Many
109 important AST nodes derive from
110 `Type <http://clang.llvm.org/doxygen/classclang_1_1Type.html>`_,
111 `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_,
112 `DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_
113 or `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_, with
114 some classes deriving from both Decl and DeclContext.
116 There are also a multitude of nodes in the AST that are not part of a
117 larger hierarchy, and are only reachable from specific other nodes, like
118 `CXXBaseSpecifier <http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html>`_.
120 Thus, to traverse the full AST, one starts from the
121 `TranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_
122 and then recursively traverses everything that can be reached from that
123 node - this information has to be encoded for each specific node type.
124 This algorithm is encoded in the
125 `RecursiveASTVisitor <http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html>`_.
126 See the `RecursiveASTVisitor
127 tutorial <http://clang.llvm.org/docs/RAVFrontendAction.html>`_.
129 The two most basic nodes in the Clang AST are statements
130 (`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_) and
132 (`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_). Note
134 (`Expr <http://clang.llvm.org/doxygen/classclang_1_1Expr.html>`_) are
135 also statements in Clang's AST.