From: Manuel Klimek Date: Wed, 25 Jul 2012 07:28:11 +0000 (+0000) Subject: Adds a introductionary document to the Clang AST. X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=a705f1f507e4a773808ee7743caf4b50fa993d67;p=clang Adds a introductionary document to the Clang AST. Next steps are adding information about types & source location handling. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@160708 91177308-0d34-0410-b5e6-96231b3b80d8 --- diff --git a/docs/IntroductionToTheClangAST.html b/docs/IntroductionToTheClangAST.html new file mode 100644 index 0000000000..28175dd81b --- /dev/null +++ b/docs/IntroductionToTheClangAST.html @@ -0,0 +1,139 @@ + + + +Introduction to the Clang AST + + + + + + + +
+ +

Introduction to the Clang AST

+

This document gives a gentle introduction to the mysteries of the Clang AST. +It is targeted at developers who either want to contribute to Clang, or use +tools that work based on Clang's AST, like the AST matchers.

+ + + +

Introduction

+ + +

Clang's AST is different from ASTs produced by some other compilers in that it closely +resembles both the written C++ code and the C++ standard. For example, +parenthesis expressions and compile time constants are available in an unreduced +form in the AST. This makes Clang's AST a good fit for refactoring tools.

+ +

Documentation for all Clang AST nodes is available via the generated +Doxygen. The doxygen online +documentation is also indexed by your favorite search engine, which will make +a search for clang and the AST node's class name usually turn up the doxygen +of the class you're looking for (for example, search for: clang ParenExpr).

+ + +

Examining the AST

+ + +

A good way to familarize yourself with the Clang AST is to actually look +at it on some simple example code. Clang has a builtin AST-dump modes, which +can be enabled with the flags -ast-dump and -ast-dump-xml. Note that -ast-dump-xml +currently only works with debug-builds of clang.

+ +

Let's look at a simple example AST:

+
+# cat test.cc
+int f(int x) {
+  int result = (x / 42);
+  return result;
+}
+
+# Clang by default is a frontend for many tools; -cc1 tells it to directly
+# use the C++ compiler mode. -undef leaves out some internal declarations.
+$ clang -cc1 -undef -ast-dump-xml test.cc
+... cutting out internal declarations of clang ...
+<TranslationUnit ptr="0x4871160">
+ <Function ptr="0x48a5800" name="f" prototype="true">
+  <FunctionProtoType ptr="0x4871de0" canonical="0x4871de0">
+   <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
+   <parameters>
+    <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
+   </parameters>
+  </FunctionProtoType>
+  <ParmVar ptr="0x4871d80" name="x" initstyle="c">
+   <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
+  </ParmVar>
+  <Stmt>
+(CompoundStmt 0x48a5a38 <t2.cc:1:14, line:4:1>
+  (DeclStmt 0x48a59c0 <line:2:3, col:24>
+    0x48a58c0 "int result =
+      (ParenExpr 0x48a59a0 <col:16, col:23> 'int'
+        (BinaryOperator 0x48a5978 <col:17, col:21> 'int' '/'
+          (ImplicitCastExpr 0x48a5960 <col:17> 'int' <LValueToRValue>
+            (DeclRefExpr 0x48a5918 <col:17> 'int' lvalue ParmVar 0x4871d80 'x' 'int'))
+          (IntegerLiteral 0x48a5940 <col:21> 'int' 42)))")
+  (ReturnStmt 0x48a5a18 <line:3:3, col:10>
+    (ImplicitCastExpr 0x48a5a00 <col:10> 'int' <LValueToRValue>
+      (DeclRefExpr 0x48a59d8 <col:10> 'int' lvalue Var 0x48a58c0 'result' 'int'))))
+
+  </Stmt>
+ </Function>
+</TranslationUnit>
+
+

In general, -ast-dump-xml dumps declarations in an XML-style format and +statements in an S-expression-style format. +The toplevel declaration in a translation unit is always the +translation unit declaration. +In this example, our first user written declaration is the +function declaration +of 'f'. The body of 'f' is a compound statement, +whose child nodes are a declaration statement +that declares our result variable, and the +return statement.

+ + +

AST Context

+ + +

All information about the AST for a translation unit is bundled up in the class +ASTContext. +It allows traversal of the whole translation unit starting from +getTranslationUnitDecl, +or to access Clang's table of identifiers +for the parsed translation unit.

+ + +

AST Nodes

+ + +

Clang's AST nodes are modeled on a class hierarchy that does not have a common +ancestor. Instead, there are multiple larger hierarchies for basic node types like +Decl and +Stmt. Many +important AST nodes derive from Type, +Decl, +DeclContext or +Stmt, +with some classes deriving from both Decl and DeclContext.

+

There are also a multitude of nodes in the AST that are not part of a +larger hierarchy, and are only reachable from specific other nodes, +like CXXBaseSpecifier. +

+ +

Thus, to traverse the full AST, one starts from the TranslationUnitDecl +and then recursively traverses everything that can be reached from that node +- this information has to be encoded for each specific node type. This algorithm +is encoded in the RecursiveASTVisitor. +See the RecursiveASTVisitor tutorial.

+ +

The two most basic nodes in the Clang AST are statements (Stmt) +and declarations (Decl). +Note that expressions (Expr) +are also statements in Clang's AST.

+ +
+ + +