From: Douglas Gregor Date: Fri, 30 Sep 2011 21:32:37 +0000 (+0000) Subject: Add a section detailing the steps required to add an expression or X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=1f634c6dc91805320bb13983faf5e86a2bd07421;p=clang Add a section detailing the steps required to add an expression or statement to Clang. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@140888 91177308-0d34-0410-b5e6-96231b3b80d8 --- diff --git a/docs/InternalsManual.html b/docs/InternalsManual.html index 5d97609373..2829dbdd75 100644 --- a/docs/InternalsManual.html +++ b/docs/InternalsManual.html @@ -71,6 +71,7 @@ td {
  • Howto guides
  • @@ -1785,6 +1786,228 @@ Check for the attribute's presence using Decl::getAttr<YourAttr>().<

    Update the Clang Language Extensions document to describe your new attribute.

    + +

    How to add an expression or statement

    + + +

    Expressions and statements are one of the most fundamental constructs within a +compiler, because they interact with many different parts of the AST, +semantic analysis, and IR generation. Therefore, adding a new +expression or statement kind into Clang requires some care. The following list +details the various places in Clang where an expression or statement needs to be +introduced, along with patterns to follow to ensure that the new +expression or statement works well across all of the C languages. We +focus on expressions, but statements are similar.

    + +
      +
    1. Introduce parsing actions into the parser. Recursive-descent + parsing is mostly self-explanatory, but there are a few things that + are worth keeping in mind: +
        +
      • Keep as much source location information as possible! You'll + want it later to produce great diagnostics and support Clang's + various features that map between source code and the AST.
      • +
      • Write tests for all of the "bad" parsing cases, to make sure + your recovery is good. If you have matched delimiters (e.g., + parentheses, square brackets, etc.), use + Parser::MatchRHSPunctuation to give nice diagnostics when + things go wrong.
      • +
      +
    2. + +
    3. Introduce semantic analysis actions into Sema. Semantic + analysis should always involve two functions: an ActOnXXX + function that will be called directly from the parser, and a + BuildXXX function that performs the actual semantic + analysis and will (eventually!) build the AST node. It's fairly + common for the ActOnCXX function to do very little (often + just some minor translation from the parser's representation to + Sema's representation of the same thing), but the separation + is still important: C++ template instantiation, for example, + should always call the BuildXXX variant. Several notes on + semantic analysis before we get into construction of the AST: +
        +
      • Your expression probably involves some types and some + subexpressions. Make sure to fully check that those types, and the + types of those subexpressions, meet your expectations. Add + implicit conversions where necessary to make sure that all of the + types line up exactly the way you want them. Write extensive tests + to check that you're getting good diagnostics for mistakes and + that you can use various forms of subexpressions with your + expression.
      • +
      • When type-checking a type or subexpression, make sure to first + check whether the type is "dependent" + (Type::isDependentType()) or whether a subexpression is + type-dependent (Expr::isTypeDependent()). If any of these + return true, then you're inside a template and you can't do much + type-checking now. That's normal, and your AST node (when you get + there) will have to deal with this case. At this point, you can + write tests that use your expression within templates, but don't + try to instantiate the templates.
      • +
      • For each subexpression, be sure to call + Sema::CheckPlaceholderExpr() to deal with "weird" + expressions that don't behave well as subexpressions. Then, + determine whether you need to perform + lvalue-to-rvalue conversions + (Sema::DefaultLvalueConversione) or + the usual unary conversions + (Sema::UsualUnaryConversions), for places where the + subexpression is producing a value you intend to use.
      • +
      • Your BuildXXX function will probably just return + ExprError() at this point, since you don't have an AST. + That's perfectly fine, and shouldn't impact your testing.
      • +
      +
    4. + +
    5. Introduce an AST node for your new expression. This starts with + declaring the node in include/Basic/StmtNodes.td and + creating a new class for your expression in the appropriate + include/AST/Expr*.h header. It's best to look at the class + for a similar expression to get ideas, and there are some specific + things to watch for: +
        +
      • If you need to allocate memory, use the ASTContext + allocator to allocate memory. Never use raw malloc or + new, and never hold any resources in an AST node, because + the destructor of an AST node is never called.
      • + +
      • Make sure that getSourceRange() covers the exact + source range of your expression. This is needed for diagnostics + and for IDE support.
      • + +
      • Make sure that children() visits all of the + subexpressions. This is important for a number of features (e.g., IDE + support, C++ variadic templates). If you have sub-types, you'll + also need to visit those sub-types in the + RecursiveASTVisitor.
      • + +
      • Add printing support (StmtPrinter.cpp) and dumping + support (StmtDumper.cpp) for your expression.
      • + +
      • Add profiling support (StmtProfile.cpp) for your AST + node, noting the distinguishing (non-source location) + characteristics of an instance of your expression. Omitting this + step will lead to hard-to-diagnose failures regarding matching of + template declarations.
      • +
      +
    6. + +
    7. Teach semantic analysis to build your AST node! At this point, + you can wire up your Sema::BuildXXX function to actually + create your AST. A few things to check at this point: +
        +
      • If your expression can construct a new C++ class or return a + new Objective-C object, be sure to update and then call + Sema::MaybeBindToTemporary for your just-created AST node + to be sure that the object gets properly destructed. An easy way + to test this is to return a C++ class with a private destructor: + semantic analysis should flag an error here with the attempt to + call the destructor.
      • +
      • Inspect the generated AST by printing it using clang -cc1 + -ast-print, to make sure you're capturing all of the + important information about how the AST was written.
      • +
      • Inspect the generated AST under clang -cc1 -ast-dump + to verify that all of the types in the generated AST line up the + way you want them. Remember that clients of the AST should never + have to "think" to understand what's going on. For example, all + implicit conversions should show up explicitly in the AST.
      • +
      • Write tests that use your expression as a subexpression of + other, well-known expressions. Can you call a function using your + expression as an argument? Can you use the ternary operator?
      • +
      +
    8. + +
    9. Teach code generation to create IR to your AST node. This step + is the first (and only) that requires knowledge of LLVM IR. There + are several things to keep in mind: +
        +
      • Code generation is separated into scalar/aggregate/complex and + lvalue/rvalue paths, depending on what kind of result your + expression produces. On occasion, this requires some careful + factoring of code to avoid duplication.
      • + +
      • CodeGenFunction contains functions + ConvertType and ConvertTypeForMem that convert + Clang's types (clang::Type* or clang::QualType) + to LLVM types. + Use the former for values, and the later for memory locations: + test with the C++ "bool" type to check this. If you find + that you are having to use LLVM bitcasts to make + the subexpressions of your expression have the type that your + expression expects, STOP! Go fix semantic analysis and the AST so + that you don't need these bitcasts.
      • + +
      • The CodeGenFunction class has a number of helper + functions to make certain operations easy, such as generating code + to produce an lvalue or an rvalue, or to initialize a memory + location with a given value. Prefer to use these functions rather + than directly writing loads and stores, because these functions + take care of some of the tricky details for you (e.g., for + exceptions).
      • + +
      • If your expression requires some special behavior in the event + of an exception, look at the push*Cleanup functions in + CodeGenFunction to introduce a cleanup. You shouldn't + have to deal with exception-handling directly.
      • + +
      • Testing is extremely important in IR generation. Use clang + -cc1 -emit-llvm and FileCheck to verify + that you're generating the right IR.
      • +
      +
    10. + +
    11. Teach template instantiation how to cope with your AST + node, which requires some fairly simple code: +
        +
      • Make sure that your expression's constructor properly + computes the flags for type dependence (i.e., the type your + expression produces can change from one instantiation to the + next), value dependence (i.e., the constant value your expression + produces can change from one instantiation to the next), + instantiation dependence (i.e., a template parameter or occurs + anywhere in your expression), and whether your expression contains + a parameter pack (for variadic templates). Often, computing these + flags just means combining the results from the various types and + subexpressions.
      • + +
      • Add TransformXXX and RebuildXXX functions to + the + TreeTransform class template in Sema. + TransformXXX should (recursively) transform all of the + subexpressions and types + within your expression, using getDerived().TransformYYY. + If all of the subexpressions and types transform without error, it + will then call the RebuildXXX function, which will in + turn call getSema().BuildXXX to perform semantic analysis + and build your expression.
      • + +
      • To test template instantiation, take those tests you wrote to + make sure that you were type checking with type-dependent + expressions and dependent types (from step #2) and instantiate + those templates with various types, some of which type-check and + some that don't, and test the error messages in each case.
      • +
      +
    12. + +
    13. There are some "extras" that make other features work better. + It's worth handling these extras to give your expression complete + integration into Clang: +
        +
      • Add code completion support for your expression in + SemaCodeComplete.cpp.
      • + +
      • If your expression has types in it, or has any "interesting" + features other than subexpressions, extend libclang's + CursorVisitor to provide proper visitation for your + expression, enabling various IDE features such as syntax + highlighting, cross-referencing, and so on. The + c-index-test helper program can be used to test these + features.
      • +
      +
    14. +
    +