From 3932fe05a12a27cb36b131ea89202311ee8cb66d Mon Sep 17 00:00:00 2001 From: Chris Lattner Date: Tue, 6 Jan 2009 06:02:08 +0000 Subject: [PATCH] document annotation tokens. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@61792 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/InternalsManual.html | 102 +++++++++++++++++++++++++++++++++++--- 1 file changed, 95 insertions(+), 7 deletions(-) diff --git a/docs/InternalsManual.html b/docs/InternalsManual.html index aa96c0df40..8fd42e7ad0 100644 --- a/docs/InternalsManual.html +++ b/docs/InternalsManual.html @@ -31,6 +31,7 @@ td { @@ -488,7 +489,11 @@ front-end periodically needs to buffer tokens up for tentative parsing and various pieces of look-ahead. As such, the size of a Token matter. On a 32-bit system, sizeof(Token) is currently 16 bytes.

-

Tokens contain the following information:

+

Tokens occur in two forms: "Annotation +Tokens" and normal tokens. Normal tokens are those returned by the lexer, +annotation tokens represent semantic information and are produced by the parser, +replacing normal tokens in the token stream. Normal tokens contain the +following information:

-

One interesting (and somewhat unusual) aspect of tokens is that they don't -contain any semantic information about the lexed value. For example, if the -token was a pp-number token, we do not represent the value of the number that -was lexed (this is left for later pieces of code to decide). Additionally, the -lexer library has no notion of typedef names vs variable names: both are +

One interesting (and somewhat unusual) aspect of normal tokens is that they +don't contain any semantic information about the lexed value. For example, if +the token was a pp-number token, we do not represent the value of the number +that was lexed (this is left for later pieces of code to decide). Additionally, +the lexer library has no notion of typedef names vs variable names: both are returned as identifiers, and the parser is left to decide whether a specific identifier is a typedef or a variable (tracking this requires scope information -among other things).

+among other things). The parser can do this translation by replacing tokens +returned by the preprocessor with "Annotation Tokens".

+ + +

Annotation Tokens

+ + +

Annotation Tokens are tokens that are synthesized by the parser and injected +into the preprocessor's token stream (replacing existing tokens) to record +semantic information found by the parser. For example, if "foo" is found to be +a typedef, the "foo" tok::identifier token is replaced with an +tok::annot_typename. This is useful for a couple of reasons: 1) this +makes it easy to handle qualified type names (e.g. "foo::bar::baz<42>::t") +in C++ as a single "token" in the parser. 2) if the parser backtracks, the +reparse does not need to redo semantic analysis to determine whether a token +sequence is a variable, type, template, etc.

+ +

Annotation Tokens are created by the parser and reinjected into the parser's +token stream (when backtracking is enabled). Because they can only exist in +tokens that the preprocessor-proper is done with, it doesn't need to keep around +flags like "start of line" that the preprocessor uses to do its job. +Additionally, an annotation token may "cover" a sequence of preprocessor tokens +(e.g. a::b::c is five preprocessor tokens). As such, the valid fields +of an annotation token are different than the fields for a normal token (but +they are multiplexed into the normal Token fields):

+ + + +

Annotation tokens currently come in three kinds:

+ +
    +
  1. tok::annot_typename: This annotation token represents a +resolved typename token that is potentially qualified. The AnnotationValue +field contains a pointer returned by Action::isTypeName(). In the case of the +Sema actions module, this is a Decl* for the type.
  2. + +
  3. tok::annot_cxxscope: This annotation token represents a C++ scope +specifier, such as "A::B::". This corresponds to the grammar productions "::" +and ":: [opt] nested-name-specifier". The AnnotationValue pointer is returned +by the Action::ActOnCXXGlobalScopeSpecifier and +Action::ActOnCXXNestedNameSpecifier callbacks. In the case of Sema, this is a +DeclContext*.
  4. + +
  5. tok::annot_template_id: This annotation token represents a C++ +template-id such as "foo<int, 4>", which may refer to a function or type +depending on whether foo is a function template or class template. The +AnnotationValue pointer is a pointer to a malloc'd TemplateIdAnnotation object. +FIXME: I don't think the parsing logic is right for this. Shouldn't type +templates be turned into annot_typename??
  6. + +
+ +

As mentioned above, annotation tokens are not returned bye the preprocessor, +they are formed on demand by the parser. This means that the parser has to be +aware of cases where an annotation could occur and form it where appropriate. +This is somewhat similar to how the parser handles Translation Phase 6 of C99: +String Concatenation (see C99 5.1.1.2). In the case of string concatenation, +the preprocessor just returns distinct tok::string_literal and +tok::wide_string_literal tokens and the parser eats a sequence of them wherever +the grammar indicates that a string literal can occur.

+ +

In order to do this, whenever the parser expects a tok::identifier or +tok::coloncolon, it should call the TryAnnotateTypeOrScopeToken or +TryAnnotateCXXScopeToken methods to form the annotation token. These methods +will maximally form the specified annotation tokens and replace the current +token with them, if applicable. If the current tokens is not valid for an +annotation token, it will remain an identifier or :: token.

+ +

The Lexer class

-- 2.40.0