From: Chris Lattner
Tokens contain the following information:
+Tokens occur in two forms: "Annotation +Tokens" and normal tokens. Normal tokens are those returned by the lexer, +annotation tokens represent semantic information and are produced by the parser, +replacing normal tokens in the token stream. Normal tokens contain the +following information:
One interesting (and somewhat unusual) aspect of tokens is that they don't -contain any semantic information about the lexed value. For example, if the -token was a pp-number token, we do not represent the value of the number that -was lexed (this is left for later pieces of code to decide). Additionally, the -lexer library has no notion of typedef names vs variable names: both are +
One interesting (and somewhat unusual) aspect of normal tokens is that they +don't contain any semantic information about the lexed value. For example, if +the token was a pp-number token, we do not represent the value of the number +that was lexed (this is left for later pieces of code to decide). Additionally, +the lexer library has no notion of typedef names vs variable names: both are returned as identifiers, and the parser is left to decide whether a specific identifier is a typedef or a variable (tracking this requires scope information -among other things).
+among other things). The parser can do this translation by replacing tokens +returned by the preprocessor with "Annotation Tokens". + + +Annotation Tokens are tokens that are synthesized by the parser and injected +into the preprocessor's token stream (replacing existing tokens) to record +semantic information found by the parser. For example, if "foo" is found to be +a typedef, the "foo" tok::identifier token is replaced with an +tok::annot_typename. This is useful for a couple of reasons: 1) this +makes it easy to handle qualified type names (e.g. "foo::bar::baz<42>::t") +in C++ as a single "token" in the parser. 2) if the parser backtracks, the +reparse does not need to redo semantic analysis to determine whether a token +sequence is a variable, type, template, etc.
+ +Annotation Tokens are created by the parser and reinjected into the parser's +token stream (when backtracking is enabled). Because they can only exist in +tokens that the preprocessor-proper is done with, it doesn't need to keep around +flags like "start of line" that the preprocessor uses to do its job. +Additionally, an annotation token may "cover" a sequence of preprocessor tokens +(e.g. a::b::c is five preprocessor tokens). As such, the valid fields +of an annotation token are different than the fields for a normal token (but +they are multiplexed into the normal Token fields):
+ +Annotation tokens currently come in three kinds:
+ +As mentioned above, annotation tokens are not returned bye the preprocessor, +they are formed on demand by the parser. This means that the parser has to be +aware of cases where an annotation could occur and form it where appropriate. +This is somewhat similar to how the parser handles Translation Phase 6 of C99: +String Concatenation (see C99 5.1.1.2). In the case of string concatenation, +the preprocessor just returns distinct tok::string_literal and +tok::wide_string_literal tokens and the parser eats a sequence of them wherever +the grammar indicates that a string literal can occur.
+ +In order to do this, whenever the parser expects a tok::identifier or +tok::coloncolon, it should call the TryAnnotateTypeOrScopeToken or +TryAnnotateCXXScopeToken methods to form the annotation token. These methods +will maximally form the specified annotation tokens and replace the current +token with them, if applicable. If the current tokens is not valid for an +annotation token, it will remain an identifier or :: token.
+ +