doc/manual/features/generic_api/generic_api.rst_ \
doc/manual/features/conditions/conditions.rst_ \
doc/manual/features/state/state.rst_ \
+ doc/manual/features/submatch/submatch.rst_ \
doc/manual/features/encodings/encodings.rst_ \
doc/manual/options/options_list.rst
DOC = doc/re2c.1
.TP
.B \fB\-T \-\-tags\fP
Enable submatch extraction with tags.
-This option is implied by \fB\-\-posix\-captures\fP\&.
.TP
.B \fB\-P \-\-posix\-captures\fP
Enable submatch extraction with POSIX\-style capturing groups.
-This option implies \fB\-T \-\-tags\fP\&.
.TP
.B \fB\-u \-\-unicode\fP
Generate a parser that supports UTF\-32. The generated
.B \fB\-\-no\-generation\-date\fP
Suppress date output in the generated file.
.TP
+.B \fB\-\-no\-lookahead\fP
+Use TDFA(0) instead of TDFA(1).
+This option only has effect with \fB\-\-tags\fP or \fB\-\-posix\-captures\fP options.
+.TP
.B \fB\-\-no\-optimize\-tags\fP
-Suppress tag optimization (mostly used for debugging).
+Suppress optimization of tag variables (mostly used for debugging).
.TP
.B \fB\-\-no\-version\fP
Suppress version output in the generated file.
Warn if a symbol is escaped when it shouldn\(aqt be.
By default, re2c silently ignores such escapes, but this may as well indicate a
typo or error in the escape sequence.
+.TP
+.B \fB\-Wnondeterministic\-tags\fP
+Warn if tag has \fBn\fP\-th degree of nondeterminism, where \fBn\fP is greater than 1.
.UNINDENT
.SH INTERFACE CODE
.sp
depends on the particular use case.
.INDENT 0.0
.TP
+.B \fBYYBACKUP ()\fP
+Backup current input position (used only with generic API).
+.TP
+.B \fBYYBACKUPCTX ()\fP
+Backup current input position for trailing context (used only with generic API).
+.TP
.B \fBYYCONDTYPE\fP
In \fB\-c\fP mode, you can use \fB\-t\fP to generate a file that
contains the enumeration used as conditions. Each of the values refers
case, the scanner will resume operations right after where the last
\fBYYFILL (n)\fP was called.
.TP
+.B \fBYYLESSTHAN (n)\fP
+Check if less than \fBn\fP input characters are left (used only with generic API).
+.TP
.B \fBYYLIMIT\fP
An expression of type \fBYYCTYPE *\fP that marks the end of the buffer \fBYYLIMIT[\-1]\fP
is the last character in the buffer). The generated code repeatedly
The generated code saves backtracking information in \fBYYMARKER\fP\&. Some
simple scanners might not use this.
.TP
+.B \fBYYMTAGP (t)\fP
+Append current input position to the history of tag \fBt\fP\&.
+.TP
+.B \fBYYMTAGN (t)\fP
+Append default value to the history of tag \fBt\fP\&.
+.TP
.B \fBYYMAXFILL\fP
This will be automatically defined by \fB/*!max:re2c*/\fP blocks as explained above.
.TP
+.B \fBYYMAXNMATCH\fP
+This will be automatically defined by \fB/*!maxnmatch:re2c*/\fP\&.
+.TP
+.B \fBYYPEEK ()\fP
+Get current input character (used only with generic API).
+.TP
+.B \fBYYRESTORE ()\fP
+Restore input position (used only with generic API).
+.TP
+.B \fBYYRESTORECTX ()\fP
+Restore input position from the value of trailing context (used only with generic API).
+.TP
+.B \fBYYRESTORETAG (t)\fP
+Restore input position from the value of tag \fBt\fP (used only with generic API).
+.TP
.B \fBYYSETCONDITION (c)\fP
This define is used to set the condition in
transition rules. This is only being used when \fB\-c\fP is active and
\fBYYGETSTATE ()\fP and resume execution right where it left off. The
generated code will contain both \fBYYSETSTATE (s)\fP and \fBYYGETSTATE\fP even
if \fBYYFILL (n)\fP is disabled.
+.TP
+.B \fBYYSKIP ()\fP
+Advance input position to the next character (used only with generic API).
+.TP
+.B \fBYYSTAGP (t)\fP
+Save current input position to tag \fBt\fP (used only with generic API).
+.TP
+.B \fBYYSTAGN (t)\fP
+Save default value to tag \fBt\fP (used only with generic API).
.UNINDENT
.SH SYNTAX
.sp
-Code for \fBre2c\fP consists of a set of \fBRULES\fP, \fBNAMED DEFINITIONS\fP, and
+Code for \fBre2c\fP consists of a set of \fBRULES\fP, \fBNAMED DEFINITIONS\fP, \fBCODE\fP and
\fBINPLACE CONFIGURATIONS\fP\&.
.SS RULES
.sp
allowed and \fBre2c\fP stops looking for code at the first line that does
not begin with whitespace. If two or more rules overlap, the first rule
is preferred.
-.INDENT 0.0
-.INDENT 3.5
-\fBregular\-expression { C/C++ code }\fP
.sp
-\fBregular\-expression := C/C++ code\fP
-.UNINDENT
-.UNINDENT
+There is one special rule that can be used instead of regular expression: the default rule \fB*\fP\&.
+Note that the default rule \fB*\fP differs from \fB[^]\fP: the default rule has the lowest priority,
+matches any code unit (either valid or invalid) and always consumes exactly one character.
+\fB[^]\fP, on the other hand, matches any valid code point (not the same as a code unit) and can consume multiple
+code units. In fact, when a variable\-length encoding is used, \fB*\fP
+is the only possible way to match an invalid input character.
.sp
-There is one special rule: the default rule (\fB*\fP)
+In general, all rules have the form:
.INDENT 0.0
.INDENT 3.5
-\fB* { C/C++ code }\fP
-.sp
-\fB* := C/C++ code\fP
+\fBregular\-expression\-or\-* code\fP
.UNINDENT
.UNINDENT
.sp
-Note that the default rule (\fB*\fP) differs from \fB[^]\fP: the default rule has the lowest priority,
-matches any code unit (either valid or invalid) and always consumes exactly one character.
-\fB[^]\fP, on the other hand, matches any valid code point (not the same as a code unit) and can consume multiple
-code units. In fact, when a variable\-length encoding is used, \fB*\fP
-is the only possible way to match an invalid input character.
-.sp
If \fB\-c\fP is active, then each regular expression is preceded by a list
of comma\-separated condition names. Besides the normal naming rules, there
are two special cases: \fB<*>\fP (these rules are merged to all conditions)
can insert it with \fB<!>\fP pseudo\-rules.
.INDENT 0.0
.INDENT 3.5
-\fB<condition\-list> regular\-expression { C/C++ code }\fP
-.sp
-\fB<condition\-list> regular\-expression := C/C++ code\fP
-.sp
-\fB<condition\-list> * { C/C++ code }\fP
-.sp
-\fB<condition\-list> * := C/C++ code\fP
-.sp
-\fB<condition\-list> regular\-expression => condition { C/C++ code }\fP
-.sp
-\fB<condition\-list> regular\-expression => condition := C/C++ code\fP
-.sp
-\fB<condition\-list> * => condition { C/C++ code }\fP
-.sp
-\fB<condition\-list> * => condition := C/C++ code\fP
-.sp
-\fB<condition\-list> regular\-expression :=> condition\fP
-.sp
-\fB<*> regular\-expression { C/C++ code }\fP
-.sp
-\fB<*> regular\-expression := C/C++ code\fP
-.sp
-\fB<*> * { C/C++ code }\fP
-.sp
-\fB<*> * := C/C++ code\fP
-.sp
-\fB<*> regular\-expression => condition { C/C++ code }\fP
-.sp
-\fB<*> regular\-expression => condition := C/C++ code\fP
+\fB<condition\-list\-or\-*> regular\-expression\-or\-* code\fP
.sp
-\fB<*> * => condition { C/C++ code }\fP
+\fB<condition\-list\-or\-*> regular\-expression\-or\-* => condition code\fP
.sp
-\fB<*> * => condition := C/C++ code\fP
+\fB<condition\-list\-or\-*> regular\-expression\-or\-* :=> condition\fP
.sp
-\fB<*> regular\-expression :=> condition\fP
+\fB<> code\fP
.sp
-\fB<> { C/C++ code }\fP
-.sp
-\fB<> := C/C++ code\fP
-.sp
-\fB<> => condition { C/C++ code }\fP
-.sp
-\fB<> => condition := C/C++ code\fP
+\fB<> => condition code\fP
.sp
\fB<> :=> condition\fP
.sp
-\fB<> :=> condition\fP
-.sp
-\fB<! condition\-list> { C/C++ code }\fP
-.sp
-\fB<! condition\-list> := C/C++ code\fP
+\fB<!condition\-list> code\fP
.sp
-\fB<!> { C/C++ code }\fP
-.sp
-\fB<!> := C/C++ code\fP
+\fB<!> code\fP
.UNINDENT
.UNINDENT
.SS NAMED DEFINITIONS
.SS INPLACE CONFIGURATIONS
.INDENT 0.0
.TP
-.B \fBre2c:condprefix = yyc;\fP
-Allows to specify the prefix used for
-condition labels. That is, the text to be prepended to condition labels
-in the generated output file.
-.TP
-.B \fBre2c:condenumprefix = yyc;\fP
-Allows to specify the prefix used for
-condition values. That is, the text to be prepended to condition enum
-values in the generated output file.
+.B \fBre2c:cgoto:threshold = 9;\fP
+When \fB\-g\fP is active, this value specifies
+the complexity threshold that triggers the generation of jump tables rather
+than nested ifs and decision bitfields. The threshold is compared
+against a calculated estimation of ifs needed where every used bitmap
+divides the threshold by 2.
.TP
-.B \fBre2c:cond:divider = "/* *********************************** */";\fP
+.B \fBre2c:cond:divider = \(aq/* *********************************** */\(aq;\fP
Allows to customize the divider for condition blocks. You can use \fB@@\fP
to put the name of the condition or customize the placeholder using
\fBre2c:cond:divider@cond\fP\&.
Specifies the placeholder that will be
replaced with the condition name in \fBre2c:cond:divider\fP\&.
.TP
-.B \fBre2c:cond:goto = "goto @@;";\fP
+.B \fBre2c:condenumprefix = yyc;\fP
+Allows to specify the prefix used for
+condition values. That is, the text to be prepended to condition enum
+values in the generated output file.
+.TP
+.B \fBre2c:cond:goto@cond = @@;\fP
+Specifies the placeholder that will be replaced with the condition label in \fBre2c:cond:goto\fP\&.
+.TP
+.B \fBre2c:cond:goto = \(aqgoto @@;\(aq;\fP
Allows to customize the condition goto statements used with \fB:=>\fP style rules. You can use \fB@@\fP
to put the name of the condition or customize the placeholder using
\fBre2c:cond:goto@cond\fP\&. You can also change this to \fBcontinue;\fP, which
would allow you to continue with the next loop cycle including any code
between your loop start and your re2c block.
.TP
-.B \fBre2c:cond:goto@cond = @@;\fP
-Specifies the placeholder that will be replaced with the condition label in \fBre2c:cond:goto\fP\&.
+.B \fBre2c:condprefix = yyc;\fP
+Allows to specify the prefix used for
+condition labels. That is, the text to be prepended to condition labels
+in the generated output file.
.TP
-.B \fBre2c:indent:top = 0;\fP
-Specifies the minimum amount of indentation to
-use. Requires a numeric value greater than or equal to zero.
+.B \fBre2c:define:YYBACKUPCTX = \(aqYYBACKUPCTX\(aq;\fP
+Replaces \fBYYBACKUPCTX\fP identifier with the specified string.
.TP
-.B \fBre2c:indent:string = "\et";\fP
-Specifies the string to use for indentation. Requires a string that should
-contain only whitespace unless you need something else for external tools. The easiest
-way to specify spaces is to enclose them in single or double quotes.
-If you do not want any indentation at all, you can simply set this to "".
+.B \fBre2c:define:YYBACKUP = \(aqYYBACKUP\(aq;\fP
+Replaces \fBYYBACKUP\fP identifier with the specified string.
.TP
-.B \fBre2c:yych:conversion = 0;\fP
-When this setting is non zero, \fBre2c\fP automatically generates
-conversion code whenever yych gets read. In this case, the type must be
-defined using \fBre2c:define:YYCTYPE\fP\&.
-.TP
-.B \fBre2c:yych:emit = 1;\fP
-Set this to zero to suppress the generation of \fIyych\fP\&.
+.B \fBre2c:define:YYCONDTYPE = \(aqYYCONDTYPE\(aq;\fP
+Enumeration used for condition support with \fB\-c\fP mode.
.TP
-.B \fBre2c:yybm:hex = 0;\fP
-If set to zero, a decimal table will be used. Otherwise, a hexadecimal table will be generated.
+.B \fBre2c:define:YYCTXMARKER = \(aqYYCTXMARKER\(aq;\fP
+Replaces the \fBYYCTXMARKER\fP placeholder with the specified identifier.
.TP
-.B \fBre2c:yyfill:enable = 1;\fP
-Set this to zero to suppress the generation of \fBYYFILL (n)\fP\&. When using this, be sure to verify that the generated
-scanner does not read beyond the available input, as allowing such behavior might
-introduce severe security issues to your programs.
+.B \fBre2c:define:YYCTYPE = \(aqYYCTYPE\(aq;\fP
+Replaces the \fBYYCTYPE\fP placeholder with the specified type.
.TP
-.B \fBre2c:yyfill:check = 1;\fP
-This can be set to 0 to suppress the generations of
-\fBYYCURSOR\fP and \fBYYLIMIT\fP based precondition checks. This option is useful when
-\fBYYLIMIT + YYMAXFILL\fP is always accessible.
+.B \fBre2c:define:YYCURSOR = \(aqYYCURSOR\(aq;\fP
+Replaces the \fBYYCURSOR\fP placeholder with the specified identifier.
.TP
-.B \fBre2c:define:YYFILL = "YYFILL";\fP
-Define a substitution for \fBYYFILL\fP\&. Note that by default,
-\fBre2c\fP generates an argument in parentheses and a semicolon after
-\fBYYFILL\fP\&. If you need to make \fBYYFILL\fP an arbitrary statement rather
-than a call, set \fBre2c:define:YYFILL:naked\fP to a non\-zero value and use
-\fBre2c:define:YYFILL@len\fP to set a placeholder for the formal parameter inside of your \fBYYFILL\fP
-body.
+.B \fBre2c:define:YYDEBUG = \(aqYYDEBUG\(aq;\fP
+Replaces the \fBYYDEBUG\fP placeholder with the specified identifier.
.TP
-.B \fBre2c:define:YYFILL@len = "@@";\fP
+.B \fBre2c:define:YYFILL@len = \(aq@@\(aq;\fP
Any occurrence of this text
inside of a \fBYYFILL\fP call will be replaced with the actual argument.
.TP
-.B \fBre2c:yyfill:parameter = 1;\fP
-Controls the argument in the parentheses that follow \fBYYFILL\fP\&. If zero, the argument is omitted.
-If non\-zero, the argument is generated unless \fBre2c:define:YYFILL:naked\fP is set to non\-zero.
-.TP
.B \fBre2c:define:YYFILL:naked = 0;\fP
Controls the argument in the parentheses after \fBYYFILL\fP and
the following semicolon. If zero, both the argument and the semicolon are
\fBre2c:yyfill:parameter\fP is set to zero; the semicolon is generated
unconditionally.
.TP
-.B \fBre2c:startlabel = 0;\fP
-If set to a non zero integer, then the start
-label of the next scanner block will be generated even if it isn\(aqt used by
-the scanner itself. Otherwise, the normal \fByy0\fP\-like start label is only
-generated if needed. If set to a text value, then a label with that
-text will be generated regardless of whether the normal start label is
-used or not. This setting is reset to 0 after a start label has been generated.
+.B \fBre2c:define:YYFILL = \(aqYYFILL\(aq;\fP
+Define a substitution for \fBYYFILL\fP\&. Note that by default,
+\fBre2c\fP generates an argument in parentheses and a semicolon after
+\fBYYFILL\fP\&. If you need to make \fBYYFILL\fP an arbitrary statement rather
+than a call, set \fBre2c:define:YYFILL:naked\fP to a non\-zero value and use
+\fBre2c:define:YYFILL@len\fP to set a placeholder for the formal parameter inside of your \fBYYFILL\fP
+body.
.TP
-.B \fBre2c:labelprefix = "yy";\fP
-Allows to change the prefix of numbered
-labels. The default is \fByy\fP\&. Can be set any string that is valid in
-a label name.
+.B \fBre2c:define:YYGETCONDITION:naked = 0;\fP
+Controls the parentheses after
+\fBYYGETCONDITION\fP\&. If zero, the parentheses are omitted. If non\-zero, the parentheses are
+generated.
.TP
-.B \fBre2c:state:abort = 0;\fP
-When not zero and the \fB\-f\fP switch is active, then
-the \fBYYGETSTATE\fP block will contain a default case that aborts and a \-1
-case will be used for initialization.
+.B \fBre2c:define:YYGETCONDITION = \(aqYYGETCONDITION\(aq;\fP
+Substitution for
+\fBYYGETCONDITION\fP\&. Note that by default, \fBre2c\fP generates parentheses after
+\fBYYGETCONDITION\fP\&. Set \fBre2c:define:YYGETCONDITION:naked\fP to non\-zero to
+omit the parentheses.
.TP
-.B \fBre2c:state:nextlabel = 0;\fP
-Used when \fB\-f\fP is active to control
-whether the \fBYYGETSTATE\fP block is followed by a \fByyNext:\fP label line.
-Instead of using \fByyNext\fP, you can usually also use configuration
-\fBstartlabel\fP to force a specific start label or default to \fByy0\fP as
-a start label. Instead of using a dedicated label, it is often better to
-separate the \fBYYGETSTATE\fP code from the actual scanner code by placing a
-\fB/*!getstate:re2c*/\fP comment.
+.B \fBre2c:define:YYGETSTATE:naked = 0;\fP
+Controls the parentheses that follow
+\fBYYGETSTATE\fP\&. If zero, the parentheses are omitted. If non\-zero, they are
+generated.
.TP
-.B \fBre2c:cgoto:threshold = 9;\fP
-When \fB\-g\fP is active, this value specifies
-the complexity threshold that triggers the generation of jump tables rather
-than nested ifs and decision bitfields. The threshold is compared
-against a calculated estimation of ifs needed where every used bitmap
-divides the threshold by 2.
+.B \fBre2c:define:YYGETSTATE = \(aqYYGETSTATE\(aq;\fP
+Substitution for
+\fBYYGETSTATE\fP\&. Note that by default, \fBre2c\fP generates parentheses after
+\fBYYGETSTATE\fP\&. Set \fBre2c:define:YYGETSTATE:naked\fP to non\-zero to omit
+the parentheses.
.TP
-.B \fBre2c:yych:conversion = 0;\fP
-When input uses signed characters and the
-\fB\-s\fP or \fB\-b\fP switches are in effect, re2c allows automatic conversion
-to the unsigned character type that is then necessary for its internal
-single character. When this setting is zero or an empty string, the
-conversion is disabled. If a non zero number is used, the conversion is taken
-from \fBYYCTYPE\fP\&. If \fBYYCTYPE\fP is overridden by an inplace configuration setting, that setting is
-is used instead of a \fBYYCTYPE\fP cast. Otherwise, it will be \fB(YYCTYPE)\fP and changes to that
-configuration are no longer possible. When this setting is a string, it must contain the casting
-parentheses. Now assuming your input is a \fBchar *\fP buffer and you are using the above mentioned switches, you can set
-\fBYYCTYPE\fP to \fBunsigned char\fP and this setting to either 1 or \fB(unsigned char)\fP\&.
-.TP
-.B \fBre2c:define:YYCONDTYPE = "YYCONDTYPE";\fP
-Enumeration used for condition support with \fB\-c\fP mode.
+.B \fBre2c:define:YYLESSTHAN = \(aqYYLESSTHAN\(aq;\fP
+Replaces \fBYYLESSTHAN\fP identifier with the specified string.
.TP
-.B \fBre2c:define:YYCTXMARKER = "YYCTXMARKER";\fP
-Replaces the \fBYYCTXMARKER\fP placeholder with the specified identifier.
+.B \fBre2c:define:YYLIMIT = \(aqYYLIMIT\(aq;\fP
+Replaces the \fBYYLIMIT\fP placeholder with the specified identifier.
+needed.
.TP
-.B \fBre2c:define:YYCTYPE = "YYCTYPE";\fP
-Replaces the \fBYYCTYPE\fP placeholder with the specified type.
+.B \fBre2c:define:YYMARKER = \(aqYYMARKER\(aq;\fP
+Replaces the \fBYYMARKER\fP placeholder with the specified identifier.
.TP
-.B \fBre2c:define:YYCURSOR = "YYCURSOR";\fP
-Replaces the \fBYYCURSOR\fP placeholder with the specified identifier.
+.B \fBre2c:define:YYMTAGN = \(aqYYMTAGN\(aq;\fP
+Replaces \fBYYMTAGN\fP identifier with the specified string.
.TP
-.B \fBre2c:define:YYDEBUG = "YYDEBUG";\fP
-Replaces the \fBYYDEBUG\fP placeholder with the specified identifier.
+.B \fBre2c:define:YYMTAGP = \(aqYYMTAGP\(aq;\fP
+Replaces \fBYYMTAGP\fP identifier with the specified string.
.TP
-.B \fBre2c:define:YYGETCONDITION = "YYGETCONDITION";\fP
-Substitution for
-\fBYYGETCONDITION\fP\&. Note that by default, \fBre2c\fP generates parentheses after
-\fBYYGETCONDITION\fP\&. Set \fBre2c:define:YYGETCONDITION:naked\fP to non\-zero to
-omit the parentheses.
+.B \fBre2c:define:YYPEEK = \(aqYYPEEK\(aq;\fP
+Replaces \fBYYPEEK\fP identifier with the specified string.
.TP
-.B \fBre2c:define:YYGETCONDITION:naked = 0;\fP
-Controls the parentheses after
-\fBYYGETCONDITION\fP\&. If zero, the parentheses are omitted. If non\-zero, the parentheses are
-generated.
+.B \fBre2c:define:YYRESTORECTX = \(aqYYRESTORECTX\(aq;\fP
+Replaces \fBYYRESTORECTX\fP identifier with the specified string.
.TP
-.B \fBre2c:define:YYSETCONDITION = "YYSETCONDITION";\fP
-Substitution for
-\fBYYSETCONDITION\fP\&. Note that by default, \fBre2c\fP generates an argument in
-parentheses followed by semicolon after \fBYYSETCONDITION\fP\&. If you need to make
-\fBYYSETCONDITION\fP an arbitrary statement rather than a call, set
-\fBre2c:define:YYSETCONDITION:naked\fP to non\-zero and use
-\fBre2c:define:YYSETCONDITION@cond\fP to denote the formal parameter inside of the
-\fBYYSETCONDITION\fP body.
+.B \fBre2c:define:YYRESTORE = \(aqYYRESTORE\(aq;\fP
+Replaces \fBYYRESTORE\fP identifier with the specified string.
+.TP
+.B \fBre2c:define:YYRESTORETAG = \(aqYYRESTORETAG\(aq;\fP
+Replaces \fBYYRESTORETAG\fP identifier with the specified string.
.TP
-.B \fBre2c:define:YYSETCONDITION@cond = "@@";\fP
+.B \fBre2c:define:YYSETCONDITION@cond = \(aq@@\(aq;\fP
Any occurrence of this
text inside of \fBYYSETCONDITION\fP will be replaced with the actual
argument.
the semicolon are omitted. If non\-zero, both the argument and the semicolon are
generated.
.TP
-.B \fBre2c:define:YYGETSTATE = "YYGETSTATE";\fP
+.B \fBre2c:define:YYSETCONDITION = \(aqYYSETCONDITION\(aq;\fP
Substitution for
-\fBYYGETSTATE\fP\&. Note that by default, \fBre2c\fP generates parentheses after
-\fBYYGETSTATE\fP\&. Set \fBre2c:define:YYGETSTATE:naked\fP to non\-zero to omit
-the parentheses.
+\fBYYSETCONDITION\fP\&. Note that by default, \fBre2c\fP generates an argument in
+parentheses followed by semicolon after \fBYYSETCONDITION\fP\&. If you need to make
+\fBYYSETCONDITION\fP an arbitrary statement rather than a call, set
+\fBre2c:define:YYSETCONDITION:naked\fP to non\-zero and use
+\fBre2c:define:YYSETCONDITION@cond\fP to denote the formal parameter inside of the
+\fBYYSETCONDITION\fP body.
.TP
-.B \fBre2c:define:YYGETSTATE:naked = 0;\fP
-Controls the parentheses that follow
-\fBYYGETSTATE\fP\&. If zero, the parentheses are omitted. If non\-zero, they are
-generated.
+.B \fBre2c:define:YYSETSTATE:naked = 0;\fP
+Controls the argument in parentheses and the
+semicolon after \fBYYSETSTATE\fP\&. If zero, both argument and the semicolon are
+omitted. If non\-zero, both the argument and the semicolon are generated.
+.TP
+.B \fBre2c:define:YYSETSTATE@state = \(aq@@\(aq;\fP
+Any occurrence of this text
+inside of \fBYYSETSTATE\fP will be replaced with the actual argument.
.TP
-.B \fBre2c:define:YYSETSTATE = "YYSETSTATE";\fP
+.B \fBre2c:define:YYSETSTATE = \(aqYYSETSTATE\(aq;\fP
Substitution for
\fBYYSETSTATE\fP\&. Note that by default, \fBre2c\fP generates an argument in parentheses
followed by a semicolon after \fBYYSETSTATE\fP\&. If you need to make \fBYYSETSTATE\fP an
\fBre2c:define:YYSETSTATE@cond\fP to denote formal parameter inside of
your \fBYYSETSTATE\fP body.
.TP
-.B \fBre2c:define:YYSETSTATE@state = "@@";\fP
-Any occurrence of this text
-inside of \fBYYSETSTATE\fP will be replaced with the actual argument.
+.B \fBre2c:define:YYSKIP = \(aqYYSKIP\(aq;\fP
+Replaces \fBYYSKIP\fP identifier with the specified string.
.TP
-.B \fBre2c:define:YYSETSTATE:naked = 0;\fP
-Controls the argument in parentheses and the
-semicolon after \fBYYSETSTATE\fP\&. If zero, both argument and the semicolon are
-omitted. If non\-zero, both the argument and the semicolon are generated.
+.B \fBre2c:define:YYSTAGN = \(aqYYSTAGN\(aq;\fP
+Replaces \fBYYSTAGN\fP identifier with the specified string.
.TP
-.B \fBre2c:define:YYLIMIT = "YYLIMIT";\fP
-Replaces the \fBYYLIMIT\fP placeholder with the specified identifier.
-needed.
+.B \fBre2c:define:YYSTAGP = \(aqYYSTAGP\(aq;\fP
+Replaces \fBYYSTAGP\fP identifier with the specified string.
.TP
-.B \fBre2c:define:YYMARKER = "YYMARKER";\fP
-Replaces the \fBYYMARKER\fP placeholder with the specified identifier.
+.B \fBre2c:flags:8\fP or \fBre2c:flags:utf\-8\fP
+Same as \fB\-8 \-\-utf\-8\fP command\-line option.
+.TP
+.B \fBre2c:flags:b\fP or \fBre2c:flags:bit\-vectors\fP
+Same as \fB\-b \-\-bit\-vectors\fP command\-line option.
+.TP
+.B \fBre2c:flags:case\-insensitive = 0;\fP
+Same as \fB\-\-case\-insensitive\fP command\-line option.
+.TP
+.B \fBre2c:flags:case\-inverted = 0;\fP
+Same as \fB\-\-case\-inverted\fP command\-line option.
+.TP
+.B \fBre2c:flags:d\fP or \fBre2c:flags:debug\-output\fP
+Same as \fB\-d \-\-debug\-output\fP command\-line option.
+.TP
+.B \fBre2c:flags:dfa\-minimization = \(aqmoore\(aq;\fP
+Same as \fB\-\-dfa\-minimization\fP command\-line option.
+.TP
+.B \fBre2c:flags:eager\-skip = 0;\fP
+Same as \fB\-\-eager\-skip\fP command\-line option.
+.TP
+.B \fBre2c:flags:e\fP or \fBre2c:flags:ecb\fP
+Same as \fB\-e \-\-ecb\fP command\-line option.
+.TP
+.B \fBre2c:flags:empty\-class = \(aqmatch\-empty\(aq;\fP
+Same as \fB\-\-empty\-class\fP command\-line option.
+.TP
+.B \fBre2c:flags:encoding\-policy = \(aqignore\(aq;\fP
+Same as \fB\-\-encoding\-policy\fP command\-line option.
+.TP
+.B \fBre2c:flags:g\fP or \fBre2c:flags:computed\-gotos\fP
+Same as \fB\-g \-\-computed\-gotos\fP command\-line option.
+.TP
+.B \fBre2c:flags:i\fP or \fBre2c:flags:no\-debug\-info\fP
+Same as \fB\-i \-\-no\-debug\-info\fP command\-line option.
+.TP
+.B \fBre2c:flags:input = \(aqdefault\(aq;\fP
+Same as \fB\-\-input\fP command\-line option.
+.TP
+.B \fBre2c:flags:lookahead = 1;\fP
+Same as inverted \fB\-\-no\-lookahead\fP command\-line option.
+.TP
+.B \fBre2c:flags:optimize\-tags = 1;\fP
+Same as inverted \fB\-\-no\-optimize\-tags\fP command\-line option.
+.TP
+.B \fBre2c:flags:P\fP or \fBre2c:flags:posix\-captures\fP
+Same as \fB\-P \-\-posix\-captures\fP command\-line option.
+.TP
+.B \fBre2c:flags:s\fP or \fBre2c:flags:nested\-ifs\fP
+Same as \fB\-s \-\-nested\-ifs\fP command\-line option.
+.TP
+.B \fBre2c:flags:T\fP or \fBre2c:flags:tags\fP
+Same as \fB\-T \-\-tags\fP command\-line option.
+.TP
+.B \fBre2c:flags:u\fP or \fBre2c:flags:unicode\fP
+Same as \fB\-u \-\-unicode\fP command\-line option.
+.TP
+.B \fBre2c:flags:w\fP or \fBre2c:flags:wide\-chars\fP
+Same as \fB\-w \-\-wide\-chars\fP command\-line option.
+.TP
+.B \fBre2c:flags:x\fP or \fBre2c:flags:utf\-16\fP
+Same as \fB\-x \-\-utf\-16\fP command\-line option.
+.TP
+.B \fBre2c:indent:string = \(aq\et\(aq;\fP
+Specifies the string to use for indentation. Requires a string that should
+contain only whitespace unless you need something else for external tools. The easiest
+way to specify spaces is to enclose them in single or double quotes.
+If you do not want any indentation at all, you can simply set this to \(aq\(aq.
+.TP
+.B \fBre2c:indent:top = 0;\fP
+Specifies the minimum amount of indentation to
+use. Requires a numeric value greater than or equal to zero.
+.TP
+.B \fBre2c:labelprefix = \(aqyy\(aq;\fP
+Allows to change the prefix of numbered
+labels. The default is \fByy\fP\&. Can be set any string that is valid in
+a label name.
.TP
-.B \fBre2c:label:yyFillLabel = "yyFillLabel";\fP
+.B \fBre2c:label:yyFillLabel = \(aqyyFillLabel\(aq;\fP
Overrides the name of the \fByyFillLabel\fP label.
.TP
-.B \fBre2c:label:yyNext = "yyNext";\fP
+.B \fBre2c:label:yyNext = \(aqyyNext\(aq;\fP
Overrides the name of the \fByyNext\fP label.
.TP
+.B \fBre2c:startlabel = 0;\fP
+If set to a non zero integer, then the start
+label of the next scanner block will be generated even if it isn\(aqt used by
+the scanner itself. Otherwise, the normal \fByy0\fP\-like start label is only
+generated if needed. If set to a text value, then a label with that
+text will be generated regardless of whether the normal start label is
+used or not. This setting is reset to 0 after a start label has been generated.
+.TP
+.B \fBre2c:state:abort = 0;\fP
+When not zero and the \fB\-f\fP switch is active, then
+the \fBYYGETSTATE\fP block will contain a default case that aborts and a \-1
+case will be used for initialization.
+.TP
+.B \fBre2c:state:nextlabel = 0;\fP
+Used when \fB\-f\fP is active to control
+whether the \fBYYGETSTATE\fP block is followed by a \fByyNext:\fP label line.
+Instead of using \fByyNext\fP, you can usually also use configuration
+\fBstartlabel\fP to force a specific start label or default to \fByy0\fP as
+a start label. Instead of using a dedicated label, it is often better to
+separate the \fBYYGETSTATE\fP code from the actual scanner code by placing a
+\fB/*!getstate:re2c*/\fP comment.
+.TP
+.B \fBre2c:tags:expression = \(aq@@\(aq;\fP
+Allows to customize the way \fBre2c\fP addresses tag variables:
+by default it emits expressions of the form \fByyt<N>\fP,
+but this might be inconvenient if tag variables are defined as fields in a struct,
+or for any other reason require special accessors.
+For example, setting \fBre2c:tags:expression = p\->@@\fP will result in \fBp\->yyt<N>\fP\&.
+.TP
+.B \fBre2c:tags:prefix = \(aqyyt\(aq;\fP
+Allows to override prefix of tag variables.
+.TP
.B \fBre2c:variable:yyaccept = yyaccept;\fP
Overrides the name of the \fByyaccept\fP variable.
.TP
-.B \fBre2c:variable:yybm = "yybm";\fP
+.B \fBre2c:variable:yybm = \(aqyybm\(aq;\fP
Overrides the name of the \fByybm\fP variable.
.TP
-.B \fBre2c:variable:yych = "yych";\fP
+.B \fBre2c:variable:yych = \(aqyych\(aq;\fP
Overrides the name of the \fByych\fP variable.
.TP
-.B \fBre2c:variable:yyctable = "yyctable";\fP
+.B \fBre2c:variable:yyctable = \(aqyyctable\(aq;\fP
When both \fB\-c\fP and \fB\-g\fP are active, \fBre2c\fP will use this variable to generate a static jump table
for \fBYYGETCONDITION\fP\&.
.TP
-.B \fBre2c:variable:yystable = "yystable";\fP
+.B \fBre2c:variable:yystable = \(aqyystable\(aq;\fP
Deprecated.
.TP
-.B \fBre2c:variable:yytarget = "yytarget";\fP
+.B \fBre2c:variable:yytarget = \(aqyytarget\(aq;\fP
Overrides the name of the \fByytarget\fP variable.
+.TP
+.B \fBre2c:yybm:hex = 0;\fP
+If set to zero, a decimal table will be used. Otherwise, a hexadecimal table will be generated.
+.TP
+.B \fBre2c:yych:conversion = 0;\fP
+When this setting is non zero, \fBre2c\fP automatically generates
+conversion code whenever yych gets read. In this case, the type must be
+defined using \fBre2c:define:YYCTYPE\fP\&.
+.TP
+.B \fBre2c:yych:emit = 1;\fP
+Set this to zero to suppress the generation of \fIyych\fP\&.
+.TP
+.B \fBre2c:yyfill:check = 1;\fP
+This can be set to 0 to suppress the generations of
+\fBYYCURSOR\fP and \fBYYLIMIT\fP based precondition checks. This option is useful when
+\fBYYLIMIT + YYMAXFILL\fP is always accessible.
+.TP
+.B \fBre2c:yyfill:enable = 1;\fP
+Set this to zero to suppress the generation of \fBYYFILL (n)\fP\&. When using this, be sure to verify that the generated
+scanner does not read beyond the available input, as allowing such behavior might
+introduce severe security issues to your programs.
+.TP
+.B \fBre2c:yyfill:parameter = 1;\fP
+Controls the argument in the parentheses that follow \fBYYFILL\fP\&. If zero, the argument is omitted.
+If non\-zero, the argument is generated unless \fBre2c:define:YYFILL:naked\fP is set to non\-zero.
.UNINDENT
.SS REGULAR EXPRESSIONS
.INDENT 0.0
matches a named definition as specified by \fBname\fP only if \fB\-F\fP is
off. If \fB\-F\fP is active then this behaves like it was enclosed in double
quotes and matches the string "name".
+.TP
+.B \fB@stag\fP
+save input position at which \fB@stag\fP matches in a variable named \fBstag\fP
+.TP
+.B \fB#mtag\fP
+save all input positions at which \fB#mtag\fP matches in a variable named \fBmtag\fP
+(multiple positions are possible if \fB#mtag\fP is enclosed in a repetition subexpression that matches several times)
.UNINDENT
.sp
Character classes and string literals may contain octal or hexadecimal
and eight hexadecimal digits (e.g., \fB\eU12345678\fP).
.sp
The only portable "any" rule is the default rule, \fB*\fP\&.
+.SH SUBMATCH EXTRACTION
+.sp
+\fBre2c\fP supports two kinds of submatch extraction.
+.sp
+The first option is \fB\-P \-\-posix\-captures\fP: it enables POSIX\-compliant capturing groups.
+In this mode parentheses in regular expressions denote the beginning and the end of capturing groups;
+the whole regular expression is group number zero.
+The number of groups for the matching rule is stored in a variable \fByynmatch\fP,
+and submatch results are stored in \fByypmatch\fP array.
+Both \fByynmatch\fP and \fByypmatch\fP should be defined by the user;
+note that \fByypmatch\fP size must be at least \fB[yynmatch * 2]\fP\&.
+\fBre2c\fP provides a directive \fB/*!maxnmatch:re2c*/\fP that defines a constant \fBYYMAXNMATCH\fP: the maximal value of \fByynmatch\fP among all rules.
+Note that \fBre2c\fP implements POSIX\-compliant disambiguation:
+each subexpression matches as long as possible,
+and subexpressions that start earlier in regular expression have priority over those starting later.
+.sp
+Second option is \fB\-T \-\-tags\fP\&.
+With this option one can use standalone tags of the form \fB@stag\fP and \fB#mtag\fP instead of capturing parentheses,
+where \fBstag\fP and \fBmtag\fP are arbitrary used\-defined names.
+Tags can be used anywhere inside of a regular expression; semantically they are just position markers.
+Tags of the form \fB@stag\fP are called \fIs\-tags\fP: they denote a single submatch value (the last input position where this tag matched).
+Tags of the form \fB#mtag\fP are called \fIm\-tags\fP: they denote multiple submatch values (the whole history of repetitions of this tag).
+All tags should be defined by the user as variables with the corresponding names.
+With standalone tags \fBre2c\fP uses leftmost greedy disambiguation:
+submatch positions correspond to the leftmost matching path through the regular expression.
+.sp
+With both \fB\-\-posix\-captures\fP and \fB\-\-tags\fP options \fBre2c\fP generates a number of tag variables
+that are used by the lexer to track multiple possible versions of each tag
+(multiple versions are caused by possible ambiguity of submatch).
+When a rule matches, ambiguity is resolved and all tags of this rule (or capturing parentheses, which are also implemented as tags)
+are initialized with the values of appropriate tag variables.
+Note that there is no one\-to\-one correspondence between tag variables and tags:
+the same tag variable may be reused for different tags, and one tag may require multiple tag variables to hold all its ambiguous versions.
+The exact number of tag variables is unknown to the user; this number is determined by \fBre2c\fP\&.
+However, tag variables should be defined by the user, because it might be necessary to update them in \fBYYFILL\fP
+and store them between invocations of lexer with \fB\-\-storable\-state\fP option.
+Therefore \fBre2c\fP provides directives \fB/*!stags:re2c ... */\fP and \fB/*!mtags:re2c ... */\fP
+that can be used to declare, initialize and manipulate tag variables.
+.sp
+\fIS\-tags\fP must support the following operations:
+.INDENT 0.0
+.IP \(bu 2
+save input position to \fIs\-tag\fP:
+\fBt = YYCURSOR\fP with default API, or user\-defined operation \fBYYSTAGP (t)\fP with generic API
+.IP \(bu 2
+save default value to \fIs\-tag\fP:
+\fBt = NULL\fP with default API, or user\-defined operation \fBYYSTAGN (t)\fP with generic API
+.IP \(bu 2
+copy one \fIs\-tag\fP to another:
+\fBt1 = t2\fP
+.UNINDENT
+.sp
+\fIM\-tags\fP must support the following operations:
+.INDENT 0.0
+.IP \(bu 2
+append input position to \fIm\-tag\fP:
+user\-defined operation \fBYYMTAGP (t)\fP with both default and generic API
+.IP \(bu 2
+append default value to \fIm\-tag\fP:
+user\-defined operation \fBYYMTAGN (t)\fP with both default and generic API
+.IP \(bu 2
+copy one \fIm\-tag\fP to another:
+\fBt1 = t2\fP
+.UNINDENT
+.sp
+\fIS\-tags\fP can be implemented as scalar values (pointers or offsets).
+\fIM\-tags\fP need a more complex representation, as they need to store a sequence of tag values.
+The most naive and inefficient representation of \fIm\-tag\fP is a list (array, vector) of tag values;
+a more efficient representation is to store all \fIm\-tags\fP in a prefix\-tree
+represented as array of nodes \fB(v, p)\fP, where \fBv\fP is tag value and \fBp\fP is a pointer to parent node.
+.sp
+For further details see \fBhttp://re2c.org/examples/examples.html\fP page on the website
+or \fBre2c/examples/\fP subdirectory of \fBre2c\fP distribution.
.SH SCANNER WITH STORABLE STATES
.sp
When the \fB\-f\fP flag is specified, \fBre2c\fP generates a scanner that can
T{
\fBYYBACKUP ()\fP
T} T{
-back up current input position
+backup current input position
T}
_
T{
\fBYYBACKUPCTX ()\fP
T} T{
-back up current input position for trailing context
+backup current input position for trailing context
+T}
+_
+T{
+\fBYYSTAGP (t)\fP
+T} T{
+save current input position to tag \fBt\fP
+T}
+_
+T{
+\fBYYSTAGN (t)\fP
+T} T{
+save default value to tag \fBt\fP
+T}
+_
+T{
+\fBYYMTAGP (t)\fP
+T} T{
+append input position to the history of tag \fBt\fP
+T}
+_
+T{
+\fBYYMTAGN (t)\fP
+T} T{
+append default value to the history of tag \fBt\fP
T}
_
T{
T}
_
T{
+\fBYYRESTORETAG (t)\fP
+T} T{
+restore current input position from tag \fBt\fP
+T}
+_
+T{
\fBYYLESSTHAN (n)\fP
T} T{
check if less than \fBn\fP input characters are left
" condition support. This can only be activated when -c is in use.\n"
"\n"
" -T --tags\n"
-" Enable submatch extraction with tags. This option is implied by\n"
-" --posix-captures.\n"
+" Enable submatch extraction with tags.\n"
"\n"
" -P --posix-captures\n"
-" Enable submatch extraction with POSIX-style capturing groups.\n"
-" This option implies -T --tags.\n"
+" Enable submatch extraction with POSIX-style capturing groups.\n"
"\n"
" -u --unicode\n"
" Generate a parser that supports UTF-32. The generated code can\n"
" --no-generation-date\n"
" Suppress date output in the generated file.\n"
"\n"
+" --no-lookahead\n"
+" Use TDFA(0) instead of TDFA(1). This option only has effect\n"
+" with --tags or --posix-captures options.\n"
+"\n"
" --no-optimize-tags\n"
-" Suppress tag optimization (mostly used for debugging).\n"
+" Suppress optimization of tag variables (mostly used for debug‐\n"
+" ging).\n"
"\n"
" --no-version\n"
" Suppress version output in the generated file.\n"
" re2c silently ignores such escapes, but this may as well indi‐\n"
" cate a typo or error in the escape sequence.\n"
"\n"
+" -Wnondeterministic-tags\n"
+" Warn if tag has n-th degree of nondeterminism, where n is\n"
+" greater than 1.\n"
+"\n"
;
SYNTAX
------
-Code for ``re2c`` consists of a set of ``RULES``, ``NAMED DEFINITIONS``, and
+Code for ``re2c`` consists of a set of ``RULES``, ``NAMED DEFINITIONS``, ``CODE`` and
``INPLACE CONFIGURATIONS``.
.. include:: @top_srcdir@/doc/manual/syntax/regular_expressions.rst_
+SUBMATCH EXTRACTION
+-------------------
+
+.. include:: @top_srcdir@/doc/manual/features/submatch/submatch.rst_
+
+
SCANNER WITH STORABLE STATES
----------------------------
customizing input operations. In this mode, ``re2c`` will express all
operations on input in terms of the following primitives:
- +---------------------+-----------------------------------------------------+
- | ``YYPEEK ()`` | get current input character |
- +---------------------+-----------------------------------------------------+
- | ``YYSKIP ()`` | advance to next character |
- +---------------------+-----------------------------------------------------+
- | ``YYBACKUP ()`` | back up current input position |
- +---------------------+-----------------------------------------------------+
- | ``YYBACKUPCTX ()`` | back up current input position for trailing context |
- +---------------------+-----------------------------------------------------+
- | ``YYRESTORE ()`` | restore current input position |
- +---------------------+-----------------------------------------------------+
- | ``YYRESTORECTX ()`` | restore current input position for trailing context |
- +---------------------+-----------------------------------------------------+
- | ``YYLESSTHAN (n)`` | check if less than ``n`` input characters are left |
- +---------------------+-----------------------------------------------------+
+ +----------------------+-----------------------------------------------------------+
+ | ``YYPEEK ()`` | get current input character |
+ +----------------------+-----------------------------------------------------------+
+ | ``YYSKIP ()`` | advance to next character |
+ +----------------------+-----------------------------------------------------------+
+ | ``YYBACKUP ()`` | backup current input position |
+ +----------------------+-----------------------------------------------------------+
+ | ``YYBACKUPCTX ()`` | backup current input position for trailing context |
+ +----------------------+-----------------------------------------------------------+
+ | ``YYSTAGP (t)`` | save current input position to tag ``t`` |
+ +----------------------+-----------------------------------------------------------+
+ | ``YYSTAGN (t)`` | save default value to tag ``t`` |
+ +----------------------+-----------------------------------------------------------+
+ | ``YYMTAGP (t)`` | append input position to the history of tag ``t`` |
+ +----------------------+-----------------------------------------------------------+
+ | ``YYMTAGN (t)`` | append default value to the history of tag ``t`` |
+ +----------------------+-----------------------------------------------------------+
+ | ``YYRESTORE ()`` | restore current input position |
+ +----------------------+-----------------------------------------------------------+
+ | ``YYRESTORECTX ()`` | restore current input position for trailing context |
+ +----------------------+-----------------------------------------------------------+
+ | ``YYRESTORETAG (t)`` | restore current input position from tag ``t`` |
+ +----------------------+-----------------------------------------------------------+
+ | ``YYLESSTHAN (n)`` | check if less than ``n`` input characters are left |
+ +----------------------+-----------------------------------------------------------+
--- /dev/null
+``re2c`` supports two kinds of submatch extraction.
+
+
+The first option is ``-P --posix-captures``: it enables POSIX-compliant capturing groups.
+In this mode parentheses in regular expressions denote the beginning and the end of capturing groups;
+the whole regular expression is group number zero.
+The number of groups for the matching rule is stored in a variable ``yynmatch``,
+and submatch results are stored in ``yypmatch`` array.
+Both ``yynmatch`` and ``yypmatch`` should be defined by the user;
+note that ``yypmatch`` size must be at least ``[yynmatch * 2]``.
+``re2c`` provides a directive ``/*!maxnmatch:re2c*/`` that defines a constant ``YYMAXNMATCH``: the maximal value of ``yynmatch`` among all rules.
+Note that ``re2c`` implements POSIX-compliant disambiguation:
+each subexpression matches as long as possible,
+and subexpressions that start earlier in regular expression have priority over those starting later.
+
+
+Second option is ``-T --tags``.
+With this option one can use standalone tags of the form ``@stag`` and ``#mtag`` instead of capturing parentheses,
+where ``stag`` and ``mtag`` are arbitrary used-defined names.
+Tags can be used anywhere inside of a regular expression; semantically they are just position markers.
+Tags of the form ``@stag`` are called *s-tags*: they denote a single submatch value (the last input position where this tag matched).
+Tags of the form ``#mtag`` are called *m-tags*: they denote multiple submatch values (the whole history of repetitions of this tag).
+All tags should be defined by the user as variables with the corresponding names.
+With standalone tags ``re2c`` uses leftmost greedy disambiguation:
+submatch positions correspond to the leftmost matching path through the regular expression.
+
+
+With both ``--posix-captures`` and ``--tags`` options ``re2c`` generates a number of tag variables
+that are used by the lexer to track multiple possible versions of each tag
+(multiple versions are caused by possible ambiguity of submatch).
+When a rule matches, ambiguity is resolved and all tags of this rule (or capturing parentheses, which are also implemented as tags)
+are initialized with the values of appropriate tag variables.
+Note that there is no one-to-one correspondence between tag variables and tags:
+the same tag variable may be reused for different tags, and one tag may require multiple tag variables to hold all its ambiguous versions.
+The exact number of tag variables is unknown to the user; this number is determined by ``re2c``.
+However, tag variables should be defined by the user, because it might be necessary to update them in ``YYFILL``
+and store them between invocations of lexer with ``--storable-state`` option.
+Therefore ``re2c`` provides directives ``/*!stags:re2c ... */`` and ``/*!mtags:re2c ... */``
+that can be used to declare, initialize and manipulate tag variables.
+
+*S-tags* must support the following operations:
+
+* save input position to *s-tag*:
+ ``t = YYCURSOR`` with default API, or user-defined operation ``YYSTAGP (t)`` with generic API
+* save default value to *s-tag*:
+ ``t = NULL`` with default API, or user-defined operation ``YYSTAGN (t)`` with generic API
+* copy one *s-tag* to another:
+ ``t1 = t2``
+
+*M-tags* must support the following operations:
+
+* append input position to *m-tag*:
+ user-defined operation ``YYMTAGP (t)`` with both default and generic API
+* append default value to *m-tag*:
+ user-defined operation ``YYMTAGN (t)`` with both default and generic API
+* copy one *m-tag* to another:
+ ``t1 = t2``
+
+*S-tags* can be implemented as scalar values (pointers or offsets).
+*M-tags* need a more complex representation, as they need to store a sequence of tag values.
+The most naive and inefficient representation of *m-tag* is a list (array, vector) of tag values;
+a more efficient representation is to store all *m-tags* in a prefix-tree
+represented as array of nodes ``(v, p)``, where ``v`` is tag value and ``p`` is a pointer to parent node.
+
+
+For further details see ``http://re2c.org/examples/examples.html`` page on the website
+or ``re2c/examples/`` subdirectory of ``re2c`` distribution.
+
``-T --tags``
Enable submatch extraction with tags.
- This option is implied by ``--posix-captures``.
``-P --posix-captures``
Enable submatch extraction with POSIX-style capturing groups.
- This option implies ``-T --tags``.
``-u --unicode``
Generate a parser that supports UTF-32. The generated
``--no-generation-date``
Suppress date output in the generated file.
+``--no-lookahead``
+ Use TDFA(0) instead of TDFA(1).
+ This option only has effect with ``--tags`` or ``--posix-captures`` options.
+
``--no-optimize-tags``
- Suppress tag optimization (mostly used for debugging).
+ Suppress optimization of tag variables (mostly used for debugging).
``--no-version``
Suppress version output in the generated file.
-``re2c:condprefix = yyc;``
- Allows to specify the prefix used for
- condition labels. That is, the text to be prepended to condition labels
- in the generated output file.
-
-
-``re2c:condenumprefix = yyc;``
- Allows to specify the prefix used for
- condition values. That is, the text to be prepended to condition enum
- values in the generated output file.
+``re2c:cgoto:threshold = 9;``
+ When ``-g`` is active, this value specifies
+ the complexity threshold that triggers the generation of jump tables rather
+ than nested ifs and decision bitfields. The threshold is compared
+ against a calculated estimation of ifs needed where every used bitmap
+ divides the threshold by 2.
-``re2c:cond:divider = "/* *********************************** */";``
+``re2c:cond:divider = '/* *********************************** */';``
Allows to customize the divider for condition blocks. You can use ``@@``
to put the name of the condition or customize the placeholder using
``re2c:cond:divider@cond``.
Specifies the placeholder that will be
replaced with the condition name in ``re2c:cond:divider``.
-``re2c:cond:goto = "goto @@;";``
+``re2c:condenumprefix = yyc;``
+ Allows to specify the prefix used for
+ condition values. That is, the text to be prepended to condition enum
+ values in the generated output file.
+
+``re2c:cond:goto@cond = @@;``
+ Specifies the placeholder that will be replaced with the condition label in ``re2c:cond:goto``.
+
+``re2c:cond:goto = 'goto @@;';``
Allows to customize the condition goto statements used with ``:=>`` style rules. You can use ``@@``
to put the name of the condition or customize the placeholder using
``re2c:cond:goto@cond``. You can also change this to ``continue;``, which
would allow you to continue with the next loop cycle including any code
between your loop start and your re2c block.
-``re2c:cond:goto@cond = @@;``
- Specifies the placeholder that will be replaced with the condition label in ``re2c:cond:goto``.
-
-``re2c:indent:top = 0;``
- Specifies the minimum amount of indentation to
- use. Requires a numeric value greater than or equal to zero.
+``re2c:condprefix = yyc;``
+ Allows to specify the prefix used for
+ condition labels. That is, the text to be prepended to condition labels
+ in the generated output file.
-``re2c:indent:string = "\t";``
- Specifies the string to use for indentation. Requires a string that should
- contain only whitespace unless you need something else for external tools. The easiest
- way to specify spaces is to enclose them in single or double quotes.
- If you do not want any indentation at all, you can simply set this to "".
+``re2c:define:YYBACKUPCTX = 'YYBACKUPCTX';``
+ Replaces ``YYBACKUPCTX`` identifier with the specified string.
-``re2c:yych:conversion = 0;``
- When this setting is non zero, ``re2c`` automatically generates
- conversion code whenever yych gets read. In this case, the type must be
- defined using ``re2c:define:YYCTYPE``.
+``re2c:define:YYBACKUP = 'YYBACKUP';``
+ Replaces ``YYBACKUP`` identifier with the specified string.
-``re2c:yych:emit = 1;``
- Set this to zero to suppress the generation of *yych*.
+``re2c:define:YYCONDTYPE = 'YYCONDTYPE';``
+ Enumeration used for condition support with ``-c`` mode.
-``re2c:yybm:hex = 0;``
- If set to zero, a decimal table will be used. Otherwise, a hexadecimal table will be generated.
+``re2c:define:YYCTXMARKER = 'YYCTXMARKER';``
+ Replaces the ``YYCTXMARKER`` placeholder with the specified identifier.
-``re2c:yyfill:enable = 1;``
- Set this to zero to suppress the generation of ``YYFILL (n)``. When using this, be sure to verify that the generated
- scanner does not read beyond the available input, as allowing such behavior might
- introduce severe security issues to your programs.
+``re2c:define:YYCTYPE = 'YYCTYPE';``
+ Replaces the ``YYCTYPE`` placeholder with the specified type.
-``re2c:yyfill:check = 1;``
- This can be set to 0 to suppress the generations of
- ``YYCURSOR`` and ``YYLIMIT`` based precondition checks. This option is useful when
- ``YYLIMIT + YYMAXFILL`` is always accessible.
+``re2c:define:YYCURSOR = 'YYCURSOR';``
+ Replaces the ``YYCURSOR`` placeholder with the specified identifier.
-``re2c:define:YYFILL = "YYFILL";``
- Define a substitution for ``YYFILL``. Note that by default,
- ``re2c`` generates an argument in parentheses and a semicolon after
- ``YYFILL``. If you need to make ``YYFILL`` an arbitrary statement rather
- than a call, set ``re2c:define:YYFILL:naked`` to a non-zero value and use
- ``re2c:define:YYFILL@len`` to set a placeholder for the formal parameter inside of your ``YYFILL``
- body.
+``re2c:define:YYDEBUG = 'YYDEBUG';``
+ Replaces the ``YYDEBUG`` placeholder with the specified identifier.
-``re2c:define:YYFILL@len = "@@";``
+``re2c:define:YYFILL@len = '@@';``
Any occurrence of this text
inside of a ``YYFILL`` call will be replaced with the actual argument.
-``re2c:yyfill:parameter = 1;``
- Controls the argument in the parentheses that follow ``YYFILL``. If zero, the argument is omitted.
- If non-zero, the argument is generated unless ``re2c:define:YYFILL:naked`` is set to non-zero.
-
``re2c:define:YYFILL:naked = 0;``
Controls the argument in the parentheses after ``YYFILL`` and
the following semicolon. If zero, both the argument and the semicolon are
``re2c:yyfill:parameter`` is set to zero; the semicolon is generated
unconditionally.
-``re2c:startlabel = 0;``
- If set to a non zero integer, then the start
- label of the next scanner block will be generated even if it isn't used by
- the scanner itself. Otherwise, the normal ``yy0``-like start label is only
- generated if needed. If set to a text value, then a label with that
- text will be generated regardless of whether the normal start label is
- used or not. This setting is reset to 0 after a start label has been generated.
+``re2c:define:YYFILL = 'YYFILL';``
+ Define a substitution for ``YYFILL``. Note that by default,
+ ``re2c`` generates an argument in parentheses and a semicolon after
+ ``YYFILL``. If you need to make ``YYFILL`` an arbitrary statement rather
+ than a call, set ``re2c:define:YYFILL:naked`` to a non-zero value and use
+ ``re2c:define:YYFILL@len`` to set a placeholder for the formal parameter inside of your ``YYFILL``
+ body.
-``re2c:labelprefix = "yy";``
- Allows to change the prefix of numbered
- labels. The default is ``yy``. Can be set any string that is valid in
- a label name.
+``re2c:define:YYGETCONDITION:naked = 0;``
+ Controls the parentheses after
+ ``YYGETCONDITION``. If zero, the parentheses are omitted. If non-zero, the parentheses are
+ generated.
-``re2c:state:abort = 0;``
- When not zero and the ``-f`` switch is active, then
- the ``YYGETSTATE`` block will contain a default case that aborts and a -1
- case will be used for initialization.
+``re2c:define:YYGETCONDITION = 'YYGETCONDITION';``
+ Substitution for
+ ``YYGETCONDITION``. Note that by default, ``re2c`` generates parentheses after
+ ``YYGETCONDITION``. Set ``re2c:define:YYGETCONDITION:naked`` to non-zero to
+ omit the parentheses.
-``re2c:state:nextlabel = 0;``
- Used when ``-f`` is active to control
- whether the ``YYGETSTATE`` block is followed by a ``yyNext:`` label line.
- Instead of using ``yyNext``, you can usually also use configuration
- ``startlabel`` to force a specific start label or default to ``yy0`` as
- a start label. Instead of using a dedicated label, it is often better to
- separate the ``YYGETSTATE`` code from the actual scanner code by placing a
- ``/*!getstate:re2c*/`` comment.
+``re2c:define:YYGETSTATE:naked = 0;``
+ Controls the parentheses that follow
+ ``YYGETSTATE``. If zero, the parentheses are omitted. If non-zero, they are
+ generated.
-``re2c:cgoto:threshold = 9;``
- When ``-g`` is active, this value specifies
- the complexity threshold that triggers the generation of jump tables rather
- than nested ifs and decision bitfields. The threshold is compared
- against a calculated estimation of ifs needed where every used bitmap
- divides the threshold by 2.
+``re2c:define:YYGETSTATE = 'YYGETSTATE';``
+ Substitution for
+ ``YYGETSTATE``. Note that by default, ``re2c`` generates parentheses after
+ ``YYGETSTATE``. Set ``re2c:define:YYGETSTATE:naked`` to non-zero to omit
+ the parentheses.
-``re2c:yych:conversion = 0;``
- When input uses signed characters and the
- ``-s`` or ``-b`` switches are in effect, re2c allows automatic conversion
- to the unsigned character type that is then necessary for its internal
- single character. When this setting is zero or an empty string, the
- conversion is disabled. If a non zero number is used, the conversion is taken
- from ``YYCTYPE``. If ``YYCTYPE`` is overridden by an inplace configuration setting, that setting is
- is used instead of a ``YYCTYPE`` cast. Otherwise, it will be ``(YYCTYPE)`` and changes to that
- configuration are no longer possible. When this setting is a string, it must contain the casting
- parentheses. Now assuming your input is a ``char *`` buffer and you are using the above mentioned switches, you can set
- ``YYCTYPE`` to ``unsigned char`` and this setting to either 1 or ``(unsigned char)``.
-
-``re2c:define:YYCONDTYPE = "YYCONDTYPE";``
- Enumeration used for condition support with ``-c`` mode.
+``re2c:define:YYLESSTHAN = 'YYLESSTHAN';``
+ Replaces ``YYLESSTHAN`` identifier with the specified string.
-``re2c:define:YYCTXMARKER = "YYCTXMARKER";``
- Replaces the ``YYCTXMARKER`` placeholder with the specified identifier.
+``re2c:define:YYLIMIT = 'YYLIMIT';``
+ Replaces the ``YYLIMIT`` placeholder with the specified identifier.
+ needed.
-``re2c:define:YYCTYPE = "YYCTYPE";``
- Replaces the ``YYCTYPE`` placeholder with the specified type.
+``re2c:define:YYMARKER = 'YYMARKER';``
+ Replaces the ``YYMARKER`` placeholder with the specified identifier.
-``re2c:define:YYCURSOR = "YYCURSOR";``
- Replaces the ``YYCURSOR`` placeholder with the specified identifier.
+``re2c:define:YYMTAGN = 'YYMTAGN';``
+ Replaces ``YYMTAGN`` identifier with the specified string.
-``re2c:define:YYDEBUG = "YYDEBUG";``
- Replaces the ``YYDEBUG`` placeholder with the specified identifier.
+``re2c:define:YYMTAGP = 'YYMTAGP';``
+ Replaces ``YYMTAGP`` identifier with the specified string.
-``re2c:define:YYGETCONDITION = "YYGETCONDITION";``
- Substitution for
- ``YYGETCONDITION``. Note that by default, ``re2c`` generates parentheses after
- ``YYGETCONDITION``. Set ``re2c:define:YYGETCONDITION:naked`` to non-zero to
- omit the parentheses.
+``re2c:define:YYPEEK = 'YYPEEK';``
+ Replaces ``YYPEEK`` identifier with the specified string.
-``re2c:define:YYGETCONDITION:naked = 0;``
- Controls the parentheses after
- ``YYGETCONDITION``. If zero, the parentheses are omitted. If non-zero, the parentheses are
- generated.
+``re2c:define:YYRESTORECTX = 'YYRESTORECTX';``
+ Replaces ``YYRESTORECTX`` identifier with the specified string.
-``re2c:define:YYSETCONDITION = "YYSETCONDITION";``
- Substitution for
- ``YYSETCONDITION``. Note that by default, ``re2c`` generates an argument in
- parentheses followed by semicolon after ``YYSETCONDITION``. If you need to make
- ``YYSETCONDITION`` an arbitrary statement rather than a call, set
- ``re2c:define:YYSETCONDITION:naked`` to non-zero and use
- ``re2c:define:YYSETCONDITION@cond`` to denote the formal parameter inside of the
- ``YYSETCONDITION`` body.
+``re2c:define:YYRESTORE = 'YYRESTORE';``
+ Replaces ``YYRESTORE`` identifier with the specified string.
+
+``re2c:define:YYRESTORETAG = 'YYRESTORETAG';``
+ Replaces ``YYRESTORETAG`` identifier with the specified string.
-``re2c:define:YYSETCONDITION@cond = "@@";``
+``re2c:define:YYSETCONDITION@cond = '@@';``
Any occurrence of this
text inside of ``YYSETCONDITION`` will be replaced with the actual
argument.
the semicolon are omitted. If non-zero, both the argument and the semicolon are
generated.
-``re2c:define:YYGETSTATE = "YYGETSTATE";``
+``re2c:define:YYSETCONDITION = 'YYSETCONDITION';``
Substitution for
- ``YYGETSTATE``. Note that by default, ``re2c`` generates parentheses after
- ``YYGETSTATE``. Set ``re2c:define:YYGETSTATE:naked`` to non-zero to omit
- the parentheses.
+ ``YYSETCONDITION``. Note that by default, ``re2c`` generates an argument in
+ parentheses followed by semicolon after ``YYSETCONDITION``. If you need to make
+ ``YYSETCONDITION`` an arbitrary statement rather than a call, set
+ ``re2c:define:YYSETCONDITION:naked`` to non-zero and use
+ ``re2c:define:YYSETCONDITION@cond`` to denote the formal parameter inside of the
+ ``YYSETCONDITION`` body.
-``re2c:define:YYGETSTATE:naked = 0;``
- Controls the parentheses that follow
- ``YYGETSTATE``. If zero, the parentheses are omitted. If non-zero, they are
- generated.
+``re2c:define:YYSETSTATE:naked = 0;``
+ Controls the argument in parentheses and the
+ semicolon after ``YYSETSTATE``. If zero, both argument and the semicolon are
+ omitted. If non-zero, both the argument and the semicolon are generated.
+
+``re2c:define:YYSETSTATE@state = '@@';``
+ Any occurrence of this text
+ inside of ``YYSETSTATE`` will be replaced with the actual argument.
-``re2c:define:YYSETSTATE = "YYSETSTATE";``
+``re2c:define:YYSETSTATE = 'YYSETSTATE';``
Substitution for
``YYSETSTATE``. Note that by default, ``re2c`` generates an argument in parentheses
followed by a semicolon after ``YYSETSTATE``. If you need to make ``YYSETSTATE`` an
``re2c:define:YYSETSTATE@cond`` to denote formal parameter inside of
your ``YYSETSTATE`` body.
-``re2c:define:YYSETSTATE@state = "@@";``
- Any occurrence of this text
- inside of ``YYSETSTATE`` will be replaced with the actual argument.
+``re2c:define:YYSKIP = 'YYSKIP';``
+ Replaces ``YYSKIP`` identifier with the specified string.
-``re2c:define:YYSETSTATE:naked = 0;``
- Controls the argument in parentheses and the
- semicolon after ``YYSETSTATE``. If zero, both argument and the semicolon are
- omitted. If non-zero, both the argument and the semicolon are generated.
+``re2c:define:YYSTAGN = 'YYSTAGN';``
+ Replaces ``YYSTAGN`` identifier with the specified string.
-``re2c:define:YYLIMIT = "YYLIMIT";``
- Replaces the ``YYLIMIT`` placeholder with the specified identifier.
- needed.
+``re2c:define:YYSTAGP = 'YYSTAGP';``
+ Replaces ``YYSTAGP`` identifier with the specified string.
-``re2c:define:YYMARKER = "YYMARKER";``
- Replaces the ``YYMARKER`` placeholder with the specified identifier.
+``re2c:flags:8`` or ``re2c:flags:utf-8``
+ Same as ``-8 --utf-8`` command-line option.
+
+``re2c:flags:b`` or ``re2c:flags:bit-vectors``
+ Same as ``-b --bit-vectors`` command-line option.
+
+``re2c:flags:case-insensitive = 0;``
+ Same as ``--case-insensitive`` command-line option.
+
+``re2c:flags:case-inverted = 0;``
+ Same as ``--case-inverted`` command-line option.
+
+``re2c:flags:d`` or ``re2c:flags:debug-output``
+ Same as ``-d --debug-output`` command-line option.
+
+``re2c:flags:dfa-minimization = 'moore';``
+ Same as ``--dfa-minimization`` command-line option.
+
+``re2c:flags:eager-skip = 0;``
+ Same as ``--eager-skip`` command-line option.
+
+``re2c:flags:e`` or ``re2c:flags:ecb``
+ Same as ``-e --ecb`` command-line option.
+
+``re2c:flags:empty-class = 'match-empty';``
+ Same as ``--empty-class`` command-line option.
+
+``re2c:flags:encoding-policy = 'ignore';``
+ Same as ``--encoding-policy`` command-line option.
+
+``re2c:flags:g`` or ``re2c:flags:computed-gotos``
+ Same as ``-g --computed-gotos`` command-line option.
+
+``re2c:flags:i`` or ``re2c:flags:no-debug-info``
+ Same as ``-i --no-debug-info`` command-line option.
+
+``re2c:flags:input = 'default';``
+ Same as ``--input`` command-line option.
+
+``re2c:flags:lookahead = 1;``
+ Same as inverted ``--no-lookahead`` command-line option.
+
+``re2c:flags:optimize-tags = 1;``
+ Same as inverted ``--no-optimize-tags`` command-line option.
+
+``re2c:flags:P`` or ``re2c:flags:posix-captures``
+ Same as ``-P --posix-captures`` command-line option.
+
+``re2c:flags:s`` or ``re2c:flags:nested-ifs``
+ Same as ``-s --nested-ifs`` command-line option.
+
+``re2c:flags:T`` or ``re2c:flags:tags``
+ Same as ``-T --tags`` command-line option.
+
+``re2c:flags:u`` or ``re2c:flags:unicode``
+ Same as ``-u --unicode`` command-line option.
-``re2c:label:yyFillLabel = "yyFillLabel";``
+``re2c:flags:w`` or ``re2c:flags:wide-chars``
+ Same as ``-w --wide-chars`` command-line option.
+
+``re2c:flags:x`` or ``re2c:flags:utf-16``
+ Same as ``-x --utf-16`` command-line option.
+
+``re2c:indent:string = '\t';``
+ Specifies the string to use for indentation. Requires a string that should
+ contain only whitespace unless you need something else for external tools. The easiest
+ way to specify spaces is to enclose them in single or double quotes.
+ If you do not want any indentation at all, you can simply set this to ''.
+
+``re2c:indent:top = 0;``
+ Specifies the minimum amount of indentation to
+ use. Requires a numeric value greater than or equal to zero.
+
+``re2c:labelprefix = 'yy';``
+ Allows to change the prefix of numbered
+ labels. The default is ``yy``. Can be set any string that is valid in
+ a label name.
+
+``re2c:label:yyFillLabel = 'yyFillLabel';``
Overrides the name of the ``yyFillLabel`` label.
-``re2c:label:yyNext = "yyNext";``
+``re2c:label:yyNext = 'yyNext';``
Overrides the name of the ``yyNext`` label.
+``re2c:startlabel = 0;``
+ If set to a non zero integer, then the start
+ label of the next scanner block will be generated even if it isn't used by
+ the scanner itself. Otherwise, the normal ``yy0``-like start label is only
+ generated if needed. If set to a text value, then a label with that
+ text will be generated regardless of whether the normal start label is
+ used or not. This setting is reset to 0 after a start label has been generated.
+
+``re2c:state:abort = 0;``
+ When not zero and the ``-f`` switch is active, then
+ the ``YYGETSTATE`` block will contain a default case that aborts and a -1
+ case will be used for initialization.
+
+``re2c:state:nextlabel = 0;``
+ Used when ``-f`` is active to control
+ whether the ``YYGETSTATE`` block is followed by a ``yyNext:`` label line.
+ Instead of using ``yyNext``, you can usually also use configuration
+ ``startlabel`` to force a specific start label or default to ``yy0`` as
+ a start label. Instead of using a dedicated label, it is often better to
+ separate the ``YYGETSTATE`` code from the actual scanner code by placing a
+ ``/*!getstate:re2c*/`` comment.
+
+``re2c:tags:expression = '@@';``
+ Allows to customize the way ``re2c`` addresses tag variables:
+ by default it emits expressions of the form ``yyt<N>``,
+ but this might be inconvenient if tag variables are defined as fields in a struct,
+ or for any other reason require special accessors.
+ For example, setting ``re2c:tags:expression = p->@@`` will result in ``p->yyt<N>``.
+
+``re2c:tags:prefix = 'yyt';``
+ Allows to override prefix of tag variables.
+
``re2c:variable:yyaccept = yyaccept;``
Overrides the name of the ``yyaccept`` variable.
-``re2c:variable:yybm = "yybm";``
+``re2c:variable:yybm = 'yybm';``
Overrides the name of the ``yybm`` variable.
-``re2c:variable:yych = "yych";``
+``re2c:variable:yych = 'yych';``
Overrides the name of the ``yych`` variable.
-``re2c:variable:yyctable = "yyctable";``
+``re2c:variable:yyctable = 'yyctable';``
When both ``-c`` and ``-g`` are active, ``re2c`` will use this variable to generate a static jump table
for ``YYGETCONDITION``.
-``re2c:variable:yystable = "yystable";``
+``re2c:variable:yystable = 'yystable';``
Deprecated.
-``re2c:variable:yytarget = "yytarget";``
+``re2c:variable:yytarget = 'yytarget';``
Overrides the name of the ``yytarget`` variable.
+``re2c:yybm:hex = 0;``
+ If set to zero, a decimal table will be used. Otherwise, a hexadecimal table will be generated.
+
+``re2c:yych:conversion = 0;``
+ When this setting is non zero, ``re2c`` automatically generates
+ conversion code whenever yych gets read. In this case, the type must be
+ defined using ``re2c:define:YYCTYPE``.
+
+``re2c:yych:emit = 1;``
+ Set this to zero to suppress the generation of *yych*.
+
+``re2c:yyfill:check = 1;``
+ This can be set to 0 to suppress the generations of
+ ``YYCURSOR`` and ``YYLIMIT`` based precondition checks. This option is useful when
+ ``YYLIMIT + YYMAXFILL`` is always accessible.
+
+``re2c:yyfill:enable = 1;``
+ Set this to zero to suppress the generation of ``YYFILL (n)``. When using this, be sure to verify that the generated
+ scanner does not read beyond the available input, as allowing such behavior might
+ introduce severe security issues to your programs.
+
+``re2c:yyfill:parameter = 1;``
+ Controls the argument in the parentheses that follow ``YYFILL``. If zero, the argument is omitted.
+ If non-zero, the argument is generated unless ``re2c:define:YYFILL:naked`` is set to non-zero.
+
+
+``YYBACKUP ()``
+ Backup current input position (used only with generic API).
+
+``YYBACKUPCTX ()``
+ Backup current input position for trailing context (used only with generic API).
+
``YYCONDTYPE``
In ``-c`` mode, you can use ``-t`` to generate a file that
contains the enumeration used as conditions. Each of the values refers
case, the scanner will resume operations right after where the last
``YYFILL (n)`` was called.
+``YYLESSTHAN (n)``
+ Check if less than ``n`` input characters are left (used only with generic API).
+
``YYLIMIT``
An expression of type ``YYCTYPE *`` that marks the end of the buffer ``YYLIMIT[-1]``
is the last character in the buffer). The generated code repeatedly
The generated code saves backtracking information in ``YYMARKER``. Some
simple scanners might not use this.
+``YYMTAGP (t)``
+ Append current input position to the history of tag ``t``.
+
+``YYMTAGN (t)``
+ Append default value to the history of tag ``t``.
+
``YYMAXFILL``
This will be automatically defined by ``/*!max:re2c*/`` blocks as explained above.
+``YYMAXNMATCH``
+ This will be automatically defined by ``/*!maxnmatch:re2c*/``.
+
+``YYPEEK ()``
+ Get current input character (used only with generic API).
+
+``YYRESTORE ()``
+ Restore input position (used only with generic API).
+
+``YYRESTORECTX ()``
+ Restore input position from the value of trailing context (used only with generic API).
+
+``YYRESTORETAG (t)``
+ Restore input position from the value of tag ``t`` (used only with generic API).
+
``YYSETCONDITION (c)``
This define is used to set the condition in
transition rules. This is only being used when ``-c`` is active and
generated code will contain both ``YYSETSTATE (s)`` and ``YYGETSTATE`` even
if ``YYFILL (n)`` is disabled.
+``YYSKIP ()``
+ Advance input position to the next character (used only with generic API).
+
+``YYSTAGP (t)``
+ Save current input position to tag ``t`` (used only with generic API).
+
+``YYSTAGN (t)``
+ Save default value to tag ``t`` (used only with generic API).
+
off. If ``-F`` is active then this behaves like it was enclosed in double
quotes and matches the string "name".
+``@stag``
+ save input position at which ``@stag`` matches in a variable named ``stag``
+
+``#mtag``
+ save all input positions at which ``#mtag`` matches in a variable named ``mtag``
+ (multiple positions are possible if ``#mtag`` is enclosed in a repetition subexpression that matches several times)
+
Character classes and string literals may contain octal or hexadecimal
character definitions and the following set of escape sequences:
``\a``, ``\b``, ``\f``, ``\n``, ``\r``, ``\t``, ``\v``, ``\\``. An octal character is defined by a backslash
- ``regular-expression { C/C++ code }``
-
- ``regular-expression := C/C++ code``
-
-There is one special rule: the default rule (``*``)
-
- ``* { C/C++ code }``
-
- ``* := C/C++ code``
-
-Note that the default rule (``*``) differs from ``[^]``: the default rule has the lowest priority,
+There is one special rule that can be used instead of regular expression: the default rule ``*``.
+Note that the default rule ``*`` differs from ``[^]``: the default rule has the lowest priority,
matches any code unit (either valid or invalid) and always consumes exactly one character.
``[^]``, on the other hand, matches any valid code point (not the same as a code unit) and can consume multiple
code units. In fact, when a variable-length encoding is used, ``*``
is the only possible way to match an invalid input character.
+In general, all rules have the form:
+
+ ``regular-expression-or-* code``
+
+
If ``-c`` is active, then each regular expression is preceded by a list
of comma-separated condition names. Besides the normal naming rules, there
are two special cases: ``<*>`` (these rules are merged to all conditions)
is changed to ``continue``. If some code is needed before all rules (though not before simple jumps), you
can insert it with ``<!>`` pseudo-rules.
- ``<condition-list> regular-expression { C/C++ code }``
-
- ``<condition-list> regular-expression := C/C++ code``
-
- ``<condition-list> * { C/C++ code }``
-
- ``<condition-list> * := C/C++ code``
-
- ``<condition-list> regular-expression => condition { C/C++ code }``
-
- ``<condition-list> regular-expression => condition := C/C++ code``
-
- ``<condition-list> * => condition { C/C++ code }``
-
- ``<condition-list> * => condition := C/C++ code``
+ ``<condition-list-or-*> regular-expression-or-* code``
- ``<condition-list> regular-expression :=> condition``
+ ``<condition-list-or-*> regular-expression-or-* => condition code``
+ ``<condition-list-or-*> regular-expression-or-* :=> condition``
- ``<*> regular-expression { C/C++ code }``
- ``<*> regular-expression := C/C++ code``
+ ``<> code``
- ``<*> * { C/C++ code }``
-
- ``<*> * := C/C++ code``
-
- ``<*> regular-expression => condition { C/C++ code }``
-
- ``<*> regular-expression => condition := C/C++ code``
-
- ``<*> * => condition { C/C++ code }``
-
- ``<*> * => condition := C/C++ code``
-
- ``<*> regular-expression :=> condition``
-
-
- ``<> { C/C++ code }``
-
- ``<> := C/C++ code``
-
- ``<> => condition { C/C++ code }``
-
- ``<> => condition := C/C++ code``
+ ``<> => condition code``
``<> :=> condition``
- ``<> :=> condition``
-
-
- ``<! condition-list> { C/C++ code }``
-
- ``<! condition-list> := C/C++ code``
- ``<!> { C/C++ code }``
+ ``<!condition-list> code``
- ``<!> := C/C++ code``
+ ``<!> code``
Warn if a symbol is escaped when it shouldn't be.
By default, re2c silently ignores such escapes, but this may as well indicate a
typo or error in the escape sequence.
+
+``-Wnondeterministic-tags``
+ Warn if tag has ``n``-th degree of nondeterminism, where ``n`` is greater than 1.
+