From 08508a63422814cd3982afac26b7eb383f872665 Mon Sep 17 00:00:00 2001 From: Ulya Trofimovich Date: Fri, 11 Oct 2019 07:42:50 +0100 Subject: [PATCH] Changed the decsription of tags in docs to avoid ambiguity. Thanks to Des-Nerger for pointing out in bug #140. --- bootstrap/doc/re2c.1 | 21 +++++++++++++-------- doc/manual/submatch/submatch.rst_ | 21 +++++++++++++-------- 2 files changed, 26 insertions(+), 16 deletions(-) diff --git a/bootstrap/doc/re2c.1 b/bootstrap/doc/re2c.1 index e5545e0e..2b2d072c 100644 --- a/bootstrap/doc/re2c.1 +++ b/bootstrap/doc/re2c.1 @@ -1726,12 +1726,20 @@ struct State { .UNINDENT .SH SUBMATCH EXTRACTION .sp -Re2c has two options for submatch extraction. +Submatch extraction in re2c is based on the lookahead\-TDFA algorithm described +in the +\fI\%Tagged Deterministic Finite Automata with Lookahead\fP +paper. The algorithm uses the notion of "tags" \-\-\- position markers that denote +positions in the regular expression for which the lexer must determine the +corresponding position in the input string. +Re2c provides two options for submatch extraction: the first one allows to use +raw tags, and the second one allows to use the more conventional parenthesized +capturing groups. .sp The first option is \fB\-T \-\-tags\fP\&. With this option one can use standalone tags of the form \fB@stag\fP and \fB#mtag\fP, where \fBstag\fP and \fBmtag\fP are arbitrary -used\-defined names. Tags can be used anywhere inside of a regular expression; -semantically they are just position markers. Tags of the form \fB@stag\fP are +used\-defined names. Tags can be used anywhere inside of a regular expression. +Tags of the form \fB@stag\fP are called s\-tags: they denote a single submatch value (the last input position where this tag matched). Tags of the form \fB#mtag\fP are called m\-tags: they denote multiple submatch values (the whole history of repetitions of this tag). @@ -1752,12 +1760,9 @@ maximal value of \fByynmatch\fP among all rules. Note that re2c implements POSIX\-compliant disambiguation: each subexpression matches as long as possible, and subexpressions that start earlier in regular expression have priority over those starting later. Capturing groups are translated into s\-tags under the -hood, therefore we use the word "tag" to describe them as well. +hood. .sp -With both \fB\-P \-\-posix\-captures\fP and \fBT \-\-tags\fP options re2c uses efficient -submatch extraction algorithm described in the -\fI\%Tagged Deterministic Finite Automata with Lookahead\fP -paper. The overhead on submatch extraction in the generated lexer grows with the +The overhead on submatch extraction in the generated lexer grows with the number of tags \-\-\- if this number is moderate, the overhead is barely noticeable. In the lexer tags are implemented using a number of tag variables generated by re2c. There is no one\-to\-one correspondence between tag variables diff --git a/doc/manual/submatch/submatch.rst_ b/doc/manual/submatch/submatch.rst_ index eebf0ec1..8d1bfd65 100644 --- a/doc/manual/submatch/submatch.rst_ +++ b/doc/manual/submatch/submatch.rst_ @@ -1,9 +1,17 @@ -Re2c has two options for submatch extraction. +Submatch extraction in re2c is based on the lookahead-TDFA algorithm described +in the +`Tagged Deterministic Finite Automata with Lookahead `_ +paper. The algorithm uses the notion of "tags" --- position markers that denote +positions in the regular expression for which the lexer must determine the +corresponding position in the input string. +Re2c provides two options for submatch extraction: the first one allows to use +raw tags, and the second one allows to use the more conventional parenthesized +capturing groups. The first option is ``-T --tags``. With this option one can use standalone tags of the form ``@stag`` and ``#mtag``, where ``stag`` and ``mtag`` are arbitrary -used-defined names. Tags can be used anywhere inside of a regular expression; -semantically they are just position markers. Tags of the form ``@stag`` are +used-defined names. Tags can be used anywhere inside of a regular expression. +Tags of the form ``@stag`` are called s-tags: they denote a single submatch value (the last input position where this tag matched). Tags of the form ``#mtag`` are called m-tags: they denote multiple submatch values (the whole history of repetitions of this tag). @@ -24,12 +32,9 @@ maximal value of ``yynmatch`` among all rules. Note that re2c implements POSIX-compliant disambiguation: each subexpression matches as long as possible, and subexpressions that start earlier in regular expression have priority over those starting later. Capturing groups are translated into s-tags under the -hood, therefore we use the word "tag" to describe them as well. +hood. -With both ``-P --posix-captures`` and ``T --tags`` options re2c uses efficient -submatch extraction algorithm described in the -`Tagged Deterministic Finite Automata with Lookahead `_ -paper. The overhead on submatch extraction in the generated lexer grows with the +The overhead on submatch extraction in the generated lexer grows with the number of tags --- if this number is moderate, the overhead is barely noticeable. In the lexer tags are implemented using a number of tag variables generated by re2c. There is no one-to-one correspondence between tag variables -- 2.40.0