Changed the decsription of tags in docs to avoid ambiguity.

author Ulya Trofimovich <skvadrik@gmail.com>

Fri, 11 Oct 2019 06:42:50 +0000 (07:42 +0100)

committer Ulya Trofimovich <skvadrik@gmail.com>

Fri, 11 Oct 2019 06:42:50 +0000 (07:42 +0100)
author Ulya Trofimovich <skvadrik@gmail.com>
Fri, 11 Oct 2019 06:42:50 +0000 (07:42 +0100)
committer Ulya Trofimovich <skvadrik@gmail.com>
Fri, 11 Oct 2019 06:42:50 +0000 (07:42 +0100)
diff --git a/bootstrap/doc/re2c.1 b/bootstrap/doc/re2c.1

index e5545e0eebf17ce177b2a69761f12ec9af073b07..2b2d072c89442e5404dedfdb49c0ce24fa4d31fd 100644 (file)
--- a/bootstrap/doc/re2c.1
+++ b/bootstrap/doc/re2c.1
@@ -1726,12 +1726,20 @@ struct State {
  .UNINDENT
  .SH SUBMATCH EXTRACTION
  .sp
-Re2c has two options for submatch extraction.
+Submatch extraction in re2c is based on the lookahead\-TDFA algorithm described
+in the
+\fI\%Tagged Deterministic Finite Automata with Lookahead\fP
+paper. The algorithm uses the notion of "tags" \-\-\- position markers that denote
+positions in the regular expression for which the lexer must determine the
+corresponding position in the input string.
+Re2c provides two options for submatch extraction: the first one allows to use
+raw tags, and the second one allows to use the more conventional parenthesized
+capturing groups.
  .sp
  The first option is \fB\-T \-\-tags\fP\&. With this option one can use standalone tags
  of the form \fB@stag\fP and \fB#mtag\fP, where \fBstag\fP and \fBmtag\fP are arbitrary
-used\-defined names. Tags can be used anywhere inside of a regular expression;
-semantically they are just position markers. Tags of the form \fB@stag\fP are
+used\-defined names. Tags can be used anywhere inside of a regular expression.
+Tags of the form \fB@stag\fP are
  called s\-tags: they denote a single submatch value (the last input position
  where this tag matched). Tags of the form \fB#mtag\fP are called m\-tags: they
  denote multiple submatch values (the whole history of repetitions of this tag).
@@ -1752,12 +1760,9 @@ maximal value of \fByynmatch\fP among all rules. Note that re2c implements
  POSIX\-compliant disambiguation: each subexpression matches as long as possible,
  and subexpressions that start earlier in regular expression have priority over
  those starting later. Capturing groups are translated into s\-tags under the
-hood, therefore we use the word "tag" to describe them as well.
+hood.
  .sp
-With both \fB\-P \-\-posix\-captures\fP and \fBT \-\-tags\fP options re2c uses efficient
-submatch extraction algorithm described in the
-\fI\%Tagged Deterministic Finite Automata with Lookahead\fP
-paper. The overhead on submatch extraction in the generated lexer grows with the
+The overhead on submatch extraction in the generated lexer grows with the
  number of tags \-\-\- if this number is moderate, the overhead is barely
  noticeable. In the lexer tags are implemented using a number of tag variables
  generated by re2c. There is no one\-to\-one correspondence between tag variables
diff --git a/doc/manual/submatch/submatch.rst_ b/doc/manual/submatch/submatch.rst_

index eebf0ec1acb810a9989153080782a1a43753a26b..8d1bfd658c742bd770d2a57e2ac7c81c7b1bd825 100644 (file)
--- a/doc/manual/submatch/submatch.rst_
+++ b/doc/manual/submatch/submatch.rst_
@@ -1,9 +1,17 @@
-Re2c has two options for submatch extraction.
+Submatch extraction in re2c is based on the lookahead-TDFA algorithm described
+in the
+`Tagged Deterministic Finite Automata with Lookahead <https://arxiv.org/abs/1907.08837>`_
+paper. The algorithm uses the notion of "tags" --- position markers that denote
+positions in the regular expression for which the lexer must determine the
+corresponding position in the input string.
+Re2c provides two options for submatch extraction: the first one allows to use
+raw tags, and the second one allows to use the more conventional parenthesized
+capturing groups.
  
  The first option is ``-T --tags``. With this option one can use standalone tags
  of the form ``@stag`` and ``#mtag``, where ``stag`` and ``mtag`` are arbitrary
-used-defined names. Tags can be used anywhere inside of a regular expression;
-semantically they are just position markers. Tags of the form ``@stag`` are
+used-defined names. Tags can be used anywhere inside of a regular expression.
+Tags of the form ``@stag`` are
  called s-tags: they denote a single submatch value (the last input position
  where this tag matched). Tags of the form ``#mtag`` are called m-tags: they
  denote multiple submatch values (the whole history of repetitions of this tag).
@@ -24,12 +32,9 @@ maximal value of ``yynmatch`` among all rules. Note that re2c implements
  POSIX-compliant disambiguation: each subexpression matches as long as possible,
  and subexpressions that start earlier in regular expression have priority over
  those starting later. Capturing groups are translated into s-tags under the
-hood, therefore we use the word "tag" to describe them as well.
+hood.
  
-With both ``-P --posix-captures`` and ``T --tags`` options re2c uses efficient
-submatch extraction algorithm described in the
-`Tagged Deterministic Finite Automata with Lookahead <https://arxiv.org/abs/1907.08837>`_
-paper. The overhead on submatch extraction in the generated lexer grows with the
+The overhead on submatch extraction in the generated lexer grows with the
  number of tags --- if this number is moderate, the overhead is barely
  noticeable. In the lexer tags are implemented using a number of tag variables
  generated by re2c. There is no one-to-one correspondence between tag variables
author	Ulya Trofimovich <skvadrik@gmail.com>
	Fri, 11 Oct 2019 06:42:50 +0000 (07:42 +0100)
committer	Ulya Trofimovich <skvadrik@gmail.com>
	Fri, 11 Oct 2019 06:42:50 +0000 (07:42 +0100)
bootstrap/doc/re2c.1		patch \| blob \| history
doc/manual/submatch/submatch.rst_		patch \| blob \| history