Added '-D' flag description (.dot generation), restructured features.

author Ulya Trofimovich <skvadrik@gmail.com>

Thu, 12 Nov 2015 15:17:45 +0000 (15:17 +0000)

committer Ulya Trofimovich <skvadrik@gmail.com>

Thu, 12 Nov 2015 15:17:45 +0000 (15:17 +0000)
author Ulya Trofimovich <skvadrik@gmail.com>
Thu, 12 Nov 2015 15:17:45 +0000 (15:17 +0000)
committer Ulya Trofimovich <skvadrik@gmail.com>
Thu, 12 Nov 2015 15:17:45 +0000 (15:17 +0000)
diff --git a/src/css/default.css b/src/css/default.css

index 2c85c05f6f84314f02917f858762723b9d1d6ca6..d91b96df961ab2aeb2a52b3bba04cd1e36320151 100644 (file)
--- a/src/css/default.css
+++ b/src/css/default.css
@@ -55,3 +55,9 @@ pre.code .literal.number, code .literal.number { color: #ff5500; font-weight: bo
  pre.code .name.builtin, code .name.builtin { color: #352B84 }
  pre.code .deleted, code .deleted { background-color: #DEB0A1}
  pre.code .inserted, code .inserted { background-color: #A3D289}
+
+img {
+    display: block;
+    border: 1px dotted #557799;
+    margin: auto;
+}
diff --git a/src/manual/features/conditions/conditions.rst b/src/manual/features/conditions/conditions.rst

new file mode 100644 (file)

index 0000000..b12a2b1
--- /dev/null
+++ b/src/manual/features/conditions/conditions.rst
@@ -0,0 +1,36 @@
+Conditions
+----------
+
+.. include:: ../home.rst
+
+You can preceed regular expressions with a list of condition names when
+using the ``-c`` switch. In this case ``re2c`` generates scanner blocks for
+each conditon. Where each of the generated blocks has its own
+precondition. The precondition is given by the interface define
+``YYGETCONDITON()`` and must be of type ``YYCONDTYPE``.
+
+There are two special rule types. First, the rules of the condition ``<*>``
+are merged to all conditions (note that they have lower priority than
+other rules of that condition). And second the empty condition list
+allows to provide a code block that does not have a scanner part.
+Meaning it does not allow any regular expression. The condition value
+referring to this special block is always the one with the enumeration
+value 0. This way the code of this special rule can be used to
+initialize a scanner. It is in no way necessary to have these rules: but
+sometimes it is helpful to have a dedicated uninitialized condition
+state.
+
+Non empty rules allow to specify the new condition, which makes them
+transition rules. Besides generating calls for the define
+``YYSETCONDTITION`` no other special code is generated.
+
+There is another kind of special rules that allow to prepend code to any
+code block of all rules of a certain set of conditions or to all code
+blocks to all rules. This can be helpful when some operation is common
+among rules. For instance this can be used to store the length of the
+scanned string. These special setup rules start with an exclamation mark
+followed by either a list of conditions ``<! condition, ... >`` or a star
+``<!*>``. When ``re2c`` generates the code for a rule whose state does not have a
+setup rule and a star'd setup rule is present, than that code will be
+used as setup code.
+
diff --git a/src/manual/features/dot/dot.rst b/src/manual/features/dot/dot.rst

new file mode 100644 (file)

index 0000000..2702c40
--- /dev/null
+++ b/src/manual/features/dot/dot.rst
@@ -0,0 +1,59 @@
+.dot
+----
+
+.. include:: ../home.rst
+
+With ``-D, --emit-dot`` option re2c does not generate C/C++ code.
+Instead, it dumps the generated DFA in `DOT format <https://en.wikipedia.org/wiki/DOT_%28graph_description_language%29>`_.
+One can convert this dump to an image of DFA using `graphviz <http://www.graphviz.org/>`_ or another library.
+
+Say we want a picture of DFA that accepts any UTF-8 code point:
+
+.. include:: utf8_any.re
+    :code: cpp
+    :number-lines:
+
+Generate and render :
+
+.. code-block::
+
+    $ re2c -D8 -o utf8_any.dot utf8_any.re
+    $ dot -T png -o utf8_any.png utf8_any.dot
+
+Here is the picture:
+
+.. image:: utf8_any.png
+    :width: 70%
+
+Note that re2c performs additional transformations on the DFA:
+inserts ``YYFILL`` `checkpoints <../../../examples/example_02.html>`_,
+binds actions, applies basic code deduplication.
+During the transforamtions it splits certain states and adds lambda transitions.
+These transitions correspond to the unlabeled edges on the picture.
+
+A real-world example (JSON lexer, all non-re2c code stripped out):
+
+.. include:: php_json.re
+    :code: cpp
+    :number-lines:
+
+Generate .dot file:
+
+.. code-block::
+
+    $ re2c -Dc -o php_json.dot php_json.re
+
+Render with ```dot -G ratio=0.3 -T png -o php_json_dot.png php_json.dot```:
+
+.. image:: php_json_dot.png
+    :width: 80%
+
+Render with ```neato -E len=4 -T png -o php_json_neato.png php_json.dot```:
+
+.. image:: php_json_neato.png
+    :width: 50%
+
+The generated graph is sometimes very large and requires careful tuning of rendering paratemeters.
+
+
+
diff --git a/src/manual/features/dot/php_json.re b/src/manual/features/dot/php_json.re

new file mode 100644 (file)

index 0000000..5e838a8
--- /dev/null
+++ b/src/manual/features/dot/php_json.re
@@ -0,0 +1,76 @@
+/*!re2c
+       re2c:indent:top = 1;
+       re2c:yyfill:enable = 0;
+
+       DIGIT   = [0-9] ;
+       DIGITNZ = [1-9] ;
+       UINT    = "0" | ( DIGITNZ DIGIT* ) ;
+       INT     = "-"? UINT ;
+       HEX     = DIGIT | [a-fA-F] ;
+       HEXNZ   = DIGITNZ | [a-fA-F] ;
+       HEX7    = [0-7] ;
+       HEXC    = DIGIT | [a-cA-C] ;
+       FLOAT   = INT "." DIGIT+ ;
+       EXP     = ( INT | FLOAT ) [eE] [+-]? DIGIT+ ;
+       NL      = "\r"? "\n" ;
+       WS      = [ \t\r]+ ;
+       EOI     = "\000";
+       CTRL    = [\x00-\x1F] ;
+       UTF8T   = [\x80-\xBF] ;
+       UTF8_1  = [\x00-\x7F] ;
+       UTF8_2  = [\xC2-\xDF] UTF8T ;
+       UTF8_3A = "\xE0" [\xA0-\xBF] UTF8T ;
+       UTF8_3B = [\xE1-\xEC] UTF8T{2} ;
+       UTF8_3C = "\xED" [\x80-\x9F] UTF8T ;
+       UTF8_3D = [\xEE-\xEF] UTF8T{2} ;
+       UTF8_3  = UTF8_3A | UTF8_3B | UTF8_3C | UTF8_3D ;
+       UTF8_4A = "\xF0"[\x90-\xBF] UTF8T{2} ;
+       UTF8_4B = [\xF1-\xF3] UTF8T{3} ;
+       UTF8_4C = "\xF4" [\x80-\x8F] UTF8T{2} ;
+       UTF8_4  = UTF8_4A | UTF8_4B | UTF8_4C ;
+       UTF8    = UTF8_1 | UTF8_2 | UTF8_3 | UTF8_4 ;
+       ANY     = [^] ;
+       ESCPREF = "\\" ;
+       ESCSYM  = ( "\"" | "\\" | "/" | [bfnrt] ) ;
+       ESC     = ESCPREF ESCSYM ;
+       UTFSYM  = "u" ;
+       UTFPREF = ESCPREF UTFSYM ;
+       UCS2    = UTFPREF HEX{4} ;
+       UTF16_1 = UTFPREF "00" HEX7 HEX ;
+       UTF16_2 = UTFPREF "0" HEX7 HEX{2} ;
+       UTF16_3 = UTFPREF ( ( ( HEXC | [efEF] ) HEX ) | ( [dD] HEX7 ) ) HEX{2} ;
+       UTF16_4 = UTFPREF [dD] [89abAB] HEX{2} UTFPREF [dD] [c-fC-F] HEX{2} ;
+       
+       <JS>"{"                  {}
+       <JS>"}"                  {}
+       <JS>"["                  {}
+       <JS>"]"                  {}
+       <JS>":"                  {}
+       <JS>","                  {}
+       <JS>"null"               {}
+       <JS>"true"               {}
+       <JS>"false"              {}
+       <JS>INT                  {}
+       <JS>FLOAT|EXP            {}
+       <JS>NL|WS                {}
+       <JS>EOI                  {}
+       <JS>["]                  {}
+       <STR_P1>CTRL             {}
+       <STR_P1>UTF16_1          {}
+       <STR_P1>UTF16_2          {}
+       <STR_P1>UTF16_4          {}
+       <STR_P1>UCS2             {}
+       <STR_P1>ESC              {}
+       <STR_P1>ESCPREF          {}
+       <STR_P1>["]              {}
+       <STR_P1>UTF8             {}
+       <STR_P1>ANY              {}
+       <STR_P2>UTF16_1          {}
+       <STR_P2>UTF16_2          {}
+       <STR_P2>UTF16_4          {}
+       <STR_P2>UCS2             {}
+       <STR_P2>ESCPREF          {}
+       <STR_P2>["] => JS        {}
+       <STR_P2>ANY              {}
+       <*>ANY                   {}
+*/
diff --git a/src/manual/features/dot/php_json_dot.png b/src/manual/features/dot/php_json_dot.png

new file mode 100644 (file)

index 0000000..6374bcf

Binary files /dev/null and b/src/manual/features/dot/php_json_dot.png differ
diff --git a/src/manual/features/dot/php_json_neato.png b/src/manual/features/dot/php_json_neato.png

new file mode 100644 (file)

index 0000000..b627a2e

Binary files /dev/null and b/src/manual/features/dot/php_json_neato.png differ
diff --git a/src/manual/features/dot/utf8_any.png b/src/manual/features/dot/utf8_any.png

new file mode 100644 (file)

index 0000000..7e5a41b

Binary files /dev/null and b/src/manual/features/dot/utf8_any.png differ
diff --git a/src/manual/features/dot/utf8_any.re b/src/manual/features/dot/utf8_any.re

new file mode 100644 (file)

index 0000000..96668a8
--- /dev/null
+++ b/src/manual/features/dot/utf8_any.re
@@ -0,0 +1,4 @@
+/*!re2c
+    *   {}
+    [^] {}
+*/
diff --git a/src/manual/features/encodings/encodings.rst b/src/manual/features/encodings/encodings.rst

new file mode 100644 (file)

index 0000000..7e59e3d
--- /dev/null
+++ b/src/manual/features/encodings/encodings.rst
@@ -0,0 +1,61 @@
+Encodings
+---------
+
+.. include:: ../home.rst
+
+``re2c`` supports the following encodings: ASCII (default), EBCDIC (``-e``),
+UCS-2 (``-w``), UTF-16 (``-x``), UTF-32 (``-u``) and UTF-8 (``-8``).
+See also inplace configuration ``re2c:flags``.
+
+The following concepts should be clarified when talking about encoding.
+*Code point* is an abstract number, which represents single encoding
+symbol. *Code unit* is the smallest unit of memory, which is used in the
+encoded text (it corresponds to one character in the input stream). One
+or more code units can be needed to represent a single code point,
+depending on the encoding. In *fixed-length* encoding, each code point
+is represented with equal number of code units. In *variable-length*
+encoding, different code points can be represented with different number
+of code units.
+
+* ASCII is a fixed-length encoding. Its code space includes 0x100
+  code points, from 0 to 0xFF. One code point is represented with exactly one
+  1-byte code unit, which has the same value as the code point. Size of
+  ``YYCTYPE`` must be 1 byte.
+
+* EBCDIC is a fixed-length encoding. Its code space includes 0x100
+  code points, from 0 to 0xFF. One code point is represented with exactly
+  one 1-byte code unit, which has the same value as the code point. Size
+  of ``YYCTYPE`` must be 1 byte.
+
+* UCS-2 is a fixed-length encoding. Its code space includes 0x10000
+  code points, from 0 to 0xFFFF. One code point is represented with
+  exactly one 2-byte code unit, which has the same value as the code
+  point. Size of ``YYCTYPE`` must be 2 bytes.
+
+* UTF-16 is a variable-length encoding. Its code space includes all
+  Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
+  code point is represented with one or two 2-byte code units. Size of
+  ``YYCTYPE`` must be 2 bytes.
+
+* UTF-32 is a fixed-length encoding. Its code space includes all
+  Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
+  code point is represented with exactly one 4-byte code unit. Size of
+  ``YYCTYPE`` must be 4 bytes.
+
+* UTF-8 is a variable-length encoding. Its code space includes all
+  Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
+  code point is represented with sequence of one, two, three or four
+  1-byte code units. Size of ``YYCTYPE`` must be 1 byte.
+
+In Unicode, values from range 0xD800 to 0xDFFF (surrogates) are not
+valid Unicode code points, any encoded sequence of code units, that
+would map to Unicode code points in the range 0xD800-0xDFFF, is
+ill-formed. The user can control how ``re2c`` treats such ill-formed
+sequences with ``--encoding-policy <policy>`` flag.
+
+For some encodings, there are code units, that never occur in valid
+encoded stream (e.g. 0xFF byte in UTF-8). If the generated scanner must
+check for invalid input, the only true way to do so is to use default
+rule ``*``. Note, that full range rule ``[^]`` won't catch invalid code units when variable-length encoding is used
+(``[^]`` means "all valid code points", while default rule ``*`` means "all possible code units").
+
diff --git a/src/manual/features/features.rst b/src/manual/features/features.rst

index baa38db0ad9072a992d337fa8ffd1f1f200eac3e..2e68265001c77e8f0c99b1b2d3584b575def937f 100644 (file)
--- a/src/manual/features/features.rst
+++ b/src/manual/features/features.rst
@@ -3,196 +3,14 @@ Features
  ========
  
  .. include:: ../home.rst
-.. include:: ../../contents.rst
  
-Conditions
-----------
+★
  
-You can preceed regular expressions with a list of condition names when
-using the ``-c`` switch. In this case ``re2c`` generates scanner blocks for
-each conditon. Where each of the generated blocks has its own
-precondition. The precondition is given by the interface define
-``YYGETCONDITON()`` and must be of type ``YYCONDTYPE``.
-
-There are two special rule types. First, the rules of the condition ``<*>``
-are merged to all conditions (note that they have lower priority than
-other rules of that condition). And second the empty condition list
-allows to provide a code block that does not have a scanner part.
-Meaning it does not allow any regular expression. The condition value
-referring to this special block is always the one with the enumeration
-value 0. This way the code of this special rule can be used to
-initialize a scanner. It is in no way necessary to have these rules: but
-sometimes it is helpful to have a dedicated uninitialized condition
-state.
-
-Non empty rules allow to specify the new condition, which makes them
-transition rules. Besides generating calls for the define
-``YYSETCONDTITION`` no other special code is generated.
-
-There is another kind of special rules that allow to prepend code to any
-code block of all rules of a certain set of conditions or to all code
-blocks to all rules. This can be helpful when some operation is common
-among rules. For instance this can be used to store the length of the
-scanned string. These special setup rules start with an exclamation mark
-followed by either a list of conditions ``<! condition, ... >`` or a star
-``<!*>``. When ``re2c`` generates the code for a rule whose state does not have a
-setup rule and a star'd setup rule is present, than that code will be
-used as setup code.
-
-State
------
-
-When the ``-f`` flag is specified, ``re2c`` generates a scanner that can
-store its current state, return to the caller, and later resume
-operations exactly where it left off.
-
-The default operation of ``re2c`` is a
-"pull" model, where the scanner asks for extra input whenever it needs it. However, this mode of operation assumes that the scanner is the "owner"
-the parsing loop, and that may not always be convenient.
-
-Typically, if there is a preprocessor ahead of the scanner in the
-stream, or for that matter any other procedural source of data, the
-scanner cannot "ask" for more data unless both scanner and source
-live in a separate threads.
-
-The ``-f`` flag is useful for just this situation: it lets users design
-scanners that work in a "push" model, i.e. where data is fed to the
-scanner chunk by chunk. When the scanner runs out of data to consume, it
-just stores its state, and return to the caller. When more input data is
-fed to the scanner, it resumes operations exactly where it left off.
-
-Changes needed compared to the "pull" model:
-
-* User has to supply macros ``YYSETSTATE ()`` and ``YYGETSTATE (state)``.
-
-* The ``-f`` option inhibits declaration of ``yych`` and ``yyaccept``. So the
-  user has to declare these. Also the user has to save and restore these.
-  In the example ``examples/push_model/push.re`` these are declared as
-  fields of the (C++) class of which the scanner is a method, so they do
-  not need to be saved/restored explicitly. For C they could e.g. be made
-  macros that select fields from a structure passed in as parameter.
-  Alternatively, they could be declared as local variables, saved with
-  ``YYFILL (n)`` when it decides to return and restored at entry to the
-  function. Also, it could be more efficient to save the state from
-  ``YYFILL (n)`` because ``YYSETSTATE (state)`` is called unconditionally.
-  ``YYFILL (n)`` however does not get ``state`` as parameter, so we would have
-  to store state in a local variable by ``YYSETSTATE (state)``.
-
-* Modify ``YYFILL (n)`` to return (from the function calling it) if more input is needed.
-
-* Modify caller to recognise if more input is needed and respond appropriately.
-
-* The generated code will contain a switch block that is used to
-  restores the last state by jumping behind the corrspoding ``YYFILL (n)``
-  call. This code is automatically generated in the epilog of the first ``/*!re2c */``
-  block. It is possible to trigger generation of the ``YYGETSTATE ()``
-  block earlier by placing a ``/*!getstate:re2c*/`` comment. This is especially useful when the scanner code should be
-  wrapped inside a loop.
-
-Please see ``examples/push_model/push.re`` for "push" model scanner. The
-generated code can be tweaked using inplace configurations ``state:abort``
-and ``state:nextlabel``.
-
-Reuse
------
-
-Reuse mode is controlled by ``-r --reusable`` option.
-Allows reuse of scanner definitions with ``/*!use:re2c */`` after ``/*!rules:re2c */``.
-In this mode no ``/*!re2c */`` block and exactly one ``/*!rules:re2c */`` must be present.
-The rules are being saved and used by every ``/*!use:re2c */`` block that follows.
-These blocks can contain inplace configurations, especially ``re2c:flags:e``,
-``re2c:flags:w``, ``re2c:flags:x``, ``re2c:flags:u`` and ``re2c:flags:8``.
-That way it is possible to create the same scanner multiple times for
-different character types, different input mechanisms or different output mechanisms.
-The ``/*!use:re2c */`` blocks can also contain additional rules that will be appended
-to the set of rules in ``/*!rules:re2c */``.
-
-Encodings
----------
-
-``re2c`` supports the following encodings: ASCII (default), EBCDIC (``-e``),
-UCS-2 (``-w``), UTF-16 (``-x``), UTF-32 (``-u``) and UTF-8 (``-8``).
-See also inplace configuration ``re2c:flags``.
-
-The following concepts should be clarified when talking about encoding.
-*Code point* is an abstract number, which represents single encoding
-symbol. *Code unit* is the smallest unit of memory, which is used in the
-encoded text (it corresponds to one character in the input stream). One
-or more code units can be needed to represent a single code point,
-depending on the encoding. In *fixed-length* encoding, each code point
-is represented with equal number of code units. In *variable-length*
-encoding, different code points can be represented with different number
-of code units.
-
-* ASCII is a fixed-length encoding. Its code space includes 0x100
-  code points, from 0 to 0xFF. One code point is represented with exactly one
-  1-byte code unit, which has the same value as the code point. Size of
-  ``YYCTYPE`` must be 1 byte.
-
-* EBCDIC is a fixed-length encoding. Its code space includes 0x100
-  code points, from 0 to 0xFF. One code point is represented with exactly
-  one 1-byte code unit, which has the same value as the code point. Size
-  of ``YYCTYPE`` must be 1 byte.
-
-* UCS-2 is a fixed-length encoding. Its code space includes 0x10000
-  code points, from 0 to 0xFFFF. One code point is represented with
-  exactly one 2-byte code unit, which has the same value as the code
-  point. Size of ``YYCTYPE`` must be 2 bytes.
-
-* UTF-16 is a variable-length encoding. Its code space includes all
-  Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
-  code point is represented with one or two 2-byte code units. Size of
-  ``YYCTYPE`` must be 2 bytes.
-
-* UTF-32 is a fixed-length encoding. Its code space includes all
-  Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
-  code point is represented with exactly one 4-byte code unit. Size of
-  ``YYCTYPE`` must be 4 bytes.
-
-* UTF-8 is a variable-length encoding. Its code space includes all
-  Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
-  code point is represented with sequence of one, two, three or four
-  1-byte code units. Size of ``YYCTYPE`` must be 1 byte.
-
-In Unicode, values from range 0xD800 to 0xDFFF (surrogates) are not
-valid Unicode code points, any encoded sequence of code units, that
-would map to Unicode code points in the range 0xD800-0xDFFF, is
-ill-formed. The user can control how ``re2c`` treats such ill-formed
-sequences with ``--encoding-policy <policy>`` flag.
-
-For some encodings, there are code units, that never occur in valid
-encoded stream (e.g. 0xFF byte in UTF-8). If the generated scanner must
-check for invalid input, the only true way to do so is to use default
-rule ``*``. Note, that full range rule ``[^]`` won't catch invalid code units when variable-length encoding is used
-(``[^]`` means "all valid code points", while default rule ``*`` means "all possible code units").
-
-Generic interface
------------------
-
-``re2c`` usually operates on input using pointer-like primitives
-``YYCURSOR``, ``YYMARKER``, ``YYCTXMARKER`` and ``YYLIMIT``.
-
-Generic input API (enabled with ``--input custom`` switch) allows to
-customize input operations. In this mode, ``re2c`` will express all
-operations on input in terms of the following primitives:
-
-    +---------------------+-----------------------------------------------------+
-    | ``YYPEEK ()``       | get current input character                         |
-    +---------------------+-----------------------------------------------------+
-    | ``YYSKIP ()``       | advance to the next character                       |
-    +---------------------+-----------------------------------------------------+
-    | ``YYBACKUP ()``     | backup current input position                       |
-    +---------------------+-----------------------------------------------------+
-    | ``YYBACKUPCTX ()``  | backup current input position for trailing context  |
-    +---------------------+-----------------------------------------------------+
-    | ``YYRESTORE ()``    | restore current input position                      |
-    +---------------------+-----------------------------------------------------+
-    | ``YYRESTORECTX ()`` | restore current input position for trailing context |
-    +---------------------+-----------------------------------------------------+
-    | ``YYLESSTHAN (n)``  | check if less than ``n`` input characters are left  |
-    +---------------------+-----------------------------------------------------+
-
-This `article <http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-13-input_model.html>`_
-has more details, and you can find some usage examples
-`here <http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-15-input_model_custom.html>`_.
+* `Conditions     <conditions/conditions.html>`_
+* `State          <state/state.html>`_
+* `Reuse          <reuse/reuse.html>`_
+* `Encodings      <encodings/encodings.html>`_
+* `Generic API    <generic_api/generic_api.html>`_
+* `Skeleton       <skeleton/skeleton.html>`_
+* `Dot            <dot/dot.html>`_
  
diff --git a/src/manual/features/generic_api/generic_api.rst b/src/manual/features/generic_api/generic_api.rst

new file mode 100644 (file)

index 0000000..930ee83
--- /dev/null
+++ b/src/manual/features/generic_api/generic_api.rst
@@ -0,0 +1,32 @@
+Generic API
+-----------
+
+.. include:: ../home.rst
+
+``re2c`` usually operates on input using pointer-like primitives
+``YYCURSOR``, ``YYMARKER``, ``YYCTXMARKER`` and ``YYLIMIT``.
+
+Generic input API (enabled with ``--input custom`` switch) allows to
+customize input operations. In this mode, ``re2c`` will express all
+operations on input in terms of the following primitives:
+
+    +---------------------+-----------------------------------------------------+
+    | ``YYPEEK ()``       | get current input character                         |
+    +---------------------+-----------------------------------------------------+
+    | ``YYSKIP ()``       | advance to the next character                       |
+    +---------------------+-----------------------------------------------------+
+    | ``YYBACKUP ()``     | backup current input position                       |
+    +---------------------+-----------------------------------------------------+
+    | ``YYBACKUPCTX ()``  | backup current input position for trailing context  |
+    +---------------------+-----------------------------------------------------+
+    | ``YYRESTORE ()``    | restore current input position                      |
+    +---------------------+-----------------------------------------------------+
+    | ``YYRESTORECTX ()`` | restore current input position for trailing context |
+    +---------------------+-----------------------------------------------------+
+    | ``YYLESSTHAN (n)``  | check if less than ``n`` input characters are left  |
+    +---------------------+-----------------------------------------------------+
+
+This `article <http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-13-input_model.html>`_
+has more details, and you can find some usage examples
+`here <http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-15-input_model_custom.html>`_.
+
diff --git a/src/manual/features/home.rst b/src/manual/features/home.rst

new file mode 100644 (file)

index 0000000..ba039a4
--- /dev/null
+++ b/src/manual/features/home.rst
@@ -0,0 +1,4 @@
+.. |[manual_features_home]| replace:: [home]
+.. _[manual_features_home]: ../../../index.html
+.. header:: |[manual_features_home]|_ `[Manual] <../../manual.html>`_ `[Features] <../features.html>`_
+.. footer:: |[manual_features_home]|_ `[Manual] <../../manual.html>`_ `[Features] <../features.html>`_
diff --git a/src/manual/features/reuse/reuse.rst b/src/manual/features/reuse/reuse.rst

new file mode 100644 (file)

index 0000000..d9c1c56
--- /dev/null
+++ b/src/manual/features/reuse/reuse.rst
@@ -0,0 +1,16 @@
+Reuse
+-----
+
+.. include:: ../home.rst
+
+Reuse mode is controlled by ``-r --reusable`` option.
+Allows reuse of scanner definitions with ``/*!use:re2c */`` after ``/*!rules:re2c */``.
+In this mode no ``/*!re2c */`` block and exactly one ``/*!rules:re2c */`` must be present.
+The rules are being saved and used by every ``/*!use:re2c */`` block that follows.
+These blocks can contain inplace configurations, especially ``re2c:flags:e``,
+``re2c:flags:w``, ``re2c:flags:x``, ``re2c:flags:u`` and ``re2c:flags:8``.
+That way it is possible to create the same scanner multiple times for
+different character types, different input mechanisms or different output mechanisms.
+The ``/*!use:re2c */`` blocks can also contain additional rules that will be appended
+to the set of rules in ``/*!rules:re2c */``.
+
diff --git a/src/manual/features/skeleton/skeleton.rst b/src/manual/features/skeleton/skeleton.rst

new file mode 100644 (file)

index 0000000..103383a
--- /dev/null
+++ b/src/manual/features/skeleton/skeleton.rst
@@ -0,0 +1,7 @@
+Skeleton
+--------
+
+.. include:: ../home.rst
+
+
+
diff --git a/src/manual/features/state/state.rst b/src/manual/features/state/state.rst

new file mode 100644 (file)

index 0000000..b29ad2b
--- /dev/null
+++ b/src/manual/features/state/state.rst
@@ -0,0 +1,56 @@
+State
+-----
+
+.. include:: ../home.rst
+
+When the ``-f`` flag is specified, ``re2c`` generates a scanner that can
+store its current state, return to the caller, and later resume
+operations exactly where it left off.
+
+The default operation of ``re2c`` is a
+"pull" model, where the scanner asks for extra input whenever it needs it. However, this mode of operation assumes that the scanner is the "owner"
+the parsing loop, and that may not always be convenient.
+
+Typically, if there is a preprocessor ahead of the scanner in the
+stream, or for that matter any other procedural source of data, the
+scanner cannot "ask" for more data unless both scanner and source
+live in a separate threads.
+
+The ``-f`` flag is useful for just this situation: it lets users design
+scanners that work in a "push" model, i.e. where data is fed to the
+scanner chunk by chunk. When the scanner runs out of data to consume, it
+just stores its state, and return to the caller. When more input data is
+fed to the scanner, it resumes operations exactly where it left off.
+
+Changes needed compared to the "pull" model:
+
+* User has to supply macros ``YYSETSTATE ()`` and ``YYGETSTATE (state)``.
+
+* The ``-f`` option inhibits declaration of ``yych`` and ``yyaccept``. So the
+  user has to declare these. Also the user has to save and restore these.
+  In the example ``examples/push_model/push.re`` these are declared as
+  fields of the (C++) class of which the scanner is a method, so they do
+  not need to be saved/restored explicitly. For C they could e.g. be made
+  macros that select fields from a structure passed in as parameter.
+  Alternatively, they could be declared as local variables, saved with
+  ``YYFILL (n)`` when it decides to return and restored at entry to the
+  function. Also, it could be more efficient to save the state from
+  ``YYFILL (n)`` because ``YYSETSTATE (state)`` is called unconditionally.
+  ``YYFILL (n)`` however does not get ``state`` as parameter, so we would have
+  to store state in a local variable by ``YYSETSTATE (state)``.
+
+* Modify ``YYFILL (n)`` to return (from the function calling it) if more input is needed.
+
+* Modify caller to recognise if more input is needed and respond appropriately.
+
+* The generated code will contain a switch block that is used to
+  restores the last state by jumping behind the corrspoding ``YYFILL (n)``
+  call. This code is automatically generated in the epilog of the first ``/*!re2c */``
+  block. It is possible to trigger generation of the ``YYGETSTATE ()``
+  block earlier by placing a ``/*!getstate:re2c*/`` comment. This is especially useful when the scanner code should be
+  wrapped inside a loop.
+
+Please see ``examples/push_model/push.re`` for "push" model scanner. The
+generated code can be tweaked using inplace configurations ``state:abort``
+and ``state:nextlabel``.
+
diff --git a/src/manual/warnings/condition_order/wcondition_order.rst b/src/manual/warnings/condition_order/wcondition_order.rst

index 18898307527aa0a172733e8b965a48a8f0af144e..26c29fe26ef6068156e0285fec1168548d45b663 100644 (file)
--- a/src/manual/warnings/condition_order/wcondition_order.rst
+++ b/src/manual/warnings/condition_order/wcondition_order.rst
@@ -1,5 +1,5 @@
  [-Wcondition-order]
---------------------------
+-------------------
  
  .. include:: ../home.rst
  .. include:: ../../../contents.rst
author	Ulya Trofimovich <skvadrik@gmail.com>
	Thu, 12 Nov 2015 15:17:45 +0000 (15:17 +0000)
committer	Ulya Trofimovich <skvadrik@gmail.com>
	Thu, 12 Nov 2015 15:17:45 +0000 (15:17 +0000)
src/css/default.css		patch \| blob \| history
src/manual/features/conditions/conditions.rst	[new file with mode: 0644]	patch \| blob
src/manual/features/dot/dot.rst	[new file with mode: 0644]	patch \| blob
src/manual/features/dot/php_json.re	[new file with mode: 0644]	patch \| blob
src/manual/features/dot/php_json_dot.png	[new file with mode: 0644]	patch \| blob
src/manual/features/dot/php_json_neato.png	[new file with mode: 0644]	patch \| blob
src/manual/features/dot/utf8_any.png	[new file with mode: 0644]	patch \| blob
src/manual/features/dot/utf8_any.re	[new file with mode: 0644]	patch \| blob
src/manual/features/encodings/encodings.rst	[new file with mode: 0644]	patch \| blob
src/manual/features/features.rst		patch \| blob \| history
src/manual/features/generic_api/generic_api.rst	[new file with mode: 0644]	patch \| blob
src/manual/features/home.rst	[new file with mode: 0644]	patch \| blob
src/manual/features/reuse/reuse.rst	[new file with mode: 0644]	patch \| blob
src/manual/features/skeleton/skeleton.rst	[new file with mode: 0644]	patch \| blob
src/manual/features/state/state.rst	[new file with mode: 0644]	patch \| blob
src/manual/warnings/condition_order/wcondition_order.rst		patch \| blob \| history