From: Ulya Trofimovich Date: Thu, 12 Nov 2015 15:17:45 +0000 (+0000) Subject: Added '-D' flag description (.dot generation), restructured features. X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=701ce8fbaadf5bfc08b3af606f9e3ed5f0e871fc;p=re2c Added '-D' flag description (.dot generation), restructured features. --- diff --git a/src/css/default.css b/src/css/default.css index 2c85c05f..d91b96df 100644 --- a/src/css/default.css +++ b/src/css/default.css @@ -55,3 +55,9 @@ pre.code .literal.number, code .literal.number { color: #ff5500; font-weight: bo pre.code .name.builtin, code .name.builtin { color: #352B84 } pre.code .deleted, code .deleted { background-color: #DEB0A1} pre.code .inserted, code .inserted { background-color: #A3D289} + +img { + display: block; + border: 1px dotted #557799; + margin: auto; +} diff --git a/src/manual/features/conditions/conditions.rst b/src/manual/features/conditions/conditions.rst new file mode 100644 index 00000000..b12a2b10 --- /dev/null +++ b/src/manual/features/conditions/conditions.rst @@ -0,0 +1,36 @@ +Conditions +---------- + +.. include:: ../home.rst + +You can preceed regular expressions with a list of condition names when +using the ``-c`` switch. In this case ``re2c`` generates scanner blocks for +each conditon. Where each of the generated blocks has its own +precondition. The precondition is given by the interface define +``YYGETCONDITON()`` and must be of type ``YYCONDTYPE``. + +There are two special rule types. First, the rules of the condition ``<*>`` +are merged to all conditions (note that they have lower priority than +other rules of that condition). And second the empty condition list +allows to provide a code block that does not have a scanner part. +Meaning it does not allow any regular expression. The condition value +referring to this special block is always the one with the enumeration +value 0. This way the code of this special rule can be used to +initialize a scanner. It is in no way necessary to have these rules: but +sometimes it is helpful to have a dedicated uninitialized condition +state. + +Non empty rules allow to specify the new condition, which makes them +transition rules. Besides generating calls for the define +``YYSETCONDTITION`` no other special code is generated. + +There is another kind of special rules that allow to prepend code to any +code block of all rules of a certain set of conditions or to all code +blocks to all rules. This can be helpful when some operation is common +among rules. For instance this can be used to store the length of the +scanned string. These special setup rules start with an exclamation mark +followed by either a list of conditions ```` or a star +````. When ``re2c`` generates the code for a rule whose state does not have a +setup rule and a star'd setup rule is present, than that code will be +used as setup code. + diff --git a/src/manual/features/dot/dot.rst b/src/manual/features/dot/dot.rst new file mode 100644 index 00000000..2702c400 --- /dev/null +++ b/src/manual/features/dot/dot.rst @@ -0,0 +1,59 @@ +.dot +---- + +.. include:: ../home.rst + +With ``-D, --emit-dot`` option re2c does not generate C/C++ code. +Instead, it dumps the generated DFA in `DOT format `_. +One can convert this dump to an image of DFA using `graphviz `_ or another library. + +Say we want a picture of DFA that accepts any UTF-8 code point: + +.. include:: utf8_any.re + :code: cpp + :number-lines: + +Generate and render : + +.. code-block:: + + $ re2c -D8 -o utf8_any.dot utf8_any.re + $ dot -T png -o utf8_any.png utf8_any.dot + +Here is the picture: + +.. image:: utf8_any.png + :width: 70% + +Note that re2c performs additional transformations on the DFA: +inserts ``YYFILL`` `checkpoints <../../../examples/example_02.html>`_, +binds actions, applies basic code deduplication. +During the transforamtions it splits certain states and adds lambda transitions. +These transitions correspond to the unlabeled edges on the picture. + +A real-world example (JSON lexer, all non-re2c code stripped out): + +.. include:: php_json.re + :code: cpp + :number-lines: + +Generate .dot file: + +.. code-block:: + + $ re2c -Dc -o php_json.dot php_json.re + +Render with ```dot -G ratio=0.3 -T png -o php_json_dot.png php_json.dot```: + +.. image:: php_json_dot.png + :width: 80% + +Render with ```neato -E len=4 -T png -o php_json_neato.png php_json.dot```: + +.. image:: php_json_neato.png + :width: 50% + +The generated graph is sometimes very large and requires careful tuning of rendering paratemeters. + + + diff --git a/src/manual/features/dot/php_json.re b/src/manual/features/dot/php_json.re new file mode 100644 index 00000000..5e838a83 --- /dev/null +++ b/src/manual/features/dot/php_json.re @@ -0,0 +1,76 @@ +/*!re2c + re2c:indent:top = 1; + re2c:yyfill:enable = 0; + + DIGIT = [0-9] ; + DIGITNZ = [1-9] ; + UINT = "0" | ( DIGITNZ DIGIT* ) ; + INT = "-"? UINT ; + HEX = DIGIT | [a-fA-F] ; + HEXNZ = DIGITNZ | [a-fA-F] ; + HEX7 = [0-7] ; + HEXC = DIGIT | [a-cA-C] ; + FLOAT = INT "." DIGIT+ ; + EXP = ( INT | FLOAT ) [eE] [+-]? DIGIT+ ; + NL = "\r"? "\n" ; + WS = [ \t\r]+ ; + EOI = "\000"; + CTRL = [\x00-\x1F] ; + UTF8T = [\x80-\xBF] ; + UTF8_1 = [\x00-\x7F] ; + UTF8_2 = [\xC2-\xDF] UTF8T ; + UTF8_3A = "\xE0" [\xA0-\xBF] UTF8T ; + UTF8_3B = [\xE1-\xEC] UTF8T{2} ; + UTF8_3C = "\xED" [\x80-\x9F] UTF8T ; + UTF8_3D = [\xEE-\xEF] UTF8T{2} ; + UTF8_3 = UTF8_3A | UTF8_3B | UTF8_3C | UTF8_3D ; + UTF8_4A = "\xF0"[\x90-\xBF] UTF8T{2} ; + UTF8_4B = [\xF1-\xF3] UTF8T{3} ; + UTF8_4C = "\xF4" [\x80-\x8F] UTF8T{2} ; + UTF8_4 = UTF8_4A | UTF8_4B | UTF8_4C ; + UTF8 = UTF8_1 | UTF8_2 | UTF8_3 | UTF8_4 ; + ANY = [^] ; + ESCPREF = "\\" ; + ESCSYM = ( "\"" | "\\" | "/" | [bfnrt] ) ; + ESC = ESCPREF ESCSYM ; + UTFSYM = "u" ; + UTFPREF = ESCPREF UTFSYM ; + UCS2 = UTFPREF HEX{4} ; + UTF16_1 = UTFPREF "00" HEX7 HEX ; + UTF16_2 = UTFPREF "0" HEX7 HEX{2} ; + UTF16_3 = UTFPREF ( ( ( HEXC | [efEF] ) HEX ) | ( [dD] HEX7 ) ) HEX{2} ; + UTF16_4 = UTFPREF [dD] [89abAB] HEX{2} UTFPREF [dD] [c-fC-F] HEX{2} ; + + "{" {} + "}" {} + "[" {} + "]" {} + ":" {} + "," {} + "null" {} + "true" {} + "false" {} + INT {} + FLOAT|EXP {} + NL|WS {} + EOI {} + ["] {} + CTRL {} + UTF16_1 {} + UTF16_2 {} + UTF16_4 {} + UCS2 {} + ESC {} + ESCPREF {} + ["] {} + UTF8 {} + ANY {} + UTF16_1 {} + UTF16_2 {} + UTF16_4 {} + UCS2 {} + ESCPREF {} + ["] => JS {} + ANY {} + <*>ANY {} +*/ diff --git a/src/manual/features/dot/php_json_dot.png b/src/manual/features/dot/php_json_dot.png new file mode 100644 index 00000000..6374bcff Binary files /dev/null and b/src/manual/features/dot/php_json_dot.png differ diff --git a/src/manual/features/dot/php_json_neato.png b/src/manual/features/dot/php_json_neato.png new file mode 100644 index 00000000..b627a2e8 Binary files /dev/null and b/src/manual/features/dot/php_json_neato.png differ diff --git a/src/manual/features/dot/utf8_any.png b/src/manual/features/dot/utf8_any.png new file mode 100644 index 00000000..7e5a41b0 Binary files /dev/null and b/src/manual/features/dot/utf8_any.png differ diff --git a/src/manual/features/dot/utf8_any.re b/src/manual/features/dot/utf8_any.re new file mode 100644 index 00000000..96668a86 --- /dev/null +++ b/src/manual/features/dot/utf8_any.re @@ -0,0 +1,4 @@ +/*!re2c + * {} + [^] {} +*/ diff --git a/src/manual/features/encodings/encodings.rst b/src/manual/features/encodings/encodings.rst new file mode 100644 index 00000000..7e59e3d6 --- /dev/null +++ b/src/manual/features/encodings/encodings.rst @@ -0,0 +1,61 @@ +Encodings +--------- + +.. include:: ../home.rst + +``re2c`` supports the following encodings: ASCII (default), EBCDIC (``-e``), +UCS-2 (``-w``), UTF-16 (``-x``), UTF-32 (``-u``) and UTF-8 (``-8``). +See also inplace configuration ``re2c:flags``. + +The following concepts should be clarified when talking about encoding. +*Code point* is an abstract number, which represents single encoding +symbol. *Code unit* is the smallest unit of memory, which is used in the +encoded text (it corresponds to one character in the input stream). One +or more code units can be needed to represent a single code point, +depending on the encoding. In *fixed-length* encoding, each code point +is represented with equal number of code units. In *variable-length* +encoding, different code points can be represented with different number +of code units. + +* ASCII is a fixed-length encoding. Its code space includes 0x100 + code points, from 0 to 0xFF. One code point is represented with exactly one + 1-byte code unit, which has the same value as the code point. Size of + ``YYCTYPE`` must be 1 byte. + +* EBCDIC is a fixed-length encoding. Its code space includes 0x100 + code points, from 0 to 0xFF. One code point is represented with exactly + one 1-byte code unit, which has the same value as the code point. Size + of ``YYCTYPE`` must be 1 byte. + +* UCS-2 is a fixed-length encoding. Its code space includes 0x10000 + code points, from 0 to 0xFFFF. One code point is represented with + exactly one 2-byte code unit, which has the same value as the code + point. Size of ``YYCTYPE`` must be 2 bytes. + +* UTF-16 is a variable-length encoding. Its code space includes all + Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One + code point is represented with one or two 2-byte code units. Size of + ``YYCTYPE`` must be 2 bytes. + +* UTF-32 is a fixed-length encoding. Its code space includes all + Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One + code point is represented with exactly one 4-byte code unit. Size of + ``YYCTYPE`` must be 4 bytes. + +* UTF-8 is a variable-length encoding. Its code space includes all + Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One + code point is represented with sequence of one, two, three or four + 1-byte code units. Size of ``YYCTYPE`` must be 1 byte. + +In Unicode, values from range 0xD800 to 0xDFFF (surrogates) are not +valid Unicode code points, any encoded sequence of code units, that +would map to Unicode code points in the range 0xD800-0xDFFF, is +ill-formed. The user can control how ``re2c`` treats such ill-formed +sequences with ``--encoding-policy `` flag. + +For some encodings, there are code units, that never occur in valid +encoded stream (e.g. 0xFF byte in UTF-8). If the generated scanner must +check for invalid input, the only true way to do so is to use default +rule ``*``. Note, that full range rule ``[^]`` won't catch invalid code units when variable-length encoding is used +(``[^]`` means "all valid code points", while default rule ``*`` means "all possible code units"). + diff --git a/src/manual/features/features.rst b/src/manual/features/features.rst index baa38db0..2e682650 100644 --- a/src/manual/features/features.rst +++ b/src/manual/features/features.rst @@ -3,196 +3,14 @@ Features ======== .. include:: ../home.rst -.. include:: ../../contents.rst -Conditions ----------- +★ -You can preceed regular expressions with a list of condition names when -using the ``-c`` switch. In this case ``re2c`` generates scanner blocks for -each conditon. Where each of the generated blocks has its own -precondition. The precondition is given by the interface define -``YYGETCONDITON()`` and must be of type ``YYCONDTYPE``. - -There are two special rule types. First, the rules of the condition ``<*>`` -are merged to all conditions (note that they have lower priority than -other rules of that condition). And second the empty condition list -allows to provide a code block that does not have a scanner part. -Meaning it does not allow any regular expression. The condition value -referring to this special block is always the one with the enumeration -value 0. This way the code of this special rule can be used to -initialize a scanner. It is in no way necessary to have these rules: but -sometimes it is helpful to have a dedicated uninitialized condition -state. - -Non empty rules allow to specify the new condition, which makes them -transition rules. Besides generating calls for the define -``YYSETCONDTITION`` no other special code is generated. - -There is another kind of special rules that allow to prepend code to any -code block of all rules of a certain set of conditions or to all code -blocks to all rules. This can be helpful when some operation is common -among rules. For instance this can be used to store the length of the -scanned string. These special setup rules start with an exclamation mark -followed by either a list of conditions ```` or a star -````. When ``re2c`` generates the code for a rule whose state does not have a -setup rule and a star'd setup rule is present, than that code will be -used as setup code. - -State ------ - -When the ``-f`` flag is specified, ``re2c`` generates a scanner that can -store its current state, return to the caller, and later resume -operations exactly where it left off. - -The default operation of ``re2c`` is a -"pull" model, where the scanner asks for extra input whenever it needs it. However, this mode of operation assumes that the scanner is the "owner" -the parsing loop, and that may not always be convenient. - -Typically, if there is a preprocessor ahead of the scanner in the -stream, or for that matter any other procedural source of data, the -scanner cannot "ask" for more data unless both scanner and source -live in a separate threads. - -The ``-f`` flag is useful for just this situation: it lets users design -scanners that work in a "push" model, i.e. where data is fed to the -scanner chunk by chunk. When the scanner runs out of data to consume, it -just stores its state, and return to the caller. When more input data is -fed to the scanner, it resumes operations exactly where it left off. - -Changes needed compared to the "pull" model: - -* User has to supply macros ``YYSETSTATE ()`` and ``YYGETSTATE (state)``. - -* The ``-f`` option inhibits declaration of ``yych`` and ``yyaccept``. So the - user has to declare these. Also the user has to save and restore these. - In the example ``examples/push_model/push.re`` these are declared as - fields of the (C++) class of which the scanner is a method, so they do - not need to be saved/restored explicitly. For C they could e.g. be made - macros that select fields from a structure passed in as parameter. - Alternatively, they could be declared as local variables, saved with - ``YYFILL (n)`` when it decides to return and restored at entry to the - function. Also, it could be more efficient to save the state from - ``YYFILL (n)`` because ``YYSETSTATE (state)`` is called unconditionally. - ``YYFILL (n)`` however does not get ``state`` as parameter, so we would have - to store state in a local variable by ``YYSETSTATE (state)``. - -* Modify ``YYFILL (n)`` to return (from the function calling it) if more input is needed. - -* Modify caller to recognise if more input is needed and respond appropriately. - -* The generated code will contain a switch block that is used to - restores the last state by jumping behind the corrspoding ``YYFILL (n)`` - call. This code is automatically generated in the epilog of the first ``/*!re2c */`` - block. It is possible to trigger generation of the ``YYGETSTATE ()`` - block earlier by placing a ``/*!getstate:re2c*/`` comment. This is especially useful when the scanner code should be - wrapped inside a loop. - -Please see ``examples/push_model/push.re`` for "push" model scanner. The -generated code can be tweaked using inplace configurations ``state:abort`` -and ``state:nextlabel``. - -Reuse ------ - -Reuse mode is controlled by ``-r --reusable`` option. -Allows reuse of scanner definitions with ``/*!use:re2c */`` after ``/*!rules:re2c */``. -In this mode no ``/*!re2c */`` block and exactly one ``/*!rules:re2c */`` must be present. -The rules are being saved and used by every ``/*!use:re2c */`` block that follows. -These blocks can contain inplace configurations, especially ``re2c:flags:e``, -``re2c:flags:w``, ``re2c:flags:x``, ``re2c:flags:u`` and ``re2c:flags:8``. -That way it is possible to create the same scanner multiple times for -different character types, different input mechanisms or different output mechanisms. -The ``/*!use:re2c */`` blocks can also contain additional rules that will be appended -to the set of rules in ``/*!rules:re2c */``. - -Encodings ---------- - -``re2c`` supports the following encodings: ASCII (default), EBCDIC (``-e``), -UCS-2 (``-w``), UTF-16 (``-x``), UTF-32 (``-u``) and UTF-8 (``-8``). -See also inplace configuration ``re2c:flags``. - -The following concepts should be clarified when talking about encoding. -*Code point* is an abstract number, which represents single encoding -symbol. *Code unit* is the smallest unit of memory, which is used in the -encoded text (it corresponds to one character in the input stream). One -or more code units can be needed to represent a single code point, -depending on the encoding. In *fixed-length* encoding, each code point -is represented with equal number of code units. In *variable-length* -encoding, different code points can be represented with different number -of code units. - -* ASCII is a fixed-length encoding. Its code space includes 0x100 - code points, from 0 to 0xFF. One code point is represented with exactly one - 1-byte code unit, which has the same value as the code point. Size of - ``YYCTYPE`` must be 1 byte. - -* EBCDIC is a fixed-length encoding. Its code space includes 0x100 - code points, from 0 to 0xFF. One code point is represented with exactly - one 1-byte code unit, which has the same value as the code point. Size - of ``YYCTYPE`` must be 1 byte. - -* UCS-2 is a fixed-length encoding. Its code space includes 0x10000 - code points, from 0 to 0xFFFF. One code point is represented with - exactly one 2-byte code unit, which has the same value as the code - point. Size of ``YYCTYPE`` must be 2 bytes. - -* UTF-16 is a variable-length encoding. Its code space includes all - Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One - code point is represented with one or two 2-byte code units. Size of - ``YYCTYPE`` must be 2 bytes. - -* UTF-32 is a fixed-length encoding. Its code space includes all - Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One - code point is represented with exactly one 4-byte code unit. Size of - ``YYCTYPE`` must be 4 bytes. - -* UTF-8 is a variable-length encoding. Its code space includes all - Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One - code point is represented with sequence of one, two, three or four - 1-byte code units. Size of ``YYCTYPE`` must be 1 byte. - -In Unicode, values from range 0xD800 to 0xDFFF (surrogates) are not -valid Unicode code points, any encoded sequence of code units, that -would map to Unicode code points in the range 0xD800-0xDFFF, is -ill-formed. The user can control how ``re2c`` treats such ill-formed -sequences with ``--encoding-policy `` flag. - -For some encodings, there are code units, that never occur in valid -encoded stream (e.g. 0xFF byte in UTF-8). If the generated scanner must -check for invalid input, the only true way to do so is to use default -rule ``*``. Note, that full range rule ``[^]`` won't catch invalid code units when variable-length encoding is used -(``[^]`` means "all valid code points", while default rule ``*`` means "all possible code units"). - -Generic interface ------------------ - -``re2c`` usually operates on input using pointer-like primitives -``YYCURSOR``, ``YYMARKER``, ``YYCTXMARKER`` and ``YYLIMIT``. - -Generic input API (enabled with ``--input custom`` switch) allows to -customize input operations. In this mode, ``re2c`` will express all -operations on input in terms of the following primitives: - - +---------------------+-----------------------------------------------------+ - | ``YYPEEK ()`` | get current input character | - +---------------------+-----------------------------------------------------+ - | ``YYSKIP ()`` | advance to the next character | - +---------------------+-----------------------------------------------------+ - | ``YYBACKUP ()`` | backup current input position | - +---------------------+-----------------------------------------------------+ - | ``YYBACKUPCTX ()`` | backup current input position for trailing context | - +---------------------+-----------------------------------------------------+ - | ``YYRESTORE ()`` | restore current input position | - +---------------------+-----------------------------------------------------+ - | ``YYRESTORECTX ()`` | restore current input position for trailing context | - +---------------------+-----------------------------------------------------+ - | ``YYLESSTHAN (n)`` | check if less than ``n`` input characters are left | - +---------------------+-----------------------------------------------------+ - -This `article `_ -has more details, and you can find some usage examples -`here `_. +* `Conditions `_ +* `State `_ +* `Reuse `_ +* `Encodings `_ +* `Generic API `_ +* `Skeleton `_ +* `Dot `_ diff --git a/src/manual/features/generic_api/generic_api.rst b/src/manual/features/generic_api/generic_api.rst new file mode 100644 index 00000000..930ee835 --- /dev/null +++ b/src/manual/features/generic_api/generic_api.rst @@ -0,0 +1,32 @@ +Generic API +----------- + +.. include:: ../home.rst + +``re2c`` usually operates on input using pointer-like primitives +``YYCURSOR``, ``YYMARKER``, ``YYCTXMARKER`` and ``YYLIMIT``. + +Generic input API (enabled with ``--input custom`` switch) allows to +customize input operations. In this mode, ``re2c`` will express all +operations on input in terms of the following primitives: + + +---------------------+-----------------------------------------------------+ + | ``YYPEEK ()`` | get current input character | + +---------------------+-----------------------------------------------------+ + | ``YYSKIP ()`` | advance to the next character | + +---------------------+-----------------------------------------------------+ + | ``YYBACKUP ()`` | backup current input position | + +---------------------+-----------------------------------------------------+ + | ``YYBACKUPCTX ()`` | backup current input position for trailing context | + +---------------------+-----------------------------------------------------+ + | ``YYRESTORE ()`` | restore current input position | + +---------------------+-----------------------------------------------------+ + | ``YYRESTORECTX ()`` | restore current input position for trailing context | + +---------------------+-----------------------------------------------------+ + | ``YYLESSTHAN (n)`` | check if less than ``n`` input characters are left | + +---------------------+-----------------------------------------------------+ + +This `article `_ +has more details, and you can find some usage examples +`here `_. + diff --git a/src/manual/features/home.rst b/src/manual/features/home.rst new file mode 100644 index 00000000..ba039a4b --- /dev/null +++ b/src/manual/features/home.rst @@ -0,0 +1,4 @@ +.. |[manual_features_home]| replace:: [home] +.. _[manual_features_home]: ../../../index.html +.. header:: |[manual_features_home]|_ `[Manual] <../../manual.html>`_ `[Features] <../features.html>`_ +.. footer:: |[manual_features_home]|_ `[Manual] <../../manual.html>`_ `[Features] <../features.html>`_ diff --git a/src/manual/features/reuse/reuse.rst b/src/manual/features/reuse/reuse.rst new file mode 100644 index 00000000..d9c1c561 --- /dev/null +++ b/src/manual/features/reuse/reuse.rst @@ -0,0 +1,16 @@ +Reuse +----- + +.. include:: ../home.rst + +Reuse mode is controlled by ``-r --reusable`` option. +Allows reuse of scanner definitions with ``/*!use:re2c */`` after ``/*!rules:re2c */``. +In this mode no ``/*!re2c */`` block and exactly one ``/*!rules:re2c */`` must be present. +The rules are being saved and used by every ``/*!use:re2c */`` block that follows. +These blocks can contain inplace configurations, especially ``re2c:flags:e``, +``re2c:flags:w``, ``re2c:flags:x``, ``re2c:flags:u`` and ``re2c:flags:8``. +That way it is possible to create the same scanner multiple times for +different character types, different input mechanisms or different output mechanisms. +The ``/*!use:re2c */`` blocks can also contain additional rules that will be appended +to the set of rules in ``/*!rules:re2c */``. + diff --git a/src/manual/features/skeleton/skeleton.rst b/src/manual/features/skeleton/skeleton.rst new file mode 100644 index 00000000..103383a4 --- /dev/null +++ b/src/manual/features/skeleton/skeleton.rst @@ -0,0 +1,7 @@ +Skeleton +-------- + +.. include:: ../home.rst + + + diff --git a/src/manual/features/state/state.rst b/src/manual/features/state/state.rst new file mode 100644 index 00000000..b29ad2bc --- /dev/null +++ b/src/manual/features/state/state.rst @@ -0,0 +1,56 @@ +State +----- + +.. include:: ../home.rst + +When the ``-f`` flag is specified, ``re2c`` generates a scanner that can +store its current state, return to the caller, and later resume +operations exactly where it left off. + +The default operation of ``re2c`` is a +"pull" model, where the scanner asks for extra input whenever it needs it. However, this mode of operation assumes that the scanner is the "owner" +the parsing loop, and that may not always be convenient. + +Typically, if there is a preprocessor ahead of the scanner in the +stream, or for that matter any other procedural source of data, the +scanner cannot "ask" for more data unless both scanner and source +live in a separate threads. + +The ``-f`` flag is useful for just this situation: it lets users design +scanners that work in a "push" model, i.e. where data is fed to the +scanner chunk by chunk. When the scanner runs out of data to consume, it +just stores its state, and return to the caller. When more input data is +fed to the scanner, it resumes operations exactly where it left off. + +Changes needed compared to the "pull" model: + +* User has to supply macros ``YYSETSTATE ()`` and ``YYGETSTATE (state)``. + +* The ``-f`` option inhibits declaration of ``yych`` and ``yyaccept``. So the + user has to declare these. Also the user has to save and restore these. + In the example ``examples/push_model/push.re`` these are declared as + fields of the (C++) class of which the scanner is a method, so they do + not need to be saved/restored explicitly. For C they could e.g. be made + macros that select fields from a structure passed in as parameter. + Alternatively, they could be declared as local variables, saved with + ``YYFILL (n)`` when it decides to return and restored at entry to the + function. Also, it could be more efficient to save the state from + ``YYFILL (n)`` because ``YYSETSTATE (state)`` is called unconditionally. + ``YYFILL (n)`` however does not get ``state`` as parameter, so we would have + to store state in a local variable by ``YYSETSTATE (state)``. + +* Modify ``YYFILL (n)`` to return (from the function calling it) if more input is needed. + +* Modify caller to recognise if more input is needed and respond appropriately. + +* The generated code will contain a switch block that is used to + restores the last state by jumping behind the corrspoding ``YYFILL (n)`` + call. This code is automatically generated in the epilog of the first ``/*!re2c */`` + block. It is possible to trigger generation of the ``YYGETSTATE ()`` + block earlier by placing a ``/*!getstate:re2c*/`` comment. This is especially useful when the scanner code should be + wrapped inside a loop. + +Please see ``examples/push_model/push.re`` for "push" model scanner. The +generated code can be tweaked using inplace configurations ``state:abort`` +and ``state:nextlabel``. + diff --git a/src/manual/warnings/condition_order/wcondition_order.rst b/src/manual/warnings/condition_order/wcondition_order.rst index 18898307..26c29fe2 100644 --- a/src/manual/warnings/condition_order/wcondition_order.rst +++ b/src/manual/warnings/condition_order/wcondition_order.rst @@ -1,5 +1,5 @@ [-Wcondition-order] --------------------------- +------------------- .. include:: ../home.rst .. include:: ../../../contents.rst