--- /dev/null
+=====
+About
+=====
+
+.. include:: ../home.rst
+.. include:: ../contents.rst
+
+Authors
+-------
+
+Originally written by Peter Bumbulis (peter@csg.uwaterloo.ca)
+and described in research article
+`"RE2C: a more versatile scanner generator" <1994_bumbulis_cowan_re2c_a_more_versatile_scanner_generator.pdf>`_
+by Peter Bumbulis, Donald D. Cowan, 1994,
+ACM Letters on Programming Languages and Systems (LOPLAS).
+
+Since then many people have contributed to re2c:
+
+* Brian Young bayoung@acm.org
+* Dan Nuffer nuffer@users.sourceforge.net
+* Marcus Boerger helly@users.sourceforge.net
+* Hartmut Kaiser hkaiser@users.sourceforge.net
+* Emmanuel Mogenet mgix@mgix.com
+* Ulya Trofimovich skvadrik@gmail.com
+
+Let me know if I missed someone!
+
+License
+-------
+
+re2c is distributed with no warranty whatever.
+The code is certain to contain errors.
+Neither the author nor any contributor takes responsibility for any consequences of its use.
+re2c is in the public domain. The data structures and algorithms used
+in re2c are all either taken from documents available to the general
+public or are inventions of the author. Programs generated by re2c may
+be distributed freely. re2c itself may be distributed freely, in source
+or binary, unchanged or modified. Distributors may charge whatever fees
+they can obtain for re2c. If you do make use of re2c, or incorporate it into a larger project an
+acknowledgement somewhere (documentation, research report, etc.) would
+be appreciated.
+
+Version
+-------
+
+This page describes ``re2c`` version 0.14.1.dev, package date 15 Oct 2015.
+
--- /dev/null
+.. contents:: ★
+ :backlinks: none
+ :depth: 2
color: #557799;
}
h1 {
- border-bottom: groove black;
+ border-bottom: groove gray;
}
h1.title {
font-size: 2.5em;
+++ /dev/null
-
-All examples are written in C++-98.
-`Do let me know <skvadik@gmail.com>`_ if you notice any obvious lies and errors.
-You can find more examples in subdirectory ``examples`` of the ``re2c`` distribution.
+++ /dev/null
-.. header:: `[home] <index.html>`_
-.. footer:: `[home] <index.html>`_
-.. contents:: ★
- :backlinks: none
- :depth: 2
-
-.. include:: example_intro.rst
-.. include:: example_01.rst
-.. include:: example_02.rst
-.. include:: example_03.rst
-.. include:: example_04.rst
-.. include:: example_05.rst
-.. include:: example_06.rst
-.. include:: example_07.rst
-
Recognizing integers: the sentinel method
-----------------------------------------
+.. include:: home.rst
+
This example is very simple, yet practical.
We assume that the input is small (fits in one continuous piece of memory).
We also assume that some characters never occur in well-formed input (but may occur in ill-formed input).
binary, octal, decimal and hexadecimal integer literals.
The numbers are not *parsed* (their numeric value is not retrieved), they are merely *recognized*.
-`[01_recognizing_integers.re] <examples/01_recognizing_integers.re>`_
+`[01_recognizing_integers.re] <01_recognizing_integers.re>`_
-.. include:: examples/01_recognizing_integers.re
+.. include:: 01_recognizing_integers.re
:code: cpp
:number-lines:
Recognizing strings: the need for YYMAXFILL
-------------------------------------------
+.. include:: home.rst
+
This example is about recognizing strings.
Strings (in generic sense) are different from other kinds of lexemes: they can contain *arbitrary* characters.
It makes them a way more difficult to lex: unlike `Recognizing integers: the sentinel method <example_01.html>`_ example,
The length of padding depends on the maximal argument to ``YYFILL``
(this value is called ``YYMAXFILL`` and can be generated using ``/*!max:re2c*/`` directive).
-`[02_recognizing_strings.re] <examples/02_recognizing_strings.re>`_
+`[02_recognizing_strings.re] <02_recognizing_strings.re>`_
-.. include:: examples/02_recognizing_strings.re
+.. include:: 02_recognizing_strings.re
:code: cpp
:number-lines:
Arbitrary large input and YYFILL
--------------------------------
+.. include:: home.rst
+
In this example we suppose that input cannot be mapped in memory at once:
either it's too large or its size cannot be determined in advance.
The usual thing to do in such case is to allocate a buffer and lex input in chunks that fit into buffer.
Our example program reads ``stdin`` in chunks of 16 bytes (in real word buffer size is usually ~4Kb)
and tries to lex numbers separated by newlines.
-`[03_arbitrary_large_input.re] <examples/03_arbitrary_large_input.re>`_
+`[03_arbitrary_large_input.re] <03_arbitrary_large_input.re>`_
-.. include:: examples/03_arbitrary_large_input.re
+.. include:: 03_arbitrary_large_input.re
:code: cpp
:number-lines:
Parsing integers (multiple re2c blocks)
---------------------------------------
+.. include:: home.rst
+
This example is based on `Recognizing integers: the sentinel method <example_01.html>`_ example,
only now integer literals are parsed rather than simply recognized.
Parsing integers is simple: one can easily do it by hand.
However, re2c-generated code *does* look like a simple handwritten parser:
a couple of dereferences and conditional jumps. No overhead. ``:)``
-`[04_parsing_integers_blocks.re] <examples/04_parsing_integers_blocks.re>`_
+`[04_parsing_integers_blocks.re] <04_parsing_integers_blocks.re>`_
-.. include:: examples/04_parsing_integers_blocks.re
+.. include:: 04_parsing_integers_blocks.re
:code: cpp
:number-lines:
Parsing integers (conditions)
-----------------------------
+.. include:: home.rst
+
This example does exactly the same as `Parsing integers (multiple re2c blocks) <example_04.html>`_ example,
but in a slightly different manner: it uses re2c conditions instead of blocks.
Conditions allow to encode multiple interconnected lexers within a single re2c block.
-`[05_parsing_integers_conditions.re] <examples/05_parsing_integers_conditions.re>`_
+`[05_parsing_integers_conditions.re] <05_parsing_integers_conditions.re>`_
-.. include:: examples/05_parsing_integers_conditions.re
+.. include:: 05_parsing_integers_conditions.re
:code: cpp
:number-lines:
Braille patterns (encodings)
----------------------------
+.. include:: home.rst
+
This example is about encoding support in re2c.
It's a partial decoder from Grade-1 (uncontracted) Unicode English Braille to plain English.
The input may be encoded in UTF-8, UTF-16, UTF-32 or UCS-2:
So. The hardest part is to get some input.
Here is a message out of the void:
-.. include:: examples/06_braille.utf8.txt
+.. include:: 06_braille.utf8.txt
-It appears to be UTF-8 encoded `[06_braille.utf8.txt] <examples/06_braille.utf8.txt.html>`_.
+It appears to be UTF-8 encoded `[06_braille.utf8.txt] <06_braille.utf8.txt.html>`_.
Convert it into UTF-16, UTF-32 or UCS-2:
.. code-block:: bash
Grade-2 Braille allows contractions; they obey complex rules (like those of a natural language)
and are much harder to implement.
-`[06_braille.re] <examples/06_braille.re>`_
+`[06_braille.re] <06_braille.re>`_
-.. include:: examples/06_braille.re
+.. include:: 06_braille.re
:code: cpp
:number-lines:
C++98 lexer
-----------
+.. include:: home.rst
+
This is an example of a big real-world re2c program: C++98 lexer.
It confirms to the C++98 standard (except for a couple of hacks to simulate preprocessor).
All nontrivial lexemes (integers, floating-point constants, strings and character literals)
Some additional checks described in standard (e.g. overflows in integer literals) are also done.
In fact, C++ is an easy language to lex: unlike many other languages, lexer can proceed without feedback from parser.
-`[07_c++98.re] <examples/07_c++98.re>`_
+`[07_c++98.re] <07_c++98.re>`_
-.. include:: examples/07_c++98.re
+.. include:: 07_c++98.re
:code: cpp
:number-lines:
--- /dev/null
+========
+Examples
+========
+
+.. include:: ../home.rst
+
+★
+
+* `Recognizing integers: the sentinel method <example_01.html>`_
+* `Recognizing strings: the need for YYMAXFILL <example_02.html>`_
+* `Arbitrary large input and YYFILL <example_03.html>`_
+* `Parsing integers (multiple re2c blocks) <example_04.html>`_
+* `Parsing integers (conditions) <example_05.html>`_
+* `Braille patterns (encodings) <example_06.html>`_
+* `C++98 lexer <example_07.html>`_
+
+All examples are written in C++-98.
+`Do let me know <skvadik@gmail.com>`_ if you notice any obvious lies and errors.
+You can find more examples in subdirectory ``examples`` of the ``re2c`` distribution.
+
--- /dev/null
+.. |[examples_home]| replace:: [home]
+.. _[examples_home]: ../index.html
+.. header:: |[examples_home]|_ `[Examples] <examples.html>`_
+.. footer:: |[examples_home]|_ `[Examples] <examples.html>`_
--- /dev/null
+.. header:: `[home] <../index.html>`_
+.. footer:: `[home] <../index.html>`_
-
====
re2c
====
--------------------------------------------------------------------------------
-★ `Install <install.html>`_
+★ `About <about/about.html>`_
+
+★ `Install <install/install.html>`_
-★ `Manual <manual.html>`_
+★ `Manual <manual/manual.html>`_
-★ `Examples <examples.html>`_
+★ `Examples <examples/examples.html>`_
-★ `News <news.html>`_
+★ `News <news/news.html>`_
--------------------------------------------------------------------------------
-.. header:: `[home] <index.html>`_
-.. footer:: `[home] <index.html>`_
-.. contents:: ★
- :backlinks: none
- :depth: 2
+=======
+Install
+=======
+
+.. include:: ../home.rst
+.. include:: ../contents.rst
Download
========
--- /dev/null
+========
+Features
+========
+
+.. include:: ../home.rst
+.. include:: ../../contents.rst
+
+Conditions
+----------
+
+You can preceed regular expressions with a list of condition names when
+using the ``-c`` switch. In this case ``re2c`` generates scanner blocks for
+each conditon. Where each of the generated blocks has its own
+precondition. The precondition is given by the interface define
+``YYGETCONDITON()`` and must be of type ``YYCONDTYPE``.
+
+There are two special rule types. First, the rules of the condition ``<*>``
+are merged to all conditions (note that they have lower priority than
+other rules of that condition). And second the empty condition list
+allows to provide a code block that does not have a scanner part.
+Meaning it does not allow any regular expression. The condition value
+referring to this special block is always the one with the enumeration
+value 0. This way the code of this special rule can be used to
+initialize a scanner. It is in no way necessary to have these rules: but
+sometimes it is helpful to have a dedicated uninitialized condition
+state.
+
+Non empty rules allow to specify the new condition, which makes them
+transition rules. Besides generating calls for the define
+``YYSETCONDTITION`` no other special code is generated.
+
+There is another kind of special rules that allow to prepend code to any
+code block of all rules of a certain set of conditions or to all code
+blocks to all rules. This can be helpful when some operation is common
+among rules. For instance this can be used to store the length of the
+scanned string. These special setup rules start with an exclamation mark
+followed by either a list of conditions ``<! condition, ... >`` or a star
+``<!*>``. When ``re2c`` generates the code for a rule whose state does not have a
+setup rule and a star'd setup rule is present, than that code will be
+used as setup code.
+
+State
+-----
+
+When the ``-f`` flag is specified, ``re2c`` generates a scanner that can
+store its current state, return to the caller, and later resume
+operations exactly where it left off.
+
+The default operation of ``re2c`` is a
+"pull" model, where the scanner asks for extra input whenever it needs it. However, this mode of operation assumes that the scanner is the "owner"
+the parsing loop, and that may not always be convenient.
+
+Typically, if there is a preprocessor ahead of the scanner in the
+stream, or for that matter any other procedural source of data, the
+scanner cannot "ask" for more data unless both scanner and source
+live in a separate threads.
+
+The ``-f`` flag is useful for just this situation: it lets users design
+scanners that work in a "push" model, i.e. where data is fed to the
+scanner chunk by chunk. When the scanner runs out of data to consume, it
+just stores its state, and return to the caller. When more input data is
+fed to the scanner, it resumes operations exactly where it left off.
+
+Changes needed compared to the "pull" model:
+
+* User has to supply macros ``YYSETSTATE ()`` and ``YYGETSTATE (state)``.
+
+* The ``-f`` option inhibits declaration of ``yych`` and ``yyaccept``. So the
+ user has to declare these. Also the user has to save and restore these.
+ In the example ``examples/push_model/push.re`` these are declared as
+ fields of the (C++) class of which the scanner is a method, so they do
+ not need to be saved/restored explicitly. For C they could e.g. be made
+ macros that select fields from a structure passed in as parameter.
+ Alternatively, they could be declared as local variables, saved with
+ ``YYFILL (n)`` when it decides to return and restored at entry to the
+ function. Also, it could be more efficient to save the state from
+ ``YYFILL (n)`` because ``YYSETSTATE (state)`` is called unconditionally.
+ ``YYFILL (n)`` however does not get ``state`` as parameter, so we would have
+ to store state in a local variable by ``YYSETSTATE (state)``.
+
+* Modify ``YYFILL (n)`` to return (from the function calling it) if more input is needed.
+
+* Modify caller to recognise if more input is needed and respond appropriately.
+
+* The generated code will contain a switch block that is used to
+ restores the last state by jumping behind the corrspoding ``YYFILL (n)``
+ call. This code is automatically generated in the epilog of the first ``/*!re2c */``
+ block. It is possible to trigger generation of the ``YYGETSTATE ()``
+ block earlier by placing a ``/*!getstate:re2c*/`` comment. This is especially useful when the scanner code should be
+ wrapped inside a loop.
+
+Please see ``examples/push_model/push.re`` for "push" model scanner. The
+generated code can be tweaked using inplace configurations ``state:abort``
+and ``state:nextlabel``.
+
+Reuse
+-----
+
+Reuse mode is controlled by ``-r --reusable`` option.
+Allows reuse of scanner definitions with ``/*!use:re2c */`` after ``/*!rules:re2c */``.
+In this mode no ``/*!re2c */`` block and exactly one ``/*!rules:re2c */`` must be present.
+The rules are being saved and used by every ``/*!use:re2c */`` block that follows.
+These blocks can contain inplace configurations, especially ``re2c:flags:e``,
+``re2c:flags:w``, ``re2c:flags:x``, ``re2c:flags:u`` and ``re2c:flags:8``.
+That way it is possible to create the same scanner multiple times for
+different character types, different input mechanisms or different output mechanisms.
+The ``/*!use:re2c */`` blocks can also contain additional rules that will be appended
+to the set of rules in ``/*!rules:re2c */``.
+
+Encodings
+---------
+
+``re2c`` supports the following encodings: ASCII (default), EBCDIC (``-e``),
+UCS-2 (``-w``), UTF-16 (``-x``), UTF-32 (``-u``) and UTF-8 (``-8``).
+See also inplace configuration ``re2c:flags``.
+
+The following concepts should be clarified when talking about encoding.
+*Code point* is an abstract number, which represents single encoding
+symbol. *Code unit* is the smallest unit of memory, which is used in the
+encoded text (it corresponds to one character in the input stream). One
+or more code units can be needed to represent a single code point,
+depending on the encoding. In *fixed-length* encoding, each code point
+is represented with equal number of code units. In *variable-length*
+encoding, different code points can be represented with different number
+of code units.
+
+* ASCII is a fixed-length encoding. Its code space includes 0x100
+ code points, from 0 to 0xFF. One code point is represented with exactly one
+ 1-byte code unit, which has the same value as the code point. Size of
+ ``YYCTYPE`` must be 1 byte.
+
+* EBCDIC is a fixed-length encoding. Its code space includes 0x100
+ code points, from 0 to 0xFF. One code point is represented with exactly
+ one 1-byte code unit, which has the same value as the code point. Size
+ of ``YYCTYPE`` must be 1 byte.
+
+* UCS-2 is a fixed-length encoding. Its code space includes 0x10000
+ code points, from 0 to 0xFFFF. One code point is represented with
+ exactly one 2-byte code unit, which has the same value as the code
+ point. Size of ``YYCTYPE`` must be 2 bytes.
+
+* UTF-16 is a variable-length encoding. Its code space includes all
+ Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
+ code point is represented with one or two 2-byte code units. Size of
+ ``YYCTYPE`` must be 2 bytes.
+
+* UTF-32 is a fixed-length encoding. Its code space includes all
+ Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
+ code point is represented with exactly one 4-byte code unit. Size of
+ ``YYCTYPE`` must be 4 bytes.
+
+* UTF-8 is a variable-length encoding. Its code space includes all
+ Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
+ code point is represented with sequence of one, two, three or four
+ 1-byte code units. Size of ``YYCTYPE`` must be 1 byte.
+
+In Unicode, values from range 0xD800 to 0xDFFF (surrogates) are not
+valid Unicode code points, any encoded sequence of code units, that
+would map to Unicode code points in the range 0xD800-0xDFFF, is
+ill-formed. The user can control how ``re2c`` treats such ill-formed
+sequences with ``--encoding-policy <policy>`` flag (see `Options`_
+for full explanation).
+
+For some encodings, there are code units, that never occur in valid
+encoded stream (e.g. 0xFF byte in UTF-8). If the generated scanner must
+check for invalid input, the only true way to do so is to use default
+rule ``*``. Note, that full range rule ``[^]`` won't catch invalid code units when variable-length encoding is used
+(``[^]`` means "all valid code points", while default rule ``*`` means "all possible code units").
+
+Generic interface
+-----------------
+
+``re2c`` usually operates on input using pointer-like primitives
+``YYCURSOR``, ``YYMARKER``, ``YYCTXMARKER`` and ``YYLIMIT``.
+
+Generic input API (enabled with ``--input custom`` switch) allows to
+customize input operations. In this mode, ``re2c`` will express all
+operations on input in terms of the following primitives:
+
+ +---------------------+-----------------------------------------------------+
+ | ``YYPEEK ()`` | get current input character |
+ +---------------------+-----------------------------------------------------+
+ | ``YYSKIP ()`` | advance to the next character |
+ +---------------------+-----------------------------------------------------+
+ | ``YYBACKUP ()`` | backup current input position |
+ +---------------------+-----------------------------------------------------+
+ | ``YYBACKUPCTX ()`` | backup current input position for trailing context |
+ +---------------------+-----------------------------------------------------+
+ | ``YYRESTORE ()`` | restore current input position |
+ +---------------------+-----------------------------------------------------+
+ | ``YYRESTORECTX ()`` | restore current input position for trailing context |
+ +---------------------+-----------------------------------------------------+
+ | ``YYLESSTHAN (n)`` | check if less than ``n`` input characters are left |
+ +---------------------+-----------------------------------------------------+
+
+This `article <http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-13-input_model.html>`_
+has more details, and you can find some usage examples
+`here <http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-15-input_model_custom.html>`_.
+
--- /dev/null
+.. |[manual_home]| replace:: [home]
+.. _[manual_home]: ../../index.html
+.. header:: |[manual_home]|_ `[Manual] <../manual.html>`_
+.. footer:: |[manual_home]|_ `[Manual] <../manual.html>`_
--- /dev/null
+======
+Manual
+======
+
+.. include:: ../home.rst
+
+★
+
+* `Options <options/options_list.html>`_
+* `Warnings <warnings/warnings.html>`_
+* `Syntax <syntax/syntax.html>`_
+* `Features <features/features.html>`_
+
-.. header:: `[home] <index.html>`_
-.. footer:: `[home] <index.html>`_
-.. contents:: ★
- :backlinks: none
- :depth: 2
-
-About
-=====
-
-Authors
--------
-
-Originally written by Peter Bumbulis (peter@csg.uwaterloo.ca)
-and described in research article
-`"RE2C: a more versatile scanner generator" <1994_bumbulis_cowan_re2c_a_more_versatile_scanner_generator.pdf>`_
-by Peter Bumbulis, Donald D. Cowan, 1994,
-ACM Letters on Programming Languages and Systems (LOPLAS).
-
-Since then many people have contributed to re2c:
-
-* Brian Young bayoung@acm.org
-* Dan Nuffer nuffer@users.sourceforge.net
-* Marcus Boerger helly@users.sourceforge.net
-* Hartmut Kaiser hkaiser@users.sourceforge.net
-* Emmanuel Mogenet mgix@mgix.com
-* Ulya Trofimovich skvadrik@gmail.com
-
-Let me know if I missed someone!
-
-License
--------
-
-re2c is distributed with no warranty whatever.
-The code is certain to contain errors.
-Neither the author nor any contributor takes responsibility for any consequences of its use.
-re2c is in the public domain. The data structures and algorithms used
-in re2c are all either taken from documents available to the general
-public or are inventions of the author. Programs generated by re2c may
-be distributed freely. re2c itself may be distributed freely, in source
-or binary, unchanged or modified. Distributors may charge whatever fees
-they can obtain for re2c. If you do make use of re2c, or incorporate it into a larger project an
-acknowledgement somewhere (documentation, research report, etc.) would
-be appreciated.
-
-Version
--------
-
-This page describes ``re2c`` version 0.14.1.dev, package date 15 Oct 2015.
-
-Run
-===
-
-Synopsis
---------
-
-``re2c [OPTIONS] FILE [OPTIONS]``
-
-Options
--------
-
-.. include:: options/options_list.rst
+======
+Syntax
+======
-Warnings
---------
+.. include:: ../home.rst
+.. include:: ../../contents.rst
-.. include:: options/warnings_list.rst
-Syntax
-======
Code for ``re2c`` consists of a set of `rules`_, `definitions`_ and
`configurations`_.
generated code will contain both ``YYSETSTATE (s)`` and ``YYGETSTATE`` even
if ``YYFILL (n)`` is being disabled.
-Features
-========
-
-Conditions
-----------
-
-You can preceed regular expressions with a list of condition names when
-using the ``-c`` switch. In this case ``re2c`` generates scanner blocks for
-each conditon. Where each of the generated blocks has its own
-precondition. The precondition is given by the interface define
-``YYGETCONDITON()`` and must be of type ``YYCONDTYPE``.
-
-There are two special rule types. First, the rules of the condition ``<*>``
-are merged to all conditions (note that they have lower priority than
-other rules of that condition). And second the empty condition list
-allows to provide a code block that does not have a scanner part.
-Meaning it does not allow any regular expression. The condition value
-referring to this special block is always the one with the enumeration
-value 0. This way the code of this special rule can be used to
-initialize a scanner. It is in no way necessary to have these rules: but
-sometimes it is helpful to have a dedicated uninitialized condition
-state.
-
-Non empty rules allow to specify the new condition, which makes them
-transition rules. Besides generating calls for the define
-``YYSETCONDTITION`` no other special code is generated.
-
-There is another kind of special rules that allow to prepend code to any
-code block of all rules of a certain set of conditions or to all code
-blocks to all rules. This can be helpful when some operation is common
-among rules. For instance this can be used to store the length of the
-scanned string. These special setup rules start with an exclamation mark
-followed by either a list of conditions ``<! condition, ... >`` or a star
-``<!*>``. When ``re2c`` generates the code for a rule whose state does not have a
-setup rule and a star'd setup rule is present, than that code will be
-used as setup code.
-
-State
------
-
-When the ``-f`` flag is specified, ``re2c`` generates a scanner that can
-store its current state, return to the caller, and later resume
-operations exactly where it left off.
-
-The default operation of ``re2c`` is a
-"pull" model, where the scanner asks for extra input whenever it needs it. However, this mode of operation assumes that the scanner is the "owner"
-the parsing loop, and that may not always be convenient.
-
-Typically, if there is a preprocessor ahead of the scanner in the
-stream, or for that matter any other procedural source of data, the
-scanner cannot "ask" for more data unless both scanner and source
-live in a separate threads.
-
-The ``-f`` flag is useful for just this situation: it lets users design
-scanners that work in a "push" model, i.e. where data is fed to the
-scanner chunk by chunk. When the scanner runs out of data to consume, it
-just stores its state, and return to the caller. When more input data is
-fed to the scanner, it resumes operations exactly where it left off.
-
-Changes needed compared to the "pull" model:
-
-* User has to supply macros ``YYSETSTATE ()`` and ``YYGETSTATE (state)``.
-
-* The ``-f`` option inhibits declaration of ``yych`` and ``yyaccept``. So the
- user has to declare these. Also the user has to save and restore these.
- In the example ``examples/push_model/push.re`` these are declared as
- fields of the (C++) class of which the scanner is a method, so they do
- not need to be saved/restored explicitly. For C they could e.g. be made
- macros that select fields from a structure passed in as parameter.
- Alternatively, they could be declared as local variables, saved with
- ``YYFILL (n)`` when it decides to return and restored at entry to the
- function. Also, it could be more efficient to save the state from
- ``YYFILL (n)`` because ``YYSETSTATE (state)`` is called unconditionally.
- ``YYFILL (n)`` however does not get ``state`` as parameter, so we would have
- to store state in a local variable by ``YYSETSTATE (state)``.
-
-* Modify ``YYFILL (n)`` to return (from the function calling it) if more input is needed.
-
-* Modify caller to recognise if more input is needed and respond appropriately.
-
-* The generated code will contain a switch block that is used to
- restores the last state by jumping behind the corrspoding ``YYFILL (n)``
- call. This code is automatically generated in the epilog of the first ``/*!re2c */``
- block. It is possible to trigger generation of the ``YYGETSTATE ()``
- block earlier by placing a ``/*!getstate:re2c*/`` comment. This is especially useful when the scanner code should be
- wrapped inside a loop.
-
-Please see ``examples/push_model/push.re`` for "push" model scanner. The
-generated code can be tweaked using inplace configurations ``state:abort``
-and ``state:nextlabel``.
-
-Reuse
------
-
-Reuse mode is controlled by ``-r --reusable`` option.
-Allows reuse of scanner definitions with ``/*!use:re2c */`` after ``/*!rules:re2c */``.
-In this mode no ``/*!re2c */`` block and exactly one ``/*!rules:re2c */`` must be present.
-The rules are being saved and used by every ``/*!use:re2c */`` block that follows.
-These blocks can contain inplace configurations, especially ``re2c:flags:e``,
-``re2c:flags:w``, ``re2c:flags:x``, ``re2c:flags:u`` and ``re2c:flags:8``.
-That way it is possible to create the same scanner multiple times for
-different character types, different input mechanisms or different output mechanisms.
-The ``/*!use:re2c */`` blocks can also contain additional rules that will be appended
-to the set of rules in ``/*!rules:re2c */``.
-
-Encodings
----------
-
-``re2c`` supports the following encodings: ASCII (default), EBCDIC (``-e``),
-UCS-2 (``-w``), UTF-16 (``-x``), UTF-32 (``-u``) and UTF-8 (``-8``).
-See also inplace configuration ``re2c:flags``.
-
-The following concepts should be clarified when talking about encoding.
-*Code point* is an abstract number, which represents single encoding
-symbol. *Code unit* is the smallest unit of memory, which is used in the
-encoded text (it corresponds to one character in the input stream). One
-or more code units can be needed to represent a single code point,
-depending on the encoding. In *fixed-length* encoding, each code point
-is represented with equal number of code units. In *variable-length*
-encoding, different code points can be represented with different number
-of code units.
-
-* ASCII is a fixed-length encoding. Its code space includes 0x100
- code points, from 0 to 0xFF. One code point is represented with exactly one
- 1-byte code unit, which has the same value as the code point. Size of
- ``YYCTYPE`` must be 1 byte.
-
-* EBCDIC is a fixed-length encoding. Its code space includes 0x100
- code points, from 0 to 0xFF. One code point is represented with exactly
- one 1-byte code unit, which has the same value as the code point. Size
- of ``YYCTYPE`` must be 1 byte.
-
-* UCS-2 is a fixed-length encoding. Its code space includes 0x10000
- code points, from 0 to 0xFFFF. One code point is represented with
- exactly one 2-byte code unit, which has the same value as the code
- point. Size of ``YYCTYPE`` must be 2 bytes.
-
-* UTF-16 is a variable-length encoding. Its code space includes all
- Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
- code point is represented with one or two 2-byte code units. Size of
- ``YYCTYPE`` must be 2 bytes.
-
-* UTF-32 is a fixed-length encoding. Its code space includes all
- Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
- code point is represented with exactly one 4-byte code unit. Size of
- ``YYCTYPE`` must be 4 bytes.
-
-* UTF-8 is a variable-length encoding. Its code space includes all
- Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
- code point is represented with sequence of one, two, three or four
- 1-byte code units. Size of ``YYCTYPE`` must be 1 byte.
-
-In Unicode, values from range 0xD800 to 0xDFFF (surrogates) are not
-valid Unicode code points, any encoded sequence of code units, that
-would map to Unicode code points in the range 0xD800-0xDFFF, is
-ill-formed. The user can control how ``re2c`` treats such ill-formed
-sequences with ``--encoding-policy <policy>`` flag (see `Options`_
-for full explanation).
-
-For some encodings, there are code units, that never occur in valid
-encoded stream (e.g. 0xFF byte in UTF-8). If the generated scanner must
-check for invalid input, the only true way to do so is to use default
-rule ``*``. Note, that full range rule ``[^]`` won't catch invalid code units when variable-length encoding is used
-(``[^]`` means "all valid code points", while default rule ``*`` means "all possible code units").
-
-Generic interface
------------------
-
-``re2c`` usually operates on input using pointer-like primitives
-``YYCURSOR``, ``YYMARKER``, ``YYCTXMARKER`` and ``YYLIMIT``.
-
-Generic input API (enabled with ``--input custom`` switch) allows to
-customize input operations. In this mode, ``re2c`` will express all
-operations on input in terms of the following primitives:
-
- +---------------------+-----------------------------------------------------+
- | ``YYPEEK ()`` | get current input character |
- +---------------------+-----------------------------------------------------+
- | ``YYSKIP ()`` | advance to the next character |
- +---------------------+-----------------------------------------------------+
- | ``YYBACKUP ()`` | backup current input position |
- +---------------------+-----------------------------------------------------+
- | ``YYBACKUPCTX ()`` | backup current input position for trailing context |
- +---------------------+-----------------------------------------------------+
- | ``YYRESTORE ()`` | restore current input position |
- +---------------------+-----------------------------------------------------+
- | ``YYRESTORECTX ()`` | restore current input position for trailing context |
- +---------------------+-----------------------------------------------------+
- | ``YYLESSTHAN (n)`` | check if less than ``n`` input characters are left |
- +---------------------+-----------------------------------------------------+
-
-This `article <http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-13-input_model.html>`_
-has more details, and you can find some usage examples
-`here <http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-15-input_model_custom.html>`_.
-
-Examples
-========
-
-.. include:: example_intro.rst
-.. include:: example_01.rst
-.. include:: example_02.rst
-.. include:: example_03.rst
-.. include:: example_04.rst
-.. include:: example_05.rst
-.. include:: example_06.rst
-.. include:: example_07.rst
-
-Changelog
-=========
-
-.. include:: changelog.rst
--- /dev/null
+========
+Warnings
+========
+
+.. include:: ../home.rst
+.. include:: ../../contents.rst
+
+.. include:: wundefined_control_flow.rst
+++ /dev/null
-.. header:: `[home] <index.html>`_
-.. footer:: `[home] <index.html>`_
-.. contents:: ★
- :backlinks: none
- :depth: 2
+=========
+Changelog
+=========
+
+.. include:: home.rst
+
* 2015-02-23: 0.14
- Added generic input API (#21 "Support to configure how re2c code interfaced with the symbol buffer?")
- fixed #46 "re2c generates an infinite loop, depends on existence of previous parser"
--- /dev/null
+.. |[news_home]| replace:: [home]
+.. _[news_home]: ../index.html
+.. header:: |[news_home]|_ `[News] <news.html>`_
+.. footer:: |[news_home]|_ `[News] <news.html>`_
--- /dev/null
+====
+News
+====
+
+.. include:: ../home.rst
+
+* `Changelog <changelog.html>`_
+