Restructured files and links.

author Ulya Trofimovich <skvadrik@gmail.com>

Thu, 5 Nov 2015 12:42:37 +0000 (12:42 +0000)

committer Ulya Trofimovich <skvadrik@gmail.com>

Thu, 5 Nov 2015 12:42:37 +0000 (12:42 +0000)
author Ulya Trofimovich <skvadrik@gmail.com>
Thu, 5 Nov 2015 12:42:37 +0000 (12:42 +0000)
committer Ulya Trofimovich <skvadrik@gmail.com>
Thu, 5 Nov 2015 12:42:37 +0000 (12:42 +0000)
diff --git a/src/1994_bumbulis_cowan_re2c_a_more_versatile_scanner_generator.pdf b/src/about/1994_bumbulis_cowan_re2c_a_more_versatile_scanner_generator.pdf

similarity index 100%

rename from src/1994_bumbulis_cowan_re2c_a_more_versatile_scanner_generator.pdf

rename to src/about/1994_bumbulis_cowan_re2c_a_more_versatile_scanner_generator.pdf
diff --git a/src/about/about.rst b/src/about/about.rst

new file mode 100644 (file)

index 0000000..fbe2f96
--- /dev/null
+++ b/src/about/about.rst
@@ -0,0 +1,47 @@
+=====
+About
+=====
+
+.. include:: ../home.rst
+.. include:: ../contents.rst
+
+Authors
+-------
+
+Originally written by Peter Bumbulis (peter@csg.uwaterloo.ca)
+and described in research article
+`"RE2C: a more versatile scanner generator" <1994_bumbulis_cowan_re2c_a_more_versatile_scanner_generator.pdf>`_
+by Peter Bumbulis, Donald D. Cowan, 1994,
+ACM Letters on Programming Languages and Systems (LOPLAS).
+
+Since then many people have contributed to re2c:
+
+* Brian Young      bayoung@acm.org
+* Dan Nuffer       nuffer@users.sourceforge.net
+* Marcus Boerger   helly@users.sourceforge.net
+* Hartmut Kaiser   hkaiser@users.sourceforge.net
+* Emmanuel Mogenet mgix@mgix.com
+* Ulya Trofimovich skvadrik@gmail.com
+
+Let me know if I missed someone!
+
+License
+-------
+
+re2c is distributed with no warranty whatever.
+The code is certain to contain errors.
+Neither the author nor any contributor takes responsibility for any consequences of its use.
+re2c is in the public domain. The data structures and algorithms used
+in re2c are all either taken from documents available to the general
+public or are inventions of the author. Programs generated by re2c may
+be distributed freely. re2c itself may be distributed freely, in source
+or binary, unchanged or modified. Distributors may charge whatever fees
+they can obtain for re2c. If you do make use of re2c, or incorporate it into a larger project an
+acknowledgement somewhere (documentation, research report, etc.) would
+be appreciated.
+
+Version
+-------
+
+This page describes ``re2c`` version 0.14.1.dev, package date 15 Oct 2015.
+
diff --git a/src/contents.rst b/src/contents.rst

new file mode 100644 (file)

index 0000000..f187f63
--- /dev/null
+++ b/src/contents.rst
@@ -0,0 +1,3 @@
+.. contents:: ★
+    :backlinks: none
+    :depth: 2
diff --git a/src/css/default.css b/src/css/default.css

index f6dd9517e6bdd2839340e49f6aeef16b584c75b1..3a3c01ceaefc3b903e3bf705d95641f0cb3e7af7 100644 (file)
--- a/src/css/default.css
+++ b/src/css/default.css
@@ -11,7 +11,7 @@ h1, h2, h3, h4, h5, h6 {
      color: #557799;
  }
  h1 {
-    border-bottom: groove black;
+    border-bottom: groove gray;
  }
  h1.title {
      font-size: 2.5em;
diff --git a/src/example_intro.rst b/src/example_intro.rst

deleted file mode 100644 (file)

index a8755bc..0000000
--- a/src/example_intro.rst
+++ /dev/null
@@ -1,4 +0,0 @@
-
-All examples are written in C++-98.
-`Do let me know <skvadik@gmail.com>`_ if you notice any obvious lies and errors.
-You can find more examples in subdirectory ``examples`` of the ``re2c`` distribution.
diff --git a/src/examples.rst b/src/examples.rst

deleted file mode 100644 (file)

index e47f013..0000000
--- a/src/examples.rst
+++ /dev/null
@@ -1,15 +0,0 @@
-.. header:: `[home] <index.html>`_
-.. footer:: `[home] <index.html>`_
-.. contents:: ★
-    :backlinks: none
-    :depth: 2
-
-.. include:: example_intro.rst
-.. include:: example_01.rst
-.. include:: example_02.rst
-.. include:: example_03.rst
-.. include:: example_04.rst
-.. include:: example_05.rst
-.. include:: example_06.rst
-.. include:: example_07.rst
-
diff --git a/src/example_01.rst b/src/examples/example_01.rst

similarity index 95%

rename from src/example_01.rst

rename to src/examples/example_01.rst

index 54da9230d8f22fbacbc8a886ec7061979ba1b90d..a13e0bbe49bc6821c85408e9f97680804630bce7 100644 (file)
--- a/src/example_01.rst
+++ b/src/examples/example_01.rst
@@ -1,6 +1,8 @@
  Recognizing integers: the sentinel method
  -----------------------------------------
  
+.. include:: home.rst
+
  This example is very simple, yet practical.
  We assume that the input is small (fits in one continuous piece of memory).
  We also assume that some characters never occur in well-formed input (but may occur in ill-formed input).
@@ -12,9 +14,9 @@ and tries to match each argument against one of the four patterns:
  binary, octal, decimal and hexadecimal integer literals.
  The numbers are not *parsed* (their numeric value is not retrieved), they are merely *recognized*.
  
-`[01_recognizing_integers.re] <examples/01_recognizing_integers.re>`_
+`[01_recognizing_integers.re] <01_recognizing_integers.re>`_
  
-.. include:: examples/01_recognizing_integers.re
+.. include:: 01_recognizing_integers.re
      :code: cpp
      :number-lines:
  
diff --git a/src/example_02.rst b/src/examples/example_02.rst

similarity index 97%

rename from src/example_02.rst

rename to src/examples/example_02.rst

index 40ccf6f1f2ca9103f006f054be2bb777d5fa84ff..66501cf2305a28d6f82e68d46433e84ceccf1ca0 100644 (file)
--- a/src/example_02.rst
+++ b/src/examples/example_02.rst
@@ -1,6 +1,8 @@
  Recognizing strings: the need for YYMAXFILL
  -------------------------------------------
  
+.. include:: home.rst
+
  This example is about recognizing strings.
  Strings (in generic sense) are different from other kinds of lexemes: they can contain *arbitrary* characters.
  It makes them a way more difficult to lex: unlike `Recognizing integers: the sentinel method <example_01.html>`_ example,
@@ -38,9 +40,9 @@ Common hack is to pad input with a few fake characters that **do not form a vali
  The length of padding depends on the maximal argument to ``YYFILL``
  (this value is called ``YYMAXFILL`` and can be generated using ``/*!max:re2c*/`` directive).
  
-`[02_recognizing_strings.re] <examples/02_recognizing_strings.re>`_
+`[02_recognizing_strings.re] <02_recognizing_strings.re>`_
  
-.. include:: examples/02_recognizing_strings.re
+.. include:: 02_recognizing_strings.re
      :code: cpp
      :number-lines:
  
diff --git a/src/example_03.rst b/src/examples/example_03.rst

similarity index 97%

rename from src/example_03.rst

rename to src/examples/example_03.rst

index bdfeb077930d079f38d032e415611a4849bc6883..4e8e33625c42d16a11d64ba3fd66ca5d37853af0 100644 (file)
--- a/src/example_03.rst
+++ b/src/examples/example_03.rst
@@ -1,6 +1,8 @@
  Arbitrary large input and YYFILL
  --------------------------------
  
+.. include:: home.rst
+
  In this example we suppose that input cannot be mapped in memory at once:
  either it's too large or its size cannot be determined in advance.
  The usual thing to do in such case is to allocate a buffer and lex input in chunks that fit into buffer.
@@ -70,9 +72,9 @@ It can be used as boundary in ``YYFILL``.
  Our example program reads ``stdin`` in chunks of 16 bytes (in real word buffer size is usually ~4Kb)
  and tries to lex numbers separated by newlines.
  
-`[03_arbitrary_large_input.re] <examples/03_arbitrary_large_input.re>`_
+`[03_arbitrary_large_input.re] <03_arbitrary_large_input.re>`_
  
-.. include:: examples/03_arbitrary_large_input.re
+.. include:: 03_arbitrary_large_input.re
      :code: cpp
      :number-lines:
  
diff --git a/src/example_04.rst b/src/examples/example_04.rst

similarity index 90%

rename from src/example_04.rst

rename to src/examples/example_04.rst

index 2b1e93393cf7d93c46025af7e909ad632adbda25..be0e0fec70a2b484d88fa06c18e9fa444f0c1aad 100644 (file)
--- a/src/example_04.rst
+++ b/src/examples/example_04.rst
@@ -1,15 +1,17 @@
  Parsing integers (multiple re2c blocks)
  ---------------------------------------
  
+.. include:: home.rst
+
  This example is based on `Recognizing integers: the sentinel method <example_01.html>`_ example,
  only now integer literals are parsed rather than simply recognized.
  Parsing integers is simple: one can easily do it by hand.
  However, re2c-generated code *does* look like a simple handwritten parser:
  a couple of dereferences and conditional jumps. No overhead. ``:)``
  
-`[04_parsing_integers_blocks.re] <examples/04_parsing_integers_blocks.re>`_
+`[04_parsing_integers_blocks.re] <04_parsing_integers_blocks.re>`_
  
-.. include:: examples/04_parsing_integers_blocks.re
+.. include:: 04_parsing_integers_blocks.re
      :code: cpp
      :number-lines:
  
diff --git a/src/example_05.rst b/src/examples/example_05.rst

similarity index 92%

rename from src/example_05.rst

rename to src/examples/example_05.rst

index decd5485a2999b3b5120b13807145b6217b4f536..8738d75c0c9504a6b9c91ef42185f27fc84e3e47 100644 (file)
--- a/src/example_05.rst
+++ b/src/examples/example_05.rst
@@ -1,13 +1,15 @@
  Parsing integers (conditions)
  -----------------------------
  
+.. include:: home.rst
+
  This example does exactly the same as `Parsing integers (multiple re2c blocks) <example_04.html>`_ example,
  but in a slightly different manner: it uses re2c conditions instead of blocks.
  Conditions allow to encode multiple interconnected lexers within a single re2c block.
  
-`[05_parsing_integers_conditions.re] <examples/05_parsing_integers_conditions.re>`_
+`[05_parsing_integers_conditions.re] <05_parsing_integers_conditions.re>`_
  
-.. include:: examples/05_parsing_integers_conditions.re
+.. include:: 05_parsing_integers_conditions.re
      :code: cpp
      :number-lines:
  
diff --git a/src/example_06.rst b/src/examples/example_06.rst

similarity index 92%

rename from src/example_06.rst

rename to src/examples/example_06.rst

index 55b47776ccef2a72e40ed522cae5c30db95f0e59..9098a0674be9c9205852b3eea59640b9842963a3 100644 (file)
--- a/src/example_06.rst
+++ b/src/examples/example_06.rst
@@ -1,6 +1,8 @@
  Braille patterns (encodings)
  ----------------------------
  
+.. include:: home.rst
+
  This example is about encoding support in re2c.
  It's a partial decoder from Grade-1 (uncontracted) Unicode English Braille to plain English.
  The input may be encoded in UTF-8, UTF-16, UTF-32 or UCS-2:
@@ -10,9 +12,9 @@ We use ``-r`` option to reuse the same block of re2c rules with different encodi
  So. The hardest part is to get some input.
  Here is a message out of the void:
  
-.. include:: examples/06_braille.utf8.txt
+.. include:: 06_braille.utf8.txt
  
-It appears to be UTF-8 encoded `[06_braille.utf8.txt] <examples/06_braille.utf8.txt.html>`_.
+It appears to be UTF-8 encoded `[06_braille.utf8.txt] <06_braille.utf8.txt.html>`_.
  Convert it into UTF-16, UTF-32 or UCS-2:
  
  .. code-block:: bash
@@ -30,9 +32,9 @@ and some other, which we omit for simplicity (as well as a few ambiguous punctua
  Grade-2 Braille allows contractions; they obey complex rules (like those of a natural language)
  and are much harder to implement.
  
-`[06_braille.re] <examples/06_braille.re>`_
+`[06_braille.re] <06_braille.re>`_
  
-.. include:: examples/06_braille.re
+.. include:: 06_braille.re
      :code: cpp
      :number-lines:
  
diff --git a/src/example_07.rst b/src/examples/example_07.rst

similarity index 98%

rename from src/example_07.rst

rename to src/examples/example_07.rst

index 47dbdaf5343bc4cf7ef04186a80a934ce5052d7c..8b1686ad8f0573f5a5207370d33fd92ebac22ed7 100644 (file)
--- a/src/example_07.rst
+++ b/src/examples/example_07.rst
@@ -1,6 +1,8 @@
  C++98 lexer
  -----------
  
+.. include:: home.rst
+
  This is an example of a big real-world re2c program: C++98 lexer.
  It confirms to the C++98 standard (except for a couple of hacks to simulate preprocessor).
  All nontrivial lexemes (integers, floating-point constants, strings and character literals)
@@ -8,9 +10,9 @@ are parsed (not only recognized): numeric literals are converted to numbers, str
  Some additional checks described in standard (e.g. overflows in integer literals) are also done.
  In fact, C++ is an easy language to lex: unlike many other languages, lexer can proceed without feedback from parser.
  
-`[07_c++98.re] <examples/07_c++98.re>`_
+`[07_c++98.re] <07_c++98.re>`_
  
-.. include:: examples/07_c++98.re
+.. include:: 07_c++98.re
      :code: cpp
      :number-lines:
  
diff --git a/src/examples/examples.rst b/src/examples/examples.rst

new file mode 100644 (file)

index 0000000..720e70b
--- /dev/null
+++ b/src/examples/examples.rst
@@ -0,0 +1,20 @@
+========
+Examples
+========
+
+.. include:: ../home.rst
+
+★
+
+* `Recognizing integers: the sentinel method   <example_01.html>`_
+* `Recognizing strings: the need for YYMAXFILL <example_02.html>`_
+* `Arbitrary large input and YYFILL            <example_03.html>`_
+* `Parsing integers (multiple re2c blocks)     <example_04.html>`_
+* `Parsing integers (conditions)               <example_05.html>`_
+* `Braille patterns (encodings)                <example_06.html>`_
+* `C++98 lexer                                 <example_07.html>`_
+
+All examples are written in C++-98.
+`Do let me know <skvadik@gmail.com>`_ if you notice any obvious lies and errors.
+You can find more examples in subdirectory ``examples`` of the ``re2c`` distribution.
+
diff --git a/src/examples/home.rst b/src/examples/home.rst

new file mode 100644 (file)

index 0000000..da92baf
--- /dev/null
+++ b/src/examples/home.rst
@@ -0,0 +1,4 @@
+.. |[examples_home]| replace:: [home]
+.. _[examples_home]: ../index.html
+.. header:: |[examples_home]|_ `[Examples] <examples.html>`_
+.. footer:: |[examples_home]|_ `[Examples] <examples.html>`_
diff --git a/src/home.rst b/src/home.rst

new file mode 100644 (file)

index 0000000..eccd374
--- /dev/null
+++ b/src/home.rst
@@ -0,0 +1,2 @@
+.. header:: `[home] <../index.html>`_
+.. footer:: `[home] <../index.html>`_
diff --git a/src/index.rst b/src/index.rst

index 93e4bfcfb1af3c47bd44d7e8d97d900a24007954..57eb91e97b47a790f7f34c5feccd6836566ea904 100644 (file)
--- a/src/index.rst
+++ b/src/index.rst
@@ -1,4 +1,3 @@
-
  ====
  re2c
  ====
@@ -9,13 +8,15 @@ and flexible (easy to embed into existing environment).
  
  --------------------------------------------------------------------------------
  
-★ `Install <install.html>`_
+★ `About    <about/about.html>`_
+
+★ `Install  <install/install.html>`_
  
-★ `Manual <manual.html>`_
+★ `Manual   <manual/manual.html>`_
  
-★ `Examples <examples.html>`_
+★ `Examples <examples/examples.html>`_
  
-★ `News <news.html>`_
+★ `News     <news/news.html>`_
  
  --------------------------------------------------------------------------------
  
diff --git a/src/install.rst b/src/install/install.rst

similarity index 96%

rename from src/install.rst

rename to src/install/install.rst

index e9be30accc6a378d1afc4bf05ced362b4152007d..a48d6648121ff5b01e64a7335323a11524120975 100644 (file)
--- a/src/install.rst
+++ b/src/install/install.rst
@@ -1,8 +1,9 @@
-.. header:: `[home] <index.html>`_
-.. footer:: `[home] <index.html>`_
-.. contents:: ★
-    :backlinks: none
-    :depth: 2
+=======
+Install
+=======
+
+.. include:: ../home.rst
+.. include:: ../contents.rst
  
  Download
  ========
diff --git a/src/manual/features/features.rst b/src/manual/features/features.rst

new file mode 100644 (file)

index 0000000..15e7b13
--- /dev/null
+++ b/src/manual/features/features.rst
@@ -0,0 +1,199 @@
+========
+Features
+========
+
+.. include:: ../home.rst
+.. include:: ../../contents.rst
+
+Conditions
+----------
+
+You can preceed regular expressions with a list of condition names when
+using the ``-c`` switch. In this case ``re2c`` generates scanner blocks for
+each conditon. Where each of the generated blocks has its own
+precondition. The precondition is given by the interface define
+``YYGETCONDITON()`` and must be of type ``YYCONDTYPE``.
+
+There are two special rule types. First, the rules of the condition ``<*>``
+are merged to all conditions (note that they have lower priority than
+other rules of that condition). And second the empty condition list
+allows to provide a code block that does not have a scanner part.
+Meaning it does not allow any regular expression. The condition value
+referring to this special block is always the one with the enumeration
+value 0. This way the code of this special rule can be used to
+initialize a scanner. It is in no way necessary to have these rules: but
+sometimes it is helpful to have a dedicated uninitialized condition
+state.
+
+Non empty rules allow to specify the new condition, which makes them
+transition rules. Besides generating calls for the define
+``YYSETCONDTITION`` no other special code is generated.
+
+There is another kind of special rules that allow to prepend code to any
+code block of all rules of a certain set of conditions or to all code
+blocks to all rules. This can be helpful when some operation is common
+among rules. For instance this can be used to store the length of the
+scanned string. These special setup rules start with an exclamation mark
+followed by either a list of conditions ``<! condition, ... >`` or a star
+``<!*>``. When ``re2c`` generates the code for a rule whose state does not have a
+setup rule and a star'd setup rule is present, than that code will be
+used as setup code.
+
+State
+-----
+
+When the ``-f`` flag is specified, ``re2c`` generates a scanner that can
+store its current state, return to the caller, and later resume
+operations exactly where it left off.
+
+The default operation of ``re2c`` is a
+"pull" model, where the scanner asks for extra input whenever it needs it. However, this mode of operation assumes that the scanner is the "owner"
+the parsing loop, and that may not always be convenient.
+
+Typically, if there is a preprocessor ahead of the scanner in the
+stream, or for that matter any other procedural source of data, the
+scanner cannot "ask" for more data unless both scanner and source
+live in a separate threads.
+
+The ``-f`` flag is useful for just this situation: it lets users design
+scanners that work in a "push" model, i.e. where data is fed to the
+scanner chunk by chunk. When the scanner runs out of data to consume, it
+just stores its state, and return to the caller. When more input data is
+fed to the scanner, it resumes operations exactly where it left off.
+
+Changes needed compared to the "pull" model:
+
+* User has to supply macros ``YYSETSTATE ()`` and ``YYGETSTATE (state)``.
+
+* The ``-f`` option inhibits declaration of ``yych`` and ``yyaccept``. So the
+  user has to declare these. Also the user has to save and restore these.
+  In the example ``examples/push_model/push.re`` these are declared as
+  fields of the (C++) class of which the scanner is a method, so they do
+  not need to be saved/restored explicitly. For C they could e.g. be made
+  macros that select fields from a structure passed in as parameter.
+  Alternatively, they could be declared as local variables, saved with
+  ``YYFILL (n)`` when it decides to return and restored at entry to the
+  function. Also, it could be more efficient to save the state from
+  ``YYFILL (n)`` because ``YYSETSTATE (state)`` is called unconditionally.
+  ``YYFILL (n)`` however does not get ``state`` as parameter, so we would have
+  to store state in a local variable by ``YYSETSTATE (state)``.
+
+* Modify ``YYFILL (n)`` to return (from the function calling it) if more input is needed.
+
+* Modify caller to recognise if more input is needed and respond appropriately.
+
+* The generated code will contain a switch block that is used to
+  restores the last state by jumping behind the corrspoding ``YYFILL (n)``
+  call. This code is automatically generated in the epilog of the first ``/*!re2c */``
+  block. It is possible to trigger generation of the ``YYGETSTATE ()``
+  block earlier by placing a ``/*!getstate:re2c*/`` comment. This is especially useful when the scanner code should be
+  wrapped inside a loop.
+
+Please see ``examples/push_model/push.re`` for "push" model scanner. The
+generated code can be tweaked using inplace configurations ``state:abort``
+and ``state:nextlabel``.
+
+Reuse
+-----
+
+Reuse mode is controlled by ``-r --reusable`` option.
+Allows reuse of scanner definitions with ``/*!use:re2c */`` after ``/*!rules:re2c */``.
+In this mode no ``/*!re2c */`` block and exactly one ``/*!rules:re2c */`` must be present.
+The rules are being saved and used by every ``/*!use:re2c */`` block that follows.
+These blocks can contain inplace configurations, especially ``re2c:flags:e``,
+``re2c:flags:w``, ``re2c:flags:x``, ``re2c:flags:u`` and ``re2c:flags:8``.
+That way it is possible to create the same scanner multiple times for
+different character types, different input mechanisms or different output mechanisms.
+The ``/*!use:re2c */`` blocks can also contain additional rules that will be appended
+to the set of rules in ``/*!rules:re2c */``.
+
+Encodings
+---------
+
+``re2c`` supports the following encodings: ASCII (default), EBCDIC (``-e``),
+UCS-2 (``-w``), UTF-16 (``-x``), UTF-32 (``-u``) and UTF-8 (``-8``).
+See also inplace configuration ``re2c:flags``.
+
+The following concepts should be clarified when talking about encoding.
+*Code point* is an abstract number, which represents single encoding
+symbol. *Code unit* is the smallest unit of memory, which is used in the
+encoded text (it corresponds to one character in the input stream). One
+or more code units can be needed to represent a single code point,
+depending on the encoding. In *fixed-length* encoding, each code point
+is represented with equal number of code units. In *variable-length*
+encoding, different code points can be represented with different number
+of code units.
+
+* ASCII is a fixed-length encoding. Its code space includes 0x100
+  code points, from 0 to 0xFF. One code point is represented with exactly one
+  1-byte code unit, which has the same value as the code point. Size of
+  ``YYCTYPE`` must be 1 byte.
+
+* EBCDIC is a fixed-length encoding. Its code space includes 0x100
+  code points, from 0 to 0xFF. One code point is represented with exactly
+  one 1-byte code unit, which has the same value as the code point. Size
+  of ``YYCTYPE`` must be 1 byte.
+
+* UCS-2 is a fixed-length encoding. Its code space includes 0x10000
+  code points, from 0 to 0xFFFF. One code point is represented with
+  exactly one 2-byte code unit, which has the same value as the code
+  point. Size of ``YYCTYPE`` must be 2 bytes.
+
+* UTF-16 is a variable-length encoding. Its code space includes all
+  Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
+  code point is represented with one or two 2-byte code units. Size of
+  ``YYCTYPE`` must be 2 bytes.
+
+* UTF-32 is a fixed-length encoding. Its code space includes all
+  Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
+  code point is represented with exactly one 4-byte code unit. Size of
+  ``YYCTYPE`` must be 4 bytes.
+
+* UTF-8 is a variable-length encoding. Its code space includes all
+  Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
+  code point is represented with sequence of one, two, three or four
+  1-byte code units. Size of ``YYCTYPE`` must be 1 byte.
+
+In Unicode, values from range 0xD800 to 0xDFFF (surrogates) are not
+valid Unicode code points, any encoded sequence of code units, that
+would map to Unicode code points in the range 0xD800-0xDFFF, is
+ill-formed. The user can control how ``re2c`` treats such ill-formed
+sequences with ``--encoding-policy <policy>`` flag (see `Options`_
+for full explanation).
+
+For some encodings, there are code units, that never occur in valid
+encoded stream (e.g. 0xFF byte in UTF-8). If the generated scanner must
+check for invalid input, the only true way to do so is to use default
+rule ``*``. Note, that full range rule ``[^]`` won't catch invalid code units when variable-length encoding is used
+(``[^]`` means "all valid code points", while default rule ``*`` means "all possible code units").
+
+Generic interface
+-----------------
+
+``re2c`` usually operates on input using pointer-like primitives
+``YYCURSOR``, ``YYMARKER``, ``YYCTXMARKER`` and ``YYLIMIT``.
+
+Generic input API (enabled with ``--input custom`` switch) allows to
+customize input operations. In this mode, ``re2c`` will express all
+operations on input in terms of the following primitives:
+
+    +---------------------+-----------------------------------------------------+
+    | ``YYPEEK ()``       | get current input character                         |
+    +---------------------+-----------------------------------------------------+
+    | ``YYSKIP ()``       | advance to the next character                       |
+    +---------------------+-----------------------------------------------------+
+    | ``YYBACKUP ()``     | backup current input position                       |
+    +---------------------+-----------------------------------------------------+
+    | ``YYBACKUPCTX ()``  | backup current input position for trailing context  |
+    +---------------------+-----------------------------------------------------+
+    | ``YYRESTORE ()``    | restore current input position                      |
+    +---------------------+-----------------------------------------------------+
+    | ``YYRESTORECTX ()`` | restore current input position for trailing context |
+    +---------------------+-----------------------------------------------------+
+    | ``YYLESSTHAN (n)``  | check if less than ``n`` input characters are left  |
+    +---------------------+-----------------------------------------------------+
+
+This `article <http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-13-input_model.html>`_
+has more details, and you can find some usage examples
+`here <http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-15-input_model_custom.html>`_.
+
diff --git a/src/manual/home.rst b/src/manual/home.rst

new file mode 100644 (file)

index 0000000..50b01e9
--- /dev/null
+++ b/src/manual/home.rst
@@ -0,0 +1,4 @@
+.. |[manual_home]| replace:: [home]
+.. _[manual_home]: ../../index.html
+.. header:: |[manual_home]|_ `[Manual] <../manual.html>`_
+.. footer:: |[manual_home]|_ `[Manual] <../manual.html>`_
diff --git a/src/manual/manual.rst b/src/manual/manual.rst

new file mode 100644 (file)

index 0000000..c2863d6
--- /dev/null
+++ b/src/manual/manual.rst
@@ -0,0 +1,13 @@
+======
+Manual
+======
+
+.. include:: ../home.rst
+
+★
+
+* `Options   <options/options_list.html>`_
+* `Warnings  <warnings/warnings.html>`_
+* `Syntax    <syntax/syntax.html>`_
+* `Features  <features/features.html>`_
+
diff --git a/src/options/options_list.rst b/src/manual/options/options_list.rst

similarity index 100%

rename from src/options/options_list.rst

rename to src/manual/options/options_list.rst
diff --git a/src/manual.rst b/src/manual/syntax/syntax.rst

similarity index 63%

rename from src/manual.rst

rename to src/manual/syntax/syntax.rst

index bac642b1b758c1659c04779f04623e41b67c5d1a..edcd63b30d4e8b10a309a2a942513f66c21e09bb 100644 (file)
--- a/src/manual.rst
+++ b/src/manual/syntax/syntax.rst
@@ -1,72 +1,11 @@
-.. header:: `[home] <index.html>`_
-.. footer:: `[home] <index.html>`_
-.. contents:: ★
-    :backlinks: none
-    :depth: 2
-
-About
-=====
-
-Authors
--------
-
-Originally written by Peter Bumbulis (peter@csg.uwaterloo.ca)
-and described in research article
-`"RE2C: a more versatile scanner generator" <1994_bumbulis_cowan_re2c_a_more_versatile_scanner_generator.pdf>`_
-by Peter Bumbulis, Donald D. Cowan, 1994,
-ACM Letters on Programming Languages and Systems (LOPLAS).
-
-Since then many people have contributed to re2c:
-
-* Brian Young      bayoung@acm.org
-* Dan Nuffer       nuffer@users.sourceforge.net
-* Marcus Boerger   helly@users.sourceforge.net
-* Hartmut Kaiser   hkaiser@users.sourceforge.net
-* Emmanuel Mogenet mgix@mgix.com
-* Ulya Trofimovich skvadrik@gmail.com
-
-Let me know if I missed someone!
-
-License
--------
-
-re2c is distributed with no warranty whatever.
-The code is certain to contain errors.
-Neither the author nor any contributor takes responsibility for any consequences of its use.
-re2c is in the public domain. The data structures and algorithms used
-in re2c are all either taken from documents available to the general
-public or are inventions of the author. Programs generated by re2c may
-be distributed freely. re2c itself may be distributed freely, in source
-or binary, unchanged or modified. Distributors may charge whatever fees
-they can obtain for re2c. If you do make use of re2c, or incorporate it into a larger project an
-acknowledgement somewhere (documentation, research report, etc.) would
-be appreciated.
-
-Version
--------
-
-This page describes ``re2c`` version 0.14.1.dev, package date 15 Oct 2015.
-
-Run
-===
-
-Synopsis
---------
-
-``re2c [OPTIONS] FILE [OPTIONS]``
-
-Options
--------
-
-.. include:: options/options_list.rst
+======
+Syntax
+======
  
-Warnings
---------
+.. include:: ../home.rst
+.. include:: ../../contents.rst
  
-.. include:: options/warnings_list.rst
  
-Syntax
-======
  
  Code for ``re2c`` consists of a set of `rules`_, `definitions`_ and
  `configurations`_.
@@ -607,214 +546,3 @@ depends on a particular use case.
      generated code will contain both ``YYSETSTATE (s)`` and ``YYGETSTATE`` even
      if ``YYFILL (n)`` is being disabled.
  
-Features
-========
-
-Conditions
-----------
-
-You can preceed regular expressions with a list of condition names when
-using the ``-c`` switch. In this case ``re2c`` generates scanner blocks for
-each conditon. Where each of the generated blocks has its own
-precondition. The precondition is given by the interface define
-``YYGETCONDITON()`` and must be of type ``YYCONDTYPE``.
-
-There are two special rule types. First, the rules of the condition ``<*>``
-are merged to all conditions (note that they have lower priority than
-other rules of that condition). And second the empty condition list
-allows to provide a code block that does not have a scanner part.
-Meaning it does not allow any regular expression. The condition value
-referring to this special block is always the one with the enumeration
-value 0. This way the code of this special rule can be used to
-initialize a scanner. It is in no way necessary to have these rules: but
-sometimes it is helpful to have a dedicated uninitialized condition
-state.
-
-Non empty rules allow to specify the new condition, which makes them
-transition rules. Besides generating calls for the define
-``YYSETCONDTITION`` no other special code is generated.
-
-There is another kind of special rules that allow to prepend code to any
-code block of all rules of a certain set of conditions or to all code
-blocks to all rules. This can be helpful when some operation is common
-among rules. For instance this can be used to store the length of the
-scanned string. These special setup rules start with an exclamation mark
-followed by either a list of conditions ``<! condition, ... >`` or a star
-``<!*>``. When ``re2c`` generates the code for a rule whose state does not have a
-setup rule and a star'd setup rule is present, than that code will be
-used as setup code.
-
-State
------
-
-When the ``-f`` flag is specified, ``re2c`` generates a scanner that can
-store its current state, return to the caller, and later resume
-operations exactly where it left off.
-
-The default operation of ``re2c`` is a
-"pull" model, where the scanner asks for extra input whenever it needs it. However, this mode of operation assumes that the scanner is the "owner"
-the parsing loop, and that may not always be convenient.
-
-Typically, if there is a preprocessor ahead of the scanner in the
-stream, or for that matter any other procedural source of data, the
-scanner cannot "ask" for more data unless both scanner and source
-live in a separate threads.
-
-The ``-f`` flag is useful for just this situation: it lets users design
-scanners that work in a "push" model, i.e. where data is fed to the
-scanner chunk by chunk. When the scanner runs out of data to consume, it
-just stores its state, and return to the caller. When more input data is
-fed to the scanner, it resumes operations exactly where it left off.
-
-Changes needed compared to the "pull" model:
-
-* User has to supply macros ``YYSETSTATE ()`` and ``YYGETSTATE (state)``.
-
-* The ``-f`` option inhibits declaration of ``yych`` and ``yyaccept``. So the
-  user has to declare these. Also the user has to save and restore these.
-  In the example ``examples/push_model/push.re`` these are declared as
-  fields of the (C++) class of which the scanner is a method, so they do
-  not need to be saved/restored explicitly. For C they could e.g. be made
-  macros that select fields from a structure passed in as parameter.
-  Alternatively, they could be declared as local variables, saved with
-  ``YYFILL (n)`` when it decides to return and restored at entry to the
-  function. Also, it could be more efficient to save the state from
-  ``YYFILL (n)`` because ``YYSETSTATE (state)`` is called unconditionally.
-  ``YYFILL (n)`` however does not get ``state`` as parameter, so we would have
-  to store state in a local variable by ``YYSETSTATE (state)``.
-
-* Modify ``YYFILL (n)`` to return (from the function calling it) if more input is needed.
-
-* Modify caller to recognise if more input is needed and respond appropriately.
-
-* The generated code will contain a switch block that is used to
-  restores the last state by jumping behind the corrspoding ``YYFILL (n)``
-  call. This code is automatically generated in the epilog of the first ``/*!re2c */``
-  block. It is possible to trigger generation of the ``YYGETSTATE ()``
-  block earlier by placing a ``/*!getstate:re2c*/`` comment. This is especially useful when the scanner code should be
-  wrapped inside a loop.
-
-Please see ``examples/push_model/push.re`` for "push" model scanner. The
-generated code can be tweaked using inplace configurations ``state:abort``
-and ``state:nextlabel``.
-
-Reuse
------
-
-Reuse mode is controlled by ``-r --reusable`` option.
-Allows reuse of scanner definitions with ``/*!use:re2c */`` after ``/*!rules:re2c */``.
-In this mode no ``/*!re2c */`` block and exactly one ``/*!rules:re2c */`` must be present.
-The rules are being saved and used by every ``/*!use:re2c */`` block that follows.
-These blocks can contain inplace configurations, especially ``re2c:flags:e``,
-``re2c:flags:w``, ``re2c:flags:x``, ``re2c:flags:u`` and ``re2c:flags:8``.
-That way it is possible to create the same scanner multiple times for
-different character types, different input mechanisms or different output mechanisms.
-The ``/*!use:re2c */`` blocks can also contain additional rules that will be appended
-to the set of rules in ``/*!rules:re2c */``.
-
-Encodings
----------
-
-``re2c`` supports the following encodings: ASCII (default), EBCDIC (``-e``),
-UCS-2 (``-w``), UTF-16 (``-x``), UTF-32 (``-u``) and UTF-8 (``-8``).
-See also inplace configuration ``re2c:flags``.
-
-The following concepts should be clarified when talking about encoding.
-*Code point* is an abstract number, which represents single encoding
-symbol. *Code unit* is the smallest unit of memory, which is used in the
-encoded text (it corresponds to one character in the input stream). One
-or more code units can be needed to represent a single code point,
-depending on the encoding. In *fixed-length* encoding, each code point
-is represented with equal number of code units. In *variable-length*
-encoding, different code points can be represented with different number
-of code units.
-
-* ASCII is a fixed-length encoding. Its code space includes 0x100
-  code points, from 0 to 0xFF. One code point is represented with exactly one
-  1-byte code unit, which has the same value as the code point. Size of
-  ``YYCTYPE`` must be 1 byte.
-
-* EBCDIC is a fixed-length encoding. Its code space includes 0x100
-  code points, from 0 to 0xFF. One code point is represented with exactly
-  one 1-byte code unit, which has the same value as the code point. Size
-  of ``YYCTYPE`` must be 1 byte.
-
-* UCS-2 is a fixed-length encoding. Its code space includes 0x10000
-  code points, from 0 to 0xFFFF. One code point is represented with
-  exactly one 2-byte code unit, which has the same value as the code
-  point. Size of ``YYCTYPE`` must be 2 bytes.
-
-* UTF-16 is a variable-length encoding. Its code space includes all
-  Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
-  code point is represented with one or two 2-byte code units. Size of
-  ``YYCTYPE`` must be 2 bytes.
-
-* UTF-32 is a fixed-length encoding. Its code space includes all
-  Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
-  code point is represented with exactly one 4-byte code unit. Size of
-  ``YYCTYPE`` must be 4 bytes.
-
-* UTF-8 is a variable-length encoding. Its code space includes all
-  Unicode code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
-  code point is represented with sequence of one, two, three or four
-  1-byte code units. Size of ``YYCTYPE`` must be 1 byte.
-
-In Unicode, values from range 0xD800 to 0xDFFF (surrogates) are not
-valid Unicode code points, any encoded sequence of code units, that
-would map to Unicode code points in the range 0xD800-0xDFFF, is
-ill-formed. The user can control how ``re2c`` treats such ill-formed
-sequences with ``--encoding-policy <policy>`` flag (see `Options`_
-for full explanation).
-
-For some encodings, there are code units, that never occur in valid
-encoded stream (e.g. 0xFF byte in UTF-8). If the generated scanner must
-check for invalid input, the only true way to do so is to use default
-rule ``*``. Note, that full range rule ``[^]`` won't catch invalid code units when variable-length encoding is used
-(``[^]`` means "all valid code points", while default rule ``*`` means "all possible code units").
-
-Generic interface
------------------
-
-``re2c`` usually operates on input using pointer-like primitives
-``YYCURSOR``, ``YYMARKER``, ``YYCTXMARKER`` and ``YYLIMIT``.
-
-Generic input API (enabled with ``--input custom`` switch) allows to
-customize input operations. In this mode, ``re2c`` will express all
-operations on input in terms of the following primitives:
-
-    +---------------------+-----------------------------------------------------+
-    | ``YYPEEK ()``       | get current input character                         |
-    +---------------------+-----------------------------------------------------+
-    | ``YYSKIP ()``       | advance to the next character                       |
-    +---------------------+-----------------------------------------------------+
-    | ``YYBACKUP ()``     | backup current input position                       |
-    +---------------------+-----------------------------------------------------+
-    | ``YYBACKUPCTX ()``  | backup current input position for trailing context  |
-    +---------------------+-----------------------------------------------------+
-    | ``YYRESTORE ()``    | restore current input position                      |
-    +---------------------+-----------------------------------------------------+
-    | ``YYRESTORECTX ()`` | restore current input position for trailing context |
-    +---------------------+-----------------------------------------------------+
-    | ``YYLESSTHAN (n)``  | check if less than ``n`` input characters are left  |
-    +---------------------+-----------------------------------------------------+
-
-This `article <http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-13-input_model.html>`_
-has more details, and you can find some usage examples
-`here <http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-15-input_model_custom.html>`_.
-
-Examples
-========
-
-.. include:: example_intro.rst
-.. include:: example_01.rst
-.. include:: example_02.rst
-.. include:: example_03.rst
-.. include:: example_04.rst
-.. include:: example_05.rst
-.. include:: example_06.rst
-.. include:: example_07.rst
-
-Changelog
-=========
-
-.. include:: changelog.rst
diff --git a/src/manual/warnings/warnings.rst b/src/manual/warnings/warnings.rst

new file mode 100644 (file)

index 0000000..7898f8d
--- /dev/null
+++ b/src/manual/warnings/warnings.rst
@@ -0,0 +1,8 @@
+========
+Warnings
+========
+
+.. include:: ../home.rst
+.. include:: ../../contents.rst
+
+.. include:: wundefined_control_flow.rst
diff --git a/src/options/warnings_list.rst b/src/manual/warnings/warnings_list.rst

similarity index 100%

rename from src/options/warnings_list.rst

rename to src/manual/warnings/warnings_list.rst
diff --git a/src/news.rst b/src/news.rst

deleted file mode 100644 (file)

index 8d89951..0000000
--- a/src/news.rst
+++ /dev/null
@@ -1,5 +0,0 @@
-.. header:: `[home] <index.html>`_
-.. footer:: `[home] <index.html>`_
-.. contents:: ★
-    :backlinks: none
-    :depth: 2
diff --git a/src/changelog.rst b/src/news/changelog.rst

similarity index 99%

rename from src/changelog.rst

rename to src/news/changelog.rst

index 9f3b8619d705620d52d0167bd8235d5fef2b311c..eb3e6ff5d3d14ce1c5aed561171c496a7a3c5bbc 100644 (file)
--- a/src/changelog.rst
+++ b/src/news/changelog.rst
@@ -1,3 +1,9 @@
+=========
+Changelog
+=========
+
+.. include:: home.rst
+
  * 2015-02-23: 0.14
      - Added generic input API (#21 "Support to configure how re2c code interfaced with the symbol buffer?")
      - fixed #46 "re2c generates an infinite loop, depends on existence of previous parser"
diff --git a/src/news/home.rst b/src/news/home.rst

new file mode 100644 (file)

index 0000000..c5752d2
--- /dev/null
+++ b/src/news/home.rst
@@ -0,0 +1,4 @@
+.. |[news_home]| replace:: [home]
+.. _[news_home]: ../index.html
+.. header:: |[news_home]|_ `[News] <news.html>`_
+.. footer:: |[news_home]|_ `[News] <news.html>`_
diff --git a/src/news/news.rst b/src/news/news.rst

new file mode 100644 (file)

index 0000000..3a0564c
--- /dev/null
+++ b/src/news/news.rst
@@ -0,0 +1,8 @@
+====
+News
+====
+
+.. include:: ../home.rst
+
+* `Changelog <changelog.html>`_
+
author	Ulya Trofimovich <skvadrik@gmail.com>
	Thu, 5 Nov 2015 12:42:37 +0000 (12:42 +0000)
committer	Ulya Trofimovich <skvadrik@gmail.com>
	Thu, 5 Nov 2015 12:42:37 +0000 (12:42 +0000)
src/about/1994_bumbulis_cowan_re2c_a_more_versatile_scanner_generator.pdf	[moved from src/1994_bumbulis_cowan_re2c_a_more_versatile_scanner_generator.pdf with 100% similarity]	patch \| blob \| history
src/about/about.rst	[new file with mode: 0644]	patch \| blob
src/contents.rst	[new file with mode: 0644]	patch \| blob
src/css/default.css		patch \| blob \| history
src/example_intro.rst	[deleted file]	patch \| blob \| history
src/examples.rst	[deleted file]	patch \| blob \| history
src/examples/example_01.rst	[moved from src/example_01.rst with 95% similarity]	patch \| blob \| history
src/examples/example_02.rst	[moved from src/example_02.rst with 97% similarity]	patch \| blob \| history
src/examples/example_03.rst	[moved from src/example_03.rst with 97% similarity]	patch \| blob \| history
src/examples/example_04.rst	[moved from src/example_04.rst with 90% similarity]	patch \| blob \| history
src/examples/example_05.rst	[moved from src/example_05.rst with 92% similarity]	patch \| blob \| history
src/examples/example_06.rst	[moved from src/example_06.rst with 92% similarity]	patch \| blob \| history
src/examples/example_07.rst	[moved from src/example_07.rst with 98% similarity]	patch \| blob \| history
src/examples/examples.rst	[new file with mode: 0644]	patch \| blob
src/examples/home.rst	[new file with mode: 0644]	patch \| blob
src/home.rst	[new file with mode: 0644]	patch \| blob
src/index.rst		patch \| blob \| history
src/install/install.rst	[moved from src/install.rst with 96% similarity]	patch \| blob \| history
src/manual/features/features.rst	[new file with mode: 0644]	patch \| blob
src/manual/home.rst	[new file with mode: 0644]	patch \| blob
src/manual/manual.rst	[new file with mode: 0644]	patch \| blob
src/manual/options/options_list.rst	[moved from src/options/options_list.rst with 100% similarity]	patch \| blob \| history
src/manual/syntax/syntax.rst	[moved from src/manual.rst with 63% similarity]	patch \| blob \| history
src/manual/warnings/warnings.rst	[new file with mode: 0644]	patch \| blob
src/manual/warnings/warnings_list.rst	[moved from src/options/warnings_list.rst with 100% similarity]	patch \| blob \| history
src/news.rst	[deleted file]	patch \| blob \| history
src/news/changelog.rst	[moved from src/changelog.rst with 99% similarity]	patch \| blob \| history
src/news/home.rst	[new file with mode: 0644]	patch \| blob
src/news/news.rst	[new file with mode: 0644]	patch \| blob