From: Ulya Trofimovich Date: Fri, 11 Aug 2017 22:52:48 +0000 (+0100) Subject: Release notes for 1.0. X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=ef17908a05f017f345829344bc97fde6679e9813;p=re2c Release notes for 1.0. --- diff --git a/src/feed/feed/atom.xml b/src/feed/feed/atom.xml index 91882205..1dc2b2a4 100644 --- a/src/feed/feed/atom.xml +++ b/src/feed/feed/atom.xml @@ -86,4 +86,20 @@ + + Release 1.0 + + http://re2c.org/news/release_notes/1_0.html + 2017-08-11T00:00:00Z + +
+ Release re2c-1.0! (release notes). +
+
+ + Ulya Trofimovich + skvadrik@gmail.com + +
+ diff --git a/src/install/install.rst b/src/install/install.rst index 3e7f151b..b414b58c 100644 --- a/src/install/install.rst +++ b/src/install/install.rst @@ -11,7 +11,7 @@ Download Tarballs -------- -* `re2c-1.0.tar.gz `_ +* `re2c-1.0.1.tar.gz `_ * `re2c-0.16.tar.gz `_ * `re2c-0.15.3.tar.gz `_ * `re2c-0.14.3.tar.gz `_ diff --git a/src/news/changelog/1_0.rst b/src/news/changelog/1_0.rst index 6f8295e7..e894a9d6 100644 --- a/src/news/changelog/1_0.rst +++ b/src/news/changelog/1_0.rst @@ -2,98 +2,14 @@ 1.0x ---- +1.0.1 (2017-08-11) +~~~~~~~~~~~~~~~~~~ + +- Fixed bug #193 1.0 build failure on macOS: error: calling a private constructor of class 're2c::Rule' +- Added paper "Tagged Deterministic Finite Automata with Lookahead" to distribution 1.0 (2017-08-11) ~~~~~~~~~~~~~~~~ -- Added options: - + ``-P --posix-captures`` (POSIX-compliant capturing groups) - + ``-T --tags`` (standalone tags with leftmost greedy disambiguation) - + ``--no-lookahead`` - + ``--no-optimize-tags`` - + ``--eager-skip`` - + ``--dump-nfa`` - + ``--dump-dfa-raw`` - + ``--dump-dfa-det`` - + ``--dump-dfa-tagopt`` - + ``--dump-dfa-min`` - + ``--dump-adfa`` -- Added new syntax: - + ``@`` - + ``#`` -- Added new directives: - + ``/*!stags:re2c ... */`` - + ``/*!mtags:re2c ... */`` - + ``/*!maxnmatch:re2c ... */`` -- Added new API: - + ``YYSTAGN (t)`` - + ``YYSTAGP (t)`` - + ``YYMTAGN (t)`` - + ``YYMTAGP (t)`` - + ``YYRESTORETAG (t)`` - + ``YYMAXNMATCH`` - + ``yynmatch`` - + ``yypmatch`` -- Added inplace confgurations: - + ``re2c:define:YYSTAGN`` - + ``re2c:define:YYSTAGP`` - + ``re2c:define:YYMTAGN`` - + ``re2c:define:YYMTAGP`` - + ``re2c:define:YYRESTORETAG`` - + ``re2c:flags:8`` or ``re2c:flags:utf-8```` - + ``re2c:flags:b`` or ``re2c:flags:bit-vectors`` - + ``re2c:flags:case-insensitive`` - + ``re2c:flags:case-inverted`` - + ``re2c:flags:d`` or ``re2c:flags:debug-output`` - + ``re2c:flags:dfa-minimization`` - + ``re2c:flags:eager-skip`` - + ``re2c:flags:e`` or ``re2c:flags:ecb`` - + ``re2c:flags:empty-class`` - + ``re2c:flags:encoding-policy`` - + ``re2c:flags:g`` or ``re2c:flags:computed-gotos`` - + ``re2c:flags:i`` or ``re2c:flags:no-debug-info`` - + ``re2c:flags:input`` - + ``re2c:flags:lookahead`` - + ``re2c:flags:optimize-tags`` - + ``re2c:flags:P`` or ``re2c:flags:posix-captures`` - + ``re2c:flags:s`` or ``re2c:flags:nested-ifs`` - + ``re2c:flags:T`` or ``re2c:flags:tags`` - + ``re2c:flags:u`` or ``re2c:flags:unicode`` - + ``re2c:flags:w`` or ``re2c:flags:wide-chars`` - + ``re2c:flags:x`` or ``re2c:flags:utf-16`` - + ``re2c:tags:expression`` - + ``re2c:tags:prefix`` -- Added warning ``-Wnondeterministic-tags`` -- Added fuzz-testing scripts -- Added paper "Tagged Deterministic Finite Automata with Lookahead" -- Fixed bugs: - + #121 "trailing contexts are fundamentally broken" - + #135 "In installation ``make check`` give syntax error" - + #137 "run_tests.sh fail when running configure script with absolute path" - + #138 "website improvement" - + #141 "Tests under Windows" - + #142 "segvault with null terminated input" - + #145 "Values for enum YYCONDTYPE are not generated when default rules with conditions are used" - + #147 "Please add symbol name to "can't find symbol" error message" - + #152 "Line number in #line directive after enum YYCONDTYPE is 0-based" - + #156 "Build with Visual Studio 14 2015: symbol name conflict" - + #158 "Inconsistent forward declaration of struct/class vs definition" - + #160 "Open text files with "wb" causes issues on Windows" - + #162 "Reading files with "rb" causes issues in Windows" - + #165 "Trailing context consumed if initial expression matches it" - + #176 "re2c help message is too wide for most terminals" - + #184 "Small documentation issue" - + #186 "Difference operator sometimes doesn't work with utf-8" -- Merged pull requests: - + #131 "Use bash-specific ``[[`` builtin" - + #136 "Added basic support for travis-ci.org integration" - + #171 "Typo fix" - + #172 "Grammar fixes in the docs" - + #173 "Grammar fixes in the manpage" - + #174 "more documentation fixes" - + #175 "more manpage fixes" - + #177 "sync --help output w/ manpage" - + #178 "Moves rts used in the manpage to master" - + #179 "compose manpage out of rsts from gh-pages-gen" - + #189 "Typo fix and small grammatical change" - + #191 "Makefile.am: create target directory before writing into it" +.. include:: 1_0_list.rst + diff --git a/src/news/changelog/1_0_list.rst b/src/news/changelog/1_0_list.rst new file mode 100644 index 00000000..4c82162f --- /dev/null +++ b/src/news/changelog/1_0_list.rst @@ -0,0 +1,91 @@ +- Added options: + + ``-P --posix-captures`` (POSIX-compliant capturing groups) + + ``-T --tags`` (standalone tags with leftmost greedy disambiguation) + + ``--no-lookahead`` + + ``--no-optimize-tags`` + + ``--eager-skip`` + + ``--dump-nfa`` + + ``--dump-dfa-raw`` + + ``--dump-dfa-det`` + + ``--dump-dfa-tagopt`` + + ``--dump-dfa-min`` + + ``--dump-adfa`` +- Added new syntax: + + ``@`` + + ``#`` +- Added new directives: + + ``/*!stags:re2c ... */`` + + ``/*!mtags:re2c ... */`` + + ``/*!maxnmatch:re2c ... */`` +- Added new API: + + ``YYSTAGN (t)`` + + ``YYSTAGP (t)`` + + ``YYMTAGN (t)`` + + ``YYMTAGP (t)`` + + ``YYRESTORETAG (t)`` + + ``YYMAXNMATCH`` + + ``yynmatch`` + + ``yypmatch`` +- Added inplace confgurations: + + ``re2c:define:YYSTAGN`` + + ``re2c:define:YYSTAGP`` + + ``re2c:define:YYMTAGN`` + + ``re2c:define:YYMTAGP`` + + ``re2c:define:YYRESTORETAG`` + + ``re2c:flags:8`` or ``re2c:flags:utf-8```` + + ``re2c:flags:b`` or ``re2c:flags:bit-vectors`` + + ``re2c:flags:case-insensitive`` + + ``re2c:flags:case-inverted`` + + ``re2c:flags:d`` or ``re2c:flags:debug-output`` + + ``re2c:flags:dfa-minimization`` + + ``re2c:flags:eager-skip`` + + ``re2c:flags:e`` or ``re2c:flags:ecb`` + + ``re2c:flags:empty-class`` + + ``re2c:flags:encoding-policy`` + + ``re2c:flags:g`` or ``re2c:flags:computed-gotos`` + + ``re2c:flags:i`` or ``re2c:flags:no-debug-info`` + + ``re2c:flags:input`` + + ``re2c:flags:lookahead`` + + ``re2c:flags:optimize-tags`` + + ``re2c:flags:P`` or ``re2c:flags:posix-captures`` + + ``re2c:flags:s`` or ``re2c:flags:nested-ifs`` + + ``re2c:flags:T`` or ``re2c:flags:tags`` + + ``re2c:flags:u`` or ``re2c:flags:unicode`` + + ``re2c:flags:w`` or ``re2c:flags:wide-chars`` + + ``re2c:flags:x`` or ``re2c:flags:utf-16`` + + ``re2c:tags:expression`` + + ``re2c:tags:prefix`` +- Added warning ``-Wnondeterministic-tags`` +- Added fuzz-testing scripts +- Added paper "Tagged Deterministic Finite Automata with Lookahead" +- Fixed bugs: + + #121 "trailing contexts are fundamentally broken" + + #135 "In installation ``make check`` give syntax error" + + #137 "run_tests.sh fail when running configure script with absolute path" + + #138 "website improvement" + + #141 "Tests under Windows" + + #142 "segvault with null terminated input" + + #145 "Values for enum YYCONDTYPE are not generated when default rules with conditions are used" + + #147 "Please add symbol name to "can't find symbol" error message" + + #152 "Line number in #line directive after enum YYCONDTYPE is 0-based" + + #156 "Build with Visual Studio 14 2015: symbol name conflict" + + #158 "Inconsistent forward declaration of struct/class vs definition" + + #160 "Open text files with "wb" causes issues on Windows" + + #162 "Reading files with "rb" causes issues in Windows" + + #165 "Trailing context consumed if initial expression matches it" + + #176 "re2c help message is too wide for most terminals" + + #184 "Small documentation issue" + + #186 "Difference operator sometimes doesn't work with utf-8" +- Merged pull requests: + + #131 "Use bash-specific ``[[`` builtin" + + #136 "Added basic support for travis-ci.org integration" + + #171 "Typo fix" + + #172 "Grammar fixes in the docs" + + #173 "Grammar fixes in the manpage" + + #174 "more documentation fixes" + + #175 "more manpage fixes" + + #177 "sync --help output w/ manpage" + + #178 "Moves rts used in the manpage to master" + + #179 "compose manpage out of rsts from gh-pages-gen" + + #189 "Typo fix and small grammatical change" + + #191 "Makefile.am: create target directory before writing into it" diff --git a/src/news/news.rst b/src/news/news.rst index a562358e..70eec632 100644 --- a/src/news/news.rst +++ b/src/news/news.rst @@ -5,6 +5,7 @@ News .. toctree:: :maxdepth: 1 + Release 1.0 Release 0.16 Release 0.15.3 Release 0.15.2 diff --git a/src/news/release_notes/1_0.rst b/src/news/release_notes/1_0.rst new file mode 100644 index 00000000..c8d61e7c --- /dev/null +++ b/src/news/release_notes/1_0.rst @@ -0,0 +1,97 @@ +=========== +Release 1.0 +=========== + +*by Ulya Trofimovich* + +This release is an important milestone in the history of re2c: it adds *submatch extraction*. +It is a long-awaited feature; even as Peter Bumbulis released the first version of re2c back in 1993, +he mentioned submatch extraction in his paper together with Donald D. Cowan +:download:`"RE2C: a more versatile scanner generator" <../../about/1994_bumbulis_cowan_re2c_a_more_versatile_scanner_generator.pdf>` +as a useful feature that soon should be added to re2c. +It seemed back then that submatch extraction is relatively easy to implement; +like many other useful extensions, it was deemed to be no more than syntactic sugar on top of canonical regular expressions. + +As it turned out, this is not so: +the problem of *parsing* with regular expressions is inherently more complex than *recognition*, +and submatch extraction cannot be implemented using traditional DFA-based approach. +Many authors studied this subject and developed algorithms suitable for their particular settings and problem domains. +Some of them used NFA-based approach, +others suggested algorithms based on multiple DFA or multiple passes over a single DFA. +For re2c most of these algorithms are too heavyweight: +they require complication of the underlying computational model and incur overhead even in simple cases. + +However, one algorithm is outstanding in this respect: +it is the algorithm invented by Ville Laurikari in 2000 and described in his paper +`"NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions" +`_. +Laurikari algorithm is based on a single DFA extended with a fixed number of *registers*; it makes one pass over the input string, +works in linear time and the required space does not depend on the input length. +But what is most important, the added complexity depends on submatch detalization: +on submatch-free regular expressions Laurikari automaton reduces to a simple DFA. + +As I started working on the implementation of submatch extraction in re2c (in early 2016), +at first I didn't appreciate the elegance of Laurikari approach; it still seemed overly complicated for re2c. +I studied many papers and tried to invent an algorithm that would support only *partial* submatch extraction, +but the supported cases would be handled very efficiently. +As I developed the algorithm, I came to realize that it was in fact but a special case of Laurikari algorithm. +Naturally, I turned back to the original paper and started to adapt my implementation. +To my surprise, I discovered a strange deficiency in Laurikari algorithm compared to my early implementation: +when recording submatches, Laurikari automata totally ignore the lookahead symbol +and perform a lot of redundant operations that could be easily avoided. +It was only logical to fix Laurikari algorithm; +but as I worked on it, a couple of other issues showed up, +and so it happened that I ended up writing a whole paper on the subject: +:download:`"Tagged Deterministic Finite Automata with Lookahead" <../../about/2017_trofimovich_tagged_deterministic_finite_automata_with_lookahead.pdf>`. + +Submatch extraction in re2c comes in two flavors: +POSIX-compliant capturing groups (with true POSIX semantics) +and standalone *tags* that use leftmost greedy disambiguation. +Both features require a number of new API primitives and configurations; +and they are barely documented yet (though I managed to add a `section <../../manual/features/submatch/submatch.html>`_ in the manual). +The paper cited above has some interesting benchmark results; +articles and examples will follow in the nearest future. +As usually, you may expect a number of small bug-fix releases in the 1.0.x series. + +The following people contributed to this release: + +* Petr Skocik tidied up re2c documentation: fixed ubiquitous spelling errors + and established synchronization system between the online documentation, + manpage and help message displayed by re2c with ``--help`` option. + +* Sergei Trofimovich wrote a *fuzzer* that generates random regular expressions + and checks them on input data generated with re2c ``--skeleton`` option. + This fuzzer is most helpful; it has already found tens of bugs in the early implementation of submatch extraction. + Sergei also helped me with testing re2c on numerous platforms and computer architectures. + +* Abs62, + asmwarrior, + Ben Smith, + CRCinAU, + Derick Rethans, + Dimitri John Ledkov, + Eldar Zakirov, + jcfp, + Jean-Claude Wippler, + Jeff Trull, + Jérôme Dumesnil, + Jesse Buesking, + Julian Andres Klode, + Paulo Custodio, + Perry E. Metzger, + philippschaefer, + Ross Burton, + Rui Maciel, + Ryan Mast, + Samuel006, + sirzooro, + Tim Kelly + --- all these people contributed bug reports, patches and feedback. Thank you! + +* Many thanks to the early user of the new features João Paredes for his patience and encouraging comments on the subject. + +Below is a coarse list of all changes (a raw excerpt from `changelog <../changelog/changelog.html>`_): + +.. include:: ../changelog/1_0_list.rst + +