Ulya Trofimovich [Tue, 13 Nov 2018 23:42:11 +0000 (23:42 +0000)]
Fixed a couple of lexer/parser errors in flex mode (-F option).
This fixes bug #229: re2c option -F (flex syntax) broken,
reported by Robert van Engelen.
A well-formed example that caused syntax error (flex-style raw literal
followed by one or more spaces and a curly brace):
/*!re2c
a {}
*/
The faulty behaviour goes back as far as re2c-0.13.6 (and supposedly
before that): in flex mode, raw literal may occur in various contexts
both as a regexp (string literal) and an identifier (named definition,
condiiton name). RE2C uses lookahead to infer the context and determine
the appropriate type of lexer token, but it missed some cases.
The fix has two sides. First, if reduces the number of contexts where
the general lexer may encounter raw literal (by using specialized lexers
for condition lists <x,y,...,z> and condition goto => and :=>). Second,
it fixes the lookahead regexps used for context inference.
Also added a bunch of tests (generated by a script).
Ulya Trofimovich [Mon, 29 Oct 2018 23:00:50 +0000 (23:00 +0000)]
Fixed out of bounds read in lexer.
The error was caused by assuming that a sequence of zeroes (used for
padding in YYFILL) cannot form a valid lexeme suffix. This is not the
case with strings, as they may contain arbitrary characters. The fix
is to manually loop over string characters in lexer, stopping at each
zero to check if it's the end of input.
Found by american fuzzy lop (thanks to Henri Salo).
src/dfa/dfa.h: simplify constructor to avoid g++-3.4 bug
On g++-3.4.6 re2c tests SIGSEGVed due to use of uninitialized data:
```
$ valgrind ... ./re2c -8 a.re -o foo.c
Conditional jump or move depends on uninitialised value(s)
at 0x432F23: re2c::tcpool_t::insert(re2c::tcmd_t const*) (tcmd.cc:202)
by 0x421FDA: re2c::freeze_tags(re2c::dfa_t&) (freeze.cc:45)
by 0x43A7FF: re2c::ast_to_dfa(re2c::spec_t const&, re2c::Output&) (compile.cc:88)
by 0x43B052: push_back (stl_iterator.h:614)
by 0x43B052: re2c::compile(re2c::Scanner&, re2c::Output&, re2c::Opt&) (???:0)
by 0x449D29: main (main.cc:31)
Uninitialised value was created by a heap allocation
at 0x403252F: operator new[](unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x42FC9E: re2c::find_state(re2c::determ_context_t&) (dfa.h:37)
by 0x429BD9: re2c::dfa_t::dfa_t(re2c::nfa_t const&, re2c::opt_t const*, std::string const&, re2c::Warn&) (determinization.cc:56)
by 0x43A76C: re2c::ast_to_dfa(re2c::spec_t const&, re2c::Output&) (compile.cc:69)
by 0x43B052: push_back (stl_iterator.h:614)
by 0x43B052: re2c::compile(re2c::Scanner&, re2c::Output&, re2c::Opt&) (???:0)
by 0x449D29: main (main.cc:31)
```
the problem here arose in default array constructor:
```c++
explicit dfa_state_t(size_t nchars)
: // ...
, tcmd(new tcmd_t*[nchars + 2]()) // +2 for final and fallback epsilon-transitions
// ...
```
g++-3.4.6 can't figure out zero-initialization rule (likely a gcc bug).
The change uses non-initializing new[] and memset() instead.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Fixed bug #215 "A memory read overrun issue in s_to_n32_unsafe.cc".
The error was in the code of the test itself: the special case of zero
wasn't handled correctrly by the function that prepares input data for
the test. As a result, zero-length input string was passed to the test,
which is unexpected: the tested function is an "unsafe" one (as the
name suggests) and is meant to be used on an already validated input.
vernum: move version-string-to-vernum converter to a separate helper
No functional change. While at it added tests
to cover past failures:
- "1.1": https://github.com/skvadrik/re2c/issues/211
- "0.14": https://sourceforge.net/p/re2c/bugs/55/
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Don't move the closing tag of POSIX capture group out of the enclosing iteration.
RE2C used to perform the following optimization: when a POSIX capture is
under iteration, we only need to get tag values of the last iteration
(according to the POSIX standard). Therefore we can move the closing tag
out of loop.
This commit removes this optimization (as part of the effort to switch
from Kuklewicz POSIX disambiguation algorthm to Okui algorithm).
In other words, for RE (x)* re2c used to generate this "optimized" IRE:
1 (3 x)* 4 2
and now it generates the "canonical" IRE:
1 (3 x 4)* 2
Updated tests for '--posix-captures' that have been affected by the change.
Updated GOR1 (fixed the core algorithm to avoid useless re-scans of the same state).
Also, depth-first traversal was done in a slightly incorrect way:
we checked outgoing nodes for admissibility and pushed the corresponding
child states on stack all at once. This is not the same as checking
the first child and recursing into it, then checking the next child,
..., and so on (because we might discover the second child while exploring
the first, and admissiblitiy check for the second child *after* that
might yield false, while *before* exploring the first child it yielded
true).
Pick the shortest available path suffix when generating skeleton path cover.
This also fixes a error in the generation process: sometimes in case
of loops the current node's suffix was set before all of its children
were processed.
Updated test results (in some cases .input files became larger because
of the above fix, in some cases they became smaller because we now pick
the shortest suffix).
Added new test; this one was found by slyfox's fuzzer and revealed the
above bug.
Changed the name of a local variable in the test to avoid collision with skeleton names.
Before tags were added to re2c, skeleton programs only used a limited
number of predefined names, such as 'yych', 'yystate', etc. With tags,
however, this is no longer true as tags may have any names. So now we need
to be more cautios when picking names for sekleton variables.
This patch is only a workaround to make all tests pass; the real solution
requires inventing a good naming scheme for skeleton programs and
regenerating all skeleton test results.
Fixed error in calculation of maximal skeleton path length.
The error was found by slyfox's fuzzer (a randomly-generated skeleton test).
The bug in the code was, apparently, too early modification of the state's
estimated maximal distance to the end states: the distance was set before
all of the state's children were processed, which resulted in aborting the
accumulation of distance from the remaining children, and, as a consequence,
shorter than necessary max distance for the root itself.