re2c used a complex and slow algorithm to split charset into
disjoint character ranges. This commit replaces old algorithm with
new (much simpler and quicker).
re2c test suite now runs 2x faster due to speedup in Unicode tests.
Fixed '#include's (appied most of 'include-what-you-use' suggestions).
The worst dependency which 'include-what-you-use' fails to see
(and rightly so) is 'src/parse/lex.re' -> 'src/parse/parser.h'.
This dependency is caused by '#include "y.tab.h"' in 'src/parse/lex.re'.
Another ubiquitos issue is 'src/util/c99_stdint.h' ('include-what-you-use'
suggests to substitute it with '<stdint.h>').
And a couple of other dependencies that 'include-what-you-use' fails to see.
Ulya Trofimovich [Mon, 30 Nov 2015 22:50:23 +0000 (22:50 +0000)]
Renamed tests that contained uppercase letters in file extension.
We use file extensions to encode re2c options.
Some (short) options are uppercase letters: e.g. '-D', '-F', '-S'.
There also short options for the same lowercase letters: '-d', '-f', '-s'.
This can cause filename collisions on platforms with case-insensitive
file extensions (e.g. Windows and OS X).
See bud #125: "[OS X] git reports changes not staged for commit
in newly cloned repository".
Fix: use long versions for options that uppercase options.
Disallowed uppercase options in 'run_tests.sh'.
The problem with pattern ordering first emerged on FreeBSD-10.2
(I was able to reproduce it with 'CXXFLAGS=-fsanitize=address').
Some tests failed because patterns reported by '-Wundefined-control-flow'
were sorted in different order than expected. This is because
patterns ordering was inconsistent: patterns were compared by length,
(it doesn't work for patterns of equal length). Now first ordering
criterion is length, and second criterion is lexicographical order.
This commit reduces the amount of memory consumed by '-Wundefined-control-flow':
re2c no longer allocates vectors on stack while deep-first-searching skeleton.
This commit also reduces the limit of memory for '-Wundefined-control-flow'
(64Mb edges -> 1Kb edges). Real-world programs rarely need that much.
The limit was so high to acommodate some few artificial tests (with lower
limit these tests cannot find shortest patterns).
This commit also removes the upper bound for the number of faulty patterns
reported by '-Wundefined-control-flow'. This bound was needed by the
artificial tests mentioned above: they produce lots of patterns.
Now these tests are limited with 1Kb of edges anyway.
Note that 1Kb limit is checked after each new pattern is added, so that
at least one pattern will fit in (even if it takes more than 1Kb).
Ulya Trofimovich [Tue, 24 Nov 2015 17:51:25 +0000 (17:51 +0000)]
Skeleton data generation: suffix should be multipath as well as prefix.
Prefix of current path under construction is a multipath, because prefix
arcs have not been covered yet. Suffix can be a simple path (that is, a
multipath of width 1), because all alternative suffix arcs have already
been covered.
prefix suffix
_________ _________
... \ /
--------- o
_________/
But nothing prevents us from alternating suffix arcs also, as long as
suffix remains a single multipath:
The resulting path's width is the maximum of prefix ans suffix width
(hence the growth in size of those tests in which suffix is wider
than prefix), but it only makes a small difference. And the generated
paths are more "variable".
Ulya Trofimovich [Tue, 24 Nov 2015 16:36:14 +0000 (16:36 +0000)]
Skeleton data generation: cover all edges in 1-byte range (not only range bounds).
If code units occupy 1 byte, then the generated path cover covers
*all* edges in the original DFA. If the size of code unit exceeds 1 byte,
then only some ~0x100 (or less) range values will be chosen
(including range bounds).
Ulya Trofimovich [Sun, 29 Nov 2015 11:38:04 +0000 (11:38 +0000)]
Removed obsolete '__STDC_LIMIT_MACROS' and '__STDC_CONSTANT_MACROS' defines.
These defines were necessary to enable numeric limits definitions
(such as 'UINT32_MAX') in our local version of 'stdint.h' (which is
used on platforms that don't have system header 'stdint.h').
Ulya Trofimovich [Sun, 29 Nov 2015 11:24:48 +0000 (11:24 +0000)]
Fixed [-Wconversion] warning.
Warning was introduced in commit b237daed2095c1e138761fb94a01d53ba2c80c95:
compiler fails to recognise (or deliberately choses not to recognize)
'std::numeric_limits<...>::max()' as a special constant.
Ulya Trofimovich [Sat, 28 Nov 2015 17:31:56 +0000 (17:31 +0000)]
Fixed crashes of 'ostream& operator<< (ostream& os, const char* s)' on NULL.
Crashes observed on platforms OS X (clang-7.0.0) and FreeBSD-10.2 (clang-3.4).
First reported in bug #122 "clang does not compile re2c 0.15.x".
What caused NULL passed to 'operator <<': re2c always generates content of
header file (regardless of '-t --type-header' option), but the content is
dumped to file (and header filename initialized to non-NULL) only if the
option was enabled.
Fix: always initialize header filename to non-NULL string.
Ulya Trofimovich [Sat, 21 Nov 2015 20:03:10 +0000 (20:03 +0000)]
Merge branch 'master' into simplified_codegen.
* master:
Updated version to 0.14.4.dev
Release 0.14.3.
Added simple test for yacc-style brackets (see patch #27)
Fixed '#27 re2c crashes reading files containing %{ %}' (patch by Rui)
Makefile.am: dropped distfiles for MSVC (they are broken anyway)
Added full another test for bug #57.
Updated version to 0.14.3.dev
Release 0.14.2.
Fixed bug #57: Wrong result only if another rule is present
Updated version to 0.14.2.dev
Release 0.14.1.
Pad version with '0' instead of nulls
Ulya Trofimovich [Mon, 16 Nov 2015 14:10:49 +0000 (14:10 +0000)]
Skeleton: disregard default rule when estimating maximum rule size (in bytes).
Default rule '*' (not to be confused with 'none' rule) used to have
normal number just like other rules. Now that re2c has to distinguish
default rule fro other rules (because of [-Wunreachable-rules]),
it reserves a special number (UINT32_MAX - 1) for it.
# the number of missing changed filenames
# equals to the number of added changed filenames
[ $diff1_fname -ne $diff2_fname ] && echo "FAIL4: $f1" && exit 1
done
Ulya Trofimovich [Tue, 13 Oct 2015 13:26:33 +0000 (14:26 +0100)]
run_tests.sh: added '--skeleton' option.
With this option script runs re2c with '--skeleton' and
'-Werror-undefined-control-flow' and instead of comparing results with
reference test results, it compiles the generated skeleton programs and
runs them. If C compiler or binary return nonzero error status, script
reports an error. Note that cases when re2c failed to generate code are
not considered errors (re2c has lots of test cases for its errors).
Ulya Trofimovich [Mon, 12 Oct 2015 13:12:11 +0000 (14:12 +0100)]
Factored out some common lexing pieces into separate routines.
re2c lacks submatch extraction; it would be much more convenient
to memorize input positions for some parts of regular expressions
than break each regexp in the middle and move parts to separate blocks.
Submatch extraction is dificult to implement in general, but supporting
submatch in some simple cases (like the case where trailing context is
allowed) would be not so difficult and most helpful.
Warns about unreachable rules:
- rules that are shadowed by other rules, e.g. rule '[a]' is shadowed by
'[a] [^]'
- infinite rules that consume infinitely many characters and fail on
YYFILL, e.g. '[^]*'
- rules that contain never-matching link, e.g. '[]' with option
'--empty-class match-none'
default rule '*' should not be reported
Merge default rules on the fly, assign them the same lowest priority.
re2c used to postpone merging default rules because rank counter could
only assign consequtive ranks to rules, and default rules must have
the lowest priority. Now rank counter has been modified to return
special value as defult rule rank.
Autogenerated configuration tests: added default rule to each test.
It's not a bunch of unnecessary warnings I want to avoid, it's a bunch of
unnecessary runtime failures in programs generated with '--skeleton'
(failures caused by undefined control flow; re2c recogizes such cases
and the generated program reports a warning before failing).
Trialing contexts are currently broken (overlapping trailing contexts
cannot be tracked with a single 'YYCTXMARKER'). For now, re2c with
'--skeleton' mimics this incorrect behaviour: information about context
is lost by the time DFA is constructed, so skeleton has no way to
figure out the right order of things.
Prior to this commit backup of trailing context position was done
before advancing input position and re2c either had to emit
YYCTXMARKER = YYCURSOR + 1;
(with default input API), or
YYRESTORECTX ();
YYSKIP ();
(with custom input API).
The problem is that sometimes initial state doesn't sdvance input position
at all. Now re2c emits context backup after advancing input position and it
no longer needs '+1' or 'YYSKIP' hacks. It always backups the correct position.
'-r' is different from normal mode in two aspects:
- single DFA may be used multiple times (unchanged, we only
need a single copy for skeleton)
- DFA may be generated but not used at all
The changes should be backwards compatible (meaning that old code that
compiled should still compile), but it may add empty statements or statements
with no effect for some configurations, e.g.:
YYSETCONDTITION(0);(0);
These changes were necessary to unify re2c behaviour, remove counter-intuitive
cases and make it possible to write comprehensible option descriptions.
In short, the changes are:
- 'naked' triggers generation of argument-in-braces and semicolon;
- 'parameter' triggers generation of argument-in-braces (when applicable,
'naked' has priority over 'parameter');
- argument templates ('@cond', '@state', '@len') don't force other
configurations, they also don't influence on argument-in-braces;
Handle all inplace configurations in a uniform way.
This commit removes check (and error) for overwritten configurations
(like setting 're2c:define:YYCYRSOR' twice in the same block).
This check was in principle useful, but it was applied to somehow
randomly chosen set of parameters. If in future we'll feel a need
for such check, it should respect all options equally and report
warning rather than error.
Omit usseless 'yyaccept' variable in '--skeleton' programs.
Normally re2c generates single 'yyaccept' variable for all conditions.
With '--skeleton' re2c handles conditions separately, so each condition
needs (or needs not) its own 'yyaccept'.
Prior to this commit re2c used the same criterion to determine if
'yyaccept' is needed with '--skeleton' as it uses generally: whether
'yyaccept' was used in any of conditions. Now re2c looks if 'yyaccept'
was used with this particular condition.
Changed '-Wcondition-order' to warn even if 'YYSETCONDITION' is used.
Tests 'condtype_yysetcondition.c{s,g}.re' show the reason why I changed
how '-Wcondition-order' works in presence of 'YYSETCONDITION' calls:
programs generated from these tests work differently depending on
condition numbering. Explicit use of condition names cannot guarantee
that these explicit names were generated by re2c (and not hardcoded as
in these examples).