- draw a single arrow for all transitions between two given states
- label all arrows with corresponding character ranges in square
brackests (no "default" label, single characters also appear in
square brackets)
- .dot output became much smaller, thus pictures are drawn faster
and generally look better: e.g. it takes ~10x less time to draw
PHP lexer and the resulting graph is shaped better.
- Use 'open' function instead of checking return status
(one may forget to check return status, but if one forgets to
open file, the error will be obvious)
- Introduced separate file type for header.
Header is much simpler than output, it doesn't need delayed
code fragments and can be generated in destructor.
Ulya Trofimovich [Thu, 26 Feb 2015 10:48:41 +0000 (10:48 +0000)]
Removed unused enum members.
I was unsure if delayed generation was also needed for genCondGoto
and genCondTable; so I kept those enum members as a reminder. Now
I know that all conditions are known by the moment re2c block is
parsed and code generation starts.
Ulya Trofimovich [Wed, 25 Feb 2015 23:13:55 +0000 (23:13 +0000)]
One pass.
Second pass was used because some information (which influences
early parts of the generated code, e.g enum with condition names
or YYMAXFILL definition) becomes available only at the end of first
pass.
I isolate all (I hope so) these things and generate stubs for them,
which are filled later. I restructured output as follows: the whole
output consists of source and header, each of them is a list of
blocks (corresponding to re2c blocks in source file), each block
is a list of code fragments (which can be either regular strings
with code or stubs that will be filled later).
Ulya Trofimovich [Mon, 23 Feb 2015 13:30:42 +0000 (13:30 +0000)]
Added tests from PHP repository: https://github.com/php/php-src
Test results are almost identical to re2c-0.13.6
(there're some few changes, I believe they are due to
commit 255262b02928d3f38c00dd91952e3253c11c78f1 and
completely harmless).
Ulya Trofimovich [Sun, 18 Jan 2015 14:12:16 +0000 (14:12 +0000)]
Replaced "YYHAS (n)" with "YYEOI (n)".
The actual meaning of this primitive is to check if
there's not enough characters left in the input stream,
e.g. "(YYLIMIT - YYCURSOR) < n" or whatever else.
Ulya Trofimovich [Sun, 18 Jan 2015 13:47:51 +0000 (13:47 +0000)]
Added tests for "--input custom".
This implied modifying runtests.sh, as it couldn't handle
test names of the form "basename.--long-switch.re":
it inserted '-' in front of all switches.
Ulya Trofimovich [Tue, 13 Jan 2015 15:30:01 +0000 (15:30 +0000)]
A little cleanup of new input API:
- moved enum and pretty-printing functions to a class
- renamed files 'input.{h,cc}' to 'input_api.{h,cc}'
- for "--input istream": moved input position increment to 'stmt_restorectx'
- main.cc: removed useless include
Double-escape special characters for dot.
Example:
17 -> 18 [label="\n"]
results in an "unlabeled" arrow in the rendered graph, but
17 -> 18 [label="\\n"]
is ok.
Ulya Trofimovich [Fri, 22 Aug 2014 20:15:11 +0000 (23:15 +0300)]
Alternation of 'RegExp's should preserve 'ins_access' attribute.
When one builds 'AltOp' from two 'RegExp's, one sometimes has to
break these 'RegExp's in pieces in order to merge their common prefix.
In such cases, if one of the original 'RegExp's has 'ins_access'
set to 'PRIVATE', it is lost (defaults to 'SHARED') after alternation.
This commit fixes Gentoo bug https://bugs.gentoo.org/show_bug.cgi?id=518904.
Fixed compile error for freebsd5 (found by Sergei Trofimovich).
Sample error:
parser.y: In function `void re2c::parse(re2c::Scanner&, std::ostream&, std::ostream*)':
parser.y:564: error: `yyparse' undeclared (first use this function)
When re2c encounters invalis code point (e.g., surrogate in Unicode),
it acts with regard to current encoding policy:
'fail' - fail with error;
'substitute' - silently substitute offending code point with
error code point;
'ignore' - ignore offending code point, consider it valid.
Fail, if someone tries to set non-ASCII encoding
when another non-ASCII encoding is already set.
If encoding has been set successfully, it is
guaranteed to be valid.