Ulya Trofimovich [Mon, 11 May 2015 09:41:01 +0000 (10:41 +0100)]
Don't rebuild docs by default.
If we configure with "--enable-docs", then docs get rebuild every
time make is executed, and we'll have to commit these insignificant
changes every time, which is bad.
Ulya Trofimovich [Fri, 8 May 2015 22:09:00 +0000 (23:09 +0100)]
Updated build system.
Major changes:
- Moved all re2c source files to a separate directory. Top source
directory should only contain autotools source files and a few other
files like README.
- Enabled out-of-source builds (and wrote a simple script build.sh
that makes out-of-source build).
- Improved portable variant of <stdint.h> header (src/c99_stdint.h):
now it relies only on some few defines in configure-generated
src/config.h (instead of checking for MSVC version and relying
on MSVC-defined stuff). Implementation follows C99 standard closely.
- Removed all windows-related build stuff. It was no use keeping it:
it's been broken for a long time and I can't maintain it.
- Removed all RPM stuff: distro-maintainers use their own hacks anyway.
Makefile.am is definetely the wrong place to keep such things.
A separete script and .spec file is a better idea, but again, nobody
uses it.
- added make target 'bootstrap' to make mainteiners' job easier.
- Merged lessons and examples into one.
- Updated README and doc/index.html.
- Run tests in parallel by default.
Changes concerning particular files:
- configure.ac:
- removed autoconf version check: developers will need the right
version anyway (otherwize autoconf will reject to work);
users should use configure script provided by package
distribution
- removed GCC version (3 or above) check: it's pretty ancient
and I don't know which features are missing anyway
- removed some useless checks (which resulted in defines in
src/config,h used by no one)
- introduced some new checks (used by src/c99_stdint.h)
- followed the advices of autoupdate
Makefile.am:
- explicitly prefixed all file names with $(srcdir) or $(builddir)
- removed windows and RPM related rules and targets
- added target 'bootstrap'
- sorted out automake variables
Ulya Trofimovich [Thu, 23 Apr 2015 14:26:56 +0000 (15:26 +0100)]
Continued adding "--skeleton" switch.
Some renaming and cleanup.
Ulya Trofimovich [Wed, 22 Apr 2015 22:24:43 +0000 (23:24 +0100)]
Continued adding "--skeleton" switch.
Simplified 'generate_paths_cover' a bit: NULL-state case is not
really needed, it's artificial and should be embedded into the
only case that needs it.
Ulya Trofimovich [Wed, 22 Apr 2015 16:54:48 +0000 (17:54 +0100)]
Continued adding "--skeleton" switch.
Check possible overflow immediately.
Ulya Trofimovich [Wed, 22 Apr 2015 16:45:24 +0000 (17:45 +0100)]
Continued adding "--skeleton" switch.
Estimate the size of data to be generated rather than the number
of paths in DFA. Returns precise size or 'Skeleton::MAX_PATHS',
whatever is less. Estimate data size in both cases for all paths
or path cover.
Ulya Trofimovich [Tue, 21 Apr 2015 14:49:15 +0000 (15:49 +0100)]
Continued adding "--skeleton" switch.
Use RAII-style object to auto-decrement visit counter when
leaving local scope.
Ulya Trofimovich [Mon, 20 Apr 2015 22:46:29 +0000 (23:46 +0100)]
Continued adding "--skeleton" switch.
Improved readability of overflow checks when counting the number
of paths.
Ulya Trofimovich [Mon, 20 Apr 2015 22:38:32 +0000 (23:38 +0100)]
Added portable version of <stdint.h>, converted the whole project.
Ulya Trofimovich [Mon, 20 Apr 2015 16:21:48 +0000 (17:21 +0100)]
Continued adding "--skeleton" switch.
Generate all possible paths (with respect to range lower/upper
bounds) in case their total amount is less than
'Skeleton::PATHS_OVERFLOW'.
Ulya Trofimovich [Mon, 20 Apr 2015 14:03:57 +0000 (15:03 +0100)]
Continued adding "--skeleton" switch.
Be even more careful not to overflow path counter, check possible
multiplication overflow.
Ulya Trofimovich [Mon, 20 Apr 2015 12:43:31 +0000 (13:43 +0100)]
Continued adding "--skeleton" switch.
Fixed backtracking error introduced by previous commit: shouldn't
return prematurely, need to unmark state as visited first.
Some renaming.
Ulya Trofimovich [Mon, 20 Apr 2015 12:26:25 +0000 (13:26 +0100)]
Continued adding "--skeleton" switch.
Give up counting data when reached certain limit
('Skeleton::MAX_PATHS').
Ulya Trofimovich [Sun, 19 Apr 2015 22:11:32 +0000 (23:11 +0100)]
Continued adding "--skeleton" switch.
Code cleanup.
Ulya Trofimovich [Sun, 19 Apr 2015 21:38:50 +0000 (22:38 +0100)]
Continued adding "--skeleton" switch.
Keep all generated paths in memory (we are careful not to generate
too much of them, and we keep second part of data (positions) in
memory anyway).
Ulya Trofimovich [Sun, 19 Apr 2015 21:05:46 +0000 (22:05 +0100)]
Continued adding "--skeleton" switch.
Simplified traversal of outgoing arrows when extending prefixes.
Instead of adding outgoing arrow sets (which was erroneous anyway:
they should remain unmodified for another recursion entry), simply
wrap iterator and keep iterating until both ingoing and outgoing
arrows are counted.
Ulya Trofimovich [Fri, 17 Apr 2015 17:51:29 +0000 (18:51 +0100)]
Continued adding "--skeleton" switch.
Simplified construction of prefixes.
Ulya Trofimovich [Fri, 17 Apr 2015 16:15:31 +0000 (17:15 +0100)]
Continued adding "--skeleton" switch.
Renamed: 'Prefix' -> 'Path'.
Ulya Trofimovich [Fri, 17 Apr 2015 16:00:23 +0000 (17:00 +0100)]
Continued adding "--skeleton" switch.
More code cleanup: pre-initialize paths for final states and
default state, so that NULL can be used as a terminating condition
for path-generating function. Fixed memleaks.
Ulya Trofimovich [Thu, 16 Apr 2015 15:38:46 +0000 (16:38 +0100)]
Continued adding "--skeleton" switch.
Code cleanup.
Ulya Trofimovich [Thu, 16 Apr 2015 13:46:03 +0000 (14:46 +0100)]
Continued adding "--skeleton" switch.
Tried to simplify data generation a bit: construct skeleton states
so that there's no links to NULL state. Default state is
represented with a state that has no rule and zero outgoing
arrows. Final states are represented with a state that has a rule,
but zero outgoing labels.
Ulya Trofimovich [Wed, 15 Apr 2015 14:09:45 +0000 (15:09 +0100)]
Continued adding "--skeleton" switch.
Code cleanup.
Ulya Trofimovich [Wed, 15 Apr 2015 14:02:23 +0000 (15:02 +0100)]
Continued adding "--skeleton" switch.
Code cleanup.
Ulya Trofimovich [Wed, 15 Apr 2015 12:40:19 +0000 (13:40 +0100)]
Continued adding "--skeleton" switch.
Reduced the amount of generated data (no exponential growth now).
Generate enough paths to cover all DFA transition (well, not
exactly all: only upper and lower bounds of a range). Respect
cycles (loop once).
Ulya Trofimovich [Wed, 8 Apr 2015 15:43:16 +0000 (16:43 +0100)]
Code cleanup.
Ulya Trofimovich [Wed, 8 Apr 2015 11:58:59 +0000 (12:58 +0100)]
Continued adding "--skeleton" switch.
Use a separate DFA-like structure for DFA skeleton. This allows
to optimize skeleton states for fast traversal (group transitions
by destination state) and avoid messing with the original DFA.
Added 'count' method to estimate the the amount of data that will
be generated (it worked too slow for the origianl DFA with
ungrouped transitions).
Ulya Trofimovich [Tue, 7 Apr 2015 16:01:46 +0000 (17:01 +0100)]
Continued adding "--skeleton" switch.
Output input data to a separate file (otherwize we'll have to
keep all generated data in memory, cause output has a complex
structure and cannot be written to file until it's fully
generated)
This reduces memory usage significantly (so that there remain no
memory consumption problems with "--skeleton" switch). However
on some files re2c generated too much data, e.g. case-insensitive
strings:
/*!re2c
'
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' {}
*/
Exponential growth is a bad thing; must deal with it somehow.
Time grows exponentially as well, of course.
Ulya Trofimovich [Mon, 6 Apr 2015 21:11:05 +0000 (22:11 +0100)]
Continued adding "--skeleton" switch.
Check for rule match in final states.
Ulya Trofimovich [Mon, 6 Apr 2015 19:40:43 +0000 (20:40 +0100)]
Continued adding "--skeleton" switch.
Generate check for input position in final states: first check
that input position equals to the expected one; second advance
input position to the beginning of next DFA path (this may be
necessary when the generated lexer rollbacks: it should resume
with a new DFA path rather than some default characters remaining
after rollback).
Ulya Trofimovich [Mon, 6 Apr 2015 19:15:29 +0000 (20:15 +0100)]
Continued adding "--skeleton" switch.
Fixed range lower bound tracking when generating input strings
from DFA.
Ulya Trofimovich [Mon, 6 Apr 2015 18:08:02 +0000 (19:08 +0100)]
Continued adding "--skeleton" switch.
When generating DFA paths respect range lower bound as well as
upper bound.
Ulya Trofimovich [Mon, 6 Apr 2015 17:44:59 +0000 (18:44 +0100)]
Continued adding "--skeleton" switch.
Output generated strings directly to file instead of storing them
in memory.
Ulya Trofimovich [Mon, 6 Apr 2015 15:41:49 +0000 (16:41 +0100)]
Continued adding "--skeleton" switch.
Track exact number of characters to be consumed on each input
string. Note that it's not equal to string length, since many
strings end up with characters from default transitions: on such
strings lexer must rollback to the latest backtracking point (that
is, the latest accepting state).
Ulya Trofimovich [Mon, 6 Apr 2015 04:11:56 +0000 (05:11 +0100)]
Continued adding "--skeleton" switch.
Moved input-string-generation-from-DFA algorithm earlier, to the
point when DFA is barely constructed and contains only 'Match'
states and starting state (no saving, accepting or split states).
This simplifies string generation: if destination state is NULL,
it is either next to final state (if all spans lead to it), or
default state (if some spans lead to other states).
Ulya Trofimovich [Sun, 5 Apr 2015 14:39:29 +0000 (15:39 +0100)]
Removed useless type of DFA state.
'Enter' was encapsulated by 'Initial' and never used on its own.
Ulya Trofimovich [Sun, 5 Apr 2015 11:43:11 +0000 (12:43 +0100)]
Continued adding "--skeleton" switch.
Fixed backtracking in deriving-input-data-from-DFA algorithm.
Ulya Trofimovich [Fri, 3 Apr 2015 21:40:44 +0000 (22:40 +0100)]
Determine action type in a simpler way.
Ulya Trofimovich [Fri, 3 Apr 2015 21:30:13 +0000 (22:30 +0100)]
Continued adding "--skeleton" switch.
Tried to sort out various DFA states and how they influence code
generation. Added explicit type field to all actions.
Ulya Trofimovich [Fri, 3 Apr 2015 21:23:36 +0000 (22:23 +0100)]
Removed useless NULL check (all NULL destination states handled in DFA::prepare).
Ulya Trofimovich [Thu, 2 Apr 2015 14:31:14 +0000 (15:31 +0100)]
Continued adding "--skeleton" switch.
A naive attempt to generate input strings from DFA. The problem is,
if the number of spans equals 1, it's hard to determine whether
it's some kind of a 'transit' state or a normal state with just one
span. All states have spans, but do they use them? The situation is
further complicated with 'readCh' which makes it hard to trace how
actions influence input operations.
Ulya Trofimovich [Thu, 2 Apr 2015 14:07:26 +0000 (15:07 +0100)]
Omit useless check for NULL: default transitions are handled in DFA::prepare
Ulya Trofimovich [Thu, 2 Apr 2015 14:04:45 +0000 (15:04 +0100)]
Some output should be generated with "--skeleton", but not with "-D"
Ulya Trofimovich [Thu, 2 Apr 2015 14:03:40 +0000 (15:03 +0100)]
Makefile.am: added "-O2" to CXXFLAGS
Ulya Trofimovich [Thu, 2 Apr 2015 14:02:33 +0000 (15:02 +0100)]
Prettified debug output.
Ulya Trofimovich [Mon, 30 Mar 2015 13:41:29 +0000 (14:41 +0100)]
Continued adding "--skeleton" switch.
Generate prolog and epilog in the form of a for-loop. The body
of the loop is the hard-coded DFA. The code in DFA final states
is substituted with "continue" statements.
Ulya Trofimovich [Fri, 27 Mar 2015 21:35:49 +0000 (21:35 +0000)]
Started adding "--skeleton" switch.
Ulya Trofimovich [Fri, 27 Mar 2015 14:55:11 +0000 (14:55 +0000)]
Testing script: trap SIGINT in child threads.
Now 'Ctrl+C' is handled as in single-threaded script: all threads
stop working.
Ulya Trofimovich [Thu, 26 Mar 2015 16:45:57 +0000 (16:45 +0000)]
A little code cleanup.
Ulya Trofimovich [Thu, 26 Mar 2015 16:42:58 +0000 (16:42 +0000)]
Don't output YYCTXMARKER stuff in .dot mode.
Added test that revealed error.
Ulya Trofimovich [Thu, 26 Mar 2015 16:37:25 +0000 (16:37 +0000)]
Run tests in parallel with "-j<threads>" option.
Ulya Trofimovich [Thu, 26 Mar 2015 16:31:20 +0000 (16:31 +0000)]
Comment
Ulya Trofimovich [Thu, 26 Mar 2015 16:27:21 +0000 (16:27 +0000)]
Added text file 'sf-cheatsheet' to make some notes about sourceforge administration.
Ulya Trofimovich [Thu, 26 Mar 2015 16:26:51 +0000 (16:26 +0000)]
Removed unused funtion.
Ulya Trofimovich [Wed, 18 Mar 2015 15:01:41 +0000 (15:01 +0000)]
Make 'Go' hierarchy independent of relabelling.
This allows to move 'Go' initialization loop to 'DFA::prepare'
and thus avoid ugly check if it is already initialized (it can
happen in '-r' mode when the same DFA is used multiple times).
Now that we store 'State *' pointers instead of labels in
'CpgotoTable', relabelling won't affect the generated code.
Ulya Trofimovich [Wed, 18 Mar 2015 14:35:19 +0000 (14:35 +0000)]
- Track used labels in a separate traversal of 'Go' graph
(first part of effort to reduce codegen to null device)
- Properly destruct 'Go' graph (46 test failing with '--valgrind'
because of early exiting on errors)
Ulya Trofimovich [Wed, 18 Mar 2015 11:08:44 +0000 (11:08 +0000)]
Abstracted common used constant in a class variable.
Ulya Trofimovich [Wed, 18 Mar 2015 10:55:43 +0000 (10:55 +0000)]
Separated 'Go' stuff: construction and output.
Ulya Trofimovich [Tue, 17 Mar 2015 16:24:46 +0000 (16:24 +0000)]
Reduced redundant parameter.
Ulya Trofimovich [Tue, 17 Mar 2015 16:00:52 +0000 (16:00 +0000)]
Split control flow codegen in two phases:
- First, re2c builds a complex structure where it stores
all control flow codegen decisions: nested ifs or switches,
bitmaps or computed gotos, etc.
- Second, this structure is traversed and code is generated.
This differentiation is necessary to compute some statistics
(e.g. used labels) in advance, before code generation.
Ulya Trofimovich [Sun, 15 Mar 2015 15:09:46 +0000 (15:09 +0000)]
Extracted duplicated code to a function.
Ulya Trofimovich [Sun, 15 Mar 2015 14:56:57 +0000 (14:56 +0000)]
Updated test results (changed due to the previous commit).
Ulya Trofimovich [Sun, 15 Mar 2015 14:43:45 +0000 (14:43 +0000)]
Removed YYDEBUG call before switch.
We don't call YYDEBUG in analogous situation (when if/else is used
insted of switch), and it seems redundant anyway.
Ulya Trofimovich [Fri, 13 Mar 2015 18:34:14 +0000 (18:34 +0000)]
Don't segfault if span has zero length.
Ulya Trofimovich [Thu, 12 Mar 2015 21:49:55 +0000 (21:49 +0000)]
Simplified codegen decision between switches/ifs.
All tests pass.
The previous condition made more sense: it was clear that the
author intended to consider some frequent corner cases.
But the condition was very tangled and yet too heuristic,
so I substituted it with a meaningless, but simple one.
I'm planning to simplify it even more later on.
Ulya Trofimovich [Wed, 11 Mar 2015 16:42:08 +0000 (16:42 +0000)]
Removed unreachable condition.
Ulya Trofimovich [Wed, 11 Mar 2015 16:32:56 +0000 (16:32 +0000)]
Number of high (wide) spans not necessarily depends on code unit size.
Ulya Trofimovich [Wed, 11 Mar 2015 15:46:52 +0000 (15:46 +0000)]
Moved bitmap statistics counting inside of 'Go' class.
Ulya Trofimovich [Wed, 11 Mar 2015 15:02:19 +0000 (15:02 +0000)]
Moved bitmap statistics to 'Go' class.
Ulya Trofimovich [Wed, 11 Mar 2015 14:45:12 +0000 (14:45 +0000)]
Moved methods out of class.
Ulya Trofimovich [Wed, 11 Mar 2015 12:52:32 +0000 (12:52 +0000)]
Reduced useless wrappers.
Ulya Trofimovich [Wed, 11 Mar 2015 12:46:58 +0000 (12:46 +0000)]
Pass limited span instead of checking range all the time.
Ulya Trofimovich [Wed, 11 Mar 2015 10:52:05 +0000 (10:52 +0000)]
Simplify handling of high/low spans.
Spans are already ordered, so to get high (wide) spans,
all we need is to find first span with upper bound >0x100
and point at it.
Ulya Trofimovich [Tue, 10 Mar 2015 21:53:11 +0000 (21:53 +0000)]
Moved wide span barrier to 0x100 rather than 0xFF.
0x100 corresponds to condition 'encoding.szCodeUnit () <= 1',
cause field 'ub' in 'Span' means upper bound of char interval.
Ulya Trofimovich [Tue, 10 Mar 2015 21:33:13 +0000 (21:33 +0000)]
Removed useless statistics counter.
Verified with 'assert (dSpans >= lTargets);'
Ulya Trofimovich [Tue, 10 Mar 2015 18:53:31 +0000 (18:53 +0000)]
Reclaced class member with a local variable.
Ulya Trofimovich [Tue, 10 Mar 2015 18:47:31 +0000 (18:47 +0000)]
Replaced class member with a local variable.
Ulya Trofimovich [Tue, 10 Mar 2015 18:41:50 +0000 (18:41 +0000)]
Eliminated redundant variable.
Ulya Trofimovich [Tue, 10 Mar 2015 17:07:10 +0000 (17:07 +0000)]
Removed unused variable.
Ulya Trofimovich [Tue, 10 Mar 2015 16:01:15 +0000 (16:01 +0000)]
Removed dead code.
Ulya Trofimovich [Tue, 10 Mar 2015 15:58:29 +0000 (15:58 +0000)]
Extracted common functionality in codegen -g and -b flags.
Ulya Trofimovich [Sat, 7 Mar 2015 20:54:08 +0000 (20:54 +0000)]
Removed obsolete condition (makes no sense in single-pass mode).
Ulya Trofimovich [Sat, 7 Mar 2015 20:00:06 +0000 (20:00 +0000)]
Simplified .dot codegen:
- draw a single arrow for all transitions between two given states
- label all arrows with corresponding character ranges in square
brackests (no "default" label, single characters also appear in
square brackets)
- .dot output became much smaller, thus pictures are drawn faster
and generally look better: e.g. it takes ~10x less time to draw
PHP lexer and the resulting graph is shaped better.
Ulya Trofimovich [Thu, 5 Mar 2015 16:47:25 +0000 (16:47 +0000)]
Simplified switch generetion in codegen.
Ulya Trofimovich [Wed, 4 Mar 2015 17:26:40 +0000 (17:26 +0000)]
Continued separating .dot case from other cases in codegen.
Ulya Trofimovich [Wed, 4 Mar 2015 17:23:28 +0000 (17:23 +0000)]
Continued separating .dot case from other cases in codegen.
Ulya Trofimovich [Wed, 4 Mar 2015 17:15:38 +0000 (17:15 +0000)]
Continued separating .dot case from other cases in codegen. Added test.
Ulya Trofimovich [Wed, 4 Mar 2015 15:43:43 +0000 (15:43 +0000)]
Continued separating .dot case from other cases in codegen.
Ulya Trofimovich [Wed, 4 Mar 2015 15:41:10 +0000 (15:41 +0000)]
Continued separating .dot case from other cases in codegen.
Ulya Trofimovich [Wed, 4 Mar 2015 15:16:34 +0000 (15:16 +0000)]
Started to separate .dot case from other cases in codegen.
Ulya Trofimovich [Wed, 4 Mar 2015 13:12:19 +0000 (13:12 +0000)]
Removed dead code.
Used 'if (next) assert(next->label == from->label + 1);'
to ensure that code is really dead (and 'next' is always
the next state to 'from').
Ulya Trofimovich [Mon, 2 Mar 2015 13:54:11 +0000 (13:54 +0000)]
Hided 'OutputFile' structure.
Ulya Trofimovich [Mon, 2 Mar 2015 12:41:53 +0000 (12:41 +0000)]
Code cleanup in input/output:
- Use 'open' function instead of checking return status
(one may forget to check return status, but if one forgets to
open file, the error will be obvious)
- Introduced separate file type for header.
Header is much simpler than output, it doesn't need delayed
code fragments and can be generated in destructor.
Ulya Trofimovich [Fri, 27 Feb 2015 23:42:01 +0000 (23:42 +0000)]
Finally get rid of 'file_info' and 'stream_lc.h'
Ulya Trofimovich [Thu, 26 Feb 2015 13:23:02 +0000 (13:23 +0000)]
Reduce 'file_info' usage (in order to get rid of 'stream_lc.h').
Ulya Trofimovich [Thu, 26 Feb 2015 12:37:07 +0000 (12:37 +0000)]
'token.h' no longer depends on 'file_info'.
Part of campaign to remove 'stream_lc.h'.
Ulya Trofimovich [Thu, 26 Feb 2015 12:32:55 +0000 (12:32 +0000)]
Now input stream is simple 'FILE *'.
This is first part of campaign to remove 'stream_lc.h'.
Ulya Trofimovich [Thu, 26 Feb 2015 11:41:15 +0000 (11:41 +0000)]
Dead code elimination.
Ulya Trofimovich [Thu, 26 Feb 2015 10:48:41 +0000 (10:48 +0000)]
Removed unused enum members.
I was unsure if delayed generation was also needed for genCondGoto
and genCondTable; so I kept those enum members as a reminder. Now
I know that all conditions are known by the moment re2c block is
parsed and code generation starts.
Ulya Trofimovich [Thu, 26 Feb 2015 10:36:16 +0000 (10:36 +0000)]
Moved operator definition to where it belongs.
Ulya Trofimovich [Wed, 25 Feb 2015 23:13:55 +0000 (23:13 +0000)]
One pass.
Second pass was used because some information (which influences
early parts of the generated code, e.g enum with condition names
or YYMAXFILL definition) becomes available only at the end of first
pass.
I isolate all (I hope so) these things and generate stubs for them,
which are filled later. I restructured output as follows: the whole
output consists of source and header, each of them is a list of
blocks (corresponding to re2c blocks in source file), each block
is a list of code fragments (which can be either regular strings
with code or stubs that will be filled later).