]> granicus.if.org Git - re2c/log
re2c
9 years agoSeparated code generation for DFA actions and DFA states.
Ulya Trofimovich [Wed, 27 May 2015 12:04:53 +0000 (13:04 +0100)]
Separated code generation for DFA actions and DFA states.

Removes debugging utilities for DFA states (we should either add
debug builds explicitely or throw away temporary debug stuff).

9 years agoGather all label counting in one place prior to code generation.
Ulya Trofimovich [Wed, 27 May 2015 11:52:02 +0000 (12:52 +0100)]
Gather all label counting in one place prior to code generation.

9 years agoMoved start label configuration out of global scope.
Ulya Trofimovich [Wed, 27 May 2015 10:50:55 +0000 (11:50 +0100)]
Moved start label configuration out of global scope.

There are two configurations:
1. re2c:startlabel = <integer>;
2. re2c:startlabel = <string>;
The scope of these configurations is the scope of current re2c block.

9 years agoOutput user-defines start label in the appropriate place.
Ulya Trofimovich [Tue, 26 May 2015 20:34:56 +0000 (21:34 +0100)]
Output user-defines start label in the appropriate place.

Before this commit, given the following example:
    /*!re2c
            re2c:startlabel = "start";
            [^]* {}
    */
re2c would generate the following code:
    {
            YYCTYPE yych;
            goto yy0;
    yy1:
    start:
            ++YYCURSOR;
    yy0:
            if (YYLIMIT <= YYCURSOR) YYFILL(1);
            yych = *YYCURSOR;
            goto yy1;
            {}
    }
where "start:" falsely corresponds to "yy1:" rather to "yy0:".
(The important property of this example is that DFA has arrows
to initial state.) This commit fixes this behavior:
    {
            YYCTYPE yych;
    start:
            goto yy0;
    yy1:
            ++YYCURSOR;
    yy0:
            if (YYLIMIT <= YYCURSOR) YYFILL(1);
            yych = *YYCURSOR;
            goto yy1;
            {}
    }

9 years agoDon't hide the ugly fact that default state in '-f' mode is always state 0.
Ulya Trofimovich [Tue, 26 May 2015 13:00:01 +0000 (14:00 +0100)]
Don't hide the ugly fact that default state in '-f' mode is always state 0.

In '-f' mode, state dispatch generation can be triggered in two ways.
Both ways use 're2c::OutputFile::insert_state_goto', which generates
state dispatch only if it hasn't been already generated. The two ways
are:
1. Explicitly, using '/*!getstate:re2c*/'. In this case default state
   must be state 0 because it's hardcoded in the invocation of
   're2c::OutputFile::insert_state_goto' in 'src/parse/scanner_lex.re'.
2. Implicitly, in 're2c::DFA::emit'. In this case default state must
   be state 0 because if 'prolog_label' is not 0, it means that it's
   not the first time 're2c::DFA::emit' is called and state dispatch
   has already been generated.

This commit makes it explicit that re2c always uses state 0.

Note:
    Currently in '-f' mode re2c generates one global dispatch for
    the whole file (the enumeration of yyFillLabel's is also global).
    All re2c blocks share the same state dispatch, so in '-f' mode all
    re2c blocks must reside in the same function and must be parts of
    the same lexer (exception: in '-r' mode re2c generates one state
    dispatch per use block).

    This is clearly an ugly limitation: one is forced put disconnected
    lexers in different files in '-f' mode.

    Now re2c provides conditions as a way to express related blocks,
    so if users used multiple blocks only for unrelated lexers, we
    could safely limit the scope of state dispatch to a single block.
    But conditions can conflict with other re2c features, they are
    a bit broken and I'm pretty sure some users use multiple blocks
    (e.g. I used to do it).

    Thus we cannot just make '-f' generate state dispatch on per-block
    basis (there're some other obstacles: it's not quite clear which
    block '/*!getstate:re2c*/ directive is related to, etc.).

    I leave the situation 'as is' until better times (when lexer-
    parser loop is fixed and the whole code generation model is more
    robust).

9 years agoClarify which label is relevant to initial state.
Ulya Trofimovich [Tue, 26 May 2015 11:34:57 +0000 (12:34 +0100)]
Clarify which label is relevant to initial state.

(see commit 4ab969deca341820d03ffb979f023256b10c2f92)

9 years agoCompare DFA states rather than labels.
Ulya Trofimovich [Tue, 26 May 2015 11:24:48 +0000 (12:24 +0100)]
Compare DFA states rather than labels.

(see commit 923f4c5559b1f8b43e63d1013d91698e7496e28c)

9 years agoClarify which label is relevant to initial state.
Ulya Trofimovich [Sun, 24 May 2015 20:38:55 +0000 (21:38 +0100)]
Clarify which label is relevant to initial state.

Initial state is different from all other states in that should
not advance YYCURSOR when it's entered first time (when DFA is
entered). But it should advance YYCURSOR like any normal state if
it's entered from any other part of DFA (other DFA states may well
lead to the inital state, e.g. '[^]*')

That's why inital state is split into two parts, each marked by a
separate label: normal state label points to the code that advances
YYCURSOR, and special label points right after that code (skips it).

It happens so that normal state label is equal to special initial
label plus one, but we shouldn't rely on that.

9 years agoClarify that initial state stores 'start_label', not some random label.
Ulya Trofimovich [Sun, 24 May 2015 20:33:18 +0000 (21:33 +0100)]
Clarify that initial state stores 'start_label', not some random label.

9 years agoCompare DFA states rather than labels.
Ulya Trofimovich [Thu, 21 May 2015 13:38:13 +0000 (14:38 +0100)]
Compare DFA states rather than labels.

This is first step to postpone labelling states until all the
code is actually generated (in the form of some structure in
memory) and it's time to pretty-print it to file.

Labels shouldn't be mixed up with states (in particular, they
shouldn't be used as a state identifier).

9 years agoRegenerated documentation (changed by previous commit).
Ulya Trofimovich [Wed, 20 May 2015 09:45:31 +0000 (10:45 +0100)]
Regenerated documentation (changed by previous commit).

9 years agoFinally removed auxiliary code generation pass to 'null device'.
Ulya Trofimovich [Wed, 20 May 2015 08:34:26 +0000 (09:34 +0100)]
Finally removed auxiliary code generation pass to 'null device'.

This pass was used to gather statistics:
    - which labels are used (to avoid 'unused label' warnings from
      C/C++ compiler)
    - in '-f' mode, how many times YYFILL is called (to generate
      dispatch to the arrpopriate YYFILL call and resume lexing from
      there)

This commit deals with second case: it makes counting YYFILL calls
independent of auxiliary pass. Counting relies on variable
're2c::last_fill_index', which is updated in 're2c::need' function
(note that it is crucial that 're2c::need' is always called prior
to generation of state dispatch).

Also fixed documentation for '-f'.

9 years agoPass only that part of output needed by 're2c::State::emit'.
Ulya Trofimovich [Tue, 19 May 2015 17:53:35 +0000 (18:53 +0100)]
Pass only that part of output needed by 're2c::State::emit'.

9 years agoFixed clang's "warning: declaration shadows a local variable [-Wshadow]"
Ulya Trofimovich [Tue, 19 May 2015 17:45:55 +0000 (18:45 +0100)]
Fixed clang's "warning: declaration shadows a local variable [-Wshadow]"

9 years ago're2c::emit_init' doesn't generate anything useful in .dot mode.
Ulya Trofimovich [Tue, 19 May 2015 17:39:04 +0000 (18:39 +0100)]
're2c::emit_init' doesn't generate anything useful in .dot mode.

9 years agoAnother part of tracking label usage moved out from codegen.
Ulya Trofimovich [Tue, 19 May 2015 17:19:24 +0000 (18:19 +0100)]
Another part of tracking label usage moved out from codegen.

This is part of effort to reduce code generation to 'null device'
(the same code is generated twice only to gather statistics on
label usage). In order to reduce this evil pass, we should be able
to track label usage before code generation.

9 years agoSimplified 're2c::Action' class and its usage.
Ulya Trofimovich [Tue, 19 May 2015 11:53:24 +0000 (12:53 +0100)]
Simplified 're2c::Action' class and its usage.

Replaced inheritance hierarchy with tagged union.

9 years agoFixed clang's "warning: declaration shadows a local variable [-Wshadow]"
Ulya Trofimovich [Mon, 18 May 2015 11:58:04 +0000 (12:58 +0100)]
Fixed clang's "warning: declaration shadows a local variable [-Wshadow]"

9 years agoDon't copy <*> regexps for each condition.
Ulya Trofimovich [Sat, 16 May 2015 11:01:58 +0000 (12:01 +0100)]
Don't copy <*> regexps for each condition.

There're two major things to care about in this situation:
1. <*> rules must have the lowest priority: in order to guarantee
   it, we re-iterate <*> regexps after all other rules have been
   parsed and fix <*> regexps priority.
2. <*> regexps must be compiled to instrictions separately for each
   condition: this is guaranteed by assigning them
   're2c::RegExp::PRIVATE' attribute.

Note that 're2c::RuleOp::accept' member stores rule priority.
These priorities don't have to be consecutive, only the right order
must be maintained. Later on in 're2c::DFA::prepare' they result in
consecutive 'yyaccept' values.

9 years agoForbid copying of 're2c::Substr'.
Ulya Trofimovich [Thu, 14 May 2015 17:20:13 +0000 (18:20 +0100)]
Forbid copying of 're2c::Substr'.

Removed useless methods (most of them became useless after
're2c::Str' removal) and obsolete autoconf check for 'strndup'.

Removed 'Scanner::token' methods. They used
'Scanner::check_token_length', which was pretty useless:
1. checking for the lower should have always succeed, because
   'Scanner::tok' is always set to buffer start in 'Scanner::fill'
   and if the token was too long, it's start will be lost anyway.
2. checking for the upper bound could fail if re2c dev passed some
   trash into it, but any normal function would do so and this is
   no particular reason to have special runtime checks here.
Now substrings are constructed in lexer, where all the lengths
and bounds are easier to verify from lexing context (for re2c dev).

9 years agoRemoved obsolete 're2c::Str' stuff.
Ulya Trofimovich [Thu, 14 May 2015 15:37:54 +0000 (16:37 +0100)]
Removed obsolete 're2c::Str' stuff.

9 years agoPass unquoted strings to parsing functions.
Ulya Trofimovich [Thu, 14 May 2015 12:58:29 +0000 (13:58 +0100)]
Pass unquoted strings to parsing functions.

In most cases re2c accepts single-quoted or double-quoted strings
in regexp specifications, but if flex-like syntax if enabled, then
re2c accepts unquoted strings.

Before this commit, functions that parse strings into regexps
expected quoted strings, and we had to add quotes in case of
flex-like syntax. That was very inconvenient and in fact
unnecessary, since the first thing parsing functions do is get rid
of quotes.

9 years agoNow 're2c::Token' uses 'std::string' instead of 're2c::Str'.
Ulya Trofimovich [Thu, 14 May 2015 12:16:24 +0000 (13:16 +0100)]
Now 're2c::Token' uses 'std::string' instead of 're2c::Str'.

9 years agoSimplified handling of named definitions in parser.
Ulya Trofimovich [Thu, 14 May 2015 11:25:24 +0000 (12:25 +0100)]
Simplified handling of named definitions in parser.

Don't bother with named definitions in lexer, just pass them as
strings to parser. Parser will recognize named definitions, insert
them into symbol table and handle conflicts.

Use simple 'std::map' instead 're2c::Symbol' class (that hides
symbol table in class static member).

Use 'std::string' instead of 're2c::Str'. Due to bison limitations
we have to pass pointers to strings allocated on the heap and
carefully destroy them. The whole thing is quite error prone, so
maybe I'll make a small slab allocator for parser later on.

9 years agoFixed mismatched new/delelte (found by valgrind).
Ulya Trofimovich [Thu, 14 May 2015 11:21:26 +0000 (12:21 +0100)]
Fixed mismatched new/delelte (found by valgrind).

What is allocated with 're2c::allocate<T>' (operator new), should
be freed with 'operator delete', not with 'delete' or 'delete []'.

9 years agoMoved 're2c::Symbol' class to a separate header and source file.
Ulya Trofimovich [Wed, 13 May 2015 16:34:48 +0000 (17:34 +0100)]
Moved 're2c::Symbol' class to a separate header and source file.

9 years agoReplaced 're2c::Str' with 'std::string'.
Ulya Trofimovich [Wed, 13 May 2015 14:24:35 +0000 (15:24 +0100)]
Replaced 're2c::Str' with 'std::string'.

There's no point in using self-invented strings if one always
converts them to 'std::string' before actually using them.

9 years agoCXXFLAGS: removed -DPEDANTIC, added -Weffc++. Fixed warnings.
Ulya Trofimovich [Wed, 13 May 2015 14:07:01 +0000 (15:07 +0100)]
CXXFLAGS: removed -DPEDANTIC, added -Weffc++. Fixed warnings.

9 years agoSplit 'src/dfa/dfa.h' into parts: DFA states, DFA actions, DFA.
Ulya Trofimovich [Wed, 13 May 2015 10:13:09 +0000 (11:13 +0100)]
Split 'src/dfa/dfa.h' into parts: DFA states, DFA actions, DFA.

9 years agoMoved functions declarations and typedefs to a proper place.
Ulya Trofimovich [Tue, 12 May 2015 16:00:09 +0000 (17:00 +0100)]
Moved functions declarations and typedefs to a proper place.

9 years agoMerged 'src/codegen/translate.cc' into 'src/codegen/print.cc'.
Ulya Trofimovich [Tue, 12 May 2015 11:51:30 +0000 (12:51 +0100)]
Merged 'src/codegen/translate.cc' into 'src/codegen/print.cc'.

9 years agoSplit 'src/codegen/code.cc' into parts.
Ulya Trofimovich [Tue, 12 May 2015 11:26:07 +0000 (12:26 +0100)]
Split 'src/codegen/code.cc' into parts.

First, DFA is built ('re2c::DFA::DFA'), then it must be prepared
for code generation: some states must be split, backtracking points
must be marked, etc. ('re2c::DFA::prepare'), then finally code
can be generated ('re2c::DFA::genCode').

I haven't yet fully decided whether second stage (preparing) is
closer to DFA construction in general (and thus should be moved to
'src/dfa') or to code generation (and should be moved to 'src/codegen').
Since it deals a lot with bitmaps, second variant will suffice for
now. Perhaps later on I'll split preparation into general and
codegen-related parts.

9 years agoRenamed struct to avoid confusion.
Ulya Trofimovich [Tue, 12 May 2015 10:32:11 +0000 (11:32 +0100)]
Renamed struct to avoid confusion.

're2c::BitMap' represents actual bitmaps used for code generation,
while 're2c::GoBitmap' (former 're2c::Bitmap') is a node type in
graph representation of the program right before code generation.

Names used in 're2c::Go' subsystem are really ugly and will be
renamed to something more sensible later on.

9 years agoMoved 're2c::BitMap' methods to a separate source file.
Ulya Trofimovich [Tue, 12 May 2015 10:26:04 +0000 (11:26 +0100)]
Moved 're2c::BitMap' methods to a separate source file.

9 years agoRemoved some useless includes from 'src/codegen/code.cc'.
Ulya Trofimovich [Tue, 12 May 2015 10:05:08 +0000 (11:05 +0100)]
Removed some useless includes from 'src/codegen/code.cc'.

9 years agoMoved lexing functions to proper file.
Ulya Trofimovich [Tue, 12 May 2015 09:57:27 +0000 (10:57 +0100)]
Moved lexing functions to proper file.

For some reason 're2c::Scanner' methods were defined in file
'src/codegen/code.cc'. Mpved them to 'src/parser/scanner.cc'
and renamed 'src/parser/scanner.re' to 'src/parser/scanner_lex.re'.

9 years agoRemoved useless include.
Ulya Trofimovich [Mon, 11 May 2015 20:04:34 +0000 (21:04 +0100)]
Removed useless include.

9 years agoReduced "src/codegen/print.h" dependency for some unrelated files.
Ulya Trofimovich [Mon, 11 May 2015 16:46:26 +0000 (17:46 +0100)]
Reduced "src/codegen/print.h" dependency for some unrelated files.

This is the first part of effort to reduce the total number of
interdependencies between different files.

"src/codegen/print.h" contains some pretty-printing functions
mostly used for code generation. There two other cases where.
these functions can also be useful:
    1. debug: it's not yet well developed, just some messy chunks
       of code that are commented out
    2. error messages: character pretty-printing is not actually
       very useful, since error messages mostly contain printable
       characters (pieces of user-supplied input)
From the above I conclude that non-codegen files don't really need
"src/codegen/print.h". If this will change, I'll consider moving
this functions to src/util.

9 years agoImproved source files layout.
Ulya Trofimovich [Mon, 11 May 2015 12:46:26 +0000 (13:46 +0100)]
Improved source files layout.

Created a directory tree to group logically related source files
and clean up top source directory.

9 years agoAdded mingw builds for windows.
Ulya Trofimovich [Mon, 11 May 2015 10:13:34 +0000 (11:13 +0100)]
Added mingw builds for windows.

To enable mingw build, confugure with "--host i686-w64-mingw32".
Simple 'make' will then build re2c.exe executable for windows.
Note that 'make bootstrap' will not work: it will try to run
re2c.exe in order to recompile scanner and this will surely fail.

Testing with wine can be done using 'make wtests'.

9 years agoDon't recompile scanner by default.
Ulya Trofimovich [Mon, 11 May 2015 09:54:26 +0000 (10:54 +0100)]
Don't recompile scanner by default.

Since it will cause insignifacant changes in bootstrap files
which we'll have to commit every time.

9 years agoDon't rebuild docs by default.
Ulya Trofimovich [Mon, 11 May 2015 09:41:01 +0000 (10:41 +0100)]
Don't rebuild docs by default.

If we configure with "--enable-docs", then docs get rebuild every
time make is executed, and we'll have to commit these insignificant
changes every time, which is bad.

9 years agoUpdated build system.
Ulya Trofimovich [Fri, 8 May 2015 22:09:00 +0000 (23:09 +0100)]
Updated build system.

Major changes:
- Moved all re2c source files to a separate directory. Top source
  directory should only contain autotools source files and a few other
  files like README.

- Enabled out-of-source builds (and wrote a simple script build.sh
  that makes out-of-source build).

- Improved portable variant of <stdint.h> header (src/c99_stdint.h):
  now it relies only on some few defines in configure-generated
  src/config.h (instead of checking for MSVC version and relying
  on MSVC-defined stuff). Implementation follows C99 standard closely.

- Removed all windows-related build stuff. It was no use keeping it:
  it's been broken for a long time and I can't maintain it.

- Removed all RPM stuff: distro-maintainers use their own hacks anyway.
  Makefile.am is definetely the wrong place to keep such things.
  A separete script and .spec file is a better idea, but again, nobody
  uses it.

- added make target 'bootstrap' to make mainteiners' job easier.

- Merged lessons and examples into one.

- Updated README and doc/index.html.

- Run tests in parallel by default.

Changes concerning particular files:
- configure.ac:
    - removed autoconf version check: developers will need the right
      version anyway (otherwize autoconf will reject to work);
      users should use configure script provided by package
      distribution

    - removed GCC version (3 or above) check: it's pretty ancient
      and I don't know which features are missing anyway

    - removed some useless checks (which resulted in defines in
      src/config,h used by no one)

    - introduced some new checks (used by src/c99_stdint.h)

    - followed the advices of autoupdate

Makefile.am:
    - explicitly prefixed all file names with $(srcdir) or $(builddir)

    - removed windows and RPM related rules and targets

    - added target 'bootstrap'

    - sorted out automake variables

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Thu, 23 Apr 2015 14:26:56 +0000 (15:26 +0100)]
Continued adding "--skeleton" switch.

Some renaming and cleanup.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Wed, 22 Apr 2015 22:24:43 +0000 (23:24 +0100)]
Continued adding "--skeleton" switch.

Simplified 'generate_paths_cover' a bit: NULL-state case is not
really needed, it's artificial and should be embedded into the
only case that needs it.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Wed, 22 Apr 2015 16:54:48 +0000 (17:54 +0100)]
Continued adding "--skeleton" switch.

Check possible overflow immediately.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Wed, 22 Apr 2015 16:45:24 +0000 (17:45 +0100)]
Continued adding "--skeleton" switch.

Estimate the size of data to be generated rather than the number
of paths in DFA. Returns precise size or 'Skeleton::MAX_PATHS',
whatever is less. Estimate data size in both cases for all paths
or path cover.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Tue, 21 Apr 2015 14:49:15 +0000 (15:49 +0100)]
Continued adding "--skeleton" switch.

Use RAII-style object to auto-decrement visit counter when
leaving local scope.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 20 Apr 2015 22:46:29 +0000 (23:46 +0100)]
Continued adding "--skeleton" switch.

Improved readability of overflow checks when counting the number
of paths.

9 years agoAdded portable version of <stdint.h>, converted the whole project.
Ulya Trofimovich [Mon, 20 Apr 2015 22:38:32 +0000 (23:38 +0100)]
Added portable version of <stdint.h>, converted the whole project.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 20 Apr 2015 16:21:48 +0000 (17:21 +0100)]
Continued adding "--skeleton" switch.

Generate all possible paths (with respect to range lower/upper
bounds) in case their total amount is less than
'Skeleton::PATHS_OVERFLOW'.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 20 Apr 2015 14:03:57 +0000 (15:03 +0100)]
Continued adding "--skeleton" switch.

Be even more careful not to overflow path counter, check possible
multiplication overflow.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 20 Apr 2015 12:43:31 +0000 (13:43 +0100)]
Continued adding "--skeleton" switch.

Fixed backtracking error introduced by previous commit: shouldn't
return prematurely, need to unmark state as visited first.

Some renaming.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 20 Apr 2015 12:26:25 +0000 (13:26 +0100)]
Continued adding "--skeleton" switch.

Give up counting data when reached certain limit
('Skeleton::MAX_PATHS').

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Sun, 19 Apr 2015 22:11:32 +0000 (23:11 +0100)]
Continued adding "--skeleton" switch.

Code cleanup.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Sun, 19 Apr 2015 21:38:50 +0000 (22:38 +0100)]
Continued adding "--skeleton" switch.

Keep all generated paths in memory (we are careful not to generate
too much of them, and we keep second part of data (positions) in
memory anyway).

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Sun, 19 Apr 2015 21:05:46 +0000 (22:05 +0100)]
Continued adding "--skeleton" switch.

Simplified traversal of outgoing arrows when extending prefixes.
Instead of adding outgoing arrow sets (which was erroneous anyway:
they should remain unmodified for another recursion entry), simply
wrap iterator and keep iterating until both ingoing and outgoing
arrows are counted.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Fri, 17 Apr 2015 17:51:29 +0000 (18:51 +0100)]
Continued adding "--skeleton" switch.

Simplified construction of prefixes.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Fri, 17 Apr 2015 16:15:31 +0000 (17:15 +0100)]
Continued adding "--skeleton" switch.

Renamed: 'Prefix' -> 'Path'.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Fri, 17 Apr 2015 16:00:23 +0000 (17:00 +0100)]
Continued adding "--skeleton" switch.

More code cleanup: pre-initialize paths for final states and
default state, so that NULL can be used as a terminating condition
for path-generating function. Fixed memleaks.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Thu, 16 Apr 2015 15:38:46 +0000 (16:38 +0100)]
Continued adding "--skeleton" switch.

Code cleanup.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Thu, 16 Apr 2015 13:46:03 +0000 (14:46 +0100)]
Continued adding "--skeleton" switch.

Tried to simplify data generation a bit: construct skeleton states
so that there's no links to NULL state. Default state is
represented with a state that has no rule and zero outgoing
arrows. Final states are represented with a state that has a rule,
but zero outgoing labels.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Wed, 15 Apr 2015 14:09:45 +0000 (15:09 +0100)]
Continued adding "--skeleton" switch.

Code cleanup.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Wed, 15 Apr 2015 14:02:23 +0000 (15:02 +0100)]
Continued adding "--skeleton" switch.

Code cleanup.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Wed, 15 Apr 2015 12:40:19 +0000 (13:40 +0100)]
Continued adding "--skeleton" switch.

Reduced the amount of generated data (no exponential growth now).
Generate enough paths to cover all DFA transition (well, not
exactly all: only upper and lower bounds of a range). Respect
cycles (loop once).

9 years agoCode cleanup.
Ulya Trofimovich [Wed, 8 Apr 2015 15:43:16 +0000 (16:43 +0100)]
Code cleanup.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Wed, 8 Apr 2015 11:58:59 +0000 (12:58 +0100)]
Continued adding "--skeleton" switch.

Use a separate DFA-like structure for DFA skeleton. This allows
to optimize skeleton states for fast traversal (group transitions
by destination state) and avoid messing with the original DFA.

Added 'count' method to estimate the the amount of data that will
be generated (it worked too slow for the origianl DFA with
ungrouped transitions).

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Tue, 7 Apr 2015 16:01:46 +0000 (17:01 +0100)]
Continued adding "--skeleton" switch.

Output input data to a separate file (otherwize we'll have to
keep all generated data in memory, cause output has a complex
structure and cannot be written to file until it's fully
generated)

This reduces memory usage significantly (so that there remain no
memory consumption problems with "--skeleton" switch). However
on some files re2c generated too much data, e.g. case-insensitive
strings:

/*!re2c
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' {}
*/

Exponential growth is a bad thing; must deal with it somehow.
Time grows exponentially as well, of course.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 6 Apr 2015 21:11:05 +0000 (22:11 +0100)]
Continued adding "--skeleton" switch.

Check for rule match in final states.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 6 Apr 2015 19:40:43 +0000 (20:40 +0100)]
Continued adding "--skeleton" switch.

Generate check for input position in final states: first check
that input position equals to the expected one; second advance
input position to the beginning of next DFA path (this may be
necessary when the generated lexer rollbacks: it should resume
with a new DFA path rather than some default characters remaining
after rollback).

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 6 Apr 2015 19:15:29 +0000 (20:15 +0100)]
Continued adding "--skeleton" switch.

Fixed range lower bound tracking when generating input strings
from DFA.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 6 Apr 2015 18:08:02 +0000 (19:08 +0100)]
Continued adding "--skeleton" switch.

When generating DFA paths respect range lower bound as well as
upper bound.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 6 Apr 2015 17:44:59 +0000 (18:44 +0100)]
Continued adding "--skeleton" switch.

Output generated strings directly to file instead of storing them
in memory.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 6 Apr 2015 15:41:49 +0000 (16:41 +0100)]
Continued adding "--skeleton" switch.

Track exact number of characters to be consumed on each input
string. Note that it's not equal to string length, since many
strings end up with characters from default transitions: on such
strings lexer must rollback to the latest backtracking point (that
is, the latest accepting state).

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 6 Apr 2015 04:11:56 +0000 (05:11 +0100)]
Continued adding "--skeleton" switch.

Moved input-string-generation-from-DFA algorithm earlier, to the
point when DFA is barely constructed and contains only 'Match'
states and starting state (no saving, accepting or split states).
This simplifies string generation: if destination state is NULL,
it is either next to final state (if all spans lead to it), or
default state (if some spans lead to other states).

9 years agoRemoved useless type of DFA state.
Ulya Trofimovich [Sun, 5 Apr 2015 14:39:29 +0000 (15:39 +0100)]
Removed useless type of DFA state.

'Enter' was encapsulated by 'Initial' and never used on its own.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Sun, 5 Apr 2015 11:43:11 +0000 (12:43 +0100)]
Continued adding "--skeleton" switch.

Fixed backtracking in deriving-input-data-from-DFA algorithm.

9 years agoDetermine action type in a simpler way.
Ulya Trofimovich [Fri, 3 Apr 2015 21:40:44 +0000 (22:40 +0100)]
Determine action type in a simpler way.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Fri, 3 Apr 2015 21:30:13 +0000 (22:30 +0100)]
Continued adding "--skeleton" switch.

Tried to sort out various DFA states and how they influence code
generation. Added explicit type field to all actions.

9 years agoRemoved useless NULL check (all NULL destination states handled in DFA::prepare).
Ulya Trofimovich [Fri, 3 Apr 2015 21:23:36 +0000 (22:23 +0100)]
Removed useless NULL check (all NULL destination states handled in DFA::prepare).

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Thu, 2 Apr 2015 14:31:14 +0000 (15:31 +0100)]
Continued adding "--skeleton" switch.

A naive attempt to generate input strings from DFA. The problem is,
if the number of spans equals 1, it's hard to determine whether
it's some kind of a 'transit' state or a normal state with just one
span. All states have spans, but do they use them? The situation is
further complicated with 'readCh' which makes it hard to trace how
actions influence input operations.

9 years agoOmit useless check for NULL: default transitions are handled in DFA::prepare
Ulya Trofimovich [Thu, 2 Apr 2015 14:07:26 +0000 (15:07 +0100)]
Omit useless check for NULL: default transitions are handled in DFA::prepare

9 years agoSome output should be generated with "--skeleton", but not with "-D"
Ulya Trofimovich [Thu, 2 Apr 2015 14:04:45 +0000 (15:04 +0100)]
Some output should be generated with "--skeleton", but not with "-D"

9 years agoMakefile.am: added "-O2" to CXXFLAGS
Ulya Trofimovich [Thu, 2 Apr 2015 14:03:40 +0000 (15:03 +0100)]
Makefile.am: added "-O2" to CXXFLAGS

9 years agoPrettified debug output.
Ulya Trofimovich [Thu, 2 Apr 2015 14:02:33 +0000 (15:02 +0100)]
Prettified debug output.

9 years agoContinued adding "--skeleton" switch.
Ulya Trofimovich [Mon, 30 Mar 2015 13:41:29 +0000 (14:41 +0100)]
Continued adding "--skeleton" switch.

Generate prolog and epilog in the form of a for-loop. The body
of the loop is the hard-coded DFA. The code in DFA final states
is substituted with "continue" statements.

9 years agoStarted adding "--skeleton" switch.
Ulya Trofimovich [Fri, 27 Mar 2015 21:35:49 +0000 (21:35 +0000)]
Started adding "--skeleton" switch.

9 years agoTesting script: trap SIGINT in child threads.
Ulya Trofimovich [Fri, 27 Mar 2015 14:55:11 +0000 (14:55 +0000)]
Testing script: trap SIGINT in child threads.

Now 'Ctrl+C' is handled as in single-threaded script: all threads
stop working.

9 years agoA little code cleanup.
Ulya Trofimovich [Thu, 26 Mar 2015 16:45:57 +0000 (16:45 +0000)]
A little code cleanup.

9 years agoDon't output YYCTXMARKER stuff in .dot mode.
Ulya Trofimovich [Thu, 26 Mar 2015 16:42:58 +0000 (16:42 +0000)]
Don't output YYCTXMARKER stuff in .dot mode.

Added test that revealed error.

9 years agoRun tests in parallel with "-j<threads>" option.
Ulya Trofimovich [Thu, 26 Mar 2015 16:37:25 +0000 (16:37 +0000)]
Run tests in parallel with "-j<threads>" option.

9 years agoComment
Ulya Trofimovich [Thu, 26 Mar 2015 16:31:20 +0000 (16:31 +0000)]
Comment

9 years agoAdded text file 'sf-cheatsheet' to make some notes about sourceforge administration.
Ulya Trofimovich [Thu, 26 Mar 2015 16:27:21 +0000 (16:27 +0000)]
Added text file 'sf-cheatsheet' to make some notes about sourceforge administration.

9 years agoRemoved unused funtion.
Ulya Trofimovich [Thu, 26 Mar 2015 16:26:51 +0000 (16:26 +0000)]
Removed unused funtion.

9 years agoMake 'Go' hierarchy independent of relabelling.
Ulya Trofimovich [Wed, 18 Mar 2015 15:01:41 +0000 (15:01 +0000)]
Make 'Go' hierarchy independent of relabelling.

This allows to move 'Go' initialization loop to 'DFA::prepare'
and thus avoid ugly check if it is already initialized (it can
happen in '-r' mode when the same DFA is used multiple times).
Now that we store 'State *' pointers instead of labels in
'CpgotoTable', relabelling won't affect the generated code.

9 years ago- Track used labels in a separate traversal of 'Go' graph
Ulya Trofimovich [Wed, 18 Mar 2015 14:35:19 +0000 (14:35 +0000)]
- Track used labels in a separate traversal of 'Go' graph
  (first part of effort to reduce codegen to null device)
- Properly destruct 'Go' graph (46 test failing with '--valgrind'
  because of early exiting on errors)

9 years agoAbstracted common used constant in a class variable.
Ulya Trofimovich [Wed, 18 Mar 2015 11:08:44 +0000 (11:08 +0000)]
Abstracted common used constant in a class variable.

9 years agoSeparated 'Go' stuff: construction and output.
Ulya Trofimovich [Wed, 18 Mar 2015 10:55:43 +0000 (10:55 +0000)]
Separated 'Go' stuff: construction and output.

9 years agoReduced redundant parameter.
Ulya Trofimovich [Tue, 17 Mar 2015 16:24:46 +0000 (16:24 +0000)]
Reduced redundant parameter.

9 years agoSplit control flow codegen in two phases:
Ulya Trofimovich [Tue, 17 Mar 2015 16:00:52 +0000 (16:00 +0000)]
Split control flow codegen in two phases:

- First, re2c builds a complex structure where it stores
  all control flow codegen decisions: nested ifs or switches,
  bitmaps or computed gotos, etc.
- Second, this structure is traversed and code is generated.

This differentiation is necessary to compute some statistics
(e.g. used labels) in advance, before code generation.