Ulya Trofimovich [Thu, 16 Apr 2015 13:46:03 +0000 (14:46 +0100)]
Continued adding "--skeleton" switch.
Tried to simplify data generation a bit: construct skeleton states
so that there's no links to NULL state. Default state is
represented with a state that has no rule and zero outgoing
arrows. Final states are represented with a state that has a rule,
but zero outgoing labels.
Ulya Trofimovich [Wed, 15 Apr 2015 14:09:45 +0000 (15:09 +0100)]
Continued adding "--skeleton" switch.
Code cleanup.
Ulya Trofimovich [Wed, 15 Apr 2015 14:02:23 +0000 (15:02 +0100)]
Continued adding "--skeleton" switch.
Code cleanup.
Ulya Trofimovich [Wed, 15 Apr 2015 12:40:19 +0000 (13:40 +0100)]
Continued adding "--skeleton" switch.
Reduced the amount of generated data (no exponential growth now).
Generate enough paths to cover all DFA transition (well, not
exactly all: only upper and lower bounds of a range). Respect
cycles (loop once).
Ulya Trofimovich [Wed, 8 Apr 2015 15:43:16 +0000 (16:43 +0100)]
Code cleanup.
Ulya Trofimovich [Wed, 8 Apr 2015 11:58:59 +0000 (12:58 +0100)]
Continued adding "--skeleton" switch.
Use a separate DFA-like structure for DFA skeleton. This allows
to optimize skeleton states for fast traversal (group transitions
by destination state) and avoid messing with the original DFA.
Added 'count' method to estimate the the amount of data that will
be generated (it worked too slow for the origianl DFA with
ungrouped transitions).
Ulya Trofimovich [Tue, 7 Apr 2015 16:01:46 +0000 (17:01 +0100)]
Continued adding "--skeleton" switch.
Output input data to a separate file (otherwize we'll have to
keep all generated data in memory, cause output has a complex
structure and cannot be written to file until it's fully
generated)
This reduces memory usage significantly (so that there remain no
memory consumption problems with "--skeleton" switch). However
on some files re2c generated too much data, e.g. case-insensitive
strings:
/*!re2c
'
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' {}
*/
Exponential growth is a bad thing; must deal with it somehow.
Time grows exponentially as well, of course.
Ulya Trofimovich [Mon, 6 Apr 2015 21:11:05 +0000 (22:11 +0100)]
Continued adding "--skeleton" switch.
Check for rule match in final states.
Ulya Trofimovich [Mon, 6 Apr 2015 19:40:43 +0000 (20:40 +0100)]
Continued adding "--skeleton" switch.
Generate check for input position in final states: first check
that input position equals to the expected one; second advance
input position to the beginning of next DFA path (this may be
necessary when the generated lexer rollbacks: it should resume
with a new DFA path rather than some default characters remaining
after rollback).
Ulya Trofimovich [Mon, 6 Apr 2015 19:15:29 +0000 (20:15 +0100)]
Continued adding "--skeleton" switch.
Fixed range lower bound tracking when generating input strings
from DFA.
Ulya Trofimovich [Mon, 6 Apr 2015 18:08:02 +0000 (19:08 +0100)]
Continued adding "--skeleton" switch.
When generating DFA paths respect range lower bound as well as
upper bound.
Ulya Trofimovich [Mon, 6 Apr 2015 17:44:59 +0000 (18:44 +0100)]
Continued adding "--skeleton" switch.
Output generated strings directly to file instead of storing them
in memory.
Ulya Trofimovich [Mon, 6 Apr 2015 15:41:49 +0000 (16:41 +0100)]
Continued adding "--skeleton" switch.
Track exact number of characters to be consumed on each input
string. Note that it's not equal to string length, since many
strings end up with characters from default transitions: on such
strings lexer must rollback to the latest backtracking point (that
is, the latest accepting state).
Ulya Trofimovich [Mon, 6 Apr 2015 04:11:56 +0000 (05:11 +0100)]
Continued adding "--skeleton" switch.
Moved input-string-generation-from-DFA algorithm earlier, to the
point when DFA is barely constructed and contains only 'Match'
states and starting state (no saving, accepting or split states).
This simplifies string generation: if destination state is NULL,
it is either next to final state (if all spans lead to it), or
default state (if some spans lead to other states).
Ulya Trofimovich [Sun, 5 Apr 2015 14:39:29 +0000 (15:39 +0100)]
Removed useless type of DFA state.
'Enter' was encapsulated by 'Initial' and never used on its own.
Ulya Trofimovich [Sun, 5 Apr 2015 11:43:11 +0000 (12:43 +0100)]
Continued adding "--skeleton" switch.
Fixed backtracking in deriving-input-data-from-DFA algorithm.
Ulya Trofimovich [Fri, 3 Apr 2015 21:40:44 +0000 (22:40 +0100)]
Determine action type in a simpler way.
Ulya Trofimovich [Fri, 3 Apr 2015 21:30:13 +0000 (22:30 +0100)]
Continued adding "--skeleton" switch.
Tried to sort out various DFA states and how they influence code
generation. Added explicit type field to all actions.
Ulya Trofimovich [Fri, 3 Apr 2015 21:23:36 +0000 (22:23 +0100)]
Removed useless NULL check (all NULL destination states handled in DFA::prepare).
Ulya Trofimovich [Thu, 2 Apr 2015 14:31:14 +0000 (15:31 +0100)]
Continued adding "--skeleton" switch.
A naive attempt to generate input strings from DFA. The problem is,
if the number of spans equals 1, it's hard to determine whether
it's some kind of a 'transit' state or a normal state with just one
span. All states have spans, but do they use them? The situation is
further complicated with 'readCh' which makes it hard to trace how
actions influence input operations.
Ulya Trofimovich [Thu, 2 Apr 2015 14:07:26 +0000 (15:07 +0100)]
Omit useless check for NULL: default transitions are handled in DFA::prepare
Ulya Trofimovich [Thu, 2 Apr 2015 14:04:45 +0000 (15:04 +0100)]
Some output should be generated with "--skeleton", but not with "-D"
Ulya Trofimovich [Thu, 2 Apr 2015 14:03:40 +0000 (15:03 +0100)]
Makefile.am: added "-O2" to CXXFLAGS
Ulya Trofimovich [Thu, 2 Apr 2015 14:02:33 +0000 (15:02 +0100)]
Prettified debug output.
Ulya Trofimovich [Mon, 30 Mar 2015 13:41:29 +0000 (14:41 +0100)]
Continued adding "--skeleton" switch.
Generate prolog and epilog in the form of a for-loop. The body
of the loop is the hard-coded DFA. The code in DFA final states
is substituted with "continue" statements.
Ulya Trofimovich [Fri, 27 Mar 2015 21:35:49 +0000 (21:35 +0000)]
Started adding "--skeleton" switch.
Ulya Trofimovich [Fri, 27 Mar 2015 14:55:11 +0000 (14:55 +0000)]
Testing script: trap SIGINT in child threads.
Now 'Ctrl+C' is handled as in single-threaded script: all threads
stop working.
Ulya Trofimovich [Thu, 26 Mar 2015 16:45:57 +0000 (16:45 +0000)]
A little code cleanup.
Ulya Trofimovich [Thu, 26 Mar 2015 16:42:58 +0000 (16:42 +0000)]
Don't output YYCTXMARKER stuff in .dot mode.
Added test that revealed error.
Ulya Trofimovich [Thu, 26 Mar 2015 16:37:25 +0000 (16:37 +0000)]
Run tests in parallel with "-j<threads>" option.
Ulya Trofimovich [Thu, 26 Mar 2015 16:31:20 +0000 (16:31 +0000)]
Comment
Ulya Trofimovich [Thu, 26 Mar 2015 16:27:21 +0000 (16:27 +0000)]
Added text file 'sf-cheatsheet' to make some notes about sourceforge administration.
Ulya Trofimovich [Thu, 26 Mar 2015 16:26:51 +0000 (16:26 +0000)]
Removed unused funtion.
Ulya Trofimovich [Wed, 18 Mar 2015 15:01:41 +0000 (15:01 +0000)]
Make 'Go' hierarchy independent of relabelling.
This allows to move 'Go' initialization loop to 'DFA::prepare'
and thus avoid ugly check if it is already initialized (it can
happen in '-r' mode when the same DFA is used multiple times).
Now that we store 'State *' pointers instead of labels in
'CpgotoTable', relabelling won't affect the generated code.
Ulya Trofimovich [Wed, 18 Mar 2015 14:35:19 +0000 (14:35 +0000)]
- Track used labels in a separate traversal of 'Go' graph
(first part of effort to reduce codegen to null device)
- Properly destruct 'Go' graph (46 test failing with '--valgrind'
because of early exiting on errors)
Ulya Trofimovich [Wed, 18 Mar 2015 11:08:44 +0000 (11:08 +0000)]
Abstracted common used constant in a class variable.
Ulya Trofimovich [Wed, 18 Mar 2015 10:55:43 +0000 (10:55 +0000)]
Separated 'Go' stuff: construction and output.
Ulya Trofimovich [Tue, 17 Mar 2015 16:24:46 +0000 (16:24 +0000)]
Reduced redundant parameter.
Ulya Trofimovich [Tue, 17 Mar 2015 16:00:52 +0000 (16:00 +0000)]
Split control flow codegen in two phases:
- First, re2c builds a complex structure where it stores
all control flow codegen decisions: nested ifs or switches,
bitmaps or computed gotos, etc.
- Second, this structure is traversed and code is generated.
This differentiation is necessary to compute some statistics
(e.g. used labels) in advance, before code generation.
Ulya Trofimovich [Sun, 15 Mar 2015 15:09:46 +0000 (15:09 +0000)]
Extracted duplicated code to a function.
Ulya Trofimovich [Sun, 15 Mar 2015 14:56:57 +0000 (14:56 +0000)]
Updated test results (changed due to the previous commit).
Ulya Trofimovich [Sun, 15 Mar 2015 14:43:45 +0000 (14:43 +0000)]
Removed YYDEBUG call before switch.
We don't call YYDEBUG in analogous situation (when if/else is used
insted of switch), and it seems redundant anyway.
Ulya Trofimovich [Fri, 13 Mar 2015 18:34:14 +0000 (18:34 +0000)]
Don't segfault if span has zero length.
Ulya Trofimovich [Thu, 12 Mar 2015 21:49:55 +0000 (21:49 +0000)]
Simplified codegen decision between switches/ifs.
All tests pass.
The previous condition made more sense: it was clear that the
author intended to consider some frequent corner cases.
But the condition was very tangled and yet too heuristic,
so I substituted it with a meaningless, but simple one.
I'm planning to simplify it even more later on.
Ulya Trofimovich [Wed, 11 Mar 2015 16:42:08 +0000 (16:42 +0000)]
Removed unreachable condition.
Ulya Trofimovich [Wed, 11 Mar 2015 16:32:56 +0000 (16:32 +0000)]
Number of high (wide) spans not necessarily depends on code unit size.
Ulya Trofimovich [Wed, 11 Mar 2015 15:46:52 +0000 (15:46 +0000)]
Moved bitmap statistics counting inside of 'Go' class.
Ulya Trofimovich [Wed, 11 Mar 2015 15:02:19 +0000 (15:02 +0000)]
Moved bitmap statistics to 'Go' class.
Ulya Trofimovich [Wed, 11 Mar 2015 14:45:12 +0000 (14:45 +0000)]
Moved methods out of class.
Ulya Trofimovich [Wed, 11 Mar 2015 12:52:32 +0000 (12:52 +0000)]
Reduced useless wrappers.
Ulya Trofimovich [Wed, 11 Mar 2015 12:46:58 +0000 (12:46 +0000)]
Pass limited span instead of checking range all the time.
Ulya Trofimovich [Wed, 11 Mar 2015 10:52:05 +0000 (10:52 +0000)]
Simplify handling of high/low spans.
Spans are already ordered, so to get high (wide) spans,
all we need is to find first span with upper bound >0x100
and point at it.
Ulya Trofimovich [Tue, 10 Mar 2015 21:53:11 +0000 (21:53 +0000)]
Moved wide span barrier to 0x100 rather than 0xFF.
0x100 corresponds to condition 'encoding.szCodeUnit () <= 1',
cause field 'ub' in 'Span' means upper bound of char interval.
Ulya Trofimovich [Tue, 10 Mar 2015 21:33:13 +0000 (21:33 +0000)]
Removed useless statistics counter.
Verified with 'assert (dSpans >= lTargets);'
Ulya Trofimovich [Tue, 10 Mar 2015 18:53:31 +0000 (18:53 +0000)]
Reclaced class member with a local variable.
Ulya Trofimovich [Tue, 10 Mar 2015 18:47:31 +0000 (18:47 +0000)]
Replaced class member with a local variable.
Ulya Trofimovich [Tue, 10 Mar 2015 18:41:50 +0000 (18:41 +0000)]
Eliminated redundant variable.
Ulya Trofimovich [Tue, 10 Mar 2015 17:07:10 +0000 (17:07 +0000)]
Removed unused variable.
Ulya Trofimovich [Tue, 10 Mar 2015 16:01:15 +0000 (16:01 +0000)]
Removed dead code.
Ulya Trofimovich [Tue, 10 Mar 2015 15:58:29 +0000 (15:58 +0000)]
Extracted common functionality in codegen -g and -b flags.
Ulya Trofimovich [Sat, 7 Mar 2015 20:54:08 +0000 (20:54 +0000)]
Removed obsolete condition (makes no sense in single-pass mode).
Ulya Trofimovich [Sat, 7 Mar 2015 20:00:06 +0000 (20:00 +0000)]
Simplified .dot codegen:
- draw a single arrow for all transitions between two given states
- label all arrows with corresponding character ranges in square
brackests (no "default" label, single characters also appear in
square brackets)
- .dot output became much smaller, thus pictures are drawn faster
and generally look better: e.g. it takes ~10x less time to draw
PHP lexer and the resulting graph is shaped better.
Ulya Trofimovich [Thu, 5 Mar 2015 16:47:25 +0000 (16:47 +0000)]
Simplified switch generetion in codegen.
Ulya Trofimovich [Wed, 4 Mar 2015 17:26:40 +0000 (17:26 +0000)]
Continued separating .dot case from other cases in codegen.
Ulya Trofimovich [Wed, 4 Mar 2015 17:23:28 +0000 (17:23 +0000)]
Continued separating .dot case from other cases in codegen.
Ulya Trofimovich [Wed, 4 Mar 2015 17:15:38 +0000 (17:15 +0000)]
Continued separating .dot case from other cases in codegen. Added test.
Ulya Trofimovich [Wed, 4 Mar 2015 15:43:43 +0000 (15:43 +0000)]
Continued separating .dot case from other cases in codegen.
Ulya Trofimovich [Wed, 4 Mar 2015 15:41:10 +0000 (15:41 +0000)]
Continued separating .dot case from other cases in codegen.
Ulya Trofimovich [Wed, 4 Mar 2015 15:16:34 +0000 (15:16 +0000)]
Started to separate .dot case from other cases in codegen.
Ulya Trofimovich [Wed, 4 Mar 2015 13:12:19 +0000 (13:12 +0000)]
Removed dead code.
Used 'if (next) assert(next->label == from->label + 1);'
to ensure that code is really dead (and 'next' is always
the next state to 'from').
Ulya Trofimovich [Mon, 2 Mar 2015 13:54:11 +0000 (13:54 +0000)]
Hided 'OutputFile' structure.
Ulya Trofimovich [Mon, 2 Mar 2015 12:41:53 +0000 (12:41 +0000)]
Code cleanup in input/output:
- Use 'open' function instead of checking return status
(one may forget to check return status, but if one forgets to
open file, the error will be obvious)
- Introduced separate file type for header.
Header is much simpler than output, it doesn't need delayed
code fragments and can be generated in destructor.
Ulya Trofimovich [Fri, 27 Feb 2015 23:42:01 +0000 (23:42 +0000)]
Finally get rid of 'file_info' and 'stream_lc.h'
Ulya Trofimovich [Thu, 26 Feb 2015 13:23:02 +0000 (13:23 +0000)]
Reduce 'file_info' usage (in order to get rid of 'stream_lc.h').
Ulya Trofimovich [Thu, 26 Feb 2015 12:37:07 +0000 (12:37 +0000)]
'token.h' no longer depends on 'file_info'.
Part of campaign to remove 'stream_lc.h'.
Ulya Trofimovich [Thu, 26 Feb 2015 12:32:55 +0000 (12:32 +0000)]
Now input stream is simple 'FILE *'.
This is first part of campaign to remove 'stream_lc.h'.
Ulya Trofimovich [Thu, 26 Feb 2015 11:41:15 +0000 (11:41 +0000)]
Dead code elimination.
Ulya Trofimovich [Thu, 26 Feb 2015 10:48:41 +0000 (10:48 +0000)]
Removed unused enum members.
I was unsure if delayed generation was also needed for genCondGoto
and genCondTable; so I kept those enum members as a reminder. Now
I know that all conditions are known by the moment re2c block is
parsed and code generation starts.
Ulya Trofimovich [Thu, 26 Feb 2015 10:36:16 +0000 (10:36 +0000)]
Moved operator definition to where it belongs.
Ulya Trofimovich [Wed, 25 Feb 2015 23:13:55 +0000 (23:13 +0000)]
One pass.
Second pass was used because some information (which influences
early parts of the generated code, e.g enum with condition names
or YYMAXFILL definition) becomes available only at the end of first
pass.
I isolate all (I hope so) these things and generate stubs for them,
which are filled later. I restructured output as follows: the whole
output consists of source and header, each of them is a list of
blocks (corresponding to re2c blocks in source file), each block
is a list of code fragments (which can be either regular strings
with code or stubs that will be filled later).
Ulya Trofimovich [Mon, 23 Feb 2015 18:51:38 +0000 (18:51 +0000)]
Updated version to 0.14.1.dev
Ulya Trofimovich [Mon, 23 Feb 2015 17:28:35 +0000 (17:28 +0000)]
Release 0.14.
Ulya Trofimovich [Mon, 23 Feb 2015 17:25:26 +0000 (17:25 +0000)]
Makefile.am: added bootstrap/re2c.1 to dist files.
Ulya Trofimovich [Mon, 23 Feb 2015 17:23:58 +0000 (17:23 +0000)]
Makefile.am: copy re2c.1 to bootstrap files
Ulya Trofimovich [Mon, 23 Feb 2015 17:18:02 +0000 (17:18 +0000)]
Makefile.am: added forgotten header.
Ulya Trofimovich [Mon, 23 Feb 2015 17:15:40 +0000 (17:15 +0000)]
Makefile.am: removed obsolete MSVC files and added forgotten header.
Ulya Trofimovich [Mon, 23 Feb 2015 17:09:20 +0000 (17:09 +0000)]
Updated CHANGELOG and index.html
Ulya Trofimovich [Mon, 23 Feb 2015 17:00:45 +0000 (17:00 +0000)]
Updated docs.
Ulya Trofimovich [Mon, 23 Feb 2015 13:30:42 +0000 (13:30 +0000)]
Added tests from PHP repository: https://github.com/php/php-src
Test results are almost identical to re2c-0.13.6
(there're some few changes, I believe they are due to
commit
255262b02928d3f38c00dd91952e3253c11c78f1 and
completely harmless).
Ulya Trofimovich [Mon, 23 Feb 2015 12:04:38 +0000 (12:04 +0000)]
Renamed YYEOI -> YYLESSTHAN
Ulya Trofimovich [Mon, 23 Feb 2015 11:50:30 +0000 (11:50 +0000)]
Revert "Renamed re2c primitives:"
This reverts commit
7f816a85b03dd26f279868b1d0eaddb50bc8eb4c.
Ulya Trofimovich [Mon, 9 Feb 2015 17:26:36 +0000 (17:26 +0000)]
Renamed re2c primitives:
YYPEEK ---> RE2C_PEEK
YYSKIP ---> RE2C_SKIP
YYBACKUP ---> RE2C_BACKUP
YYBACKUPCTX ---> RE2C_BACKUP_CTX
YYRESTORE ---> RE2C_RESTORE
YYRESTORECTX ---> RE2C_RESTORE_CTX
YYEOI ---> RE2C_LESS_THAN
Updated tests and examples accordingly.
Renamed functions:
expr_eoi ---> expr_less_than
expr_eoi_one ---> expr_less_than_one
stmt_backupctx ---> stmt_backup_ctx
stmt_restorectx ---> stmt_restore_ctx
Ulya Trofimovich [Sun, 18 Jan 2015 20:27:58 +0000 (20:27 +0000)]
Fixed changelog formatting on re2c.org.
Ulya Trofimovich [Sun, 18 Jan 2015 15:56:06 +0000 (15:56 +0000)]
Added some examples of "--input custom" usage.
Had to modify .gitignore to unmask README in subfolders.
Ulya Trofimovich [Sun, 18 Jan 2015 14:12:16 +0000 (14:12 +0000)]
Replaced "YYHAS (n)" with "YYEOI (n)".
The actual meaning of this primitive is to check if
there's not enough characters left in the input stream,
e.g. "(YYLIMIT - YYCURSOR) < n" or whatever else.
Ulya Trofimovich [Sun, 18 Jan 2015 13:47:51 +0000 (13:47 +0000)]
Added tests for "--input custom".
This implied modifying runtests.sh, as it couldn't handle
test names of the form "basename.--long-switch.re":
it inserted '-' in front of all switches.
Ulya Trofimovich [Sun, 18 Jan 2015 13:43:08 +0000 (13:43 +0000)]
Removed "--input istream".
It's impossible to correctly implement this switch in general,
since not all istreams support seek (simplest counterexample:
std::cin).
Even for simple istreams like std::istringstream, proper
error handling requires adding something like "YYERROR ()".
Ulya Trofimovich [Tue, 13 Jan 2015 15:30:01 +0000 (15:30 +0000)]
A little cleanup of new input API:
- moved enum and pretty-printing functions to a class
- renamed files 'input.{h,cc}' to 'input_api.{h,cc}'
- for "--input istream": moved input position increment to 'stmt_restorectx'
- main.cc: removed useless include
Ulya Trofimovich [Sun, 11 Jan 2015 19:39:32 +0000 (19:39 +0000)]
New input API.
- command-line switch "--input <default | istream | custom>"
- with "--input default" (the default) no changes to generated code
- with "--input istream" assume YYCURSOR is std::istream object
and YYLIMIT is length of input stream
- with "--input custom" expose primitives:
YYPEEK ()
YYSKIP ()
YYBACKUP ()
YYBACKUPCTX ()
YYRESTORE ()
YYRESTORECTX ()
YYHAS (n)
Ulya Trofimovich [Fri, 26 Sep 2014 21:35:28 +0000 (22:35 +0100)]
Added test for bug #46.