]> granicus.if.org Git - re2c/log
re2c
8 years agoKeep fixed and variable tags together in one array.
Ulya Trofimovich [Mon, 9 May 2016 14:13:42 +0000 (15:13 +0100)]
Keep fixed and variable tags together in one array.

This complicates tag deduplication a little (fixed tags have to be
masked), but simplifies tag initialization and rule handling.

8 years agoDon't force mutations of immutable regexp AST.
Ulya Trofimovich [Fri, 6 May 2016 10:29:57 +0000 (11:29 +0100)]
Don't force mutations of immutable regexp AST.

Regexp AST should stay immutable as it may be shared between
different conditions. This means we have to store tag indices
somwhere else.

8 years agoAllow tags in any part of regexp, not only on top-level concatenation.
Ulya Trofimovich [Thu, 5 May 2016 16:17:06 +0000 (17:17 +0100)]
Allow tags in any part of regexp, not only on top-level concatenation.

Fixed-length tag optimization applies only to top-level concatenation:
since we cannot predict which path lexer will choose, we cannot be
sure that any tags except top-level concatenation will ever be
initialized. If a tag may be uninitialized, we cannot fix other
tags on this one (we cannot even fix same-level tags relative to
each other, because fixed tags won't preserve the default value).

8 years agoFixed comparison for struct 'kitem_t' (constified both operands).
Ulya Trofimovich [Mon, 2 May 2016 10:01:57 +0000 (11:01 +0100)]
Fixed comparison for struct 'kitem_t' (constified both operands).

8 years agoBind contexts (a.k.a. tags) to DFA transitions, not states.
Ulya Trofimovich [Mon, 2 May 2016 09:40:04 +0000 (10:40 +0100)]
Bind contexts (a.k.a. tags) to DFA transitions, not states.

This is a very imortant change: it makes tracing tag conflicts as
simple as comparing tags on transitions during DFA construction.
If during determinization we get two identical transitions that
differ only in tags, then we have a tag conflict. Tags that cause
conflicts are called non-deterministic (since they don't allow
deterministic match).

This approach if very similar to Ville Laurikari's TDFA: as he does,
we first build TNFA and then apply determinization; however Laurikari's
TDFA uses complex bookkeeping to track all possible tag values,
while re2c simply forbids tags that cannot be matche efficiently.

Binding tags to transitions allows more fine-grained liveness
analyses, dead tag elimination and tag deduplication.

8 years agoKeep rule number in each NFA state (not only in final states).
Ulya Trofimovich [Sun, 1 May 2016 12:29:47 +0000 (13:29 +0100)]
Keep rule number in each NFA state (not only in final states).

This is currently not necessary, but we'll need it to distinguish
between different rules when comparing tags: if rules are different,
we don't care about tag difference; otherwise tags must be equal.

8 years agoCleanup in codegen (minor changes in '-D, --emit-dot' output).
Ulya Trofimovich [Sun, 1 May 2016 11:59:24 +0000 (12:59 +0100)]
Cleanup in codegen (minor changes in '-D, --emit-dot' output).

    - more elegant hanling of default case when generatin code
      for 'switch' cases
    - dumbed down complex logic behind the generation of sequential
      'if' statements: explicitely list all different cases and
      corresponding conditions

8 years agoMore precise liveness analyses for context deduplication.
Ulya Trofimovich [Sun, 1 May 2016 08:18:07 +0000 (09:18 +0100)]
More precise liveness analyses for context deduplication.

Liveness analyses should exclude contexts that are updated before
they are used past certain point in the program.

8 years agoCleanup in code generation (mostly formatting).
Ulya Trofimovich [Sat, 30 Apr 2016 19:07:54 +0000 (20:07 +0100)]
Cleanup in code generation (mostly formatting).

8 years agoSimplified rule tracking during DFA construction.
Ulya Trofimovich [Sat, 30 Apr 2016 16:10:31 +0000 (17:10 +0100)]
Simplified rule tracking during DFA construction.

8 years agoSimplified collecting context names during code generation.
Ulya Trofimovich [Sat, 30 Apr 2016 16:04:27 +0000 (17:04 +0100)]
Simplified collecting context names during code generation.

8 years agoRule contexts are consecutive, so we only need to store bounds.
Ulya Trofimovich [Sat, 30 Apr 2016 15:47:06 +0000 (16:47 +0100)]
Rule contexts are consecutive, so we only need to store bounds.

8 years agoCount contexts after deduplication.
Ulya Trofimovich [Sat, 30 Apr 2016 15:35:40 +0000 (16:35 +0100)]
Count contexts after deduplication.

8 years agoBreak from loop if the remaining iterations have no effect.
Ulya Trofimovich [Sat, 30 Apr 2016 15:23:32 +0000 (16:23 +0100)]
Break from loop if the remaining iterations have no effect.

8 years agoCleaned up code that merges arcs while tunneling DFA.
Ulya Trofimovich [Sat, 30 Apr 2016 15:18:54 +0000 (16:18 +0100)]
Cleaned up code that merges arcs while tunneling DFA.

Behaviour should be unchanged.

8 years agoMoved conditions and contexts from global scope to block scope.
Ulya Trofimovich [Tue, 5 Apr 2016 22:01:33 +0000 (23:01 +0100)]
Moved conditions and contexts from global scope to block scope.

Directives like '/*!re2c:types*/' and '/*!re2c:contexts*/' have
global scope, they are not binded to any particular block and therefore
accumulate items (conditions/contexts/etc.) from the whole program.

There's currently no way to bind these directives to a particular
block (re2c will probably add scoping rules in future).

However, things like default context declaration or condition dispatch
have block scope (it's only natural that they are binded to the block
that generates them).

Unrelated change: context marker should be set for each condition,
re2c must generate it after condition label: this way code that skips
condition dispatch and jums directly to condition label will still
have context marker adjusted properly.

8 years agoAdded configurations for all command-line options.
Ulya Trofimovich [Tue, 5 Apr 2016 11:38:53 +0000 (12:38 +0100)]
Added configurations for all command-line options.

8 years agoSeparated .dot code generation from common case.
Ulya Trofimovich [Sun, 3 Apr 2016 16:00:37 +0000 (17:00 +0100)]
Separated .dot code generation from common case.

8 years agoOmit some useless newlines in the generated code.
Ulya Trofimovich [Sun, 3 Apr 2016 07:31:25 +0000 (08:31 +0100)]
Omit some useless newlines in the generated code.

8 years agoLexer: minor code cleanup (check errors before doing anything else).
Ulya Trofimovich [Sat, 2 Apr 2016 21:06:25 +0000 (22:06 +0100)]
Lexer: minor code cleanup (check errors before doing anything else).

8 years agoLexer: moved 'end of comment' sublexer to a separate function.
Ulya Trofimovich [Sat, 2 Apr 2016 17:16:55 +0000 (18:16 +0100)]
Lexer: moved 'end of comment' sublexer to a separate function.

8 years agoLexer: unified handling of various re2c directives.
Ulya Trofimovich [Sat, 2 Apr 2016 16:25:23 +0000 (17:25 +0100)]
Lexer: unified handling of various re2c directives.

Now when lexer encounters the beginning of a new directive, it
dumps all the intermediate code to the output.

Before this commit re2c lexer dumped intermediate code on each
newline that occured in the input.

So this commit affects two aspects:
    - Intermediate code is dumped much less often: it's good for
      performance, but it becomes more probable that the intermediate
      code will occasionally occupy too much buffer space and incur
      buffer resize.
    - Some re2c directives ignored characters right before the
      (on the same line). These unfortunate characters are no longer
      ignored.

8 years agoLexer: drop useless rule.
Ulya Trofimovich [Sat, 2 Apr 2016 14:21:50 +0000 (15:21 +0100)]
Lexer: drop useless rule.

The 'end of comment' rule was used to detect end of re2c directives;
however, now it is handled by a special 'end of comment' lexer block.

8 years agoLexer: don't care if end of comment is followed by a newline.
Ulya Trofimovich [Sat, 2 Apr 2016 14:07:59 +0000 (15:07 +0100)]
Lexer: don't care if end of comment is followed by a newline.

re2c used to swallow newline that immediately followes end of directive:
for most of the directives re2c generates code block that ends with a
newline, so the generated code looks better if the newline is not doubled.

However, this unnecessarily complicates the lexer.

8 years agoLexer: suppressed directives in skeleton mode, cleaned up code.
Ulya Trofimovich [Sat, 2 Apr 2016 10:41:04 +0000 (11:41 +0100)]
Lexer: suppressed directives in skeleton mode, cleaned up code.

8 years agoExplicitely handle empty string in lexer.
Ulya Trofimovich [Fri, 1 Apr 2016 20:37:03 +0000 (21:37 +0100)]
Explicitely handle empty string in lexer.

(rather that hide it inside of a definition)

8 years agoAdded directive '/*!contexts:re2c ... */'.
Ulya Trofimovich [Fri, 1 Apr 2016 16:04:12 +0000 (17:04 +0100)]
Added directive '/*!contexts:re2c ... */'.

Directive is a new type of re2c block that consists of zero or more
of the following configurations (example values are the defaults):
    line = "long @@;";
    sep = "";

Renamed configuration: 're2c:ctxprefix' -> 're2c:contexts:prefix'

Added configuration: 're2c:contexts:expr = "@@"' that allows to
override the way context markers are accessed (but nottheir names).

Removed configuration: 're2c:define:YYDISTTYPE' (use the new
directive '/*!contexts:re2c ... */' instead).

8 years agoParser grammar cleanup.
Ulya Trofimovich [Tue, 29 Mar 2016 21:33:32 +0000 (22:33 +0100)]
Parser grammar cleanup.

 - Rearranged some rules to avoid code duplication.
 - Added production and error message about contexts in named definitions.
 - Removed productions and error messages about missing expression
   (simple 'syntax error' is enough, given that only part of the
   errors was captured by the removed productions).

8 years agoOptimized pointer arithmetics generated with '-C, --contexts'.
Ulya Trofimovich [Tue, 29 Mar 2016 14:16:01 +0000 (15:16 +0100)]
Optimized pointer arithmetics generated with '-C, --contexts'.

If default input API is used and fixed-length context is based on
the rightmost context (that is, YYCURSOR), then the following expression
(that calculates pointer corresponding to the given context):
    (YYCTXMARKER + ((YYCURSOR - YYCTXMARKER) - yyctx))
can be optimized to:
    (YYCURSOR - yyctx)

Note: unfortunately, as of GCC < 7.0, expressions like:
    (YYCURSOR - 3) - (YYCURSOR - 5)
trigger warning:
    warning: integer overflow in expression [-Woverflow]
See this GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61240

8 years agoAdded some configurations related to '-C, --contexts' option.
Ulya Trofimovich [Tue, 29 Mar 2016 11:54:02 +0000 (12:54 +0100)]
Added some configurations related to '-C, --contexts' option.

Configurations:
    "define:YYCTX"      (defaults to 'YYCTX')
    "define:YYDIST"     (defaults to 'YYDIST')
    "define:YYDISTTYPE" (defaults to 'long')
    "ctxprefix"         (defaults to 'yyctx')

8 years agoUse 'std::valarray' instead of 'std::vector' to avoid copy semantics.
Ulya Trofimovich [Mon, 28 Mar 2016 18:59:23 +0000 (19:59 +0100)]
Use 'std::valarray' instead of 'std::vector' to avoid copy semantics.

Had to rename 'INFINITY' to 'SCC_INF': inclusion of '<valarray.h>'
pulled '<math.h>' and introduced name collision.

8 years agoUse 'size_t' instead of 'uint32_t' to avoid 'static_cast'.
Ulya Trofimovich [Mon, 28 Mar 2016 14:49:07 +0000 (15:49 +0100)]
Use 'size_t' instead of 'uint32_t' to avoid 'static_cast'.

8 years agoMake sure that default rule is handled together with other rules.
Ulya Trofimovich [Mon, 28 Mar 2016 14:14:37 +0000 (15:14 +0100)]
Make sure that default rule is handled together with other rules.

8 years agoMoved nontrivial context handling from parser to NFA construction phase.
Ulya Trofimovich [Mon, 28 Mar 2016 13:39:11 +0000 (14:39 +0100)]
Moved nontrivial context handling from parser to NFA construction phase.

Parser should simply construct AST; all the complex reasoning about
contexts (fixed vs variable) should be delayed until NFA is being
constructed. This way AST can be immutable and it's very easy to share
parts of AST between different conditions, etc.

Removed rule ranks and rank counter: rules are now stored in NFA-local
array and addressed by index.

8 years agoAdded partial submatch extraction.
Ulya Trofimovich [Wed, 23 Mar 2016 17:40:21 +0000 (17:40 +0000)]
Added partial submatch extraction.

Enabled with '-C, --contexts' option.

Warning '-Wselfoverlapping-contexts'.

8 years agoMoved loop counters out of skeleton; they belong to algorithm-local stuff.
Ulya Trofimovich [Wed, 16 Mar 2016 14:15:25 +0000 (14:15 +0000)]
Moved loop counters out of skeleton; they belong to algorithm-local stuff.

8 years agoPack all stuff relevant to skeleton data generation in a struct.
Ulya Trofimovich [Wed, 16 Mar 2016 13:54:23 +0000 (13:54 +0000)]
Pack all stuff relevant to skeleton data generation in a struct.

8 years agoUse distinct type for path suffixes during skeleton data generation.
Ulya Trofimovich [Wed, 16 Mar 2016 12:20:25 +0000 (12:20 +0000)]
Use distinct type for path suffixes during skeleton data generation.

8 years agoAddress skeleton nodes by indices rather than by pointers.
Ulya Trofimovich [Wed, 16 Mar 2016 09:41:02 +0000 (09:41 +0000)]
Address skeleton nodes by indices rather than by pointers.

This way it is more convenient to add various graph algorithms:
each algorithm can keep its own data in array indexed by skeleton
node numbers (instead of pushing all relevant data inside of the
node).

8 years agoReuse the same struct to represent different kinds of paths in skeleton.
Ulya Trofimovich [Tue, 15 Mar 2016 13:25:32 +0000 (13:25 +0000)]
Reuse the same struct to represent different kinds of paths in skeleton.

8 years ago[-Wundefined-control-flow] analyses: simplified path structure.
Ulya Trofimovich [Tue, 15 Mar 2016 12:41:07 +0000 (12:41 +0000)]
[-Wundefined-control-flow] analyses: simplified path structure.

Now paths involved in [-Wundefined-control-flow] analyses have
exactly the same structure as paths involved in skeleton data generation.

8 years agoAdded comments.
Ulya Trofimovich [Tue, 15 Mar 2016 08:28:02 +0000 (08:28 +0000)]
Added comments.

8 years agoMoved comment.
Ulya Trofimovich [Tue, 15 Mar 2016 08:15:24 +0000 (08:15 +0000)]
Moved comment.

8 years agoMoved calculation of fallback states to earlier stage of DFA construction.
Ulya Trofimovich [Tue, 15 Mar 2016 08:08:45 +0000 (08:08 +0000)]
Moved calculation of fallback states to earlier stage of DFA construction.

We will need this later, when adding multiple context markers
(liveness analyses during context marker deduplication).

8 years agoEradicated occasional copy-paste.
Ulya Trofimovich [Mon, 14 Mar 2016 22:52:53 +0000 (22:52 +0000)]
Eradicated occasional copy-paste.

Used function 'make_rule_copy' introduced by commit
f781c3a3a07cde0243c894da640070d6c62ef4bc.

8 years agoMoved function definitions out of header.
Ulya Trofimovich [Mon, 14 Mar 2016 22:38:42 +0000 (22:38 +0000)]
Moved function definitions out of header.

8 years agoSkeleton: simplified path structure.
Ulya Trofimovich [Mon, 14 Mar 2016 22:23:03 +0000 (22:23 +0000)]
Skeleton: simplified path structure.

Just store pointers to skeleton nodes (instead of bookkeeping arcs,
contexts and rules in each path). All the necessary information can
be easily retrieved from nodes when path is baing dumped to file.

Three tests in '--skeleton' mode have been broken by this commit.
Actually, these are not breakages: these cases reveal incorrect
re2c-generated code. The change is due to the fact that skeleton
now doesn't simulate contexts that go *after* the matched rule:

    ------o------o------> ... (fallback to rule)
        rule   context

8 years agoSkeleton: respect fixed-length contexts.
Ulya Trofimovich [Mon, 14 Mar 2016 21:45:42 +0000 (21:45 +0000)]
Skeleton: respect fixed-length contexts.

8 years agorun_tests.sh: fixed cleanup of intermediate files with '--skeleton'.
Ulya Trofimovich [Fri, 26 Feb 2016 23:03:43 +0000 (23:03 +0000)]
run_tests.sh: fixed cleanup of intermediate files with '--skeleton'.

Broken by commit c56a14582de4fbbed34d231e521ab2e6af5d43c3
"run_tests.sh: don't crash on absolute filepaths in test names."

8 years agorun_tests.sh: don't crash on absolute filepaths in test names.
Ulya Trofimovich [Wed, 24 Feb 2016 16:57:54 +0000 (16:57 +0000)]
run_tests.sh: don't crash on absolute filepaths in test names.

This patch fixes bug #137 "run_tests.sh fail when running configure
script with absolute path".

The fix: copy all test files into temporary build directory (which
is addressed by a relative filepath) and keep messing with relative
filepaths.

8 years agoCode cleanup: moved rule construction to a separate file.
Ulya Trofimovich [Mon, 22 Feb 2016 15:47:44 +0000 (15:47 +0000)]
Code cleanup: moved rule construction to a separate file.

8 years agoMark rules as nullable rather than keep a separate collection of nullable rules.
Ulya Trofimovich [Mon, 22 Feb 2016 13:51:57 +0000 (13:51 +0000)]
Mark rules as nullable rather than keep a separate collection of nullable rules.

8 years agoNow regexp AST is immutable.
Ulya Trofimovich [Mon, 22 Feb 2016 13:17:57 +0000 (13:17 +0000)]
Now regexp AST is immutable.

We don't do any transformations on AST anyway: it is analysed
and converted to NFA. This patch simply fixes the fact that AST
is immutable.

8 years agoCode cleanup: replaced Regexp inheritance hierarchy with tagged union.
Ulya Trofimovich [Mon, 22 Feb 2016 13:03:38 +0000 (13:03 +0000)]
Code cleanup: replaced Regexp inheritance hierarchy with tagged union.

8 years agoCode cleanup: factored RuleInfo out of RuleOp.
Ulya Trofimovich [Mon, 22 Feb 2016 10:22:43 +0000 (10:22 +0000)]
Code cleanup: factored RuleInfo out of RuleOp.

Rule information (line, attached code block, rank, shadowing set, etc.)
is used throughout the program. Before this patch, rule information was
inlined in RuleOp. It was inconvenient because RuleOp belongs to early
stages of compilation (AST, prior to NFA), but it had to live throughout
the whole program.

8 years agoSimplified tracking of fixed-length trailing contexts.
Ulya Trofimovich [Wed, 17 Feb 2016 16:12:59 +0000 (16:12 +0000)]
Simplified tracking of fixed-length trailing contexts.

Static (that is, of fixed length) trailing contexts don't need
recording context position with YYCTXMARKER and restoring it
back on successful match. They can be tracked simply by decreasing
input position by context length.

8 years agoFixed an old bug in function calculating regexp length.
Ulya Trofimovich [Wed, 17 Feb 2016 15:01:28 +0000 (15:01 +0000)]
Fixed an old bug in function calculating regexp length.

The bug: the function treated 'r1 | r2' as 'r1 | r2'
(used 'r1' twice instead of 'r1' and 'r2').

As the function was rarely used the bug went unnoticed for about
two decades.

8 years agoSimplified [-Wmatch-empty-rule] analyses.
Ulya Trofimovich [Wed, 17 Feb 2016 14:46:43 +0000 (14:46 +0000)]
Simplified [-Wmatch-empty-rule] analyses.

Before this patch [-Wmatch-empty-rule] was based on:
    - DFA structural analyses (skeleton phase)
    - rule reachability analyses (skeleton phase)

Now it is based on:
    - NFA structural analyses (NFA phase)
    - rule reachability analyses (skeleton phase)

It's much easier to find nullable rules in NFA than in DFA.
The problem with DFA is in rules with trailing context, both
dynamic and especially static (as it leaves no trace in DFA
states). re2c currently treats static context as dynamic, but
it will change soon.

On the other side NFA may give some false positives because of
unreachable rules:
    [^] {}
    ""  {}
infinite rules:
    [^]* {}
or self-shadowing rules:
    [^]?
Reachability analyses in skeleton helps to filter out unreachable
and infinite rules, but not self-shadowing ones.

8 years agorun_tests.sh: avoid another '+=' in bash arrays (compatibility with old bash).
Ulya Trofimovich [Thu, 28 Jan 2016 08:54:49 +0000 (08:54 +0000)]
run_tests.sh: avoid another '+=' in bash arrays (compatibility with old bash).

See #135 "In installation "make check" give syntax error".

8 years agorun_tests.sh: avoid '+=' in bash arrays (compatibility with old bash).
Ulya Trofimovich [Wed, 27 Jan 2016 19:53:39 +0000 (19:53 +0000)]
run_tests.sh: avoid '+=' in bash arrays (compatibility with old bash).

See #135 "In installation "make check" give syntax error".

8 years agoadd basic support for travis-ci.org integration 136/head
Sergei Trofimovich [Sun, 7 Feb 2016 11:14:55 +0000 (11:14 +0000)]
add basic support for travis-ci.org integration

Signed-off-by: Sergei Trofimovich <siarheit@google.com>
8 years agore2c/Makefile.am: use RST2MAN variable instead of hardcoded rst2man.py
Sergei Trofimovich [Sun, 7 Feb 2016 11:56:07 +0000 (11:56 +0000)]
re2c/Makefile.am: use RST2MAN variable instead of hardcoded rst2man.py

Ubuntu installs docutils binaries without .py prefix.

Signed-off-by: Sergei Trofimovich <siarheit@google.com>
9 years agoRelease 0.16. 0.16
Ulya Trofimovich [Thu, 21 Jan 2016 10:48:30 +0000 (10:48 +0000)]
Release 0.16.

9 years agoMerged branch 'devel'.
Ulya Trofimovich [Thu, 21 Jan 2016 10:45:21 +0000 (10:45 +0000)]
Merged branch 'devel'.

9 years agoUpdated CHANGELOG (preparing release 0.16).
Ulya Trofimovich [Thu, 21 Jan 2016 10:08:56 +0000 (10:08 +0000)]
Updated CHANGELOG (preparing release 0.16).

9 years agoMakefile.am: fixed path to header.
Ulya Trofimovich [Wed, 20 Jan 2016 08:41:04 +0000 (08:41 +0000)]
Makefile.am: fixed path to header.

9 years agoRemoved forgotten 'typename' kewyword in non-template code.
Ulya Trofimovich [Wed, 20 Jan 2016 08:07:33 +0000 (08:07 +0000)]
Removed forgotten 'typename' kewyword in non-template code.

9 years agoFixed includes (applied some of 'include-what-you-use' suggestions).
Ulya Trofimovich [Tue, 19 Jan 2016 18:56:07 +0000 (18:56 +0000)]
Fixed includes (applied some of 'include-what-you-use' suggestions).

9 years agoApplied #131: "Use bash-specific '[[' builtin".
Ulya Trofimovich [Sun, 17 Jan 2016 10:15:09 +0000 (10:15 +0000)]
Applied #131: "Use bash-specific '[[' builtin".

9 years agoStabilized the list of shadowing rules reported by [-Wunreachable-rules].
Ulya Trofimovich [Sat, 16 Jan 2016 23:07:17 +0000 (23:07 +0000)]
Stabilized the list of shadowing rules reported by [-Wunreachable-rules].

Before this commit, the list of rules depended on the order of NFA states
in each DFA state under construction (which is simply a matter of ordering
pointers to heap: the order can be different).

Now all rules for each DFA state are collected and the final choice of
rule is delayed until DFA is constructed, so the order of NFA states
no longer matters.

9 years agoMerged two small headers into one.
Ulya Trofimovich [Fri, 15 Jan 2016 10:07:20 +0000 (10:07 +0000)]
Merged two small headers into one.

9 years agoCompact DFA states after minimization.
Ulya Trofimovich [Wed, 13 Jan 2016 09:12:14 +0000 (09:12 +0000)]
Compact DFA states after minimization.

9 years agoReplaced class method with function.
Ulya Trofimovich [Wed, 13 Jan 2016 08:44:58 +0000 (08:44 +0000)]
Replaced class method with function.

9 years agoKeep data relevant to DFA determinization outsde of DFA states.
Ulya Trofimovich [Wed, 13 Jan 2016 08:14:38 +0000 (08:14 +0000)]
Keep data relevant to DFA determinization outsde of DFA states.

9 years agoMoved YYFILL points calculation to the earlier stage of DFA construction.
Ulya Trofimovich [Mon, 11 Jan 2016 15:01:05 +0000 (15:01 +0000)]
Moved YYFILL points calculation to the earlier stage of DFA construction.

No serious changes intended (mostly cleanup and comments).

The underlying algorithm for finding strongly connected components
(SCC) remains the same: it's a slightly modified Tarjan's algorithm.

We now mark non-YYFILL states by setting YYFILL argument to zero,
which is only logical: why would anyone call YYFILL to provide zero
characters. In fact, re2c didn't generate 'YYFILL(0)' call itself,
but some remnants of YYFILL did remain (which caused changes in tests).

9 years agoSerialize '--skeleton' generated data in little-endian.
Ulya Trofimovich [Thu, 7 Jan 2016 14:24:52 +0000 (14:24 +0000)]
Serialize '--skeleton' generated data in little-endian.

This commit fixes bug #132 "test failure on big endian archs with 0.15.3".

Tests failed because re2c with '--skeleton' option used host endianness
when serializing binary data to file. Expected test result was generated
on little-endian arch, while actual test was run on big-endian arch.

Only three tests failed (out of ~40 tests that are always run with
'--skeleton'), because in most cases data unit is 1 byte and endianness
doesn't matter.

The fix: re2c now converts binary data from host-endian to little-endian
before dumping it to file. Skeleton programs convert data back from
little-endian to host-endian when reading it from file (iff data unit
size is greater than 1 byte).

9 years agoSerialize '--skeleton' generated data in little-endian.
Ulya Trofimovich [Thu, 7 Jan 2016 14:24:52 +0000 (14:24 +0000)]
Serialize '--skeleton' generated data in little-endian.

This commit fixes bug #132 "test failure on big endian archs with 0.15.3".

Tests failed because re2c with '--skeleton' option used host endianness
when serializing binary data to file. Expected test result was generated
on little-endian arch, while actual test was run on big-endian arch.

Only three tests failed (out of ~40 tests that are always run with
'--skeleton'), because in most cases data unit is 1 byte and endianness
doesn't matter.

The fix: re2c now converts binary data from host-endian to little-endian
before dumping it to file. Skeleton programs convert data back from
little-endian to host-endian when reading it from file (iff data unit
size is greater than 1 byte).

9 years agoconfigure.ac: fixed error message.
Ulya Trofimovich [Wed, 6 Jan 2016 10:11:39 +0000 (10:11 +0000)]
configure.ac: fixed error message.

Message should report 'rst2man' as well as 'rst2man.py'.

9 years agoconfigure.ac: check for 'rst2man' as well as 'rst2man.py'.
Ulya Trofimovich [Wed, 6 Jan 2016 09:21:57 +0000 (09:21 +0000)]
configure.ac: check for 'rst2man' as well as 'rst2man.py'.

Fixes bug #133 "rst2man.py depreciated in RHEL7".

9 years agoExplicitely handle default state as special case during DFA construction.
Ulya Trofimovich [Tue, 5 Jan 2016 17:15:37 +0000 (17:15 +0000)]
Explicitely handle default state as special case during DFA construction.

9 years agoCheck accumulated parameter before entering recursion.
Ulya Trofimovich [Sat, 2 Jan 2016 17:27:45 +0000 (17:27 +0000)]
Check accumulated parameter before entering recursion.

9 years agoRenamed files and function.
Ulya Trofimovich [Fri, 1 Jan 2016 10:11:56 +0000 (10:11 +0000)]
Renamed files and function.

9 years agoMoved source subdirectory.
Ulya Trofimovich [Thu, 31 Dec 2015 22:06:13 +0000 (22:06 +0000)]
Moved source subdirectory.

9 years agoMoved source files to a proper subdirectory.
Ulya Trofimovich [Thu, 31 Dec 2015 21:52:54 +0000 (21:52 +0000)]
Moved source files to a proper subdirectory.

9 years agoRemoved obsolete code deduplication mechanism.
Ulya Trofimovich [Thu, 31 Dec 2015 21:17:32 +0000 (21:17 +0000)]
Removed obsolete code deduplication mechanism.

This mechanism was tricky and fragile; it cost us a most unfortunate
bug in PHP lexer: https://bugs.gentoo.org/show_bug.cgi?id=518904
(and a couple of other bugs).

Now that re2c does DFA minimization this is no longer needed. Hoooray!

The updated test changed because skeleton is constructed prior to
DFA minimization.

9 years agoRemoved obsolete comments.
Ulya Trofimovich [Thu, 31 Dec 2015 20:56:41 +0000 (20:56 +0000)]
Removed obsolete comments.

9 years agoAdded test for bug #128 "very slow DFA construction (resulting in a very large DFA)".
Ulya Trofimovich [Thu, 31 Dec 2015 20:17:16 +0000 (20:17 +0000)]
Added test for bug #128 "very slow DFA construction (resulting in a very large DFA)".

After minimization the resulting DFA is much smaller:
    /*!re2c
        [ac]{0,14} [a] [ac]{0,14} {}
    */

Was:
    $ time re2c slow.re > slow.c && stat -c '%s' slow.c

    real    1m54.837s
    user    1m54.733s
    sys     0m0.120s
    5627102

Now:
    $ time ./re2c slow.re > slow.c && stat -c '%s' slow.c

    real    0m0.732s
    user    0m0.684s
    sys     0m0.048s
    15078

9 years agoAdded DFA minimization and option '--dfa-minimization <table | moore>'.
Ulya Trofimovich [Thu, 31 Dec 2015 15:35:30 +0000 (15:35 +0000)]
Added DFA minimization and option '--dfa-minimization <table | moore>'.

Test results changed a lot; it is next to impossible to verify them
by hand. I therefore implemented two different minimization algorithms:
    - "table filling" algorithm (simple and inefficient)
    - Moore's algorithm (not so simple and efficient enough)
They produce identical minimized DFA (up to states relabelling), thus
giving some confidence in that the resulting DFA is correct.

I also checked the results with '--skeleton': re2c constructs
skeleton prior to reordering and minimization, therefore
skeleton-generated data is free of (potential) minimization errors.

9 years agoSplit DFA intermediate representation in two parts: DFA and ADFA.
Ulya Trofimovich [Wed, 30 Dec 2015 20:52:33 +0000 (20:52 +0000)]
Split DFA intermediate representation in two parts: DFA and ADFA.

ADFA stands for 'action DFA', that is, DFA with actions.

During DFA construction (aka NFA determinization) it is convenient
to represent DFA states as indexes to array of states.
Later on, while binding actions, it is more convanient to store
states in a linked list.

9 years agoMerge branch 'master' of git://github.com/jcfp/re2c
Ulya Trofimovich [Mon, 21 Dec 2015 14:29:20 +0000 (14:29 +0000)]
Merge branch 'master' of git://github.com/jcfp/re2c

*  missed some occurrences...

9 years agomissed some occurrences... 130/head
jcfp [Mon, 21 Dec 2015 13:25:49 +0000 (14:25 +0100)]
missed some occurrences...

9 years agoMerge branch 'master' into devel
Ulya Trofimovich [Mon, 21 Dec 2015 11:51:47 +0000 (11:51 +0000)]
Merge branch 'master' into devel

9 years agoMerge branch 'jcfp-master'
Ulya Trofimovich [Mon, 21 Dec 2015 11:46:46 +0000 (11:46 +0000)]
Merge branch 'jcfp-master'

9 years agotyp0 fixes 129/head
jcfp [Mon, 21 Dec 2015 10:57:59 +0000 (11:57 +0100)]
typ0 fixes

9 years agoKeep DFA states in a hash map (to speedup lookup fo an identical state).
Ulya Trofimovich [Sat, 19 Dec 2015 17:17:00 +0000 (17:17 +0000)]
Keep DFA states in a hash map (to speedup lookup fo an identical state).

This partially fixes bug #128: "very slow DFA construction (resulting
in a very large DFA)". DFA construction is no longer slow, but the
resulting DFA is still too large and needs to be minimized.

9 years agoDFA construction: epsilon-closure of NFA states: pick only kernel states.
Ulya Trofimovich [Fri, 18 Dec 2015 21:48:27 +0000 (21:48 +0000)]
DFA construction: epsilon-closure of NFA states: pick only kernel states.

9 years agoChanged bytecode intermediate representation to a simpler NFA representation.
Ulya Trofimovich [Fri, 18 Dec 2015 12:52:17 +0000 (12:52 +0000)]
Changed bytecode intermediate representation to a simpler NFA representation.

9 years agoBase '+' (one or more repetitions) on '*' (zero or more repetitions).
Ulya Trofimovich [Tue, 15 Dec 2015 12:44:47 +0000 (12:44 +0000)]
Base '+' (one or more repetitions) on '*' (zero or more repetitions).

Kleene star '*' (aka iteration, repetition, etc.) is a primitive
operation in regular expressions.

For some reason re2c used '+' as a primitive operation and expressed
'*' in terms of '+'. It is inconvenient, because all algorithms
described in literature are based on '*'.

Because we now express 'a+' as 'a* a', we have to set 'PRIVATE' attribute
on 'a': otherwize 'a' gets shared between the two occurences which causes
complex bugs.

Expressing 'a+' in a more intuitive way as 'a a*' rather than 'a* a'
causes the generated code to duplicate certain states. The generated code
is (supposedly correct), but re2c fails to deduplicate these states.
We therefore prefer 'a* a' expansion, which results in exactly the same
code as before.

9 years agoNo need to preserve special order of states while building DFA (thanks to states...
Ulya Trofimovich [Mon, 14 Dec 2015 14:33:21 +0000 (14:33 +0000)]
No need to preserve special order of states while building DFA (thanks to states reordering).

9 years agoDropped the difference between left and right default rule (thanks to states reordering).
Ulya Trofimovich [Mon, 14 Dec 2015 14:21:13 +0000 (14:21 +0000)]
Dropped the difference between left and right default rule (thanks to states reordering).

Bootstrap lexer changed a lot: this change is caused by commit
a4c192f27ae8806e67a8ff311eeff53d74dacb71: "Reordered states in DFA.".
Changes in parser by this commit triggered lexer regeneration.