granicus.if.org Git - re2c/log

]> granicus.if.org Git - re2c/log

projects / re2c / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Ulya Trofimovich [Tue, 4 Oct 2016 16:05:20 +0000 (17:05 +0100)]

Fixed tags comparison in table minimization.

Bug was introduced in commit 5958a2366cdb1361b5158076924014a36ff19dcc
"Bind contexts (a.k.a. tags) to DFA transitions, not states.":
instead of comparing tags of single transition on one particular
symbol, we compared tags for all outgoing transitions at once
(probably copy-paste from Moore minimization).

The bug was revealed by multiple tests (including '--skeleton' ones)
as soon as I ran testsute with '--dfa-determinization table'.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 3 Oct 2016 11:23:25 +0000 (12:23 +0100)]

More efficiently merge closure tags from different rules.

Exploit the fact that closure items are grouped by rule
instead of messing with a bit mask of rule tags.

commit | commitdiff | tree

Ulya Trofimovich [Fri, 30 Sep 2016 10:22:53 +0000 (11:22 +0100)]

Attribute tag liveness to edges, not states (to refine analysis).

State granularity is too crude: there might be
two different paths to the same state, with some tag alive on one
path but not on the other. If liveness is attributed to states, then
tag liveness on one path implies tag liveness in the join state,
which affects the other path. But if liveness is attributed to
edges, then liveness of one path won't affect liveness of the other.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 29 Sep 2016 17:33:51 +0000 (18:33 +0100)]

Liveness of fallback tags is a forward-propagagation problem.

re2c used to merge all possible fallback tags and backward-propagate
them from states that have default transitions. This is an
overapproximation; some tags are marked alive while they are not.

Liveness of fallback tags should be forward-propagated, because
this is the only way to track paths that really need these tags
(that is, paths that start in final state and end with default
transition *without passing through another final state*).

Note that sometimes we can get to the same default transition
following two different paths, one path may passing through final
state; the other is not. One path cares for fallback tags; the
other doesn't. We have no way to find out if we start bacward
prpagation of fallback tags from default transition.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 28 Sep 2016 16:07:28 +0000 (17:07 +0100)]

Comments and minor code rearrangements; nothing serious.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 28 Sep 2016 14:52:11 +0000 (15:52 +0100)]

Tags can be calculated immediately after closure construction.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 28 Sep 2016 13:20:08 +0000 (14:20 +0100)]

Typo in comment.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 28 Sep 2016 13:14:55 +0000 (14:14 +0100)]

Sort closure right after construction.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 28 Sep 2016 13:02:28 +0000 (14:02 +0100)]

Drop shadowed final NFA states when constructing DFA state.

By construction NFA has exactly one final state per rule.
Thus closure has at most one final item per rule (in other
words, all final items in closure belong to different rules).
The rule with the highest priority shadowes all other rules.
Final items that correspond to shadowed rules are useless
and should be removed as early as possible.

If we let such items remain in closure, they may prevent the
new DFA state from being merged with other states. But this
won't affect the resulting program: meaningless finalizing tags
will be removed by dead code elimination and after that DFA
minimization will merge equivalent final states.

But it is much easier and cleaner to remove useless items
immediately (and thas't what we do).

Some '--skeleton' tests changed: this is caused by collapsed
DFA states (shadowed final NFA states prevented them from
being collapsed before). All changes are verified with '--skeleton'.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 27 Sep 2016 15:22:09 +0000 (16:22 +0100)]

Use 'Tagpool' for tag configurations during closure consruction.

Now that conflicting tag configurations are no longer merged,
we can insert them into 'Tagpool' immediately. This allows us
avoud ugly trick with 'union' in closure items.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 27 Sep 2016 14:48:27 +0000 (15:48 +0100)]

Don't merge conflicting tags, choose arbitrary configuration.

Merging conflicting tags doesn't make sense: information is lost
anyway.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 27 Sep 2016 14:11:05 +0000 (15:11 +0100)]

Use explicit constant 'ZERO_TAGS' instead of 0.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 27 Sep 2016 12:43:23 +0000 (13:43 +0100)]

Ported 'Tagpool' on 'lookup_t' and got rid of 'ord_hash_set_t'.

An effort to reduce entities with similar functionality:
'lookup_t' is more flexible than 'ord_hash_set_t'.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 27 Sep 2016 12:21:48 +0000 (13:21 +0100)]

Removed unused header.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 27 Sep 2016 12:00:23 +0000 (13:00 +0100)]

Use 'std::vector' to store closure items during determinization.

This makes code somewhat clearer compared to using plain C arrays:
all function accept one parameter (vector) instead of two (array
start, array end/array length).

However, vectors have nontrivial structure and 'ord_hash_set_t'
is not suitable for them. That's why instead of 'ord_hash_set_t'
we now use 'lookup_t': just like 'ord_hash_set_t' it has constant
random access ang logarithmic insertion complexity, but unike
'ord_hash_set_t' it is not meant to be a container (rather an index).

commit | commitdiff | tree

Ulya Trofimovich [Mon, 26 Sep 2016 16:09:18 +0000 (17:09 +0100)]

Restructured determinization algorithm DFA construction in a more canonical way.

All textbooks describe determinization (powerset construction)
like this:

    S0 <- epsilon-closure of start NFA state
    add S0 to DFA
    while there is unmarked state S:
        mark S
        for each symbol X in the alphabet:
            R <- the set of NFA states reachable from S on X
            T <- epsilon-closure of R
            if T is not in DFA
                add T to DFA

In re2c the inner loop on alphabet symbols was split in two parts
and somewhat obfuscated; this commit makes re2c follow textbook
version more closely.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 26 Sep 2016 14:35:17 +0000 (15:35 +0100)]

Moved important DFA construction subroutines into separate files.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 26 Sep 2016 11:29:27 +0000 (12:29 +0100)]

Base '*' (zero or more repetitions) on '+' (one or more repetitions).

This commit effectively reverts commit 5cf441dc936c16e2264e49038ebe9a108dc750b9
"Base '+' (one or more repetitions) on '*' (zero or more repetitions)."

It turn out, there is a good reason for using '+' as base operation:
'*' can be expressed in terms of '+' as 'r* ::= r+ | <empty>', while
'+' expands as 'r+ ::= r r*' and 'r' is duplicated.

Duplication becomes crucial in presence of tags: if the duplicated
subexpression has tags, then duplication causes an error.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 28 Jun 2016 20:44:58 +0000 (21:44 +0100)]

Added test fot bug #152 "Line number in #line directive after enum
YYCONDTYPE is 0-based".

(thanks to sirzooro for reporting)

commit | commitdiff | tree

Ulya Trofimovich [Wed, 25 May 2016 09:52:20 +0000 (10:52 +0100)]

Traverse NFA from left to right when building epsilon-closure.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 23 May 2016 22:10:50 +0000 (23:10 +0100)]

Flip dimensions in 2d array (as it is slightly more convenient).

commit | commitdiff | tree

Ulya Trofimovich [Mon, 23 May 2016 21:37:10 +0000 (22:37 +0100)]

Renaming only.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 23 May 2016 21:07:00 +0000 (22:07 +0100)]

Remove useless final states; calculate fallback states more accurately.

Removing useless final states allows some further optimizations
which would otherwise be impossible (see the changed tests for
example).

However, removing useless final states revealed previously hidden
defects in calculating fallback states: in some tests spurious
new 'yyaccept' values appeared. It turned out that we used incorrect
(imprecise, too pessimistic) algorithm to calculate fallback states:
the given state was considered to be fallback state if it was final
and had transitions to non-default and non-final states. This
algorithm erroneousl captures cases when all paths from the given
state are accepting and long enough (longer than one transition).
The new algorithm to find fallback states uses information obtained
by rule liveness analyses (reachability of "none-rule" in particular).

Some important tests changed a lot; skeleton says the changes are
correct (as well as manual looking into the changes).

commit | commitdiff | tree

Ulya Trofimovich [Mon, 23 May 2016 11:49:33 +0000 (12:49 +0100)]

Fixed rule reachability analyses.

The previous algorithm handled loops incorrectly: if deep-first search
from current DFA state happened to visit looping paths before non-looping
paths, then result accumulated from non-looping paths was not propagated
into loopback states.

Now we use back propagation (from states that have transitions to default
state), which handles loops correctly.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 19 May 2016 12:04:11 +0000 (13:04 +0100)]

Tag deduplication should go before DFA minimization.

Tags prevent minimization in some cases; deduplicating them before
minimzation gives better chances to the latter.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 19 May 2016 10:37:11 +0000 (11:37 +0100)]

Don't bother with reachability when reporting nullable rules.

We couldn't trace all unreachable nullable rules anyway, e.g.:
[^]?
Nullable part of this rule is unreachable, but it was reported.
Besides, there's nothing bad in reporting all probles at once.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 18 May 2016 23:16:02 +0000 (00:16 +0100)]

Give up the idea of sharing reversed DFA across different algorithms.

Reversed DFA reflects state of the original DFA in the moment when
its reversed copy was built. Original DFA undergoes many changes;
it's hard (and needless) to keep reversed DFA in sync with the
original.

We only need reversed DFA in a couple of places, better make a new
copy in each case.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 18 May 2016 15:57:40 +0000 (16:57 +0100)]

There's no use in looping twice when tracing default paths.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 18 May 2016 14:30:49 +0000 (15:30 +0100)]

Use backwards propagation for liveness analyses on tags.

In commit a7a26d17cee825afcbe8659cb31a04a165172eed we added fix-point
iterative algorithm to do liveness analyses on tags:

    The usual algorithm for liveness analyses doesn't use DFS;
    instead, it iterates on DFA states until fix point is reached
    (next iteration adds no changes to live set compared to the
    previous iteration). The fix applies this algorithm.

However, instead of iterating until fix point is reached, one may
use backwards propagation of live variables starting from the points
where these variables are used.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 17 May 2016 17:55:03 +0000 (18:55 +0100)]

Moved loop-invariant code out of loop.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 17 May 2016 16:46:05 +0000 (17:46 +0100)]

Use tagpool to store and manipulate intermediate tag sets.

Tagpool now has a special buffer to accommodate the needs of
intermediate calculations involving tag sets. Tagpool is in charge
of these calculations now; it has various functions to merge,
mask and whatever one has to do with tag sets.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 17 May 2016 16:11:40 +0000 (17:11 +0100)]

Fixed bug in tag liveness analyses (tags lost in loops).

Example that revealed failure (added as test):
(@p "ac")* "b" { @p }

We used forward propagation: starting from initial state, walk DFA
in deep-first order; in each state, add tags associated with rule
or fallback set if necessary, then recurse to child states, then
merge all child sets into this node's live set.

This algorithm cannot propagate live tags in loops in cases when
DFS visits loopbacks before non-looping paths (tags don't propagate
into loopback states).

The usual algorithm for liveness analyses doesn't use DFS;
instead, it iterates on DFA states until fix point is reached
(next iteration adds no changes to live set compared to the
previous iteration). The fix applies this algorithm.

A more efficient algorithm may be possible; one that uses backwards
propagation instead of fix point (I'm not sure).

commit | commitdiff | tree

Ulya Trofimovich [Sun, 15 May 2016 16:49:52 +0000 (17:49 +0100)]

Fixed occasional double-freeing tag names.

Tag names belong to regexps, which may be shared between automata,
so names should be freed on regexp destruction, not on automata
destruction.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 14 May 2016 14:34:54 +0000 (15:34 +0100)]

Code cleanup.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 14 May 2016 13:46:39 +0000 (14:46 +0100)]

Don't forget to dereference tag aliases.

Some tags are merged to other tags during tag deduplication.
These tags are referenced by their distinct name in action code,
but in fact they are just aliases to their representative tag
(origin).

commit | commitdiff | tree

Ulya Trofimovich [Fri, 13 May 2016 13:35:10 +0000 (14:35 +0100)]

Moved dispatch on fixed/variable tags inside of tag-printing function.

commit | commitdiff | tree

Ulya Trofimovich [Fri, 13 May 2016 12:51:44 +0000 (13:51 +0100)]

Fixed tag substitution.

If one tag name, say 'p', is a prefix of another tag name, say 'p1',
and we try to substitute all occurences of '@p' in the given string,
we might occasionally substitute '@p1'.

I made a small re2c lexer that recognizes tag-like lexemes and
tries to match them agains rule's tags. If a tag's name matches the
recognized substring, than it is the perfect match (name cannot be
a prefix of a longer name because lexer recognizes the longest match).

commit | commitdiff | tree

Ulya Trofimovich [Wed, 11 May 2016 16:18:01 +0000 (17:18 +0100)]

Code cleanup: removed useless/added missing includes.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 11 May 2016 15:49:59 +0000 (16:49 +0100)]

Don't use '-Werror-nondeterministic-tags' with '-T, --tags'.

If tags are explicitly enabled, re2c should always fail with error
on nondeterministic tags. There's no good in '-Werror-nondeterministic-tags'
diagnostic --- the user cannot reset it anyway.

This commit fixes unsuccessful attempt 2bfe75d57c71c7102dd6e7b4e5f7139411c7c080
"Restore user warnings after temporarily enforcing a warning.", which
resulted in hiding tag errors if '-Wnondeterministic-tags' is not set.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 11 May 2016 14:47:37 +0000 (15:47 +0100)]

Fixed bug #142 "segvault with null terminated input"

Steps to reproduce:
    $ echo -ne "&\x00" > A
    $ re2c A
    Segmentation fault

Analyses: when re2c finds NULL in the input file, it checks for the
end of input; if indeed it has reached the end of input, it stops.
Otherwise, it's just some NULL byte in the middle of input; it should
be handled like any other character.

The first case (NULL as end of input) was handled correctly, but
in the second case (NULL in the middle of input) re2c crashed:
someone forgot to put an appropriate 'goto' statement, which caused
completely ad-hoc control flow in lexer.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 11 May 2016 13:48:25 +0000 (14:48 +0100)]

Don't loose tags on epsilon-loops in NFA.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 11 May 2016 12:27:07 +0000 (13:27 +0100)]

Added tests for tags (nondeterminism, unsuccessfull minimization).

commit | commitdiff | tree

Ulya Trofimovich [Wed, 11 May 2016 12:17:20 +0000 (13:17 +0100)]

Restore user warnings after temporarily enforcing a warning.

This is a temporary (and quite inconvenient) hack; in fact,
warnings should be handled in the same way as options (via
user config / real config that are synced on each read access
just in case user config has been changed).

commit | commitdiff | tree

Ulya Trofimovich [Wed, 11 May 2016 10:48:01 +0000 (11:48 +0100)]

Fixed memleak.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 11 May 2016 10:17:49 +0000 (11:17 +0100)]

Allow to override output and header filenames with configurations.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 10 May 2016 15:48:29 +0000 (16:48 +0100)]

Fixed parsing of some configurations (forgot to parse semicolon).

Added test for various 're2c:flags:' configurations.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 10 May 2016 14:17:51 +0000 (15:17 +0100)]

Added tests for bug #121 "trailing contexts are fundamentally broken".

commit | commitdiff | tree

Ulya Trofimovich [Tue, 10 May 2016 13:12:26 +0000 (14:12 +0100)]

Code cleanup: use anonymous union to shorten names.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 10 May 2016 13:02:59 +0000 (14:02 +0100)]

Renamed 'contexts' to 'tags' (and all related stuff likewise).

No serious changes.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 9 May 2016 14:13:42 +0000 (15:13 +0100)]

Keep fixed and variable tags together in one array.

This complicates tag deduplication a little (fixed tags have to be
masked), but simplifies tag initialization and rule handling.

commit | commitdiff | tree

Ulya Trofimovich [Fri, 6 May 2016 10:29:57 +0000 (11:29 +0100)]

Don't force mutations of immutable regexp AST.

Regexp AST should stay immutable as it may be shared between
different conditions. This means we have to store tag indices
somwhere else.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 5 May 2016 16:17:06 +0000 (17:17 +0100)]

Allow tags in any part of regexp, not only on top-level concatenation.

Fixed-length tag optimization applies only to top-level concatenation:
since we cannot predict which path lexer will choose, we cannot be
sure that any tags except top-level concatenation will ever be
initialized. If a tag may be uninitialized, we cannot fix other
tags on this one (we cannot even fix same-level tags relative to
each other, because fixed tags won't preserve the default value).

commit | commitdiff | tree

Ulya Trofimovich [Mon, 2 May 2016 10:01:57 +0000 (11:01 +0100)]

Fixed comparison for struct 'kitem_t' (constified both operands).

commit | commitdiff | tree

Ulya Trofimovich [Mon, 2 May 2016 09:40:04 +0000 (10:40 +0100)]

Bind contexts (a.k.a. tags) to DFA transitions, not states.

This is a very imortant change: it makes tracing tag conflicts as
simple as comparing tags on transitions during DFA construction.
If during determinization we get two identical transitions that
differ only in tags, then we have a tag conflict. Tags that cause
conflicts are called non-deterministic (since they don't allow
deterministic match).

This approach if very similar to Ville Laurikari's TDFA: as he does,
we first build TNFA and then apply determinization; however Laurikari's
TDFA uses complex bookkeeping to track all possible tag values,
while re2c simply forbids tags that cannot be matche efficiently.

Binding tags to transitions allows more fine-grained liveness
analyses, dead tag elimination and tag deduplication.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 1 May 2016 12:29:47 +0000 (13:29 +0100)]

Keep rule number in each NFA state (not only in final states).

This is currently not necessary, but we'll need it to distinguish
between different rules when comparing tags: if rules are different,
we don't care about tag difference; otherwise tags must be equal.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 1 May 2016 11:59:24 +0000 (12:59 +0100)]

Cleanup in codegen (minor changes in '-D, --emit-dot' output).

    - more elegant hanling of default case when generatin code
      for 'switch' cases
    - dumbed down complex logic behind the generation of sequential
      'if' statements: explicitely list all different cases and
      corresponding conditions

commit | commitdiff | tree

Ulya Trofimovich [Sun, 1 May 2016 08:18:07 +0000 (09:18 +0100)]

More precise liveness analyses for context deduplication.

Liveness analyses should exclude contexts that are updated before
they are used past certain point in the program.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 30 Apr 2016 19:07:54 +0000 (20:07 +0100)]

Cleanup in code generation (mostly formatting).

commit | commitdiff | tree

Ulya Trofimovich [Sat, 30 Apr 2016 16:10:31 +0000 (17:10 +0100)]

Simplified rule tracking during DFA construction.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 30 Apr 2016 16:04:27 +0000 (17:04 +0100)]

Simplified collecting context names during code generation.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 30 Apr 2016 15:47:06 +0000 (16:47 +0100)]

Rule contexts are consecutive, so we only need to store bounds.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 30 Apr 2016 15:35:40 +0000 (16:35 +0100)]

Count contexts after deduplication.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 30 Apr 2016 15:23:32 +0000 (16:23 +0100)]

Break from loop if the remaining iterations have no effect.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 30 Apr 2016 15:18:54 +0000 (16:18 +0100)]

Cleaned up code that merges arcs while tunneling DFA.

Behaviour should be unchanged.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 5 Apr 2016 22:01:33 +0000 (23:01 +0100)]

Moved conditions and contexts from global scope to block scope.

Directives like '/*!re2c:types*/' and '/*!re2c:contexts*/' have
global scope, they are not binded to any particular block and therefore
accumulate items (conditions/contexts/etc.) from the whole program.

There's currently no way to bind these directives to a particular
block (re2c will probably add scoping rules in future).

However, things like default context declaration or condition dispatch
have block scope (it's only natural that they are binded to the block
that generates them).

Unrelated change: context marker should be set for each condition,
re2c must generate it after condition label: this way code that skips
condition dispatch and jums directly to condition label will still
have context marker adjusted properly.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 5 Apr 2016 11:38:53 +0000 (12:38 +0100)]

Added configurations for all command-line options.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 3 Apr 2016 16:00:37 +0000 (17:00 +0100)]

Separated .dot code generation from common case.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 3 Apr 2016 07:31:25 +0000 (08:31 +0100)]

Omit some useless newlines in the generated code.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 2 Apr 2016 21:06:25 +0000 (22:06 +0100)]

Lexer: minor code cleanup (check errors before doing anything else).

commit | commitdiff | tree

Ulya Trofimovich [Sat, 2 Apr 2016 17:16:55 +0000 (18:16 +0100)]

Lexer: moved 'end of comment' sublexer to a separate function.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 2 Apr 2016 16:25:23 +0000 (17:25 +0100)]

Lexer: unified handling of various re2c directives.

Now when lexer encounters the beginning of a new directive, it
dumps all the intermediate code to the output.

Before this commit re2c lexer dumped intermediate code on each
newline that occured in the input.

So this commit affects two aspects:
    - Intermediate code is dumped much less often: it's good for
      performance, but it becomes more probable that the intermediate
      code will occasionally occupy too much buffer space and incur
      buffer resize.
    - Some re2c directives ignored characters right before the
      (on the same line). These unfortunate characters are no longer
      ignored.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 2 Apr 2016 14:21:50 +0000 (15:21 +0100)]

Lexer: drop useless rule.

The 'end of comment' rule was used to detect end of re2c directives;
however, now it is handled by a special 'end of comment' lexer block.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 2 Apr 2016 14:07:59 +0000 (15:07 +0100)]

Lexer: don't care if end of comment is followed by a newline.

re2c used to swallow newline that immediately followes end of directive:
for most of the directives re2c generates code block that ends with a
newline, so the generated code looks better if the newline is not doubled.

However, this unnecessarily complicates the lexer.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 2 Apr 2016 10:41:04 +0000 (11:41 +0100)]

Lexer: suppressed directives in skeleton mode, cleaned up code.

commit | commitdiff | tree

Ulya Trofimovich [Fri, 1 Apr 2016 20:37:03 +0000 (21:37 +0100)]

Explicitely handle empty string in lexer.

(rather that hide it inside of a definition)

commit | commitdiff | tree

Ulya Trofimovich [Fri, 1 Apr 2016 16:04:12 +0000 (17:04 +0100)]

Added directive '/*!contexts:re2c ... */'.

Directive is a new type of re2c block that consists of zero or more
of the following configurations (example values are the defaults):
line = "long @@;";
sep = "";

Renamed configuration: 're2c:ctxprefix' -> 're2c:contexts:prefix'

Added configuration: 're2c:contexts:expr = "@@"' that allows to
override the way context markers are accessed (but nottheir names).

Removed configuration: 're2c:define:YYDISTTYPE' (use the new
directive '/*!contexts:re2c ... */' instead).

commit | commitdiff | tree

Ulya Trofimovich [Tue, 29 Mar 2016 21:33:32 +0000 (22:33 +0100)]

Parser grammar cleanup.

- Rearranged some rules to avoid code duplication.
- Added production and error message about contexts in named definitions.
- Removed productions and error messages about missing expression
(simple 'syntax error' is enough, given that only part of the
errors was captured by the removed productions).

commit | commitdiff | tree

Ulya Trofimovich [Tue, 29 Mar 2016 14:16:01 +0000 (15:16 +0100)]

Optimized pointer arithmetics generated with '-C, --contexts'.

If default input API is used and fixed-length context is based on
the rightmost context (that is, YYCURSOR), then the following expression
(that calculates pointer corresponding to the given context):
    (YYCTXMARKER + ((YYCURSOR - YYCTXMARKER) - yyctx))
can be optimized to:
    (YYCURSOR - yyctx)

Note: unfortunately, as of GCC < 7.0, expressions like:
    (YYCURSOR - 3) - (YYCURSOR - 5)
trigger warning:
    warning: integer overflow in expression [-Woverflow]
See this GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61240

commit | commitdiff | tree

Ulya Trofimovich [Tue, 29 Mar 2016 11:54:02 +0000 (12:54 +0100)]

Added some configurations related to '-C, --contexts' option.

Configurations:
    "define:YYCTX"      (defaults to 'YYCTX')
    "define:YYDIST"     (defaults to 'YYDIST')
    "define:YYDISTTYPE" (defaults to 'long')
    "ctxprefix"         (defaults to 'yyctx')

commit | commitdiff | tree

Ulya Trofimovich [Mon, 28 Mar 2016 18:59:23 +0000 (19:59 +0100)]

Use 'std::valarray' instead of 'std::vector' to avoid copy semantics.

Had to rename 'INFINITY' to 'SCC_INF': inclusion of '<valarray.h>'
pulled '<math.h>' and introduced name collision.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 28 Mar 2016 14:49:07 +0000 (15:49 +0100)]

Use 'size_t' instead of 'uint32_t' to avoid 'static_cast'.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 28 Mar 2016 14:14:37 +0000 (15:14 +0100)]

Make sure that default rule is handled together with other rules.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 28 Mar 2016 13:39:11 +0000 (14:39 +0100)]

Moved nontrivial context handling from parser to NFA construction phase.

Parser should simply construct AST; all the complex reasoning about
contexts (fixed vs variable) should be delayed until NFA is being
constructed. This way AST can be immutable and it's very easy to share
parts of AST between different conditions, etc.

Removed rule ranks and rank counter: rules are now stored in NFA-local
array and addressed by index.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 23 Mar 2016 17:40:21 +0000 (17:40 +0000)]

Added partial submatch extraction.

Enabled with '-C, --contexts' option.

Warning '-Wselfoverlapping-contexts'.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 16 Mar 2016 14:15:25 +0000 (14:15 +0000)]

Moved loop counters out of skeleton; they belong to algorithm-local stuff.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 16 Mar 2016 13:54:23 +0000 (13:54 +0000)]

Pack all stuff relevant to skeleton data generation in a struct.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 16 Mar 2016 12:20:25 +0000 (12:20 +0000)]

Use distinct type for path suffixes during skeleton data generation.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 16 Mar 2016 09:41:02 +0000 (09:41 +0000)]

Address skeleton nodes by indices rather than by pointers.

This way it is more convenient to add various graph algorithms:
each algorithm can keep its own data in array indexed by skeleton
node numbers (instead of pushing all relevant data inside of the
node).

commit | commitdiff | tree

Ulya Trofimovich [Tue, 15 Mar 2016 13:25:32 +0000 (13:25 +0000)]

Reuse the same struct to represent different kinds of paths in skeleton.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 15 Mar 2016 12:41:07 +0000 (12:41 +0000)]

[-Wundefined-control-flow] analyses: simplified path structure.

Now paths involved in [-Wundefined-control-flow] analyses have
exactly the same structure as paths involved in skeleton data generation.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 15 Mar 2016 08:28:02 +0000 (08:28 +0000)]

Added comments.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 15 Mar 2016 08:15:24 +0000 (08:15 +0000)]

Moved comment.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 15 Mar 2016 08:08:45 +0000 (08:08 +0000)]

Moved calculation of fallback states to earlier stage of DFA construction.

We will need this later, when adding multiple context markers
(liveness analyses during context marker deduplication).

commit | commitdiff | tree

Ulya Trofimovich [Mon, 14 Mar 2016 22:52:53 +0000 (22:52 +0000)]

Eradicated occasional copy-paste.

Used function 'make_rule_copy' introduced by commit
f781c3a3a07cde0243c894da640070d6c62ef4bc.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 14 Mar 2016 22:38:42 +0000 (22:38 +0000)]

Moved function definitions out of header.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 14 Mar 2016 22:23:03 +0000 (22:23 +0000)]

Skeleton: simplified path structure.

Just store pointers to skeleton nodes (instead of bookkeeping arcs,
contexts and rules in each path). All the necessary information can
be easily retrieved from nodes when path is baing dumped to file.

Three tests in '--skeleton' mode have been broken by this commit.
Actually, these are not breakages: these cases reveal incorrect
re2c-generated code. The change is due to the fact that skeleton
now doesn't simulate contexts that go *after* the matched rule:

------o------o------> ... (fallback to rule)
rule context

commit | commitdiff | tree

Ulya Trofimovich [Mon, 14 Mar 2016 21:45:42 +0000 (21:45 +0000)]

Skeleton: respect fixed-length contexts.

commit | commitdiff | tree

Ulya Trofimovich [Fri, 26 Feb 2016 23:03:43 +0000 (23:03 +0000)]

run_tests.sh: fixed cleanup of intermediate files with '--skeleton'.

Broken by commit c56a14582de4fbbed34d231e521ab2e6af5d43c3
"run_tests.sh: don't crash on absolute filepaths in test names."

commit | commitdiff | tree

Ulya Trofimovich [Wed, 24 Feb 2016 16:57:54 +0000 (16:57 +0000)]

run_tests.sh: don't crash on absolute filepaths in test names.

This patch fixes bug #137 "run_tests.sh fail when running configure
script with absolute path".

The fix: copy all test files into temporary build directory (which
is addressed by a relative filepath) and keep messing with relative
filepaths.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 22 Feb 2016 15:47:44 +0000 (15:47 +0000)]

Code cleanup: moved rule construction to a separate file.

Unnamed repository; edit this file 'description' to name the repository.