]> granicus.if.org Git - re2c/log
re2c
7 years agofix documentation issue #184 devel
Petr Skocik [Thu, 20 Apr 2017 15:53:25 +0000 (17:53 +0200)]
fix documentation issue #184

7 years agoManpage: removed occasionally copy-pasted section.
Ulya Trofimovich [Mon, 10 Apr 2017 12:43:02 +0000 (13:43 +0100)]
Manpage: removed occasionally copy-pasted section.

7 years agoAutogenerate help from manpage.
Ulya Trofimovich [Mon, 10 Apr 2017 12:41:17 +0000 (13:41 +0100)]
Autogenerate help from manpage.

7 years agoManpage: include paths must be relative to 'top_srcdir'.
Ulya Trofimovich [Sun, 9 Apr 2017 18:36:28 +0000 (19:36 +0100)]
Manpage: include paths must be relative to 'top_srcdir'.

Otherwise out-of-source builds won't work.

7 years agocompose manpage out of rsts from gh-pages-gen
Petr Skocik [Sat, 8 Apr 2017 21:14:36 +0000 (23:14 +0200)]
compose manpage out of rsts from gh-pages-gen

7 years agosync --help output w/ manpage
Petr Skocik [Fri, 7 Apr 2017 23:29:38 +0000 (01:29 +0200)]
sync --help output w/ manpage

    + set output width to 80
    + make help output go to stdout rather than stderr

7 years agoGrammar fixes in the manpage
Petr Skocik [Mon, 3 Apr 2017 15:48:10 +0000 (17:48 +0200)]
Grammar fixes in the manpage

7 years agoRecognize newlines in character strings and classes.
Ulya Trofimovich [Mon, 7 Nov 2016 09:46:18 +0000 (09:46 +0000)]
Recognize newlines in character strings and classes.

As for now, newline inside of a character string or class is an error:
re2c should emit clear error message. Different styles of newlines
should be recognized ("\n", "\r\n").

This commit fixes bug #162 reported by pauloscustodio:
    Reading files with "rb" causes issues in Windows

7 years agoFixed line endings in output files on Windows (#162, #163).
Ulya Trofimovich [Sun, 30 Jul 2017 16:45:30 +0000 (17:45 +0100)]
Fixed line endings in output files on Windows (#162, #163).

This fix consists of two issues, both reported and fixed by pauloscustodio.

1. #162 "Open text files with "wb" causes issues on Windows"

    Text files need to be opened for writing with "w", so that stdio does
    the right thing in respect to the correct line endings for the current OS.
    ("\r\n" in Windows, "\n" in Linux).

2. #163 "Reading files with "rb" causes issues in Windows"

    re2c reads input files in binary mode and writes the generated output in
    text mode. This caused CR LF conversion to CR CR LF on Windows: first CR
    comes from reading input in binary mode, second CR is added when writing
    output in text mode. This only happened to those parts of input which are
    not transformed by re2c: we used to copy-paste verbatim, now we patch line
    endings. Now we convert all line endings to LF before writing the generated
    code to file.

7 years agoFixed #147 "Please add symbol name to "can't find symbol" error message".
Ulya Trofimovich [Sat, 25 Jun 2016 15:22:08 +0000 (16:22 +0100)]
Fixed #147 "Please add symbol name to "can't find symbol" error message".

As suggested by sirzooro:
    Please add symbol name to "can't find symbol" error message,
    it would allow to quickly spot what is wrong. Now we have to
    position cursor at given row and column to find that name.

Also tweaked error reporitng function to append "..." at the end
of the message if it didn't fit into buffer.

7 years agoFixed bug #145 "Values for enum YYCONDTYPE are not generated
Ulya Trofimovich [Fri, 24 Jun 2016 21:46:16 +0000 (22:46 +0100)]
Fixed bug #145 "Values for enum YYCONDTYPE are not generated
when default rules with conditions are used".

Default rule is handled in a special (delayed) way;
re2c uses different code for default rule than for normal rules.
This special code simply forgot to add condition name to the list
of conditions.

Thanks to sirzooro for bug report.

7 years agorun_tests.sh: fix permissions after copying source files to build directory.
Ulya Trofimovich [Sun, 9 Apr 2017 18:28:28 +0000 (19:28 +0100)]
run_tests.sh: fix permissions after copying source files to build directory.

`make distcheck` protects source files from writing.
Test script run_tests.sh copies source files into build directory,
but the copied files inherit permissions, so `make distcheck` fails.

7 years agorun_tests.sh: patch line endings in the generated file.
Ulya Trofimovich [Sat, 5 Nov 2016 15:24:02 +0000 (15:24 +0000)]
run_tests.sh: patch line endings in the generated file.

Line endings in the generated code depend on the target platform: e.g.,
"\r\n" on Windows vs. "\n" on Linux. However, reference test results are
(currently) generated on Linux and therefore contain "\n" line endings.
So we have to patch line endings in the generated code in order to pass
the tests on Windows.

Testing script did patch line endings in stdout and stderr, but forgot
to patch them in the generated file (it was broken since we started to
use '-o' option for testing). This commit fixes testing script.

It also deletes a couple of tests in which source code contains "\r\n"
instead of "\n". These tests are duplicates of other tests (they were
added by commit bd2875441cae4ab3934bfafcd34728021295b842 supposedly to
test that re2c preserves line endings in source code). They are broken
by current commit and fixing them is probably not worth of the effort.

7 years agoFixed #141 "Tests under Windows".
Ulya Trofimovich [Fri, 29 Apr 2016 07:16:12 +0000 (08:16 +0100)]
Fixed #141 "Tests under Windows".

Thanks to Abs62, who noted that under Windows (in MSYS) tests fail
because '2>"$outc.stderr"' dumps CRLF to file instead of LF
and proposed a fix:
    sed -i 's/\r//g' "$outc.stderr"

7 years agoMakefile.am: removed inexistent header.
Ulya Trofimovich [Sun, 30 Jul 2017 10:46:31 +0000 (11:46 +0100)]
Makefile.am: removed inexistent header.

7 years agoUpdated paper on Lookahead TDFA.
Ulya Trofimovich [Sat, 29 Jul 2017 18:27:32 +0000 (19:27 +0100)]
Updated paper on Lookahead TDFA.

7 years agoExplicitly pass line/column info in all error messages.
Ulya Trofimovich [Sat, 29 Jul 2017 18:10:44 +0000 (19:10 +0100)]
Explicitly pass line/column info in all error messages.

Updated tests. Some error messages are more precise now, e.g. ill-formed
character classes and escape sequences: column points to the beginning
of the faulty lexeme rather than to the middle of it where the error
occured. Other error messages are less precise (lack column info), but
the column reported before was too inexact and didn't make much sense.

7 years agoRenamed simple tags into 'stags' and tags with history into 'mtags'.
Ulya Trofimovich [Fri, 28 Jul 2017 15:08:14 +0000 (16:08 +0100)]
Renamed simple tags into 'stags' and tags with history into 'mtags'.

YYTAGP                  -> YYSTAGP
YYTAGN                  -> YYSTAGN
YYTAGLISTP              -> YYMTAGP
YYTAGLISTN              -> YYMTAGN
/*!tags:re2c ... */     -> /*!stags:re2c ... */
/*!taglists:re2c ... */ -> /*!mtags:re2c ... */

Updated tests and benchmarks.

Fixed warning message in '-Wnondeterministic-tags'.

7 years agoPOSIX disambiguation: use the same comparison algorithm for orbit and non-orbit tags.
Ulya Trofimovich [Fri, 28 Jul 2017 10:17:10 +0000 (11:17 +0100)]
POSIX disambiguation: use the same comparison algorithm for orbit and non-orbit tags.

Previously we needed a different algorithm for non-orbit tags, because
disambiguation was based on both start and end tags. Non-orbit start tags
cannot be compared incrementally, like orbit tags, becuse default value
may be discovered on a later step than non-default value. Non-orbit end
tags do not have this problem: since negative tags are inserted at the end
of alternatives, default value is always discovered on the same step as
non-default value (provided that all higher-priority tags agree and
comparison reaches this tag at all).

Now that start tags are ignored, we can use incremental comparison for both
orbit and non-orbit subhistories, which simplifies the code.

7 years agoPaper on Lookahead TDFA: added benchmarks.
Ulya Trofimovich [Wed, 19 Jul 2017 17:43:33 +0000 (18:43 +0100)]
Paper on Lookahead TDFA: added benchmarks.

7 years agoReset '--no-lookahead' to default value if '--tags' is not enabled.
Ulya Trofimovich [Tue, 18 Jul 2017 12:56:48 +0000 (13:56 +0100)]
Reset '--no-lookahead' to default value if '--tags' is not enabled.

7 years agoApply tag version compaction even with '--no-optimize-tags'.
Ulya Trofimovich [Tue, 18 Jul 2017 12:54:46 +0000 (13:54 +0100)]
Apply tag version compaction even with '--no-optimize-tags'.

Otherwize RE2C emits too many unused tag variables and bloats the generated code.

7 years agoSplit source file in two.
Ulya Trofimovich [Mon, 17 Jul 2017 22:11:18 +0000 (23:11 +0100)]
Split source file in two.

7 years agoReplaced configuration 'no-lookahead' with 'lookahead' and updated test.
Ulya Trofimovich [Mon, 17 Jul 2017 22:06:55 +0000 (23:06 +0100)]
Replaced configuration 'no-lookahead' with 'lookahead' and updated test.

7 years agoAdded option '--no-optimize-tags' (switches off tag optimizations).
Ulya Trofimovich [Mon, 17 Jul 2017 22:03:53 +0000 (23:03 +0100)]
Added option '--no-optimize-tags' (switches off tag optimizations).

7 years agoFixed memory leak.
Ulya Trofimovich [Mon, 17 Jul 2017 21:38:20 +0000 (22:38 +0100)]
Fixed memory leak.

7 years agoForbid merging of tag variables for tags with and without history.
Ulya Trofimovich [Mon, 17 Jul 2017 21:36:02 +0000 (22:36 +0100)]
Forbid merging of tag variables for tags with and without history.

7 years agoAdded directive '/*!taglists:re2c ... */' to handle tags with history.
Ulya Trofimovich [Sun, 16 Jul 2017 11:13:04 +0000 (12:13 +0100)]
Added directive '/*!taglists:re2c ... */' to handle tags with history.

It is exactly like '/*!tags:re2c ... */', except that the latter is for
simple tags, while the former is for tags with history.

7 years agoPaper on Lookahead TDFA: added section about implementation.
Ulya Trofimovich [Wed, 12 Jul 2017 16:40:01 +0000 (17:40 +0100)]
Paper on Lookahead TDFA: added section about implementation.

7 years agoNicer output with '--dump-dfa-raw' and '--dump-nfa'.
Ulya Trofimovich [Wed, 12 Jul 2017 16:39:03 +0000 (17:39 +0100)]
Nicer output with '--dump-dfa-raw' and '--dump-nfa'.

7 years agoAllocate final/fallback tag versions beforehand.
Ulya Trofimovich [Tue, 11 Jul 2017 15:12:21 +0000 (16:12 +0100)]
Allocate final/fallback tag versions beforehand.

Just a more clear and intuitive way to allocate them, and easier
to document. Updated a lot of tests; changes are trivial.

7 years agoFirst part of the paper on Lookahead TDFA.
Ulya Trofimovich [Mon, 10 Jul 2017 13:43:28 +0000 (14:43 +0100)]
First part of the paper on Lookahead TDFA.

7 years agoNicer output with '--dump-dfa-raw' and '--posix-captures'.
Ulya Trofimovich [Fri, 7 Jul 2017 21:42:08 +0000 (22:42 +0100)]
Nicer output with '--dump-dfa-raw' and '--posix-captures'.

Don't add closure items to "shadowed" set if there is an identical
"unshadowed" item: otherwise Goldberg-Radzik algorithm generates too much
"shadowed" items and the output becomes too noisy.

7 years agoUse different closure algorithms for leftmost greedy and POSIX policies.
Ulya Trofimovich [Fri, 7 Jul 2017 14:08:23 +0000 (15:08 +0100)]
Use different closure algorithms for leftmost greedy and POSIX policies.

With leftmost greedy policy we can use simple depth-first search.
With POSIX policy we need Goldberg-Radzik algorithm, which is more complex
(and the necessity to accommodate both policies complicates it even more).

7 years agoFix for possible invalidated iterator after resize
Ryan Mast [Sun, 2 Jul 2017 19:59:09 +0000 (12:59 -0700)]
Fix for possible invalidated iterator after resize

7 years agoAdd const to newver_t operator()
Ryan Mast [Sat, 1 Jul 2017 23:08:17 +0000 (16:08 -0700)]
Add const to newver_t operator()

A certain set of compilers gives a compiler error due to losing
"const-volatile qualifiers"

7 years agoAllow trivial cycles (of length 1) in tag commands.
Ulya Trofimovich [Sun, 2 Jul 2017 08:05:29 +0000 (09:05 +0100)]
Allow trivial cycles (of length 1) in tag commands.

We forbid cycles of length 2 or more because they would need temporary
variable local to the basic block, which would complicate liveness
analysis. However, trivial cycles don't need a temporary.

7 years agoFixed typo that caused miscalculation of new tag version.
Ulya Trofimovich [Tue, 27 Jun 2017 14:40:42 +0000 (15:40 +0100)]
Fixed typo that caused miscalculation of new tag version.

7 years agoInvert comparison result instead of negating its arguments.
Ulya Trofimovich [Tue, 27 Jun 2017 13:14:33 +0000 (14:14 +0100)]
Invert comparison result instead of negating its arguments.

7 years agoSkip (non-orbit) start tags during POSIX disambiguation.
Ulya Trofimovich [Tue, 27 Jun 2017 13:05:26 +0000 (14:05 +0100)]
Skip (non-orbit) start tags during POSIX disambiguation.

Their position is fixed on some higher-priority tag (except the very
first tag, but in RE2C match is always anchored).
We cannot skip orbit start tag because the corresponding orbit end tag
is hoisted out of loop (by construction) and is, in fact, non-orbit.

7 years agoOnce again fixed history comparison for POSIX disambiguation.
Ulya Trofimovich [Tue, 27 Jun 2017 12:53:09 +0000 (13:53 +0100)]
Once again fixed history comparison for POSIX disambiguation.

7 years agoUse Goldberg-Radzik shortest path algorithm for closure construction.
Ulya Trofimovich [Tue, 6 Jun 2017 07:53:36 +0000 (08:53 +0100)]
Use Goldberg-Radzik shortest path algorithm for closure construction.

It has O(M*N) worst-case complexity, where M is the number of nodes (states)
and N is the number of arcs (transitions).

Papers: 1993, "A heuristic improvement of the Bellman-Ford algorithm"
by Goldberg, Radzik and 1996, Shortest paths algorithms: Theory and
experimental evaluation" by Cherkassky, Goldberg, Radzik.

7 years agoCompare full tag histories, not just the last subhistories.
Ulya Trofimovich [Mon, 5 Jun 2017 21:17:40 +0000 (22:17 +0100)]
Compare full tag histories, not just the last subhistories.

Comparing the last pair of subhistories is not enough: because of
the shortest-path algorithm mismatch may occur at some earlier point
(not at the last pair), in which case we still need to re-run the search.

7 years agoSimplified queue handling in epsilon-closure algorithm.
Ulya Trofimovich [Wed, 24 May 2017 10:21:48 +0000 (11:21 +0100)]
Simplified queue handling in epsilon-closure algorithm.

7 years agoUse the same comparison functions for closure construction and order calculation.
Ulya Trofimovich [Mon, 22 May 2017 13:51:05 +0000 (14:51 +0100)]
Use the same comparison functions for closure construction and order calculation.

7 years agoUse two tags instead of three for captures under iteration.
Ulya Trofimovich [Sun, 21 May 2017 09:07:58 +0000 (10:07 +0100)]
Use two tags instead of three for captures under iteration.

Before we simplified POSIX disambiguation by inserting missing
captures, using three tags to represent captures under iteration
was the only way: we needed to guard where the first iteration starts,
where the last iteration starts and where the last iteration ends.
Therefore we used three tags: opening, closing and orbit.

Now that we insert missing pieces of capture hierarchy, we no longer
need a special opening tag for captures under iteration: there always
is a higher-priority tag that guards the start of the first iteration.

Massive test updates are mostly due to the fact that orbit tag is no
longer fixed (fixed tags go in the end): ~100 out of 147 tests.
Some other ~45 tests have two tag variables flipped. There are 4 tests
with non-trivial changes. All updated tests pass POSIX test
(/test/posix_captures/.run/__run.sh).

7 years agoSilenced '-Wnondeterministic-tags' warning with '--posix-captures'.
Ulya Trofimovich [Sat, 20 May 2017 18:11:19 +0000 (19:11 +0100)]
Silenced '-Wnondeterministic-tags' warning with '--posix-captures'.

7 years agoSkeleton should ignore fictive tags.
Ulya Trofimovich [Fri, 19 May 2017 21:36:25 +0000 (22:36 +0100)]
Skeleton should ignore fictive tags.

The combination of options '--skeleton' and '--posix-captures'
didn't work properly.

7 years agoForbid mixing leftmost greedy and POSIX disambiguation.
Ulya Trofimovich [Fri, 19 May 2017 21:11:24 +0000 (22:11 +0100)]
Forbid mixing leftmost greedy and POSIX disambiguation.

Mixing doesn't make much sense (there is no obvious use case).
It might be possible, but for now better forbid it before someone
starts to use it and it turns out to be impossible or very complex.

7 years agoSimplified POSIX disambiguation by reconstructing capture hierarchy.
Ulya Trofimovich [Fri, 19 May 2017 07:47:36 +0000 (08:47 +0100)]
Simplified POSIX disambiguation by reconstructing capture hierarchy.

POSIX treats captured and non-captured subexpressions on equal terms.
However, non-captured subexpressions have no tags to guard them.
Previously we used default tags to infer ambiguous paths that correspond
to the missing captures: if one path had default tag and the other did
not, then desicion which path is better was made according to the leftmost
strategy. This algorithm works because in POSIX expressions without
captures have the property that leftmost path is the best path (for
example, 'a?' is greedy, it means 'a or epsilon').

However, this algorithm has one downside: because we may need leftmost
comparison, we have to impose leftmost order on NFA substates of each
DFA state, as well as maximize and orbit order for tags. This prevents
us from mapping perfectly mappable DFA states and we end up with a larger
DFA (which is sometimes folded back to smaller DFA, but not always).

Also, default-leftmost algorithm is more complex than inserting missing
hierarchy pieces: proving that it works is non-trivial.

7 years agoFixed history comparison in case both latest subhistories are bottoms.
Ulya Trofimovich [Tue, 16 May 2017 21:11:55 +0000 (22:11 +0100)]
Fixed history comparison in case both latest subhistories are bottoms.

In this case, instead of comparing histories by the leftmost criterion
(as it should be, if only one of them is bottom), we should assume that
from standpoint of this tag histories are equal and move on to other
tags.

7 years agoAvoid exponential blowup in tagged epsilon-closure construction.
Ulya Trofimovich [Tue, 16 May 2017 17:07:00 +0000 (18:07 +0100)]
Avoid exponential blowup in tagged epsilon-closure construction.

The previous algorithm waited until the full epsilon-path is built,
then compared it to already existing paths. The new algorithm compares
and merges partial epsilon-paths as soon as they arrive at the same
NFA state, so the overall number of paths is bounded by the number of
NFA states at all times.

7 years agoDon't split tag history into individual sub-histories for tags.
Ulya Trofimovich [Sat, 29 Apr 2017 20:13:50 +0000 (21:13 +0100)]
Don't split tag history into individual sub-histories for tags.

This is necassary for correct comparison of orbit tag histories:
if orbit tag is nested in an outer capture, this outer capture is
under repetition and there is an epsilon-path through it, then
this epsilon-path may contain pieces of orbit history that belong
to different iterations of outer capture; these pieces will be
glued together and the boundary between them will be lost.

Example: ((""){0,3}){0,2}.

However, in a common history we can always find boundaries
(they are marked by tags that correspond to outer captures).

7 years agoSkeleton: constified tag variables to get rid of compiler warning.
Ulya Trofimovich [Sun, 16 Apr 2017 11:36:44 +0000 (12:36 +0100)]
Skeleton: constified tag variables to get rid of compiler warning.

7 years agoCommand normalization: update pointer to last command, it might change.
Ulya Trofimovich [Sun, 16 Apr 2017 10:10:25 +0000 (11:10 +0100)]
Command normalization: update pointer to last command, it might change.

If normalization of 'save' commands removes the last 'save' command,
then pointer to 'next' will point to the 'next' field of the removed
command and the chain of commands will break.

7 years agoAdded comment.
Ulya Trofimovich [Sun, 16 Apr 2017 10:10:03 +0000 (11:10 +0100)]
Added comment.

7 years agorelease.sh: added cppcheck and headers check to wish list.
Ulya Trofimovich [Sun, 16 Apr 2017 10:09:15 +0000 (11:09 +0100)]
release.sh: added cppcheck and headers check to wish list.

7 years agoForbid dependency cycles in tag commands.
Ulya Trofimovich [Fri, 14 Apr 2017 17:47:11 +0000 (18:47 +0100)]
Forbid dependency cycles in tag commands.

The previous attempt was to forbid 2-cycles; clearly it's not enough
and cycles of length greater than 2 also pose a problem.

We could allow cycles and deal with them by introducind a temporary
variable. However, this would create variables local to basic blocks,
which would complicate liveness analysis and dead code elimination.

7 years agoTag interference analysis: compare actual values, not formal RHS.
Ulya Trofimovich [Fri, 14 Apr 2017 10:45:23 +0000 (11:45 +0100)]
Tag interference analysis: compare actual values, not formal RHS.

Commands in the same basic block may have equal right hand sides,
e.g. 'x = y; z = y;'. Clearly in this case 'x' and 'z' do not interfere:
both versions are equal to 'y'. This optimization allows to deduplicate
cases like '("a" @x @y @z "b")*', where multiple tags have the same
values.

However, it is insufficient to find all commands in the current block
with RHS equal to current RHS. An example when this algorithm fails:
'x = y; w = z; z = y;', here 'x' and 'z' are assigned to the same RHS
'y', but if we merge them into one variable, 'w' will get the wrong
value.

The fix is as follows: first, ignore subsequent commands and consider
only commands that precede current command: if subsequent command that
sets LHS to the same value precedes any use of it, liveness propagation
through basic block would mark this LHS as dead and not interfering
anyway; otherwise (if use precedes setting to the same value), then it
indeed interferes with current LHS. This fixes the above example
'x = y; w = z; z = y;': when analysing 'x = y;' command we find that
'z' is alive (as before), but ignore subsequent 'z = y;' command (due
to the fix).

Second, when analysing preceding commands, calculate actual value for
each LHS (based on pessimistic assumption that on entry of basic block
all used versions are different). Formal RHS of some preceding command
may coincide with current formal RHS, but their values might differ:
'x = y; y = w; z = y;', here 'x = y;' and 'z = y;' have identical formal
RHS, but the value may be different. On the other hand, formal RHS
might be different, while their actual values are equal, as in
'x = y; w = y; z = w;'.

7 years agoSkeleton: don't forget to return error in case of tag mismatch.
Ulya Trofimovich [Thu, 13 Apr 2017 10:02:35 +0000 (11:02 +0100)]
Skeleton: don't forget to return error in case of tag mismatch.

7 years agoFinal tag versions in unreachable rules should be marked as unused.
Ulya Trofimovich [Wed, 12 Apr 2017 12:26:15 +0000 (13:26 +0100)]
Final tag versions in unreachable rules should be marked as unused.

7 years ago'--dump-adfa': fixed pretty-printing of tag commands.
Ulya Trofimovich [Wed, 12 Apr 2017 11:06:17 +0000 (12:06 +0100)]
'--dump-adfa': fixed pretty-printing of tag commands.

7 years agoPreserve order of commands when adding fallback tags.
Ulya Trofimovich [Wed, 12 Apr 2017 11:05:13 +0000 (12:05 +0100)]
Preserve order of commands when adding fallback tags.

7 years agoAdded debug for tag optimizations (CFG and interference table).
Ulya Trofimovich [Wed, 12 Apr 2017 10:26:18 +0000 (11:26 +0100)]
Added debug for tag optimizations (CFG and interference table).

7 years agoRecord full history for history-based tags on epsilon-paths.
Ulya Trofimovich [Tue, 11 Apr 2017 16:25:13 +0000 (17:25 +0100)]
Record full history for history-based tags on epsilon-paths.

7 years agoKeep tag histories for the whole time of determinization.
Ulya Trofimovich [Mon, 10 Apr 2017 21:35:15 +0000 (22:35 +0100)]
Keep tag histories for the whole time of determinization.

7 years agoTopsort: always initialize in-degree with zeroes.
Ulya Trofimovich [Sat, 8 Apr 2017 22:04:48 +0000 (23:04 +0100)]
Topsort: always initialize in-degree with zeroes.

Don't rely on the fact that topsort leaves all-zero in-degree:
it doesn't if cycles are present. One could manually zero the remaining
entries after topsort, but zero initialization is more reliable.
And it's quite efficient since we only need to initialize the entries
we use.

7 years agoDon't forget final and fallback commands when looking for history-based tags.
Ulya Trofimovich [Sat, 8 Apr 2017 18:58:50 +0000 (19:58 +0100)]
Don't forget final and fallback commands when looking for history-based tags.

7 years agoMerged a couple of slightly different 'topsort' versions into one.
Ulya Trofimovich [Sat, 8 Apr 2017 18:37:32 +0000 (19:37 +0100)]
Merged a couple of slightly different 'topsort' versions into one.

7 years ago'save' commands also need topological sorting: they may depend on history.
Ulya Trofimovich [Sat, 8 Apr 2017 10:35:14 +0000 (11:35 +0100)]
'save' commands also need topological sorting: they may depend on history.

7 years agoWhen mapping states, avoid double substitution in 'save' commands.
Ulya Trofimovich [Sat, 8 Apr 2017 08:46:41 +0000 (09:46 +0100)]
When mapping states, avoid double substitution in 'save' commands.

When mapping states, we need to create a list of 'copy' commands.
However, if there is a 'save' command with LHS equal to RHS of the 'copy'
command, then instead of the 'copy' command we just fix LHS of the 'save'
command, as explained in this note:

    /* note [save(X), copy(Y,X) optimization]
     *
     * save(X) command followed by a copy(Y,X) command can be optimized to
     * save(Y). This helps reduce the number commands and versions (new version
     * X is gone), but what is more important, it allows to put copy commands
     * in front of save commands. This order is necessary when it comes to
     * fallback commands.
     *
     * Note that in case of injective mapping there may be more than one copy
     * command matching the same save command: save(X), copy(Y,X), copy(Z,X).
     * In this case save command must be replicated for each copy command:
     * save(Y), save(Z).
     *
     * For each save(X) command there must be at least one copy(Y,X) command
     * (exactly one case of bijective mapping). This is because X version in
     * save(X) command must be a new version which cannot occur in the older
     * DFA state. Thus all save commands are transformed (maybe replicated) by
     * copy commands, and some copy commands are erased by save commands.
     *
     * This optimization is applied after checking priority violation, so it
     * cannot affect the check.
    */

However, the previous implementation would sometimes erroneously re-fix
the already fixed 'save' commands. This patch rewrites it in a way that
avoids re-fixing.

7 years agoFixed checking duplicates when creating tag commands.
Ulya Trofimovich [Fri, 7 Apr 2017 21:36:20 +0000 (22:36 +0100)]
Fixed checking duplicates when creating tag commands.

7 years agoDon't forget histories when comparing right hand sides of tag commands.
Ulya Trofimovich [Fri, 7 Apr 2017 17:49:58 +0000 (18:49 +0100)]
Don't forget histories when comparing right hand sides of tag commands.

7 years agoSkeleton: don't emit declarations for tags that are optimized out.
Ulya Trofimovich [Fri, 7 Apr 2017 16:19:09 +0000 (17:19 +0100)]
Skeleton: don't emit declarations for tags that are optimized out.

7 years agoDon't forget initial tag command when looking for history-based tags.
Ulya Trofimovich [Fri, 7 Apr 2017 16:17:56 +0000 (17:17 +0100)]
Don't forget initial tag command when looking for history-based tags.

7 years agoTag histories in commands must be addressed by absolute value.
Ulya Trofimovich [Fri, 7 Apr 2017 16:00:14 +0000 (17:00 +0100)]
Tag histories in commands must be addressed by absolute value.

7 years agoDon't sort and deduplicate 'save' commands for tags with history.
Ulya Trofimovich [Fri, 7 Apr 2017 15:44:44 +0000 (16:44 +0100)]
Don't sort and deduplicate 'save' commands for tags with history.

Otherwise chains like '2=(1,x); 1=(1,y)' will be rewritten to
'1=(1,y); 2=(1,x)', which is a totally different thing.

7 years ago'save' commands for tags with history may need fallback copies.
Ulya Trofimovich [Fri, 7 Apr 2017 15:19:00 +0000 (16:19 +0100)]
'save' commands for tags with history may need fallback copies.

For simple tags without history 'save' command is self-sufficient:
it does not depend on anything. However, for tags with history 'save'
commands depend on history, which may be overwritten on the way from
accepting state to fallback transition. We must take care and backup
such overwritten histories when leaving the accepting state.

If history is not overwritten, we don't need a copy, but we still
have to propagate its liveness forward on all fallthrough paths
outgoing from accepting state.

7 years agoAdded skeleton support for tags with history.
Ulya Trofimovich [Fri, 7 Apr 2017 12:47:48 +0000 (13:47 +0100)]
Added skeleton support for tags with history.

7 years agoFixed order of tag commands.
Ulya Trofimovich [Thu, 6 Apr 2017 17:01:31 +0000 (18:01 +0100)]
Fixed order of tag commands.

First, fallback 'copy' commands should go before 'save' commands.
Second, codegen should not reorder 'copy' and 'save' commands.
These two bugs shadowed each other and everything seemed to work.

7 years agoChanged generic API for tags.
Ulya Trofimovich [Thu, 6 Apr 2017 16:45:42 +0000 (17:45 +0100)]
Changed generic API for tags.

    - removed 'YYCOPYTAG', use simple assignment insted
      (it is hard to think of any other definition)
    - renamed 'YYBACKUPTAG' to 'YYTAGP'
    - added 'YYTAGN', 'YYTAGLISTP', 'YYTAGLISTN'
    - removed 're2c:tags:default' configuration

7 years agoAdded codegen primitives for tags with history.
Ulya Trofimovich [Thu, 6 Apr 2017 14:18:12 +0000 (15:18 +0100)]
Added codegen primitives for tags with history.

7 years agoFixed tag optimizations to respect tags with history.
Ulya Trofimovich [Wed, 5 Apr 2017 20:56:44 +0000 (21:56 +0100)]
Fixed tag optimizations to respect tags with history.

7 years agoAdded syntax for tags with history.
Ulya Trofimovich [Wed, 5 Apr 2017 19:45:42 +0000 (20:45 +0100)]
Added syntax for tags with history.

7 years agoStarted to add optional history for each tag.
Ulya Trofimovich [Wed, 5 Apr 2017 16:01:54 +0000 (17:01 +0100)]
Started to add optional history for each tag.

This patch adds tracking and saving tag history during determinization;
however, optimizations and codegen totally neglect histories.

7 years agoSeparated lookahead tags from tag versions in closure items.
Ulya Trofimovich [Wed, 5 Apr 2017 11:42:05 +0000 (12:42 +0100)]
Separated lookahead tags from tag versions in closure items.

This commit reverts commit 0b2ca9bf141259baf6cb0b067621c06db5f06b46:
"Pack lookahead tags together with tag versions in TDFA state."

Packing was possible because we were not interested in tag version
if closure item had this tag in lookahead set. However, if we are
to store the full history of each tag, we'll need this intermediate
version in spite of the fact that it is overwritten on the next
transition.

7 years agoFixed liveness analysis inside of basic block.
Ulya Trofimovich [Tue, 4 Apr 2017 12:41:41 +0000 (13:41 +0100)]
Fixed liveness analysis inside of basic block.

Liveness flows backwards through basic block, so it should be
propagated backwards starting with live-out set and applying
commands one by one in reverse order.

7 years agoMerge 'save' and 'copy' command lists into one common list.
Ulya Trofimovich [Tue, 4 Apr 2017 07:25:34 +0000 (08:25 +0100)]
Merge 'save' and 'copy' command lists into one common list.

Previously we used separate lists for them; however the structure
of 'save' and 'copy' commands is almost identical, so its easy to
use one C struct for both types. Also, this simplifies analysis:
both types of commands are handled in a uniform way.

However, now we have to do list transformations like sorting and
deleleting duplicates in chunks.

This is a preliminary step to allow intermixing 'copy' and 'save'
commands.

7 years agoExplicitly track order of closure items for leftmost disambiguation.
Ulya Trofimovich [Thu, 16 Mar 2017 22:39:57 +0000 (22:39 +0000)]
Explicitly track order of closure items for leftmost disambiguation.

Before this commit leftmost order was maintained implicitly by construction
of closure: NFA is traversed in left-first order, so the earliest item added
to closure is also the leftmost. Special care was taken to preserve this
order and never rearrange closure elements.

This is convenient if leftmost disambiguation strategy is the only one.
However, other strategies require explicit bookkeeping of disambiguation
information (POSIX needs tag history for orbit tags, maximize/minimize
needs most recent tag versions). Implicit handling of leftmost strategy
does not fall in line with other strategies.

By tracking leftmost order explicitly we achieve several goals:
    - uniform handling of disambiguation strategies
    - DFA states can be treated as good old sets again, which is more
      conventional and means less work for minimization

POSIX tests changed (but the '/test/posix_captures/.run/__run.sh' script
still passes all tests) because there was an error in the order items were
added to closure, which probably prevented minimization (if the new item
dominated the existing one, the old one was substituted with the new one,
while it should have been removed and the new one appended at the end).

Skeleton tests changed because the order of NFA states in DFA state does
not matter, so more states can be merged by determinization.

7 years agoForbid 2-cycles in tag commands; fixed topsort to skip cycles.
Ulya Trofimovich [Wed, 8 Mar 2017 14:16:19 +0000 (14:16 +0000)]
Forbid 2-cycles in tag commands; fixed topsort to skip cycles.

7 years agoMoved a couple of files.
Ulya Trofimovich [Wed, 8 Mar 2017 08:48:30 +0000 (08:48 +0000)]
Moved a couple of files.

7 years agoMoved the main driver out of parser grammar file.
Ulya Trofimovich [Wed, 8 Mar 2017 08:32:16 +0000 (08:32 +0000)]
Moved the main driver out of parser grammar file.

7 years agoSplit options into constant and mutable; restricted access to mutable ones.
Ulya Trofimovich [Tue, 7 Mar 2017 21:29:17 +0000 (21:29 +0000)]
Split options into constant and mutable; restricted access to mutable ones.

Some options are immutable (target, output files; global switches like
conditions, reuse mode, storable states; support of flex syntax, etc.).
These options are passed as command-line arguments and never change.
It is safe to read them from any program point after parsing command-line
arguments.

Other options are configurable; they have block scope (may be specified
anywhere inside of the block and still affect the whole block).
Reading mutable options of yet unparsed block is not allowed because
they may affect the way RE2C parses current block (RE2C would be tempted
to base decisions on the latest option value, which may not be the final
one).

7 years agoForbid configuration of certain options.
Ulya Trofimovich [Tue, 7 Mar 2017 18:51:29 +0000 (18:51 +0000)]
Forbid configuration of certain options.

Configuration of these options either makes little sense (like changing
target on the fly) or is technically impossible to implement (the only
example is '-F, --flex-syntax' option which non-trivially affects the way
RE2C parses programs).

7 years ago'-Wcondition-order': check '-t, --type-header' option together with other conditions.
Ulya Trofimovich [Tue, 7 Mar 2017 18:43:05 +0000 (18:43 +0000)]
'-Wcondition-order': check '-t, --type-header' option together with other conditions.

7 years agoMoved enum out of struct and added prefix 'TARGET_' to all members.
Ulya Trofimovich [Tue, 7 Mar 2017 18:23:24 +0000 (18:23 +0000)]
Moved enum out of struct and added prefix 'TARGET_' to all members.

7 years agoDelay check if tags are allowed until the whole block has been parsed.
Ulya Trofimovich [Tue, 7 Mar 2017 18:16:56 +0000 (18:16 +0000)]
Delay check if tags are allowed until the whole block has been parsed.

7 years agoNow that we never re-parse anything we can always dicard old data.
Ulya Trofimovich [Tue, 7 Mar 2017 18:07:09 +0000 (18:07 +0000)]
Now that we never re-parse anything we can always dicard old data.

7 years agoTests for POSIX captures: avoid using 'flags:flex-syntax' configuration.
Ulya Trofimovich [Tue, 7 Mar 2017 17:57:48 +0000 (17:57 +0000)]
Tests for POSIX captures: avoid using 'flags:flex-syntax' configuration.

Use '--flex-syntax' command-line option instead.

The actual tests are not changed by this commit.

7 years agoAllow to set new encoding without explicitely resetting the current one.
Ulya Trofimovich [Tue, 7 Mar 2017 17:52:42 +0000 (17:52 +0000)]
Allow to set new encoding without explicitely resetting the current one.