]> granicus.if.org Git - re2c/log
re2c
6 years agoAdded /*!include:re2c ... */ directive.
Ulya Trofimovich [Tue, 25 Dec 2018 19:53:23 +0000 (19:53 +0000)]
Added /*!include:re2c ... */ directive.

6 years agoPreparations to support #include: keep input files in a stack.
Ulya Trofimovich [Sun, 23 Dec 2018 19:32:29 +0000 (19:32 +0000)]
Preparations to support #include: keep input files in a stack.

6 years agoconfigure.ac: set -Wreturn-type to error.
Ulya Trofimovich [Sun, 23 Dec 2018 19:16:02 +0000 (19:16 +0000)]
configure.ac: set -Wreturn-type to error.

6 years agoInitial support of EOF rule.
Ulya Trofimovich [Sat, 22 Dec 2018 23:34:41 +0000 (23:34 +0000)]
Initial support of EOF rule.

6 years agoUpdated unicode tests and test generators for newer versions of unicode.
Ulya Trofimovich [Sat, 22 Dec 2018 11:48:12 +0000 (11:48 +0000)]
Updated unicode tests and test generators for newer versions of unicode.

6 years agoPaper: added two output() functions that convert t-string to parse tree and offsets.
Ulya Trofimovich [Thu, 20 Dec 2018 00:06:42 +0000 (00:06 +0000)]
Paper: added two output() functions that convert t-string to parse tree and offsets.

6 years agoPaper: tweaked TNFA construction.
Ulya Trofimovich [Tue, 18 Dec 2018 23:56:04 +0000 (23:56 +0000)]
Paper: tweaked TNFA construction.

6 years agoLexer: use YYMAXFILL padding and don't forget to shift tag variables in YYFILL.
Ulya Trofimovich [Thu, 6 Dec 2018 22:03:34 +0000 (22:03 +0000)]
Lexer: use YYMAXFILL padding and don't forget to shift tag variables in YYFILL.

This fixes bug #232, #233 and #234.
Found by american fuzzy lop (thanks to Henri Salo).

6 years agoCorrectly identify mapped TDFA state with --dump-dfa-raw option.
Ulya Trofimovich [Thu, 6 Dec 2018 22:01:25 +0000 (22:01 +0000)]
Correctly identify mapped TDFA state with --dump-dfa-raw option.

6 years agoMakefile.am: enable RE2C warnings (-W option).
Ulya Trofimovich [Thu, 29 Nov 2018 22:21:43 +0000 (22:21 +0000)]
Makefile.am: enable RE2C warnings (-W option).

6 years agoFixed read past the end of buffer in configuration parser.
Ulya Trofimovich [Thu, 29 Nov 2018 22:15:18 +0000 (22:15 +0000)]
Fixed read past the end of buffer in configuration parser.

This fixes bug #231.
Found by american fuzzy lop (thanks to Henri Salo).
Also reported by re2c -W (shame on me for not using it all this time!).

6 years agoPaper: tweaking TNFA construction.
Ulya Trofimovich [Mon, 26 Nov 2018 22:58:14 +0000 (22:58 +0000)]
Paper: tweaking TNFA construction.

6 years agoMakefile.am: build autogenerates files before other targets (they may create headers).
Ulya Trofimovich [Thu, 22 Nov 2018 01:00:57 +0000 (01:00 +0000)]
Makefile.am: build autogenerates files before other targets (they may create headers).

Note: I used $(@:cc=*) construct as bmake doesn't understand $*.* .

6 years agoUse tags to lex condition goto.
Ulya Trofimovich [Wed, 21 Nov 2018 22:03:12 +0000 (22:03 +0000)]
Use tags to lex condition goto.

6 years agoStarted using tags in re2c own lexer.
Ulya Trofimovich [Wed, 21 Nov 2018 00:15:15 +0000 (00:15 +0000)]
Started using tags in re2c own lexer.

6 years agoRemoved redundant wrapper around output file struct.
Ulya Trofimovich [Mon, 19 Nov 2018 23:38:07 +0000 (23:38 +0000)]
Removed redundant wrapper around output file struct.

6 years agoDump header on stdout if filename is not set, but /*!header:re2c:on*/ is used.
Ulya Trofimovich [Mon, 19 Nov 2018 23:22:33 +0000 (23:22 +0000)]
Dump header on stdout if filename is not set, but /*!header:re2c:on*/ is used.

6 years agoAdded configurations for -o, --output and -t, --type-header options.
Ulya Trofimovich [Fri, 16 Nov 2018 23:43:41 +0000 (23:43 +0000)]
Added configurations for -o, --output and -t, --type-header options.

6 years agoAdded missing #line info after /*!header:re2c: ... */ directive.
Ulya Trofimovich [Sun, 18 Nov 2018 12:19:27 +0000 (12:19 +0000)]
Added missing #line info after /*!header:re2c: ... */ directive.

Renamed:
    /*!header:re2c:1*/ -> /*!header:re2c:on*/
    /*!header:re2c:0*/ -> /*!header:re2c:off*/

6 years agoAdded /*!header:re2c:0*/ and /*!header:re2c:1*/ directives.
Ulya Trofimovich [Sun, 18 Nov 2018 10:50:00 +0000 (10:50 +0000)]
Added /*!header:re2c:0*/ and /*!header:re2c:1*/ directives.

Combined with -t, --type-header option, this allows to put arbitrary
parts of the generated output in a header file.

6 years agoTweaking condition list lexer.
Ulya Trofimovich [Fri, 16 Nov 2018 00:36:11 +0000 (00:36 +0000)]
Tweaking condition list lexer.

6 years agoMerge pull request #230 from sergeyklay/patch-1
Ulya Trofimovich [Wed, 21 Nov 2018 21:54:58 +0000 (21:54 +0000)]
Merge pull request #230 from sergeyklay/patch-1

Changes for upcoming Travis' infra migration

6 years agoChanges for upcoming Travis' infra migration 230/head
Serghei Iakovlev [Wed, 21 Nov 2018 20:23:09 +0000 (22:23 +0200)]
Changes for upcoming Travis' infra migration

See: https://blog.travis-ci.com/2018-11-19-required-linux-infrastructure-migration

6 years agoFixed segfault cause by out of bounds access.
Ulya Trofimovich [Thu, 15 Nov 2018 07:33:25 +0000 (07:33 +0000)]
Fixed segfault cause by out of bounds access.

This fixes bug #227.
Found by american fuzzy lop (thanks to Henri Salo).

6 years agoMoved tests into subdirectories.
Ulya Trofimovich [Wed, 14 Nov 2018 22:58:47 +0000 (22:58 +0000)]
Moved tests into subdirectories.

6 years agoFixed a couple of lexer/parser errors in flex mode (-F option).
Ulya Trofimovich [Tue, 13 Nov 2018 23:42:11 +0000 (23:42 +0000)]
Fixed a couple of lexer/parser errors in flex mode (-F option).

This fixes bug #229: re2c option -F (flex syntax) broken,
reported by Robert van Engelen.

A well-formed example that caused syntax error (flex-style raw literal
followed by one or more spaces and a curly brace):

/*!re2c
    a {}
*/

The faulty behaviour goes back as far as re2c-0.13.6 (and supposedly
before that): in flex mode, raw literal may occur in various contexts
both as a regexp (string literal) and an identifier (named definition,
condiiton name). RE2C uses lookahead to infer the context and determine
the appropriate type of lexer token, but it missed some cases.

The fix has two sides. First, if reduces the number of contexts where
the general lexer may encounter raw literal (by using specialized lexers
for condition lists <x,y,...,z> and condition goto => and :=>). Second,
it fixes the lookahead regexps used for context inference.

Also added a bunch of tests (generated by a script).

6 years agoSuppress -Wnullable warning on <> condition (it has no regexp -- always empty).
Ulya Trofimovich [Tue, 13 Nov 2018 23:38:02 +0000 (23:38 +0000)]
Suppress -Wnullable warning on <> condition (it has no regexp -- always empty).

6 years agoAdjusting formatting (cosmetic).
Ulya Trofimovich [Mon, 5 Nov 2018 23:35:11 +0000 (23:35 +0000)]
Adjusting formatting (cosmetic).

6 years agoFixed out of bounds read in configuration lexer (not handling EOF in configuration...
Ulya Trofimovich [Sun, 4 Nov 2018 22:38:56 +0000 (22:38 +0000)]
Fixed out of bounds read in configuration lexer (not handling EOF in configuration value).

Found by american fuzzy lop (thanks to Henri Salo).

6 years agoSmall tweaks in lexer subroutines for semantic actions.
Ulya Trofimovich [Thu, 1 Nov 2018 00:01:25 +0000 (00:01 +0000)]
Small tweaks in lexer subroutines for semantic actions.

6 years agoFixed yet another out of bounds read in lexer due to not handling EOF after escape.
Ulya Trofimovich [Wed, 31 Oct 2018 23:15:28 +0000 (23:15 +0000)]
Fixed yet another out of bounds read in lexer due to not handling EOF after escape.

Found by american fuzzy lop (thanks to Henri Salo).

6 years agoFixed some more out of bounds reads in lexer due to not handling EOF properly.
Ulya Trofimovich [Tue, 30 Oct 2018 22:11:32 +0000 (22:11 +0000)]
Fixed some more out of bounds reads in lexer due to not handling EOF properly.

Found by american fuzzy lop (thanks to Henri Salo).

6 years agoAdjusted formatting in the lexer (cosmetic).
Ulya Trofimovich [Mon, 29 Oct 2018 23:32:22 +0000 (23:32 +0000)]
Adjusted formatting in the lexer (cosmetic).

6 years agoFixed out of bounds read in lexer.
Ulya Trofimovich [Mon, 29 Oct 2018 23:00:50 +0000 (23:00 +0000)]
Fixed out of bounds read in lexer.

The error was caused by assuming that a sequence of zeroes (used for
padding in YYFILL) cannot form a valid lexeme suffix. This is not the
case with strings, as they may contain arbitrary characters. The fix
is to manually loop over string characters in lexer, stopping at each
zero to check if it's the end of input.

Found by american fuzzy lop (thanks to Henri Salo).

6 years agoUpdated README in libre2c (added a warning that the library is not maintained).
Ulya Trofimovich [Sun, 28 Oct 2018 09:53:14 +0000 (09:53 +0000)]
Updated README in libre2c (added a warning that the library is not maintained).

6 years agoUpdated README.
Ulya Trofimovich [Sun, 28 Oct 2018 09:49:06 +0000 (09:49 +0000)]
Updated README.

6 years agoPaper: made TNFA description closer to practice.
Ulya Trofimovich [Sat, 27 Oct 2018 21:32:49 +0000 (22:32 +0100)]
Paper: made TNFA description closer to practice.

6 years agoPaper: tweaks in the TNFA example.
Ulya Trofimovich [Tue, 23 Oct 2018 21:12:20 +0000 (22:12 +0100)]
Paper: tweaks in the TNFA example.

6 years agoPaper: changed description of GOR1 following the rework of the algorithm.
Ulya Trofimovich [Sat, 20 Oct 2018 21:06:59 +0000 (22:06 +0100)]
Paper: changed description of GOR1 following the rework of the algorithm.

6 years agoPaper: tweaks in picture layout.
Ulya Trofimovich [Wed, 17 Oct 2018 21:29:31 +0000 (22:29 +0100)]
Paper: tweaks in picture layout.

6 years agoPaper: added TNFA example.
Ulya Trofimovich [Mon, 15 Oct 2018 06:32:04 +0000 (07:32 +0100)]
Paper: added TNFA example.

6 years agoPaper: added example for "empty match is better than no match" POSIX rule.
Ulya Trofimovich [Sun, 14 Oct 2018 09:31:13 +0000 (10:31 +0100)]
Paper: added example for "empty match is better than no match" POSIX rule.

6 years agoPaper: packed multiple examples of PE comparison in one page.
Ulya Trofimovich [Fri, 12 Oct 2018 22:37:47 +0000 (23:37 +0100)]
Paper: packed multiple examples of PE comparison in one page.

6 years agoPaper: dropped explicit submatch indices in TNFA definition.
Ulya Trofimovich [Thu, 11 Oct 2018 22:38:03 +0000 (23:38 +0100)]
Paper: dropped explicit submatch indices in TNFA definition.

6 years agoPaper: handle (e) as (e){1,1} to avoid collapsing multiple submatch groups into one.
Ulya Trofimovich [Thu, 11 Oct 2018 06:08:48 +0000 (07:08 +0100)]
Paper: handle (e) as (e){1,1} to avoid collapsing multiple submatch groups into one.

Previously we collapsed ((e)) into (e).

6 years agoPaper: minor tweaks in pseudocode.
Ulya Trofimovich [Mon, 8 Oct 2018 21:56:57 +0000 (22:56 +0100)]
Paper: minor tweaks in pseudocode.

6 years agoPaper: added description of GTOP closure algorithm.
Ulya Trofimovich [Sat, 6 Oct 2018 22:44:35 +0000 (23:44 +0100)]
Paper: added description of GTOP closure algorithm.

6 years agoPaper: made GOR pseudocode slightly easier to read.
Ulya Trofimovich [Sat, 6 Oct 2018 21:24:04 +0000 (22:24 +0100)]
Paper: made GOR pseudocode slightly easier to read.

6 years agoMerge pull request #224 from trofi/master
Ulya Trofimovich [Mon, 22 Oct 2018 22:33:02 +0000 (23:33 +0100)]
Merge pull request #224 from trofi/master

src/dfa/closure_posix.cc: pack() tweaks

6 years agosrc/dfa/closure_posix.cc: fix pack() to drop two highest bits 224/head
Sergei Trofimovich [Mon, 22 Oct 2018 22:05:56 +0000 (23:05 +0100)]
src/dfa/closure_posix.cc: fix pack() to drop two highest bits

```c
longest | (leftmost << 30);
```
assumes `longest` does not exceed 30 bits. It could if
it's a negative value originally.

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
6 years agosrc/dfa/closure_posix.cc: fix signed shift overflow
Sergei Trofimovich [Mon, 22 Oct 2018 21:58:34 +0000 (22:58 +0100)]
src/dfa/closure_posix.cc: fix signed shift overflow

signed shift overflow is not defined by C standard.
clang++ -fsanitize=undefined detects it as:

```
src/dfa/closure_posix.cc:207:32: runtime error: left shift of negative value -1
```

This change wraps bit shift arithmetics into unsigned types.

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
6 years agoMerge pull request #223 from metab0t/master
Ulya Trofimovich [Wed, 17 Oct 2018 22:29:07 +0000 (23:29 +0100)]
Merge pull request #223 from metab0t/master

Fix typo

6 years agoFix typo 223/head
Nerd [Wed, 17 Oct 2018 08:23:22 +0000 (16:23 +0800)]
Fix typo

6 years agoMerge pull request #222 from trofi/master
Ulya Trofimovich [Tue, 16 Oct 2018 22:36:15 +0000 (23:36 +0100)]
Merge pull request #222 from trofi/master

configure.ac: enable xz tarballs instead of gzip by default

6 years agoconfigure.ac: enable xz tarballs instead of gzip by default 222/head
Sergei Trofimovich [Tue, 16 Oct 2018 19:36:53 +0000 (20:36 +0100)]
configure.ac: enable xz tarballs instead of gzip by default

`xz` compresses twice as good as `gzip` on `re2c` sources:

```
$ ls -lh *1.1.1*
4,8M re2c-1.1.1.tar.gz
2,5M re2c-1.1.1.tar.xz
```

Switch `make dist` to `xz by default. `gzip` is still available
via `make dist-gzip`.

Reported-by: rofl0r
Bug: https://github.com/skvadrik/re2c/issues/221
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
6 years agoPaper: added examples of the three rules of POSIX disambiguation.
Ulya Trofimovich [Thu, 6 Sep 2018 21:45:30 +0000 (22:45 +0100)]
Paper: added examples of the three rules of POSIX disambiguation.

6 years agoMerge pull request #220 from trofi/master
Ulya Trofimovich [Sat, 29 Sep 2018 21:29:34 +0000 (22:29 +0100)]
Merge pull request #220 from trofi/master

src/dfa/dfa.h: simplify constructor to avoid g++-3.4 bug

6 years agosrc/dfa/dfa.h: simplify constructor to avoid g++-3.4 bug 220/head
Sergei Trofimovich [Sat, 29 Sep 2018 21:11:27 +0000 (22:11 +0100)]
src/dfa/dfa.h: simplify constructor to avoid g++-3.4 bug

On g++-3.4.6 re2c tests SIGSEGVed due to use of uninitialized data:

```
$ valgrind ... ./re2c -8 a.re -o foo.c
Conditional jump or move depends on uninitialised value(s)
   at 0x432F23: re2c::tcpool_t::insert(re2c::tcmd_t const*) (tcmd.cc:202)
   by 0x421FDA: re2c::freeze_tags(re2c::dfa_t&) (freeze.cc:45)
   by 0x43A7FF: re2c::ast_to_dfa(re2c::spec_t const&, re2c::Output&) (compile.cc:88)
   by 0x43B052: push_back (stl_iterator.h:614)
   by 0x43B052: re2c::compile(re2c::Scanner&, re2c::Output&, re2c::Opt&) (???:0)
   by 0x449D29: main (main.cc:31)
 Uninitialised value was created by a heap allocation
   at 0x403252F: operator new[](unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x42FC9E: re2c::find_state(re2c::determ_context_t&) (dfa.h:37)
   by 0x429BD9: re2c::dfa_t::dfa_t(re2c::nfa_t const&, re2c::opt_t const*, std::string const&, re2c::Warn&) (determinization.cc:56)
   by 0x43A76C: re2c::ast_to_dfa(re2c::spec_t const&, re2c::Output&) (compile.cc:69)
   by 0x43B052: push_back (stl_iterator.h:614)
   by 0x43B052: re2c::compile(re2c::Scanner&, re2c::Output&, re2c::Opt&) (???:0)
   by 0x449D29: main (main.cc:31)
```

the problem here arose in default array constructor:

```c++
     explicit dfa_state_t(size_t nchars)
         : // ...
         , tcmd(new tcmd_t*[nchars + 2]()) // +2 for final and fallback epsilon-transitions
         // ...
```

g++-3.4.6 can't figure out zero-initialization rule (likely a gcc bug).

The change uses non-initializing new[] and memset() instead.

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
6 years agoMerge pull request #216 from trofi/master
Ulya Trofimovich [Tue, 4 Sep 2018 19:49:46 +0000 (20:49 +0100)]
Merge pull request #216 from trofi/master

.travis.yml: run all tests behind 'make check'

6 years agoMerge pull request #217 from trofi/add-msan
Ulya Trofimovich [Tue, 4 Sep 2018 19:42:51 +0000 (20:42 +0100)]
Merge pull request #217 from trofi/add-msan

__alltest.sh: add clang's -fsanitize=memory flavour

6 years agoFixed bug #215 "A memory read overrun issue in s_to_n32_unsafe.cc".
Ulya Trofimovich [Tue, 4 Sep 2018 19:27:40 +0000 (20:27 +0100)]
Fixed bug #215 "A memory read overrun issue in s_to_n32_unsafe.cc".

The error was in the code of the test itself: the special case of zero
wasn't handled correctrly by the function that prepares input data for
the test. As a result, zero-length input string was passed to the test,
which is unexpected: the tested function is an "unsafe" one (as the
name suggests) and is meant to be used on an already validated input.

6 years ago__alltest.sh: add clang's -fsanitize=memory flavour 217/head
Sergei Trofimovich [Tue, 4 Sep 2018 19:16:23 +0000 (20:16 +0100)]
__alltest.sh: add clang's -fsanitize=memory flavour

Bug: https://github.com/skvadrik/re2c/issues/215
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
6 years ago.travis.yml: run all tests behind 'make check' 216/head
Sergei Trofimovich [Tue, 4 Sep 2018 18:59:05 +0000 (19:59 +0100)]
.travis.yml: run all tests behind 'make check'

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
6 years agoRelease 1.1.1. 1.1.1
Ulya Trofimovich [Thu, 30 Aug 2018 22:16:10 +0000 (23:16 +0100)]
Release 1.1.1.

6 years agoConverted tabs to spaces in .re files and autogenerated files.
Ulya Trofimovich [Thu, 30 Aug 2018 22:10:21 +0000 (23:10 +0100)]
Converted tabs to spaces in .re files and autogenerated files.

6 years agoUpdated CHANGELOG.
Ulya Trofimovich [Thu, 30 Aug 2018 22:00:56 +0000 (23:00 +0100)]
Updated CHANGELOG.

6 years agoMakefile.am: reduced redundant variables.
Ulya Trofimovich [Thu, 30 Aug 2018 21:51:53 +0000 (22:51 +0100)]
Makefile.am: reduced redundant variables.

6 years agoMakefile.am: simplified clean-up part of bootstrap rule.
Ulya Trofimovich [Thu, 30 Aug 2018 21:45:04 +0000 (22:45 +0100)]
Makefile.am: simplified clean-up part of bootstrap rule.

6 years agoRewrote version-to-vernum converter in RE2C; added more unit tests.
Ulya Trofimovich [Thu, 30 Aug 2018 21:38:42 +0000 (22:38 +0100)]
Rewrote version-to-vernum converter in RE2C; added more unit tests.

6 years agovernum: move version-string-to-vernum converter to a separate helper
Sergei Trofimovich [Tue, 28 Aug 2018 22:05:59 +0000 (23:05 +0100)]
vernum: move version-string-to-vernum converter to a separate helper

No functional change. While at it added tests
to cover past failures:
- "1.1": https://github.com/skvadrik/re2c/issues/211
- "0.14": https://sourceforge.net/p/re2c/bugs/55/

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
6 years agoRewrite vernum function
Mike Gilbert [Tue, 28 Aug 2018 16:01:07 +0000 (12:01 -0400)]
Rewrite vernum function

Fixes: https://github.com/skvadrik/re2c/issues/211
6 years agoRelease 1.1. 1.1
Ulya Trofimovich [Mon, 27 Aug 2018 21:44:50 +0000 (22:44 +0100)]
Release 1.1.

6 years agoRegenerated docs.
Ulya Trofimovich [Mon, 27 Aug 2018 20:45:44 +0000 (21:45 +0100)]
Regenerated docs.

6 years agoUpdated CHANGELOG.
Ulya Trofimovich [Mon, 27 Aug 2018 20:42:33 +0000 (21:42 +0100)]
Updated CHANGELOG.

6 years agoPaper: more tweaks in examples of trace computation.
Ulya Trofimovich [Mon, 27 Aug 2018 20:26:56 +0000 (21:26 +0100)]
Paper: more tweaks in examples of trace computation.

6 years agoIncrease allocator alignment to pointer size to avoid unaligned reads/writes.
Ulya Trofimovich [Tue, 14 Aug 2018 06:10:26 +0000 (07:10 +0100)]
Increase allocator alignment to pointer size to avoid unaligned reads/writes.

Unaligned operations found by ubsan.

6 years agoFixed memory corruption bug (caused by wrong size passed to memcpy).
Ulya Trofimovich [Mon, 13 Aug 2018 22:41:56 +0000 (23:41 +0100)]
Fixed memory corruption bug (caused by wrong size passed to memcpy).

Found by asan.

6 years agoReordered function definitions.
Ulya Trofimovich [Mon, 13 Aug 2018 22:21:57 +0000 (23:21 +0100)]
Reordered function definitions.

6 years agoMoved different closure construction algorithms to separate files.
Ulya Trofimovich [Mon, 13 Aug 2018 22:11:42 +0000 (23:11 +0100)]
Moved different closure construction algorithms to separate files.

6 years agoMoved POSIX disambiguation algorithm to a separate file.
Ulya Trofimovich [Mon, 13 Aug 2018 22:02:01 +0000 (23:02 +0100)]
Moved POSIX disambiguation algorithm to a separate file.

6 years agoConverted tabs to spaces.
Ulya Trofimovich [Mon, 13 Aug 2018 21:49:44 +0000 (22:49 +0100)]
Converted tabs to spaces.

6 years agoRenamed a couple of structs.
Ulya Trofimovich [Mon, 13 Aug 2018 21:43:16 +0000 (22:43 +0100)]
Renamed a couple of structs.

6 years agoMerged a couple of small headers into one.
Ulya Trofimovich [Mon, 13 Aug 2018 20:47:13 +0000 (21:47 +0100)]
Merged a couple of small headers into one.

6 years agoGathered all determinization-related data in a struct to avoid passing many parameters.
Ulya Trofimovich [Sun, 12 Aug 2018 19:38:11 +0000 (20:38 +0100)]
Gathered all determinization-related data in a struct to avoid passing many parameters.

6 years agoUse fixed 32-bit indices in lookup tables instead of 'size_t'.
Ulya Trofimovich [Sun, 12 Aug 2018 19:28:09 +0000 (20:28 +0100)]
Use fixed 32-bit indices in lookup tables instead of 'size_t'.

6 years agoMoved all notes (lengthy comments with names) to the beginning of file.
Ulya Trofimovich [Sat, 11 Aug 2018 08:01:57 +0000 (09:01 +0100)]
Moved all notes (lengthy comments with names) to the beginning of file.

6 years agoRearranged the code a bit with a couple of helper subroutines.
Ulya Trofimovich [Fri, 10 Aug 2018 23:59:09 +0000 (00:59 +0100)]
Rearranged the code a bit with a couple of helper subroutines.

6 years agoDon't use a dedicated struct for returning multiple values from function.
Ulya Trofimovich [Fri, 10 Aug 2018 23:32:11 +0000 (00:32 +0100)]
Don't use a dedicated struct for returning multiple values from function.

6 years agoSimplified back up / restore of tag actions when mapping TDFA states.
Ulya Trofimovich [Fri, 10 Aug 2018 23:13:04 +0000 (00:13 +0100)]
Simplified back up / restore of tag actions when mapping TDFA states.

6 years agoGathered various buffers used for TDFA state mapping in a struct.
Ulya Trofimovich [Fri, 10 Aug 2018 05:57:05 +0000 (06:57 +0100)]
Gathered various buffers used for TDFA state mapping in a struct.

6 years agoUse somewhat more consistent variable naming.
Ulya Trofimovich [Thu, 9 Aug 2018 21:28:41 +0000 (22:28 +0100)]
Use somewhat more consistent variable naming.

6 years agoReplaced Kuklewicz POSIX disambiguation algorithm with Okui algorithm.
Ulya Trofimovich [Thu, 9 Aug 2018 20:45:38 +0000 (21:45 +0100)]
Replaced Kuklewicz POSIX disambiguation algorithm with Okui algorithm.

Changes in the test results are caused by putting negative tags of the
right alternative *before* the alternative.

6 years agoPack tag index and sign into one 32-bit field.
Ulya Trofimovich [Mon, 6 Aug 2018 21:37:38 +0000 (22:37 +0100)]
Pack tag index and sign into one 32-bit field.

6 years agoCompute and store tag "height" (needed for Okui disambiguation).
Ulya Trofimovich [Sun, 5 Aug 2018 09:55:58 +0000 (10:55 +0100)]
Compute and store tag "height" (needed for Okui disambiguation).

6 years agoAlways add structural tags to the RHS of alternative/catenation in POSIX captures.
Ulya Trofimovich [Sat, 4 Aug 2018 09:44:56 +0000 (10:44 +0100)]
Always add structural tags to the RHS of alternative/catenation in POSIX captures.

(Preliminary work before switching from Kuklewicz POSIX disambiguation
algorithm to Okui algorithm.)

6 years agoDon't move the closing tag of POSIX capture group out of the enclosing iteration.
Ulya Trofimovich [Sat, 4 Aug 2018 09:25:01 +0000 (10:25 +0100)]
Don't move the closing tag of POSIX capture group out of the enclosing iteration.

RE2C used to perform the following optimization: when a POSIX capture is
under iteration, we only need to get tag values of the last iteration
(according to the POSIX standard). Therefore we can move the closing tag
out of loop.

This commit removes this optimization (as part of the effort to switch
from Kuklewicz POSIX disambiguation algorthm to Okui algorithm).

In other words, for RE (x)* re2c used to generate this "optimized" IRE:
    1 (3 x)* 4 2
and now it generates the "canonical" IRE:
    1 (3 x 4)* 2

Updated tests for '--posix-captures' that have been affected by the change.

6 years agoAllow default copy for POD struct (fixes [-Wclass-memaccess] GCC warning).
Ulya Trofimovich [Sat, 4 Aug 2018 09:09:01 +0000 (10:09 +0100)]
Allow default copy for POD struct (fixes [-Wclass-memaccess] GCC warning).

6 years agoUpdated GOR1 (fixed the core algorithm to avoid useless re-scans of the same state).
Ulya Trofimovich [Sat, 28 Jul 2018 22:30:04 +0000 (23:30 +0100)]
Updated GOR1 (fixed the core algorithm to avoid useless re-scans of the same state).

Also, depth-first traversal was done in a slightly incorrect way:
we checked outgoing nodes for admissibility and pushed the corresponding
child states on stack all at once. This is not the same as checking
the first child and recursing into it, then checking the next child,
..., and so on (because we might discover the second child while exploring
the first, and admissiblitiy check for the second child *after* that
might yield false, while *before* exploring the first child it yielded
true).

6 years agoPick the shortest available path suffix when generating skeleton path cover.
Ulya Trofimovich [Sat, 28 Jul 2018 21:52:44 +0000 (22:52 +0100)]
Pick the shortest available path suffix when generating skeleton path cover.

This also fixes a error in the generation process: sometimes in case
of loops the current node's suffix was set before all of its children
were processed.

Updated test results (in some cases .input files became larger because
of the above fix, in some cases they became smaller because we now pick
the shortest suffix).

Added new test; this one was found by slyfox's fuzzer and revealed the
above bug.

6 years agoChanged the name of a local variable in the test to avoid collision with skeleton...
Ulya Trofimovich [Sat, 28 Jul 2018 09:53:34 +0000 (10:53 +0100)]
Changed the name of a local variable in the test to avoid collision with skeleton names.

Before tags were added to re2c, skeleton programs only used a limited
number of predefined names, such as 'yych', 'yystate', etc. With tags,
however, this is no longer true as tags may have any names. So now we need
to be more cautios when picking names for sekleton variables.

This patch is only a workaround to make all tests pass; the real solution
requires inventing a good naming scheme for skeleton programs and
regenerating all skeleton test results.