Sergei Trofimovich [Tue, 4 Sep 2018 19:16:23 +0000 (20:16 +0100)]
__alltest.sh: add clang's -fsanitize=memory flavour
Bug: https://github.com/skvadrik/re2c/issues/215
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Ulya Trofimovich [Thu, 30 Aug 2018 22:16:10 +0000 (23:16 +0100)]
Release 1.1.1.
Ulya Trofimovich [Thu, 30 Aug 2018 22:10:21 +0000 (23:10 +0100)]
Converted tabs to spaces in .re files and autogenerated files.
Ulya Trofimovich [Thu, 30 Aug 2018 22:00:56 +0000 (23:00 +0100)]
Updated CHANGELOG.
Ulya Trofimovich [Thu, 30 Aug 2018 21:51:53 +0000 (22:51 +0100)]
Makefile.am: reduced redundant variables.
Ulya Trofimovich [Thu, 30 Aug 2018 21:45:04 +0000 (22:45 +0100)]
Makefile.am: simplified clean-up part of bootstrap rule.
Ulya Trofimovich [Thu, 30 Aug 2018 21:38:42 +0000 (22:38 +0100)]
Rewrote version-to-vernum converter in RE2C; added more unit tests.
Sergei Trofimovich [Tue, 28 Aug 2018 22:05:59 +0000 (23:05 +0100)]
vernum: move version-string-to-vernum converter to a separate helper
No functional change. While at it added tests
to cover past failures:
- "1.1": https://github.com/skvadrik/re2c/issues/211
- "0.14": https://sourceforge.net/p/re2c/bugs/55/
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Mike Gilbert [Tue, 28 Aug 2018 16:01:07 +0000 (12:01 -0400)]
Rewrite vernum function
Fixes: https://github.com/skvadrik/re2c/issues/211
Ulya Trofimovich [Mon, 27 Aug 2018 21:44:50 +0000 (22:44 +0100)]
Release 1.1.
Ulya Trofimovich [Mon, 27 Aug 2018 20:45:44 +0000 (21:45 +0100)]
Regenerated docs.
Ulya Trofimovich [Mon, 27 Aug 2018 20:42:33 +0000 (21:42 +0100)]
Updated CHANGELOG.
Ulya Trofimovich [Mon, 27 Aug 2018 20:26:56 +0000 (21:26 +0100)]
Paper: more tweaks in examples of trace computation.
Ulya Trofimovich [Tue, 14 Aug 2018 06:10:26 +0000 (07:10 +0100)]
Increase allocator alignment to pointer size to avoid unaligned reads/writes.
Unaligned operations found by ubsan.
Ulya Trofimovich [Mon, 13 Aug 2018 22:41:56 +0000 (23:41 +0100)]
Fixed memory corruption bug (caused by wrong size passed to memcpy).
Found by asan.
Ulya Trofimovich [Mon, 13 Aug 2018 22:21:57 +0000 (23:21 +0100)]
Reordered function definitions.
Ulya Trofimovich [Mon, 13 Aug 2018 22:11:42 +0000 (23:11 +0100)]
Moved different closure construction algorithms to separate files.
Ulya Trofimovich [Mon, 13 Aug 2018 22:02:01 +0000 (23:02 +0100)]
Moved POSIX disambiguation algorithm to a separate file.
Ulya Trofimovich [Mon, 13 Aug 2018 21:49:44 +0000 (22:49 +0100)]
Converted tabs to spaces.
Ulya Trofimovich [Mon, 13 Aug 2018 21:43:16 +0000 (22:43 +0100)]
Renamed a couple of structs.
Ulya Trofimovich [Mon, 13 Aug 2018 20:47:13 +0000 (21:47 +0100)]
Merged a couple of small headers into one.
Ulya Trofimovich [Sun, 12 Aug 2018 19:38:11 +0000 (20:38 +0100)]
Gathered all determinization-related data in a struct to avoid passing many parameters.
Ulya Trofimovich [Sun, 12 Aug 2018 19:28:09 +0000 (20:28 +0100)]
Use fixed 32-bit indices in lookup tables instead of 'size_t'.
Ulya Trofimovich [Sat, 11 Aug 2018 08:01:57 +0000 (09:01 +0100)]
Moved all notes (lengthy comments with names) to the beginning of file.
Ulya Trofimovich [Fri, 10 Aug 2018 23:59:09 +0000 (00:59 +0100)]
Rearranged the code a bit with a couple of helper subroutines.
Ulya Trofimovich [Fri, 10 Aug 2018 23:32:11 +0000 (00:32 +0100)]
Don't use a dedicated struct for returning multiple values from function.
Ulya Trofimovich [Fri, 10 Aug 2018 23:13:04 +0000 (00:13 +0100)]
Simplified back up / restore of tag actions when mapping TDFA states.
Ulya Trofimovich [Fri, 10 Aug 2018 05:57:05 +0000 (06:57 +0100)]
Gathered various buffers used for TDFA state mapping in a struct.
Ulya Trofimovich [Thu, 9 Aug 2018 21:28:41 +0000 (22:28 +0100)]
Use somewhat more consistent variable naming.
Ulya Trofimovich [Thu, 9 Aug 2018 20:45:38 +0000 (21:45 +0100)]
Replaced Kuklewicz POSIX disambiguation algorithm with Okui algorithm.
Changes in the test results are caused by putting negative tags of the
right alternative *before* the alternative.
Ulya Trofimovich [Mon, 6 Aug 2018 21:37:38 +0000 (22:37 +0100)]
Pack tag index and sign into one 32-bit field.
Ulya Trofimovich [Sun, 5 Aug 2018 09:55:58 +0000 (10:55 +0100)]
Compute and store tag "height" (needed for Okui disambiguation).
Ulya Trofimovich [Sat, 4 Aug 2018 09:44:56 +0000 (10:44 +0100)]
Always add structural tags to the RHS of alternative/catenation in POSIX captures.
(Preliminary work before switching from Kuklewicz POSIX disambiguation
algorithm to Okui algorithm.)
Ulya Trofimovich [Sat, 4 Aug 2018 09:25:01 +0000 (10:25 +0100)]
Don't move the closing tag of POSIX capture group out of the enclosing iteration.
RE2C used to perform the following optimization: when a POSIX capture is
under iteration, we only need to get tag values of the last iteration
(according to the POSIX standard). Therefore we can move the closing tag
out of loop.
This commit removes this optimization (as part of the effort to switch
from Kuklewicz POSIX disambiguation algorthm to Okui algorithm).
In other words, for RE (x)* re2c used to generate this "optimized" IRE:
1 (3 x)* 4 2
and now it generates the "canonical" IRE:
1 (3 x 4)* 2
Updated tests for '--posix-captures' that have been affected by the change.
Ulya Trofimovich [Sat, 4 Aug 2018 09:09:01 +0000 (10:09 +0100)]
Allow default copy for POD struct (fixes [-Wclass-memaccess] GCC warning).
Ulya Trofimovich [Sat, 28 Jul 2018 22:30:04 +0000 (23:30 +0100)]
Updated GOR1 (fixed the core algorithm to avoid useless re-scans of the same state).
Also, depth-first traversal was done in a slightly incorrect way:
we checked outgoing nodes for admissibility and pushed the corresponding
child states on stack all at once. This is not the same as checking
the first child and recursing into it, then checking the next child,
..., and so on (because we might discover the second child while exploring
the first, and admissiblitiy check for the second child *after* that
might yield false, while *before* exploring the first child it yielded
true).
Ulya Trofimovich [Sat, 28 Jul 2018 21:52:44 +0000 (22:52 +0100)]
Pick the shortest available path suffix when generating skeleton path cover.
This also fixes a error in the generation process: sometimes in case
of loops the current node's suffix was set before all of its children
were processed.
Updated test results (in some cases .input files became larger because
of the above fix, in some cases they became smaller because we now pick
the shortest suffix).
Added new test; this one was found by slyfox's fuzzer and revealed the
above bug.
Ulya Trofimovich [Sat, 28 Jul 2018 09:53:34 +0000 (10:53 +0100)]
Changed the name of a local variable in the test to avoid collision with skeleton names.
Before tags were added to re2c, skeleton programs only used a limited
number of predefined names, such as 'yych', 'yystate', etc. With tags,
however, this is no longer true as tags may have any names. So now we need
to be more cautios when picking names for sekleton variables.
This patch is only a workaround to make all tests pass; the real solution
requires inventing a good naming scheme for skeleton programs and
regenerating all skeleton test results.
Ulya Trofimovich [Sat, 28 Jul 2018 09:36:08 +0000 (10:36 +0100)]
Paper: more tweaks of GOR1.
Ulya Trofimovich [Sat, 28 Jul 2018 09:21:37 +0000 (10:21 +0100)]
Fixed error in calculation of maximal skeleton path length.
The error was found by slyfox's fuzzer (a randomly-generated skeleton test).
The bug in the code was, apparently, too early modification of the state's
estimated maximal distance to the end states: the distance was set before
all of the state's children were processed, which resulted in aborting the
accumulation of distance from the remaining children, and, as a consequence,
shorter than necessary max distance for the root itself.
Ulya Trofimovich [Wed, 25 Jul 2018 21:12:23 +0000 (22:12 +0100)]
Paper: updated version of GOR1.
Ulya Trofimovich [Fri, 6 Jul 2018 23:01:15 +0000 (00:01 +0100)]
Paper: some tweaks for the examples of traces computation.
Ulya Trofimovich [Mon, 2 Jul 2018 22:06:40 +0000 (23:06 +0100)]
Paper: another example of traces computation.
Ulya Trofimovich [Sat, 30 Jun 2018 20:23:13 +0000 (21:23 +0100)]
Paper: added example of PEs and traces computation.
Ulya Trofimovich [Mon, 25 Jun 2018 21:42:33 +0000 (22:42 +0100)]
Fixed processing of #line directives in input files.
The correct behaviour was broken somewhere in between 0.16 and 1.0:
re2c was forgetting to output the chunk of input file that precedes
the #line directive.
Reported by pskocik in #98.
Ulya Trofimovich [Sun, 24 Jun 2018 22:06:06 +0000 (23:06 +0100)]
Paper: re-worked the theorem about compatibility of total and partial orders.
Ulya Trofimovich [Sun, 24 Jun 2018 08:37:48 +0000 (09:37 +0100)]
Paper: started re-working the theorem about compatibility of total and partial orders.
Ulya Trofimovich [Sat, 23 Jun 2018 10:06:02 +0000 (11:06 +0100)]
Paper: made example about parse trees consistent with its description.
Ulya Trofimovich [Sat, 23 Jun 2018 09:57:58 +0000 (10:57 +0100)]
Paper: continued restructuring the part about indexed parse trees.
Ulya Trofimovich [Wed, 20 Jun 2018 21:29:17 +0000 (22:29 +0100)]
Paper: restructured the IRE construction example.
Ulya Trofimovich [Mon, 18 Jun 2018 22:14:22 +0000 (23:14 +0100)]
Paper: added an example of IRE construction.
Ulya Trofimovich [Sun, 17 Jun 2018 09:21:02 +0000 (10:21 +0100)]
Paper: added introduction to the second chapter.
Ulya Trofimovich [Sat, 16 Jun 2018 09:41:18 +0000 (10:41 +0100)]
Paper: revise basic definitions before introducing partial order on trees.
Ulya Trofimovich [Wed, 13 Jun 2018 22:00:45 +0000 (23:00 +0100)]
paper: taken care of Angelo's remarks.
Ulya Trofimovich [Mon, 11 Jun 2018 20:27:27 +0000 (21:27 +0100)]
Added option "--conditions" (an alias for "-c" and "--start-conditions").
Fixes issue #206 "wrong long option for -c mode".
Ulya Trofimovich [Thu, 24 May 2018 22:44:23 +0000 (23:44 +0100)]
Added first part of TDFA paper v2.
Ulya Trofimovich [Wed, 25 Apr 2018 21:49:15 +0000 (22:49 +0100)]
Improved error reporting in fuzz-testing script.
Ulya Trofimovich [Sat, 14 Apr 2018 20:50:32 +0000 (21:50 +0100)]
If the input starts with a re2c block, apply re2c configurations immediately. (see #201).
Ulya Trofimovich [Fri, 13 Apr 2018 23:23:35 +0000 (00:23 +0100)]
Escape backslashes in file names (see #201).
Ulya Trofimovich [Wed, 8 Nov 2017 20:40:53 +0000 (20:40 +0000)]
Release 1.0.3.
Ulya Trofimovich [Wed, 8 Nov 2017 07:19:21 +0000 (07:19 +0000)]
Fix for #198.
GCC-4.2.1 is unable to compile code like this:
std::vector<int> v;
std::vector<int>::const_reverse_iterator i;
for (i = v.rbegin(); i != v.rend(); ++i) ;
It's unable to deduce const overload for 'rend':
"no match for ‘operator!=’ in ‘i != std::vector<_Tp, _Alloc>::rend()"
However, the following code compiles fine:
std::vector<int> v;
std::vector<int>::const_reverse_iterator i = v.rbegin(), e = v.rend();
for (i != e; ++i) ;
This was reported by Ryan Shmidt.
Ulya Trofimovich [Thu, 14 Sep 2017 19:08:37 +0000 (20:08 +0100)]
Fixed typo in docs (found by Maxim Reznik).
Ulya Trofimovich [Mon, 28 Aug 2017 16:33:46 +0000 (17:33 +0100)]
Removed unaccurate example.
Parsing floating-point numbers is hard and re2c doesn't help much,
so this example was somewhat misleading.
Ulya Trofimovich [Sat, 26 Aug 2017 20:07:06 +0000 (21:07 +0100)]
Release 1.0.2.
Ulya Trofimovich [Sat, 26 Aug 2017 20:02:22 +0000 (21:02 +0100)]
Updated changelog.
Ulya Trofimovich [Sat, 26 Aug 2017 19:26:26 +0000 (20:26 +0100)]
Some more fixes to the documentation.
Ulya Trofimovich [Sat, 26 Aug 2017 18:10:24 +0000 (19:10 +0100)]
Updated documentation.
Ulya Trofimovich [Sat, 26 Aug 2017 09:31:35 +0000 (10:31 +0100)]
Disallow condition names and named definitions to start with digit.
This has always been the intended behavior and was accidentally broken
by commit
e3db638fc3e9bfb318edafedbefd02f25f1c1b8c.
Ulya Trofimovich [Tue, 22 Aug 2017 20:39:33 +0000 (21:39 +0100)]
Renamed tests.
Ulya Trofimovich [Tue, 22 Aug 2017 17:06:55 +0000 (18:06 +0100)]
Updated examples and added them to 'run_tests.sh' script.
Ulya Trofimovich [Tue, 22 Aug 2017 08:15:47 +0000 (09:15 +0100)]
Updated changelog.
Ulya Trofimovich [Tue, 22 Aug 2017 08:09:35 +0000 (09:09 +0100)]
Added forgotten 'genhelp.sh' to distribution files.
This fixes bug #194 "Build with "--enable-docs" fails".
Ulya Trofimovich [Fri, 18 Aug 2017 15:09:59 +0000 (16:09 +0100)]
Added examples to test suite.
Ulya Trofimovich [Wed, 16 Aug 2017 17:43:13 +0000 (18:43 +0100)]
Added benchmarks to test suite.
Ulya Trofimovich [Fri, 11 Aug 2017 21:43:05 +0000 (22:43 +0100)]
Updated changelog for 1.0.1 version.
Ulya Trofimovich [Fri, 11 Aug 2017 21:31:09 +0000 (22:31 +0100)]
Release 1.0.1.
Ulya Trofimovich [Fri, 11 Aug 2017 21:16:04 +0000 (22:16 +0100)]
Makefile.am: add paper on Lookahead TDFA to distribution.
Ulya Trofimovich [Fri, 11 Aug 2017 21:04:05 +0000 (22:04 +0100)]
Fixed #193: "1.0 build failure on macOS: error: calling a private constructor of class 're2c::Rule'".
Copy constructor and assignment are requred by std::valarray
implementation on macOS.
Ulya Trofimovich [Fri, 11 Aug 2017 13:46:23 +0000 (14:46 +0100)]
Release 1.0.
Ulya Trofimovich [Fri, 11 Aug 2017 11:52:10 +0000 (12:52 +0100)]
Paper on lookahead TDFA: finished.
Ulya Trofimovich [Thu, 10 Aug 2017 15:03:46 +0000 (16:03 +0100)]
Updated help and manpage.
Ulya Trofimovich [Thu, 10 Aug 2017 12:25:07 +0000 (13:25 +0100)]
Leave the definition of 'yynmatch' and 'yypmatch' to the user.
With '--posix-captures' RE2C stores submatch results in 'yynmatch'
(the total number of capturing groups for the matching rule) and
'yypmatch' (an array of submatch values for each group).
These variables should be user-defined, so that users can override
default implementation (e.g. make 'yypmatch' an array of integer
offsets rather than an array of pointers). Overriding is only possible
with generic API: if default API is used, then RE2C can autogenerate
'yynmatch' and 'yypmatch' (and so it did prior to this commit).
However, it is better to have the same behavior with both APIs; also,
it is coherent with '--tags' option (RE2C leaves tag definition to
the user).
Ulya Trofimovich [Wed, 9 Aug 2017 17:35:15 +0000 (18:35 +0100)]
Updated options list and regenerated docs.
Ulya Trofimovich [Wed, 9 Aug 2017 17:07:50 +0000 (18:07 +0100)]
Added short option '-P' corresponding to '--posix-captures'.
Ulya Trofimovich [Wed, 9 Aug 2017 16:17:09 +0000 (17:17 +0100)]
Fixed includes with 'include-what-you-use'.
Command:
$ configure CXX=include-what-you-use CXXFLAGS="--check-also" \
&& make -k 2>log \
&& python2 `which fix_inclydes.py` < log
Ulya Trofimovich [Wed, 9 Aug 2017 13:04:10 +0000 (14:04 +0100)]
Paper on Lookahead TDFA: added bibliography.
Ulya Trofimovich [Wed, 9 Aug 2017 07:47:51 +0000 (08:47 +0100)]
Amended README instructions for benchmarks.
Ulya Trofimovich [Mon, 7 Aug 2017 11:57:30 +0000 (12:57 +0100)]
Paper on Lookahead TDFA: added pictures.
Ulya Trofimovich [Mon, 7 Aug 2017 11:54:29 +0000 (12:54 +0100)]
Paper on Lookahead TDFA: fixed captions and ran through aspell.
Ulya Trofimovich [Sat, 5 Aug 2017 08:33:30 +0000 (09:33 +0100)]
Paper on Lookahead TDFA: added benchmark results and graphs.
Ulya Trofimovich [Fri, 4 Aug 2017 08:57:58 +0000 (09:57 +0100)]
Paper on Lookahead TDFA: reformatted examples.
Ulya Trofimovich [Fri, 4 Aug 2017 08:52:14 +0000 (09:52 +0100)]
Tweaked CXXFLAGS in asan build script.
Ulya Trofimovich [Fri, 4 Aug 2017 08:50:59 +0000 (09:50 +0100)]
A small tweak in benchmarking scripts that reduces warmup time.
Ulya Trofimovich [Thu, 3 Aug 2017 11:14:19 +0000 (12:14 +0100)]
Fuzzers: a bunch of small tweaks.
Ulya Trofimovich [Thu, 3 Aug 2017 10:52:49 +0000 (11:52 +0100)]
Skeleton: fixed initialization of maximal path length.
Broken by commit
fffb5932ee52127e03b9f7f5ccca83a421d69061.
Path length were initialized with 0 instead 'DIST_ERROR', which caused
incorrect calculation of maximal path length. This in turn caused errors
in estimating the number of byted necessary to hold keys during data
generation in skeleton. The resulting keys were one-byte while maximal
path length was more than one byte, which (fortunately!) caused runtime
errors in skeleton programs.
Example of program that caused skeleton error:
/*!re2c
(@t [\x00] [^]{5,6})* {}
*/
The error was hidden for so long because in practice inputs that need
more than one-byte keys are rare, and fuzzer sets 'ulimit -t 10' when
running re2c, so most of such programs were simply aborted. Those that
were not aborted still had a chance of estimating key size correctly.
Ulya Trofimovich [Wed, 2 Aug 2017 22:11:13 +0000 (23:11 +0100)]
Fixed cppcheck 'style' warnings.
Used the following command to run cppcheck:
cppcheck --enable=all --inconclusive --std=posix --quiet --force -I. src/
Ulya Trofimovich [Wed, 2 Aug 2017 20:43:36 +0000 (21:43 +0100)]
Fixed headers that were not self-contained.
Used the following command to find errors:
for h in $(find src/ -name '*.h*'); do echo "CHECKING $h"; g++ -I. -c $h -o foo.o; done
Ulya Trofimovich [Wed, 2 Aug 2017 17:07:16 +0000 (18:07 +0100)]
Benchmarks: added README and small samples of input data.
Ulya Trofimovich [Wed, 2 Aug 2017 14:49:07 +0000 (15:49 +0100)]
Added scripts that run benchmarks.
Ulya Trofimovich [Tue, 1 Aug 2017 15:41:58 +0000 (16:41 +0100)]
Added fuzzers (contributed by Sergei Trofimovich).