Don't move the closing tag of POSIX capture group out of the enclosing iteration.
RE2C used to perform the following optimization: when a POSIX capture is
under iteration, we only need to get tag values of the last iteration
(according to the POSIX standard). Therefore we can move the closing tag
out of loop.
This commit removes this optimization (as part of the effort to switch
from Kuklewicz POSIX disambiguation algorthm to Okui algorithm).
In other words, for RE (x)* re2c used to generate this "optimized" IRE:
1 (3 x)* 4 2
and now it generates the "canonical" IRE:
1 (3 x 4)* 2
Updated tests for '--posix-captures' that have been affected by the change.
Updated GOR1 (fixed the core algorithm to avoid useless re-scans of the same state).
Also, depth-first traversal was done in a slightly incorrect way:
we checked outgoing nodes for admissibility and pushed the corresponding
child states on stack all at once. This is not the same as checking
the first child and recursing into it, then checking the next child,
..., and so on (because we might discover the second child while exploring
the first, and admissiblitiy check for the second child *after* that
might yield false, while *before* exploring the first child it yielded
true).
Pick the shortest available path suffix when generating skeleton path cover.
This also fixes a error in the generation process: sometimes in case
of loops the current node's suffix was set before all of its children
were processed.
Updated test results (in some cases .input files became larger because
of the above fix, in some cases they became smaller because we now pick
the shortest suffix).
Added new test; this one was found by slyfox's fuzzer and revealed the
above bug.
Changed the name of a local variable in the test to avoid collision with skeleton names.
Before tags were added to re2c, skeleton programs only used a limited
number of predefined names, such as 'yych', 'yystate', etc. With tags,
however, this is no longer true as tags may have any names. So now we need
to be more cautios when picking names for sekleton variables.
This patch is only a workaround to make all tests pass; the real solution
requires inventing a good naming scheme for skeleton programs and
regenerating all skeleton test results.
Fixed error in calculation of maximal skeleton path length.
The error was found by slyfox's fuzzer (a randomly-generated skeleton test).
The bug in the code was, apparently, too early modification of the state's
estimated maximal distance to the end states: the distance was set before
all of the state's children were processed, which resulted in aborting the
accumulation of distance from the remaining children, and, as a consequence,
shorter than necessary max distance for the root itself.
Ulya Trofimovich [Mon, 25 Jun 2018 21:42:33 +0000 (22:42 +0100)]
Fixed processing of #line directives in input files.
The correct behaviour was broken somewhere in between 0.16 and 1.0:
re2c was forgetting to output the chunk of input file that precedes
the #line directive.
Ulya Trofimovich [Thu, 10 Aug 2017 12:25:07 +0000 (13:25 +0100)]
Leave the definition of 'yynmatch' and 'yypmatch' to the user.
With '--posix-captures' RE2C stores submatch results in 'yynmatch'
(the total number of capturing groups for the matching rule) and
'yypmatch' (an array of submatch values for each group).
These variables should be user-defined, so that users can override
default implementation (e.g. make 'yypmatch' an array of integer
offsets rather than an array of pointers). Overriding is only possible
with generic API: if default API is used, then RE2C can autogenerate
'yynmatch' and 'yypmatch' (and so it did prior to this commit).
However, it is better to have the same behavior with both APIs; also,
it is coherent with '--tags' option (RE2C leaves tag definition to
the user).
Path length were initialized with 0 instead 'DIST_ERROR', which caused
incorrect calculation of maximal path length. This in turn caused errors
in estimating the number of byted necessary to hold keys during data
generation in skeleton. The resulting keys were one-byte while maximal
path length was more than one byte, which (fortunately!) caused runtime
errors in skeleton programs.
Example of program that caused skeleton error:
/*!re2c
(@t [\x00] [^]{5,6})* {}
*/
The error was hidden for so long because in practice inputs that need
more than one-byte keys are rare, and fuzzer sets 'ulimit -t 10' when
running re2c, so most of such programs were simply aborted. Those that
were not aborted still had a chance of estimating key size correctly.
Makefile.am: use portable POSIX primitive to get directory name.
Use '$(@D)' instead of '$(dir $@)', as the latter requires secondary
expansion feature specific to GNU make; it causes build failures on
bmake.
'$(@D)', on the other hand, is a POSIX make feature documented e.g. in
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html
(section "Internal Macros").
Makefile.am: fixed other custom rules to avoid writing into unexistent directory.
The fixed rules do not presently trigger any build errors: target directory
is created by configure as it need to put autogenerated files doc/manpage.rst
and doc/help.rst in it.
However, this behaviour is occasional: if one removes the .in files, build
failure would be unmasked. So it makes sense to ensure that target directory
exists.
Ross Burton [Mon, 31 Jul 2017 14:43:41 +0000 (15:43 +0100)]
Makefile.am: create target directory before writing into it
In some situations src/parse/ may not exist before a file is copied into the
directory. Ensure that this doesn't happen by creating the directory first.
Don't assert that comparator arguments are non-equal.
Comparator is used in 'std::sort'.
All items in closure have unnique TNFA states, therefore we assumed
that the compared items must always be different. However, 'std::sort'
does not have this requirement and some implementations of it compare
the element with itself.
The removed assert caused crashes with old version of GCC (4.2).
Thanks to Sergei Trofimovich for debugging the issue.
Ross Burton [Mon, 31 Jul 2017 14:43:41 +0000 (15:43 +0100)]
Makefile.am: create target directory before writing into it
In some situations src/parse/ may not exist before a file is copied into the
directory. Ensure that this doesn't happen by creating the directory first.
With CXXFLAGS='-fsanitize=undefined' GCC complains about unaligned access:
if custom allocator is used to allocate structs or other alignment-sensitive
things, then it must take care of the alignment (for example, add padding
to all unaligned blocks of memory which it allocates).