Don't move the closing tag of POSIX capture group out of the enclosing iteration.
RE2C used to perform the following optimization: when a POSIX capture is
under iteration, we only need to get tag values of the last iteration
(according to the POSIX standard). Therefore we can move the closing tag
out of loop.
This commit removes this optimization (as part of the effort to switch
from Kuklewicz POSIX disambiguation algorthm to Okui algorithm).
In other words, for RE (x)* re2c used to generate this "optimized" IRE:
1 (3 x)* 4 2
and now it generates the "canonical" IRE:
1 (3 x 4)* 2
Updated tests for '--posix-captures' that have been affected by the change.
Updated GOR1 (fixed the core algorithm to avoid useless re-scans of the same state).
Also, depth-first traversal was done in a slightly incorrect way:
we checked outgoing nodes for admissibility and pushed the corresponding
child states on stack all at once. This is not the same as checking
the first child and recursing into it, then checking the next child,
..., and so on (because we might discover the second child while exploring
the first, and admissiblitiy check for the second child *after* that
might yield false, while *before* exploring the first child it yielded
true).
Pick the shortest available path suffix when generating skeleton path cover.
This also fixes a error in the generation process: sometimes in case
of loops the current node's suffix was set before all of its children
were processed.
Updated test results (in some cases .input files became larger because
of the above fix, in some cases they became smaller because we now pick
the shortest suffix).
Added new test; this one was found by slyfox's fuzzer and revealed the
above bug.
Changed the name of a local variable in the test to avoid collision with skeleton names.
Before tags were added to re2c, skeleton programs only used a limited
number of predefined names, such as 'yych', 'yystate', etc. With tags,
however, this is no longer true as tags may have any names. So now we need
to be more cautios when picking names for sekleton variables.
This patch is only a workaround to make all tests pass; the real solution
requires inventing a good naming scheme for skeleton programs and
regenerating all skeleton test results.
Fixed error in calculation of maximal skeleton path length.
The error was found by slyfox's fuzzer (a randomly-generated skeleton test).
The bug in the code was, apparently, too early modification of the state's
estimated maximal distance to the end states: the distance was set before
all of the state's children were processed, which resulted in aborting the
accumulation of distance from the remaining children, and, as a consequence,
shorter than necessary max distance for the root itself.
Ulya Trofimovich [Mon, 25 Jun 2018 21:42:33 +0000 (22:42 +0100)]
Fixed processing of #line directives in input files.
The correct behaviour was broken somewhere in between 0.16 and 1.0:
re2c was forgetting to output the chunk of input file that precedes
the #line directive.
Ulya Trofimovich [Thu, 10 Aug 2017 12:25:07 +0000 (13:25 +0100)]
Leave the definition of 'yynmatch' and 'yypmatch' to the user.
With '--posix-captures' RE2C stores submatch results in 'yynmatch'
(the total number of capturing groups for the matching rule) and
'yypmatch' (an array of submatch values for each group).
These variables should be user-defined, so that users can override
default implementation (e.g. make 'yypmatch' an array of integer
offsets rather than an array of pointers). Overriding is only possible
with generic API: if default API is used, then RE2C can autogenerate
'yynmatch' and 'yypmatch' (and so it did prior to this commit).
However, it is better to have the same behavior with both APIs; also,
it is coherent with '--tags' option (RE2C leaves tag definition to
the user).
Path length were initialized with 0 instead 'DIST_ERROR', which caused
incorrect calculation of maximal path length. This in turn caused errors
in estimating the number of byted necessary to hold keys during data
generation in skeleton. The resulting keys were one-byte while maximal
path length was more than one byte, which (fortunately!) caused runtime
errors in skeleton programs.
Example of program that caused skeleton error:
/*!re2c
(@t [\x00] [^]{5,6})* {}
*/
The error was hidden for so long because in practice inputs that need
more than one-byte keys are rare, and fuzzer sets 'ulimit -t 10' when
running re2c, so most of such programs were simply aborted. Those that
were not aborted still had a chance of estimating key size correctly.
Makefile.am: use portable POSIX primitive to get directory name.
Use '$(@D)' instead of '$(dir $@)', as the latter requires secondary
expansion feature specific to GNU make; it causes build failures on
bmake.
'$(@D)', on the other hand, is a POSIX make feature documented e.g. in
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html
(section "Internal Macros").
Makefile.am: fixed other custom rules to avoid writing into unexistent directory.
The fixed rules do not presently trigger any build errors: target directory
is created by configure as it need to put autogenerated files doc/manpage.rst
and doc/help.rst in it.
However, this behaviour is occasional: if one removes the .in files, build
failure would be unmasked. So it makes sense to ensure that target directory
exists.
Ross Burton [Mon, 31 Jul 2017 14:43:41 +0000 (15:43 +0100)]
Makefile.am: create target directory before writing into it
In some situations src/parse/ may not exist before a file is copied into the
directory. Ensure that this doesn't happen by creating the directory first.
Don't assert that comparator arguments are non-equal.
Comparator is used in 'std::sort'.
All items in closure have unnique TNFA states, therefore we assumed
that the compared items must always be different. However, 'std::sort'
does not have this requirement and some implementations of it compare
the element with itself.
The removed assert caused crashes with old version of GCC (4.2).
Thanks to Sergei Trofimovich for debugging the issue.
Ross Burton [Mon, 31 Jul 2017 14:43:41 +0000 (15:43 +0100)]
Makefile.am: create target directory before writing into it
In some situations src/parse/ may not exist before a file is copied into the
directory. Ensure that this doesn't happen by creating the directory first.
With CXXFLAGS='-fsanitize=undefined' GCC complains about unaligned access:
if custom allocator is used to allocate structs or other alignment-sensitive
things, then it must take care of the alignment (for example, add padding
to all unaligned blocks of memory which it allocates).
Recognize newlines in character strings and classes.
As for now, newline inside of a character string or class is an error:
re2c should emit clear error message. Different styles of newlines
should be recognized ("\n", "\r\n").
This commit fixes bug #162 reported by pauloscustodio:
Reading files with "rb" causes issues in Windows
Fixed line endings in output files on Windows (#162, #163).
This fix consists of two issues, both reported and fixed by pauloscustodio.
1. #162 "Open text files with "wb" causes issues on Windows"
Text files need to be opened for writing with "w", so that stdio does
the right thing in respect to the correct line endings for the current OS.
("\r\n" in Windows, "\n" in Linux).
2. #163 "Reading files with "rb" causes issues in Windows"
re2c reads input files in binary mode and writes the generated output in
text mode. This caused CR LF conversion to CR CR LF on Windows: first CR
comes from reading input in binary mode, second CR is added when writing
output in text mode. This only happened to those parts of input which are
not transformed by re2c: we used to copy-paste verbatim, now we patch line
endings. Now we convert all line endings to LF before writing the generated
code to file.
Ulya Trofimovich [Sat, 25 Jun 2016 15:22:08 +0000 (16:22 +0100)]
Fixed #147 "Please add symbol name to "can't find symbol" error message".
As suggested by sirzooro:
Please add symbol name to "can't find symbol" error message,
it would allow to quickly spot what is wrong. Now we have to
position cursor at given row and column to find that name.
Also tweaked error reporitng function to append "..." at the end
of the message if it didn't fit into buffer.
Ulya Trofimovich [Fri, 24 Jun 2016 21:46:16 +0000 (22:46 +0100)]
Fixed bug #145 "Values for enum YYCONDTYPE are not generated
when default rules with conditions are used".
Default rule is handled in a special (delayed) way;
re2c uses different code for default rule than for normal rules.
This special code simply forgot to add condition name to the list
of conditions.
run_tests.sh: fix permissions after copying source files to build directory.
`make distcheck` protects source files from writing.
Test script run_tests.sh copies source files into build directory,
but the copied files inherit permissions, so `make distcheck` fails.