Ulya Trofimovich [Thu, 10 Aug 2017 12:25:07 +0000 (13:25 +0100)]
Leave the definition of 'yynmatch' and 'yypmatch' to the user.
With '--posix-captures' RE2C stores submatch results in 'yynmatch'
(the total number of capturing groups for the matching rule) and
'yypmatch' (an array of submatch values for each group).
These variables should be user-defined, so that users can override
default implementation (e.g. make 'yypmatch' an array of integer
offsets rather than an array of pointers). Overriding is only possible
with generic API: if default API is used, then RE2C can autogenerate
'yynmatch' and 'yypmatch' (and so it did prior to this commit).
However, it is better to have the same behavior with both APIs; also,
it is coherent with '--tags' option (RE2C leaves tag definition to
the user).
Path length were initialized with 0 instead 'DIST_ERROR', which caused
incorrect calculation of maximal path length. This in turn caused errors
in estimating the number of byted necessary to hold keys during data
generation in skeleton. The resulting keys were one-byte while maximal
path length was more than one byte, which (fortunately!) caused runtime
errors in skeleton programs.
Example of program that caused skeleton error:
/*!re2c
(@t [\x00] [^]{5,6})* {}
*/
The error was hidden for so long because in practice inputs that need
more than one-byte keys are rare, and fuzzer sets 'ulimit -t 10' when
running re2c, so most of such programs were simply aborted. Those that
were not aborted still had a chance of estimating key size correctly.
Makefile.am: use portable POSIX primitive to get directory name.
Use '$(@D)' instead of '$(dir $@)', as the latter requires secondary
expansion feature specific to GNU make; it causes build failures on
bmake.
'$(@D)', on the other hand, is a POSIX make feature documented e.g. in
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html
(section "Internal Macros").
Makefile.am: fixed other custom rules to avoid writing into unexistent directory.
The fixed rules do not presently trigger any build errors: target directory
is created by configure as it need to put autogenerated files doc/manpage.rst
and doc/help.rst in it.
However, this behaviour is occasional: if one removes the .in files, build
failure would be unmasked. So it makes sense to ensure that target directory
exists.
Ross Burton [Mon, 31 Jul 2017 14:43:41 +0000 (15:43 +0100)]
Makefile.am: create target directory before writing into it
In some situations src/parse/ may not exist before a file is copied into the
directory. Ensure that this doesn't happen by creating the directory first.
Don't assert that comparator arguments are non-equal.
Comparator is used in 'std::sort'.
All items in closure have unnique TNFA states, therefore we assumed
that the compared items must always be different. However, 'std::sort'
does not have this requirement and some implementations of it compare
the element with itself.
The removed assert caused crashes with old version of GCC (4.2).
Thanks to Sergei Trofimovich for debugging the issue.
Ross Burton [Mon, 31 Jul 2017 14:43:41 +0000 (15:43 +0100)]
Makefile.am: create target directory before writing into it
In some situations src/parse/ may not exist before a file is copied into the
directory. Ensure that this doesn't happen by creating the directory first.
With CXXFLAGS='-fsanitize=undefined' GCC complains about unaligned access:
if custom allocator is used to allocate structs or other alignment-sensitive
things, then it must take care of the alignment (for example, add padding
to all unaligned blocks of memory which it allocates).
Recognize newlines in character strings and classes.
As for now, newline inside of a character string or class is an error:
re2c should emit clear error message. Different styles of newlines
should be recognized ("\n", "\r\n").
This commit fixes bug #162 reported by pauloscustodio:
Reading files with "rb" causes issues in Windows
Fixed line endings in output files on Windows (#162, #163).
This fix consists of two issues, both reported and fixed by pauloscustodio.
1. #162 "Open text files with "wb" causes issues on Windows"
Text files need to be opened for writing with "w", so that stdio does
the right thing in respect to the correct line endings for the current OS.
("\r\n" in Windows, "\n" in Linux).
2. #163 "Reading files with "rb" causes issues in Windows"
re2c reads input files in binary mode and writes the generated output in
text mode. This caused CR LF conversion to CR CR LF on Windows: first CR
comes from reading input in binary mode, second CR is added when writing
output in text mode. This only happened to those parts of input which are
not transformed by re2c: we used to copy-paste verbatim, now we patch line
endings. Now we convert all line endings to LF before writing the generated
code to file.
Ulya Trofimovich [Sat, 25 Jun 2016 15:22:08 +0000 (16:22 +0100)]
Fixed #147 "Please add symbol name to "can't find symbol" error message".
As suggested by sirzooro:
Please add symbol name to "can't find symbol" error message,
it would allow to quickly spot what is wrong. Now we have to
position cursor at given row and column to find that name.
Also tweaked error reporitng function to append "..." at the end
of the message if it didn't fit into buffer.
Ulya Trofimovich [Fri, 24 Jun 2016 21:46:16 +0000 (22:46 +0100)]
Fixed bug #145 "Values for enum YYCONDTYPE are not generated
when default rules with conditions are used".
Default rule is handled in a special (delayed) way;
re2c uses different code for default rule than for normal rules.
This special code simply forgot to add condition name to the list
of conditions.
run_tests.sh: fix permissions after copying source files to build directory.
`make distcheck` protects source files from writing.
Test script run_tests.sh copies source files into build directory,
but the copied files inherit permissions, so `make distcheck` fails.
run_tests.sh: patch line endings in the generated file.
Line endings in the generated code depend on the target platform: e.g.,
"\r\n" on Windows vs. "\n" on Linux. However, reference test results are
(currently) generated on Linux and therefore contain "\n" line endings.
So we have to patch line endings in the generated code in order to pass
the tests on Windows.
Testing script did patch line endings in stdout and stderr, but forgot
to patch them in the generated file (it was broken since we started to
use '-o' option for testing). This commit fixes testing script.
It also deletes a couple of tests in which source code contains "\r\n"
instead of "\n". These tests are duplicates of other tests (they were
added by commit bd2875441cae4ab3934bfafcd34728021295b842 supposedly to
test that re2c preserves line endings in source code). They are broken
by current commit and fixing them is probably not worth of the effort.
Thanks to Abs62, who noted that under Windows (in MSYS) tests fail
because '2>"$outc.stderr"' dumps CRLF to file instead of LF
and proposed a fix:
sed -i 's/\r//g' "$outc.stderr"
Explicitly pass line/column info in all error messages.
Updated tests. Some error messages are more precise now, e.g. ill-formed
character classes and escape sequences: column points to the beginning
of the faulty lexeme rather than to the middle of it where the error
occured. Other error messages are less precise (lack column info), but
the column reported before was too inexact and didn't make much sense.
POSIX disambiguation: use the same comparison algorithm for orbit and non-orbit tags.
Previously we needed a different algorithm for non-orbit tags, because
disambiguation was based on both start and end tags. Non-orbit start tags
cannot be compared incrementally, like orbit tags, becuse default value
may be discovered on a later step than non-default value. Non-orbit end
tags do not have this problem: since negative tags are inserted at the end
of alternatives, default value is always discovered on the same step as
non-default value (provided that all higher-priority tags agree and
comparison reaches this tag at all).
Now that start tags are ignored, we can use incremental comparison for both
orbit and non-orbit subhistories, which simplifies the code.
Nicer output with '--dump-dfa-raw' and '--posix-captures'.
Don't add closure items to "shadowed" set if there is an identical
"unshadowed" item: otherwise Goldberg-Radzik algorithm generates too much
"shadowed" items and the output becomes too noisy.
Use different closure algorithms for leftmost greedy and POSIX policies.
With leftmost greedy policy we can use simple depth-first search.
With POSIX policy we need Goldberg-Radzik algorithm, which is more complex
(and the necessity to accommodate both policies complicates it even more).