granicus.if.org Git

]> granicus.if.org Git - re2c/log

projects / re2c / log

commit | commitdiff | tree

Ulya Trofimovich [Sat, 28 Nov 2015 15:44:04 +0000 (15:44 +0000)]

run_tests.sh: use '/usr/bin/env bash' to locate bash.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 28 Nov 2015 15:39:56 +0000 (15:39 +0000)]

Makefile.am: use '=' instead of '==' to compare strings.

'==' appears to be a bash feature.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 28 Nov 2015 11:36:41 +0000 (11:36 +0000)]

Don't use overloaded constructors with integral types.

This causes ambiguity in overload resolution on OS X:

    src/codegen/skeleton/generate_data.cc:308:30: error: ambiguous conversion for functional-style cast from 'const size_t' (aka 'const unsigned long') to 'Node::covers_t'
          (aka 'u32lim_t<1024 * 1024 * 1024>')
            const Node::covers_t size = Node::covers_t (len) * Node::covers_t (count);
                                        ^~~~~~~~~~~~~~~~~~~
    ./src/util/u32lim.h:20:11: note: candidate constructor
            explicit u32lim_t (uint32_t x)
                     ^
    ./src/util/u32lim.h:23:11: note: candidate constructor
            explicit u32lim_t (uint64_t x)

Use static constructor-like methods with expliit names.

commit | commitdiff | tree

Oleksii Taran [Sat, 28 Nov 2015 04:08:09 +0000 (20:08 -0800)]

Fix "CODE" symbol collision on OS X (see #122)

On OS X bison generates token enums as CPP macro
constants (y.tab.h):
    #define CODE 260
while on my box it's
   enum yytokentype {
     ...
     CODE = 260,
     ...
   };

That #define causes symbol collision as:

    ../src/parse/lex.re:169:38: error: expected unqualified-id
                                            else if (opts->target == opt_t::CODE)
                                                                            ^
    src/parse/y.tab.h:58:14: note: expanded from macro 'CODE'
    #define CODE 260

Renamed enum entry to TOKEN_CODE.

commit | commitdiff | tree

Ulya Trofimovich [Fri, 27 Nov 2015 14:29:16 +0000 (14:29 +0000)]

Allowed chaining for all 'OutputFile' methods; renamed them in a uniform way.

commit | commitdiff | tree

Ulya Trofimovich [Fri, 27 Nov 2015 13:58:29 +0000 (13:58 +0000)]

Use local re2c (in '$(top_bulddir)') rather than system re2c for 'make bootstrap'.

Correct behaviour was broken by commit 38f526d04415adb7b5e6bca228fc26409833f5c3.

commit | commitdiff | tree

Ulya Trofimovich [Fri, 27 Nov 2015 13:41:42 +0000 (13:41 +0000)]

Don't use 'operator <<' overloads with integral types: resolution is platform-dependent.

See bug #122 "clang does not compile re2c 0.15.x".

Example of error on Mac OS X:
    src/codegen/emit_dfa.cc:250:65: error: use of overloaded operator '<<' is ambiguous (with operand types 're2c::OutputFile' and 'const size_t'
          (aka 'const unsigned long'))
            o << indent(ind++) << "static void *" << opts->yyctable << "[" << conds << "] = {\n";
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~~~~
    ./src/codegen/output.h:84:22: note: candidate function
            friend OutputFile & operator << (OutputFile & o, char c);
                                ^
    ./src/codegen/output.h:85:22: note: candidate function
            friend OutputFile & operator << (OutputFile & o, uint32_t n);
                                ^
    ./src/codegen/output.h:86:22: note: candidate function
            friend OutputFile & operator << (OutputFile & o, uint64_t n);
                            ^

On OS X 'size_t' is neither 'uint32_t' nor 'uint64_t', resolution is therefore ambiguous.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 23 Nov 2015 21:20:12 +0000 (21:20 +0000)]

Release 0.15.2.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 23 Nov 2015 21:15:38 +0000 (21:15 +0000)]

Prepare release 0-15.2: updated CHANGELOG.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 23 Nov 2015 21:11:19 +0000 (21:11 +0000)]

Makefile.am: lexer dependends on bison-generated parser; fixed rule order.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 22 Nov 2015 21:03:29 +0000 (21:03 +0000)]

Release 0.15.1.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 22 Nov 2015 20:59:29 +0000 (20:59 +0000)]

Prepare release 0.15.1: updated CHANGELOG.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 22 Nov 2015 20:55:04 +0000 (20:55 +0000)]

run_tests.sh: fix the order of files in test results.

'sort' behavior depends on current locale; set 'LC_ALL=C LANG=C'
before doing locale-sensitive things. Updated test results.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 22 Nov 2015 20:50:15 +0000 (20:50 +0000)]

release.sh: don't forget to push tags.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 22 Nov 2015 19:53:04 +0000 (19:53 +0000)]

Release 0.15.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 22 Nov 2015 19:48:37 +0000 (19:48 +0000)]

Prepare release 0.15: updated release instructions.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 22 Nov 2015 19:46:45 +0000 (19:46 +0000)]

Prepare release 0.15: updated CHANGELOG.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 22 Nov 2015 19:42:21 +0000 (19:42 +0000)]

Use 'rst2man.py' to build manpage; updated manpage.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 21 Nov 2015 20:03:10 +0000 (20:03 +0000)]

Merge branch 'master' into simplified_codegen.

* master:
  Updated version to 0.14.4.dev
  Release 0.14.3.
  Added simple test for yacc-style brackets (see patch #27)
  Fixed '#27 re2c crashes reading files containing %{ %}' (patch by Rui)
  Makefile.am: dropped distfiles for MSVC (they are broken anyway)
  Added full another test for bug #57.
  Updated version to 0.14.3.dev
  Release 0.14.2.
  Fixed bug #57: Wrong result only if another rule is present
  Updated version to 0.14.2.dev
  Release 0.14.1.
  Pad version with '0' instead of nulls

commit | commitdiff | tree

Ulya Trofimovich [Wed, 18 Nov 2015 14:45:49 +0000 (14:45 +0000)]

Skeleton: data generation (linear): don't forget to dump path in end nodes.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 16 Nov 2015 14:48:53 +0000 (14:48 +0000)]

Skeleton: changed formatting of the generated code (no significant changes).

commit | commitdiff | tree

Ulya Trofimovich [Mon, 16 Nov 2015 14:10:49 +0000 (14:10 +0000)]

Skeleton: disregard default rule when estimating maximum rule size (in bytes).

Default rule '*' (not to be confused with 'none' rule) used to have
normal number just like other rules. Now that re2c has to distinguish
default rule fro other rules (because of [-Wunreachable-rules]),
it reserves a special number (UINT32_MAX - 1) for it.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 10 Nov 2015 15:28:28 +0000 (15:28 +0000)]

Lex strings and character classes in a more elegant way.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 9 Nov 2015 16:06:40 +0000 (16:06 +0000)]

Recognize escaped dash '\-' in character class.

commit | commitdiff | tree

Ulya Trofimovich [Fri, 16 Oct 2015 12:21:50 +0000 (13:21 +0100)]

Fixed tests for bug #119: "-f with -b/-g generates incorrect dispatch on fill labels".

Somehow configuration 're2c:state:abort = 1;' was present in all the
tests; it was meant to be only in half of them.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 15 Oct 2015 13:54:17 +0000 (14:54 +0100)]

run_tests.sh: tried to clarify regexp that splits options from filename.

Note: should keep to POSIX, so no '+' or '?' is allowed.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 14 Oct 2015 22:11:01 +0000 (23:11 +0100)]

run_tests.sh: run each test in a separate directory and paste all generated files into one.

Updated tests: changes are insignificant (the order in which multiple
generated files are concatenated has changed).

commit | commitdiff | tree

Ulya Trofimovich [Wed, 14 Oct 2015 14:09:41 +0000 (15:09 +0100)]

run_tests.sh: don't change filenames to '<stdout>'.

Updated test. Used the following shell script to validate changes:

    #!/bin/bash

    for f2 in *.temp
    do
        f1=${f2%.temp}

        diff1=`diff $f1 $f2 | grep '^< ' | wc -l`
        diff1_fname=`diff $f1 $f2 | grep '^<$ #line [0-9]\+ "<stdout>"\|[ ]\+("<stdout>[^"]\+"$' | wc -l`
        diff2=`diff $f1 $f2 | grep '^> ' | wc -l`
        diff2_fname=`diff $f1 $f2 | grep '^>$ #line [0-9]\+ "[^"]\+"\|[ ]\+("[^"]\+"$' | wc -l`

        # missing: only changed filenames
        [ $diff1 -ne $diff1_fname ] && echo "FAIL1: $f1" && exit 1

        # added: only changed filenames
        [ $diff2 -ne $diff2_fname ] && echo "FAIL2: $f1" && exit 1

        # the number of missing changed filenames
        # equals to the number of added changed filenames
        [ $diff1_fname -ne $diff2_fname ] && echo "FAIL4: $f1" && exit 1
    done

    echo "OK"

commit | commitdiff | tree

Ulya Trofimovich [Wed, 14 Oct 2015 13:19:55 +0000 (14:19 +0100)]

run_tests.sh: paste type headers into source file and diff all at once.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 14 Oct 2015 12:04:19 +0000 (13:04 +0100)]

run_tests.sh: use '-o' option. Added tests for '--skeleton' option.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 13 Oct 2015 20:20:22 +0000 (21:20 +0100)]

Omit unnecessary null pointer check (suggested by Markus Elfring).

commit | commitdiff | tree

Ulya Trofimovich [Tue, 13 Oct 2015 14:36:44 +0000 (15:36 +0100)]

run_tests.sh: added option '--keep-tmp-files'.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 13 Oct 2015 13:26:33 +0000 (14:26 +0100)]

run_tests.sh: added '--skeleton' option.

With this option script runs re2c with '--skeleton' and
'-Werror-undefined-control-flow' and instead of comparing results with
reference test results, it compiles the generated skeleton programs and
runs them. If C compiler or binary return nonzero error status, script
reports an error. Note that cases when re2c failed to generate code are
not considered errors (re2c has lots of test cases for its errors).

commit | commitdiff | tree

Ulya Trofimovich [Mon, 12 Oct 2015 14:14:16 +0000 (15:14 +0100)]

Split main lexer and configuration lexer in two separate files.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 12 Oct 2015 13:12:11 +0000 (14:12 +0100)]

Factored out some common lexing pieces into separate routines.

re2c lacks submatch extraction; it would be much more convenient
to memorize input positions for some parts of regular expressions
than break each regexp in the middle and move parts to separate blocks.

Submatch extraction is dificult to implement in general, but supporting
submatch in some simple cases (like the case where trailing context is
allowed) would be not so difficult and most helpful.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 12 Oct 2015 12:32:45 +0000 (13:32 +0100)]

Parse inplace configurations in lexer; don't pass them to parser.

This removes a lot of copy-pasting.

The change of error location in test is insignificant: the reported
location was incorrect and it still remains imprecise.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 8 Oct 2015 13:38:19 +0000 (14:38 +0100)]

Improved '-Wmatch-empty-string' warning.

- recognize empty match with nonempty trailing context
- don't report unreachable empty match

commit | commitdiff | tree

Ulya Trofimovich [Thu, 8 Oct 2015 09:51:08 +0000 (10:51 +0100)]

Added '-Wunreachable-rules' warning.

Warns about unreachable rules:
  - rules that are shadowed by other rules, e.g. rule '[a]' is shadowed by
    '[a] [^]'
  - infinite rules that consume infinitely many characters and fail on
    YYFILL, e.g. '[^]*'
  - rules that contain never-matching link, e.g. '[]' with option
    '--empty-class match-none'
default rule '*' should not be reported

commit | commitdiff | tree

Ulya Trofimovich [Wed, 7 Oct 2015 17:30:19 +0000 (18:30 +0100)]

Fixed memleaks and grouped options in one big macro.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 7 Oct 2015 15:20:38 +0000 (16:20 +0100)]

Merge default rules on the fly, assign them the same lowest priority.

re2c used to postpone merging default rules because rank counter could
only assign consequtive ranks to rules, and default rules must have
the lowest priority. Now rank counter has been modified to return
special value as defult rule rank.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 5 Oct 2015 20:20:23 +0000 (21:20 +0100)]

Autogenerated configuration tests: added default rule to each test.

It's not a bunch of unnecessary warnings I want to avoid, it's a bunch of
unnecessary runtime failures in programs generated with '--skeleton'
(failures caused by undefined control flow; re2c recogizes such cases
and the generated program reports a warning before failing).

commit | commitdiff | tree

Ulya Trofimovich [Mon, 5 Oct 2015 14:44:50 +0000 (15:44 +0100)]

Support trailing context with '--skeleton'.

Trialing contexts are currently broken (overlapping trailing contexts
cannot be tracked with a single 'YYCTXMARKER'). For now, re2c with
'--skeleton' mimics this incorrect behaviour: information about context
is lost by the time DFA is constructed, so skeleton has no way to
figure out the right order of things.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 4 Oct 2015 18:46:34 +0000 (19:46 +0100)]

Moved path-combining magic closer to path definition.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 4 Oct 2015 18:25:00 +0000 (19:25 +0100)]

Fixed bug #116: "empty string with non-empty trailing context consumes code units".

Prior to this commit backup of trailing context position was done
before advancing input position and re2c either had to emit
    YYCTXMARKER = YYCURSOR + 1;
(with default input API), or
    YYRESTORECTX ();
    YYSKIP ();
(with custom input API).

The problem is that sometimes initial state doesn't sdvance input position
at all. Now re2c emits context backup after advancing input position and it
no longer needs '+1' or 'YYSKIP' hacks. It always backups the correct position.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 30 Sep 2015 16:33:58 +0000 (17:33 +0100)]

'--skeleton': don't forget to jump to start label when needed.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 30 Sep 2015 16:04:29 +0000 (17:04 +0100)]

'--skeleton': give more info when reporting unused data and keys.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 30 Sep 2015 15:12:27 +0000 (16:12 +0100)]

'--skeleton': fixed codegen error with '-b' (don't forget last bitmap element).

commit | commitdiff | tree

Ulya Trofimovich [Wed, 30 Sep 2015 15:11:45 +0000 (16:11 +0100)]

'--skeleton': added missing newline in the generated code.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 30 Sep 2015 15:10:38 +0000 (16:10 +0100)]

'--skeleton': tell function name when reporting errors and warnings.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 29 Sep 2015 15:47:25 +0000 (16:47 +0100)]

'--skeleton': respect empty string match.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 29 Sep 2015 15:04:27 +0000 (16:04 +0100)]

Fixed skeleton generation in '-r' mode.

'-r' is different from normal mode in two aspects:
    - single DFA may be used multiple times (unchanged, we only
      need a single copy for skeleton)
    - DFA may be generated but not used at all

commit | commitdiff | tree

Ulya Trofimovich [Mon, 28 Sep 2015 21:34:06 +0000 (22:34 +0100)]

Split skeleton arc count limits for permutations, cover and default paths.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 28 Sep 2015 14:30:20 +0000 (15:30 +0100)]

Docs: updated descriptions of some inplace configurations.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 28 Sep 2015 12:54:26 +0000 (13:54 +0100)]

Unified meaning and mutual relations of some inplace configurations.

This commit changes the behaviour of three groups of options:

    re2c:define:YYSETCONDITION
    re2c:define:YYSETCONDITION@cond
    re2c:define:YYSETCONDITION:naked

    re2c:define:YYSETSTATE
    re2c:define:YYSETSTATE@state
    re2c:define:YYSETSTATE:naked (added by this commit)

    re2c:define:YYFILL
    re2c:define:YYFILL@len
    re2c:yyfill:parameter
    re2c:define:YYFILL:naked

The changes should be backwards compatible (meaning that old code that
compiled should still compile), but it may add empty statements or statements
with no effect for some configurations, e.g.:
    YYSETCONDTITION(0);(0);
These changes were necessary to unify re2c behaviour, remove counter-intuitive
cases and make it possible to write comprehensible option descriptions.

In short, the changes are:
    - 'naked' triggers generation of argument-in-braces and semicolon;
    - 'parameter' triggers generation of argument-in-braces (when applicable,
      'naked' has priority over 'parameter');
    - argument templates ('@cond', '@state', '@len') don't force other
      configurations, they also don't influence on argument-in-braces;

Added test generator and autogenerated tests.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 28 Sep 2015 12:49:05 +0000 (13:49 +0100)]

run_tests.sh: preserve nested directory structure when dumping errors.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 27 Sep 2015 11:03:03 +0000 (12:03 +0100)]

Don't hang forever trying to replace empty configuration arguments.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 27 Sep 2015 10:58:40 +0000 (11:58 +0100)]

Default options should be syncronized as well.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 24 Sep 2015 14:21:45 +0000 (15:21 +0100)]

Reduced redundant global flags.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 24 Sep 2015 13:52:24 +0000 (14:52 +0100)]

Handle all inplace configurations in a uniform way.

This commit removes check (and error) for overwritten configurations
(like setting 're2c:define:YYCYRSOR' twice in the same block).
This check was in principle useful, but it was applied to somehow
randomly chosen set of parameters. If in future we'll feel a need
for such check, it should respect all options equally and report
warning rather than error.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 23 Sep 2015 21:08:39 +0000 (22:08 +0100)]

Automatically resync options on read acccess (if they have been updated).

commit | commitdiff | tree

Ulya Trofimovich [Wed, 23 Sep 2015 16:32:36 +0000 (17:32 +0100)]

Merged 'DFlag' and 'flag_skeleton' into one option 'target'.

The nature of these options makes them mutually exclusive; so instead
of checking that they are not both set just make them a single option.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 23 Sep 2015 14:45:54 +0000 (15:45 +0100)]

Separated user config and effective config.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 22 Sep 2015 12:15:31 +0000 (13:15 +0100)]

Prepare to separate user config and effective config.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 21 Sep 2015 21:14:45 +0000 (22:14 +0100)]

Keep name table together with other options.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 21 Sep 2015 20:50:55 +0000 (21:50 +0100)]

Grouped options together in a struct.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 17 Sep 2015 14:16:38 +0000 (15:16 +0100)]

Documentation: added warning descriptions to manpage and online manual.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 17 Sep 2015 14:03:24 +0000 (15:03 +0100)]

Documentation: added warning descriptions to '-h, -?, --help' option.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 17 Sep 2015 12:55:46 +0000 (13:55 +0100)]

Removed unused method declaration.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 17 Sep 2015 12:52:09 +0000 (13:52 +0100)]

Gather some DFA statistics and use it to omit unused code with '--skeleton'.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 17 Sep 2015 09:53:35 +0000 (10:53 +0100)]

Omit usseless 'yyaccept' variable in '--skeleton' programs.

Normally re2c generates single 'yyaccept' variable for all conditions.
With '--skeleton' re2c handles conditions separately, so each condition
needs (or needs not) its own 'yyaccept'.

Prior to this commit re2c used the same criterion to determine if
'yyaccept' is needed with '--skeleton' as it uses generally: whether
'yyaccept' was used in any of conditions. Now re2c looks if 'yyaccept'
was used with this particular condition.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 17 Sep 2015 09:27:57 +0000 (10:27 +0100)]

Generate better code with '--skeleton': C90-compliant, free resources on errors.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 16 Sep 2015 11:19:28 +0000 (12:19 +0100)]

Check 'fread' return value in program generated with '--skeleton'.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 16 Sep 2015 10:25:29 +0000 (11:25 +0100)]

Support '--skeleton' with conditions and multiple re2c blocks.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 15 Sep 2015 12:35:27 +0000 (13:35 +0100)]

Make 'filesize' function for '--skeleton' preserve original file position.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 15 Sep 2015 12:08:10 +0000 (13:08 +0100)]

Fixed MINGW builds where 'sizeof (int)' is equal to 'sizeof (long)'.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 15 Sep 2015 10:51:36 +0000 (11:51 +0100)]

Changed '-Wcondition-order' to warn even if 'YYSETCONDITION' is used.

Tests 'condtype_yysetcondition.c{s,g}.re' show the reason why I changed
how '-Wcondition-order' works in presence of 'YYSETCONDITION' calls:
programs generated from these tests work differently depending on
condition numbering. Explicit use of condition names cannot guarantee
that these explicit names were generated by re2c (and not hardcoded as
in these examples).

commit | commitdiff | tree

Ulya Trofimovich [Tue, 15 Sep 2015 09:45:19 +0000 (10:45 +0100)]

More accurate handling of default rule for '--skeleton'.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 14 Sep 2015 22:11:51 +0000 (23:11 +0100)]

The generated '--skeleton' program now warns about undefined control flow.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 14 Sep 2015 14:28:52 +0000 (15:28 +0100)]

Fixed error in calculation of maximal path length in skeleton.

commit | commitdiff | tree

Ulya Trofimovich [Mon, 14 Sep 2015 12:19:15 +0000 (13:19 +0100)]

Compacted keys representation (with '--skeleton').

Determine maximal path length and maximal rule number while constructing
skeleton; take maximim of these two values; choose unsigned integer type
of minimal width capable of holding maximim.

Note: re2c operates on exact-width integers, but the generated program
doesn't (it might not have <stdint.h>). When generating the program,
re2c choses one of unsigned 'char', 'short', 'int' and 'long' types
(that one 'sizeof' which is equal to the disired key size). re2c makes
some implicit assumptions (generated program is run on the same platform
as re2c, byte consists of 8 bits, etc.). Perhaps re2c should hardcode
these assumptions in the generated program and check them on start.

commit | commitdiff | tree

Ulya Trofimovich [Fri, 11 Sep 2015 17:48:35 +0000 (18:48 +0100)]

Store keys for '--skeleton' in binary.

A single key is formed of three values:
    1. the length of string
    2. the length of matched part of string
    3. the number of matched rule
All these values are guaranteed to fit 32 bits, so for now we just
dump them as 'uint32_t' and read as 'unsigned int'. re2c asserts that
'sizeof (uint32_t) == sizeof (unsigned int)'.

Avoid structs, as they cause padding issues.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 9 Sep 2015 21:43:31 +0000 (22:43 +0100)]

Estimate maximal path length in skeleton and abort if it overflows.

Maximal skeleton path length is a bit different from YYMAXFILL:
it assumes that loops are iterated once (unlike YYMAXFILL calculation,
which disregards loops) and returns zero for empty regexp.

We need to know it in order:
- to be sure it won't overflow
- to store keys in a compact form (yet to be done)

This commit also makes DFA and skeleton store condition name and
source file line corresponding to current condition: it gets quite
annoying to pass these things around. This change caused another
change of test results (line numbers in error messages changed
for tests that use '-r' and reuse old DFA (don't reconstruct DFA
in 'use:re2c' blocks).

commit | commitdiff | tree

Ulya Trofimovich [Wed, 9 Sep 2015 15:26:28 +0000 (16:26 +0100)]

Fixed memleaks (skeleton nodes were not destructed properly).

Found with 'make vtests'

commit | commitdiff | tree

Ulya Trofimovich [Wed, 9 Sep 2015 14:30:09 +0000 (15:30 +0100)]

Make skeleton a part of DFA.

This let us create skeletom right after DFA creation (but befor DFA
has been mangled in different ways), but call skeleton methods any time.

Undefined control flow is now checked at the time of real code generation,
that's why all those tests that use '-r' changed: re2c stopped reporting
'rules:re2c' blocks and reports 'use:re2c' blocks instead.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 9 Sep 2015 12:09:25 +0000 (13:09 +0100)]

Suffixes of skeleton end nodes should be initialized by algorithm that uses them.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 8 Sep 2015 16:57:06 +0000 (17:57 +0100)]

Renamed function.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 8 Sep 2015 16:33:34 +0000 (17:33 +0100)]

A more logical way to update rules when constructing skeleton paths.

There's no need to keep rule one step behind path's arcs and update
it manually. Also, it is reasonable to set rule in constructor.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 8 Sep 2015 14:14:09 +0000 (15:14 +0100)]

Reduced the time of path generation with '--skeleton'.

The algorithm now stores partially constructed paths in a compact
form and delays expansion until it reaches the end of current
branch of recursion. This has several advantages:
    - no need to store large structures in memory
    - write data to file in large chunks
    - path expansion is faster than step-by-step construction

Speedup is actually tenfold (but the way keys are dumped to file is
still not optimized and spoils benchmarks). Real speedup can be
observed on such files as:
    test/php20150211_zend_ini_scanner_trimmed.icF.re
, which cause ~5Gb dumps. Time has gone down: 8.5m -> 2.5m and will
be furter reduced to ~1m when key dumps are fixed. This will result
in generation speed ~20Mb/s which is quite good.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 8 Sep 2015 11:06:52 +0000 (12:06 +0100)]

Renamed and restructured various kinds of skeleton paths.

commit | commitdiff | tree

Ulya Trofimovich [Sun, 6 Sep 2015 21:22:12 +0000 (22:22 +0100)]

Fixed eternal loop in path cover generation algorithm for '--skeleton'.

The simplest example I was able to come up with that reveals eternal
loop is the following:
    /*!re2c
        ( [^acf] | "0b" | "a"[^] | "a0"[^] )+ {}
    */
The problem was caused by my assumption that from any node there is
at least one non-looping path to end node. The assumption is true;
what I didn't take into account was that all such paths may go via
nodes that have already occured twice on the way to current node
(their loop counter is greater than 1). In this case the algorithm
would find no path to end node. Since not all prefixes would have
been covered (exactly none of them) the algorithm would loop forever.

Such branch may be abandoned safely: the algorithm will later find
another path to current state without loops.

As soon as I realized the problem the fix was trivial: if all outgoing
arcs have been exhausted and none of them yielded any results, abandon
current branch.

commit | commitdiff | tree

Ulya Trofimovich [Sat, 5 Sep 2015 08:39:28 +0000 (09:39 +0100)]

With '--skeleton', store input data in binary form (rather than C/C++ code).

There's a limitation on the size of input files for C/C++ compiler and
the compiled binary will have to contain all that data (and thus may grow
very large).

Storing data in binary form and reading it from file dynamically is
the way it should be.

commit | commitdiff | tree

Ulya Trofimovich [Fri, 4 Sep 2015 14:07:06 +0000 (15:07 +0100)]

With '-skeleton', dump data to file immediately as it is generated.

This way re2c won't consume memory even on large inputs.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 3 Sep 2015 14:18:45 +0000 (15:18 +0100)]

Changed '.keys' file format (generated with '--skeleton').

Store length of strings instead of pointers to string start and end.

Storing pointers requires us to remember total length of strings already
written to file by the time we want to write next string. This is very
inconvenient if we want to dump strings as we perform DFS on graph:
we'll have to track size all the time (path-cover-generator already does
that, but all-paths-generator doesn't).

This adds a new local variable to the generated code ('token'), which
is used to backup cursor position when enterind DFA.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 3 Sep 2015 12:26:47 +0000 (13:26 +0100)]

Split ".data" files (generated with '--skeleton') into two parts.

This is necessary to dump generated data as soon as possible instead
of keeping it until all data has been generated: we generate input
strings and keys simultaneously, but have to write all input strings
at once as one big string. Keys alone occupy lots of space, so
keeping only keys instead of keys and strings won't help.

commit | commitdiff | tree

Ulya Trofimovich [Thu, 3 Sep 2015 11:33:02 +0000 (12:33 +0100)]

Combined path cover generation with path cover size estimation.

We don't have any fallback algorithm in case path cover is too large.
We want to generate some paths anyway, so we have to construct path cover.
We shouldn't generate arbitrary large amounts of data.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 2 Sep 2015 12:28:48 +0000 (13:28 +0100)]

Omit some highly unlikely conditional exits from deep-first search.

With '--skeleton' re2c builds DFA skeleton and performs DFS in order to
estimate the size of data to be generated. It maintains size counter: as
soon as the counter reaches certain limit, DFS should stop.

Size counter is always checked when recursion returns.
Sometimes it is clear that size counter will overflow upon recursion
return (when arguments overflow already) and DFS can exit early (before
entering recursion).

This commit omits checks of some arguments (and correponding early exits
from DFS): first, path length is very unlikely to overflow (one has to
write/generate a regular expression with length of ~1Gb, in which case
skeleton generation won't be the worst problem); second, the number of
outgoing arcs in each vertex is also highly unlikely to exceed 1Gb limit.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 2 Sep 2015 12:11:36 +0000 (13:11 +0100)]

Split large source file into smaller files with distinct functionality.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 2 Sep 2015 11:39:47 +0000 (12:39 +0100)]

Narrowed the scope of ".data" file.

commit | commitdiff | tree

Ulya Trofimovich [Wed, 2 Sep 2015 10:44:25 +0000 (11:44 +0100)]

Hid internals of skeleton paths under construction.

commit | commitdiff | tree

Ulya Trofimovich [Tue, 1 Sep 2015 14:27:48 +0000 (15:27 +0100)]

Renamed and fixed warning about undefined control flow in generated lexer.

Renamed '-Wnaked-default' to '-Wundefined-control-flow': the latter sounds
much scarier. :D

Completely changed the algorithm that is used to determine if default case
is not handled properly. Prior to this commit a simple and incorrent criterion
was used: whether there are code units that (alone) do not match any rule.
This gived false positives in cases like this:
[^] [^] { rule }
here all code units meet the criterion: no single code unit matches a rule.
But obviously, default case is handled properly, because any input string
matches 'rule' (strictly speaking, any input string of length 2 or more, but
that's YYFILL's problem).

The new algorithm is more complex (in terms of time and space), yet it is
less heuristic: re2c parforms exhaustive deep-first-search on graph skeleton
and collects all bad paths.

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom