]> granicus.if.org Git - re2c/log
re2c
5 years agoAdded option --input-encoding <ascii | utf8> that allows to use UTF-8 literals in...
Ulya Trofimovich [Fri, 24 May 2019 12:15:07 +0000 (13:15 +0100)]
Added option --input-encoding <ascii | utf8> that allows to use UTF-8 literals in regular expressions.

5 years agoAllow to mix multiple /*!rules:re2c*/, /*!use:re2c*/ and /*!re2c*/ blocks in -r mode.
Ulya Trofimovich [Fri, 24 May 2019 11:04:47 +0000 (12:04 +0100)]
Allow to mix multiple /*!rules:re2c*/, /*!use:re2c*/ and /*!re2c*/ blocks in -r mode.

5 years agolibre2c: added GOR1 option for lazy disambiguation algorithm.
Ulya Trofimovich [Sat, 18 May 2019 18:07:57 +0000 (19:07 +0100)]
libre2c: added GOR1 option for lazy disambiguation algorithm.

5 years agolibre2c: do not include benchmark in 'make check' programs.
Ulya Trofimovich [Thu, 9 May 2019 21:38:12 +0000 (22:38 +0100)]
libre2c: do not include benchmark in 'make check' programs.

5 years agolibre2c: added forgotten benchmark data samples.
Ulya Trofimovich [Thu, 9 May 2019 20:44:42 +0000 (21:44 +0100)]
libre2c: added forgotten benchmark data samples.

5 years agolibre2c: added missing include.
Ulya Trofimovich [Thu, 9 May 2019 20:29:53 +0000 (21:29 +0100)]
libre2c: added missing include.

5 years agolibre2c: updated benchmark.
Ulya Trofimovich [Thu, 9 May 2019 17:30:32 +0000 (18:30 +0100)]
libre2c: updated benchmark.

5 years agolibre2c: extended lexer to handle some escape sequences in charachter classes.
Ulya Trofimovich [Wed, 8 May 2019 13:31:14 +0000 (14:31 +0100)]
libre2c: extended lexer to handle some escape sequences in charachter classes.

5 years agoInlined closure cleanup in the pruning procedure to further speedup determinization.
Ulya Trofimovich [Wed, 8 May 2019 10:01:48 +0000 (11:01 +0100)]
Inlined closure cleanup in the pruning procedure to further speedup determinization.

5 years agoMake unreachable rule analysis conditional to further sppedup closure pruning.
Ulya Trofimovich [Wed, 8 May 2019 09:54:00 +0000 (10:54 +0100)]
Make unreachable rule analysis conditional to further sppedup closure pruning.

5 years agoRewrite of closure pruning function to speedup determinization on large automata.
Ulya Trofimovich [Wed, 8 May 2019 09:44:25 +0000 (10:44 +0100)]
Rewrite of closure pruning function to speedup determinization on large automata.

5 years agoHash in 4-byte chunks to speedup determinization on large automata.
Ulya Trofimovich [Wed, 8 May 2019 08:57:59 +0000 (09:57 +0100)]
Hash in 4-byte chunks to speedup determinization on large automata.

5 years agoUse correct formula to find the next aligned address.
Ulya Trofimovich [Wed, 8 May 2019 06:35:42 +0000 (07:35 +0100)]
Use correct formula to find the next aligned address.

The previous formula worked incorrectly for already aligned addresses
(it added alignment to them).

5 years agoPaper: updated "TNFA construction" section.
Ulya Trofimovich [Mon, 8 Apr 2019 12:21:50 +0000 (13:21 +0100)]
Paper: updated "TNFA construction" section.

5 years agoconfigure.ac: use AC_USE_SYSTEM_EXTENSIONS, as --std=c++98 disables POSIX extensions...
Ulya Trofimovich [Sat, 6 Apr 2019 10:16:47 +0000 (11:16 +0100)]
configure.ac: use AC_USE_SYSTEM_EXTENSIONS, as --std=c++98 disables POSIX extensions on Cygwin.

Attempted fix for #247.

5 years agoMakefile.am: reflected the rename of README to README.md.
Ulya Trofimovich [Sat, 6 Apr 2019 09:33:20 +0000 (10:33 +0100)]
Makefile.am: reflected the rename of README to README.md.

This fixes bug #248.

5 years agoPaper: updated sections about representation of paths and match results.
Ulya Trofimovich [Fri, 5 Apr 2019 16:21:30 +0000 (17:21 +0100)]
Paper: updated sections about representation of paths and match results.

5 years agoPaper: rearranged "epsilon-closure" section.
Ulya Trofimovich [Fri, 5 Apr 2019 09:34:03 +0000 (10:34 +0100)]
Paper: rearranged "epsilon-closure" section.

5 years agoPaper: updated pseudocode for precedence procedures.
Ulya Trofimovich [Tue, 2 Apr 2019 12:48:52 +0000 (13:48 +0100)]
Paper: updated pseudocode for precedence procedures.

5 years agoPaper: restructured "formalization" section.
Ulya Trofimovich [Mon, 1 Apr 2019 21:49:50 +0000 (22:49 +0100)]
Paper: restructured "formalization" section.

5 years agoPaper: wrote "main idea" section.
Ulya Trofimovich [Mon, 1 Apr 2019 13:48:48 +0000 (14:48 +0100)]
Paper: wrote "main idea" section.

5 years agolibre2c: restructured main loop to avoid double call to closure.
Ulya Trofimovich [Sat, 30 Mar 2019 09:59:50 +0000 (09:59 +0000)]
libre2c: restructured main loop to avoid double call to closure.

5 years agoPaper: wrote the "introduction" section.
Ulya Trofimovich [Fri, 29 Mar 2019 09:37:38 +0000 (09:37 +0000)]
Paper: wrote the "introduction" section.

5 years agoFixed compilation error caused by extra namespace qualification (second attempt).
Ulya Trofimovich [Tue, 26 Mar 2019 18:22:42 +0000 (18:22 +0000)]
Fixed compilation error caused by extra namespace qualification (second attempt).

5 years agoFixed compilation error caused by extra namespace qualification.
Ulya Trofimovich [Tue, 26 Mar 2019 18:18:34 +0000 (18:18 +0000)]
Fixed compilation error caused by extra namespace qualification.

5 years agolibre2c: implemented Kuklewicz disambiguation algorithm for POSIX TNFA matching.
Ulya Trofimovich [Tue, 26 Mar 2019 17:40:51 +0000 (17:40 +0000)]
libre2c: implemented Kuklewicz disambiguation algorithm for POSIX TNFA matching.

5 years agolibre2c: free tag history after each step (it's no longer needed).
Ulya Trofimovich [Mon, 25 Mar 2019 16:40:04 +0000 (16:40 +0000)]
libre2c: free tag history after each step (it's no longer needed).

Seems to be a copy-paste error from TDFA implementation, where history
for each state is needed until the end of determinization.

5 years agoParameterize determinization/simulation context directly by history type.
Ulya Trofimovich [Mon, 25 Mar 2019 14:32:39 +0000 (14:32 +0000)]
Parameterize determinization/simulation context directly by history type.

5 years agoAdded logs with test failures of backward matching algorithm.
Ulya Trofimovich [Mon, 25 Mar 2019 07:26:43 +0000 (07:26 +0000)]
Added logs with test failures of backward matching algorithm.

5 years agolibre2c: added Cox backward matching algorithm (incorrect, but interesting).
Ulya Trofimovich [Sun, 24 Mar 2019 10:14:54 +0000 (10:14 +0000)]
libre2c: added Cox backward matching algorithm (incorrect, but interesting).

5 years agoFictive tags for right alt/cat must be added after adding all tags for left alt/cat.
Ulya Trofimovich [Sun, 24 Mar 2019 09:31:56 +0000 (09:31 +0000)]
Fictive tags for right alt/cat must be added after adding all tags for left alt/cat.

Previous order was incorrect: fictive tags for both left and right
alternative or catenation were added before recursing into the right
alternative or catenation. This resulted in right fictive tags habving
higher precedence than left non-fictive tags.

5 years agoDebug: don't print NFA in-degree in each node, it takes too much space.
Ulya Trofimovich [Sun, 24 Mar 2019 09:28:59 +0000 (09:28 +0000)]
Debug: don't print NFA in-degree in each node, it takes too much space.

5 years agolibre2c: added POSIX tests that show why closure can't use DFS and needs worst-case...
Ulya Trofimovich [Sat, 16 Mar 2019 23:45:48 +0000 (23:45 +0000)]
libre2c: added POSIX tests that show why closure can't use DFS and needs worst-case quadratic shortest path algorithm.

In particular, test ((((a?)+)|(aa))+) and string "aaa" demonstrates why
we cannot replace SSSP (GOR1 or GTOP) with a liner-time DFS that sets
predecessor chains of TNFA states and sorts initial closure configurations
according to POSIX relation imposed by precedence matrix.

Consider what happens after consuming the first to 'a's (see the .dot
encoded TNFA below):

- initial states are 8, 14 and 9 (in the order imposed by D-matrix)

- start DFS from 8

- find path 8->7->6->5->4->3->2->22->21->11->10

- find path 8->7->6->5->4->3->2->22->21->20->19->18->17->16->15

- find path
  8->7->6->5->4->3->2->22->21->20->19->18->17->16->14->13->17
  compare it to the previous path to 17 and discard the new looping path

- find path
  8->7->6->5->4->3->2->22->21->20->19->18->17->16->14->13->12->3
  compare it to the previous path to 3 and discard the new looping path

- find path 8->7->6->5->4->3->2->1->0

- DFS from 8 complete, start DFS from 14

- find path 14, compare it to the previous path to 14 and accept the new
  one as it is better (but do not attempt to propagate the improvement
  and rescan states: this is DFS, not GOR1, and the whole idea is to
  avoid rescanning and have linear complexity)

- DFS from 14 complete, start DFS from 9

- find path 9

The end, and we have an incorrect path to 15: the long path through the
outer loop from 8, rather than the short path through the inner loop
from 14. This further leads to incorrect results.

digraph NFA {
  rankdir=LR
  node[shape=Mrecord fontname=Courier height=0.2 width=0.2]
  edge[arrowhead=vee fontname=Courier label=" "]

  n23 [label="23"]
  n23 -> n22 [label="/0&uarr;(1)"]
  n22 [label="22"]
  n22 -> n21 [label="/2&uarr;(2)"]
  n21 [label="21"]
  n21 -> n11
  n21 -> n20 [color=lightgray]
  n20 [label="20"]
  n20 -> n19 [label="/4&darr;(3)"]
  n19 [label="19"]
  n19 -> n18 [label="/5&darr;(2)"]
  n18 [label="18"]
  n18 -> n17 [label="/6&uarr;(3)"]
  n17 [label="17"]
  n17 -> n16 [label="/8&uarr;(4)"]
  n16 [label="16"]
  n16 -> n15
  n16 -> n14 [color=lightgray]
  n15 [label="15"]
  n15 -> n14 [label="97"]
  n14 [label="14"]
  n14 -> n13 [label="/9&uarr;(3)"]
  n13 [label="13"]
  n13 -> n17
  n13 -> n12 [color=lightgray]
  n12 [label="12"]
  n12 -> n3 [label="/7&uarr;(2)"]
  n11 [label="11"]
  n11 -> n10 [label="/4&uarr;(3)"]
  n10 [label="10"]
  n10 -> n9 [label="97"]
  n9 [label="9"]
  n9 -> n8 [label="97"]
  n8 [label="8"]
  n8 -> n7 [label="/5&uarr;(2)"]
  n7 [label="7"]
  n7 -> n6 [label="/6&darr;(3)"]
  n6 [label="6"]
  n6 -> n5 [label="/8&darr;(4)"]
  n5 [label="5"]
  n5 -> n4 [label="/9&darr;(3)"]
  n4 [label="4"]
  n4 -> n3 [label="/7&darr;(2)"]
  n3 [label="3"]
  n3 -> n2 [label="/3&uarr;(1)"]
  n2 [label="2"]
  n2 -> n22
  n2 -> n1 [color=lightgray]
  n1 [label="1"]
  n1 -> n0 [label="/1&uarr;(0)"]
  n0 [label="0"] [fillcolor=gray]
}

5 years agoUse state index in closure instead of core index, and don't keep core indices at...
Ulya Trofimovich [Fri, 8 Mar 2019 07:21:49 +0000 (07:21 +0000)]
Use state index in closure instead of core index, and don't keep core indices at all.

5 years agoAdded tests for errors in case of out-of-bounds EOF value.
Ulya Trofimovich [Thu, 7 Mar 2019 23:05:40 +0000 (23:05 +0000)]
Added tests for errors in case of out-of-bounds EOF value.

5 years agoGive a concise error message when EOF rule is present, but no other rules are.
Ulya Trofimovich [Thu, 7 Mar 2019 22:51:11 +0000 (22:51 +0000)]
Give a concise error message when EOF rule is present, but no other rules are.

5 years agoUpdated .gitignore.
Ulya Trofimovich [Thu, 7 Mar 2019 22:09:05 +0000 (22:09 +0000)]
Updated .gitignore.

5 years agoUpdating .travis.yml after global move.
Ulya Trofimovich [Thu, 7 Mar 2019 18:47:04 +0000 (18:47 +0000)]
Updating .travis.yml after global move.

5 years agoMoved re2c subdirectory to root directory and renamed old libre2c to libre2c_old.
Ulya Trofimovich [Thu, 7 Mar 2019 18:28:49 +0000 (18:28 +0000)]
Moved re2c subdirectory to root directory and renamed old libre2c to libre2c_old.

5 years agoMakefile.am: use wildcard instead of braces in macro expansion (the latter breaks...
Ulya Trofimovich [Thu, 7 Mar 2019 17:21:34 +0000 (17:21 +0000)]
Makefile.am: use wildcard instead of braces in macro expansion (the latter breaks on BSD).

5 years agoAvoid using ULLONG_MAX as it is non-standard prior to C++11.
Ulya Trofimovich [Thu, 7 Mar 2019 16:24:16 +0000 (16:24 +0000)]
Avoid using ULLONG_MAX as it is non-standard prior to C++11.

5 years agoHandle cases when rename() fails because destination file exists.
Ulya Trofimovich [Thu, 7 Mar 2019 13:08:51 +0000 (13:08 +0000)]
Handle cases when rename() fails because destination file exists.

In C/C++ rename() behaviour is implementation-defined. POSIX and Windows
have different behaviour when destination file exists: POSIX says rename()
should overwrite it, but Windows says rename() should fail.

5 years agoAdded more tests for EOF rule.
Ulya Trofimovich [Wed, 6 Mar 2019 23:28:29 +0000 (23:28 +0000)]
Added more tests for EOF rule.

5 years agoImproved label generation with EOF rule (removed unused and added missing labels).
Ulya Trofimovich [Wed, 6 Mar 2019 21:47:49 +0000 (21:47 +0000)]
Improved label generation with EOF rule (removed unused and added missing labels).

5 years agoAdded some tests for EOF rule.
Ulya Trofimovich [Wed, 6 Mar 2019 16:06:39 +0000 (16:06 +0000)]
Added some tests for EOF rule.

5 years agoFixed EOF rule handling in case when EOF symbol is the upper bound of charset.
Ulya Trofimovich [Wed, 6 Mar 2019 14:54:00 +0000 (14:54 +0000)]
Fixed EOF rule handling in case when EOF symbol is the upper bound of charset.

5 years agorun_tests.sh: ignore difference in trailing whitespace with --wine option.
Ulya Trofimovich [Wed, 6 Mar 2019 12:15:48 +0000 (12:15 +0000)]
run_tests.sh: ignore difference in trailing whitespace with --wine option.

5 years agoAdded test for newline conversion.
Ulya Trofimovich [Wed, 6 Mar 2019 12:05:25 +0000 (12:05 +0000)]
Added test for newline conversion.

5 years agoConsistently convert all newlines in the generated file to Unix-style LF.
Ulya Trofimovich [Wed, 6 Mar 2019 11:52:53 +0000 (11:52 +0000)]
Consistently convert all newlines in the generated file to Unix-style LF.

Some newlines originate in user-defined code (including semantic actions
and code fragments in configurations and directives), some are generated
by re2c itself. In order to maintain consistency we convert all newlines
to LF when writing output to file.

5 years agoNeed to include <fcntl.h> with MSVC to get definitions of _O_CREAT, _O_EXCL and _O_RDWR.
Ulya Trofimovich [Wed, 6 Mar 2019 08:43:53 +0000 (08:43 +0000)]
Need to include <fcntl.h> with MSVC to get definitions of _O_CREAT, _O_EXCL and _O_RDWR.

5 years agoAvoid warnings about uninitialized variables.
Ulya Trofimovich [Wed, 6 Mar 2019 07:12:06 +0000 (07:12 +0000)]
Avoid warnings about uninitialized variables.

5 years agoUse macro _MSC_VER instead of nonexistent _MSVC to detect Visual Studio.
Ulya Trofimovich [Wed, 6 Mar 2019 07:07:51 +0000 (07:07 +0000)]
Use macro _MSC_VER instead of nonexistent _MSVC to detect Visual Studio.

5 years agoDo not access vector element by reference that is invalidated due to reallocation.
Ulya Trofimovich [Tue, 5 Mar 2019 22:35:23 +0000 (22:35 +0000)]
Do not access vector element by reference that is invalidated due to reallocation.

5 years agoUse autoconf header detection to guard non-C++98 includes (POSIX file API).
Ulya Trofimovich [Tue, 5 Mar 2019 22:09:08 +0000 (22:09 +0000)]
Use autoconf header detection to guard non-C++98 includes (POSIX file API).

5 years agoUse something that Mingw understands instead of UINTMAX_MAX from <stdint.h>.
Ulya Trofimovich [Tue, 5 Mar 2019 17:40:42 +0000 (17:40 +0000)]
Use something that Mingw understands instead of UINTMAX_MAX from <stdint.h>.

5 years agoWrite output to temporary file and then rename it to output file atomically.
Ulya Trofimovich [Tue, 5 Mar 2019 17:24:34 +0000 (17:24 +0000)]
Write output to temporary file and then rename it to output file atomically.

This is more conveniet for build systems: if code generation is interrupted
for some reason, we don't end up with half-generated or empty output file.

Note that re2c there still can be discrepancy between the generated header
file and output file.

5 years agoUse C array instead of vector (of known size) to avoid iterator invalidation warnings.
Ulya Trofimovich [Tue, 5 Mar 2019 15:50:51 +0000 (15:50 +0000)]
Use C array instead of vector (of known size) to avoid iterator invalidation warnings.

5 years agoRespect platforms with 32-bit 'long' and 64-bit pointers.
Ulya Trofimovich [Tue, 5 Mar 2019 10:50:00 +0000 (10:50 +0000)]
Respect platforms with 32-bit 'long' and 64-bit pointers.

Previously used ~0U works incorrectly in such cases: ~0U equals
0xFFFFffff, and maximum (numerical) pointer value is 0xFFFFffffFFFFffff.
This caused incorrect comparison result when tracking include files that
have been fully processed and can be removed frm the include stack.

Problem report thanks to Denis Naumov.

5 years agoRenamed README to README.md to make github render it nicely.
Ulya Trofimovich [Mon, 4 Mar 2019 17:54:23 +0000 (17:54 +0000)]
Renamed README to README.md to make github render it nicely.

5 years agoPrettified README somewhat.
Ulya Trofimovich [Mon, 4 Mar 2019 17:40:15 +0000 (17:40 +0000)]
Prettified README somewhat.

5 years agoMoved LICENSE, README, NO_WARRANTY and CHANGELOG to the root directory.
Ulya Trofimovich [Mon, 4 Mar 2019 17:32:50 +0000 (17:32 +0000)]
Moved LICENSE, README, NO_WARRANTY and CHANGELOG to the root directory.

5 years agoAdded license file.
Ulya Trofimovich [Mon, 4 Mar 2019 17:25:04 +0000 (17:25 +0000)]
Added license file.

5 years agoDeduplicated leftmost greedy closure implementations in re2c and libre2c.
Ulya Trofimovich [Mon, 4 Mar 2019 17:14:49 +0000 (17:14 +0000)]
Deduplicated leftmost greedy closure implementations in re2c and libre2c.

5 years agoReuse second closure buffer as a stack for leftmost greedy closure.
Ulya Trofimovich [Mon, 4 Mar 2019 16:43:59 +0000 (16:43 +0000)]
Reuse second closure buffer as a stack for leftmost greedy closure.

5 years agoFurther deduplicated POSIX closure implementation in libre2c and re2c.
Ulya Trofimovich [Mon, 4 Mar 2019 15:52:16 +0000 (15:52 +0000)]
Further deduplicated POSIX closure implementation in libre2c and re2c.

5 years agoParameterized determinization/simulation context over history type.
Ulya Trofimovich [Mon, 4 Mar 2019 11:15:24 +0000 (11:15 +0000)]
Parameterized determinization/simulation context over history type.

5 years agolibre2c: parameterized context type over semantics (POSX/leftmost) and eval strategy...
Ulya Trofimovich [Sat, 2 Mar 2019 10:31:17 +0000 (10:31 +0000)]
libre2c: parameterized context type over semantics (POSX/leftmost) and eval strategy (lazy/strict).

5 years agoDeduplicated POSIX closure computation in re2c and libre2c.
Ulya Trofimovich [Sat, 2 Mar 2019 09:10:51 +0000 (09:10 +0000)]
Deduplicated POSIX closure computation in re2c and libre2c.

5 years agolibre2c: updated benchmark.
Ulya Trofimovich [Fri, 1 Mar 2019 22:40:23 +0000 (22:40 +0000)]
libre2c: updated benchmark.

5 years agoDeduplicated algorithm that computes POSIX precedence matrix in re2c and libre2c.
Ulya Trofimovich [Fri, 1 Mar 2019 22:15:34 +0000 (22:15 +0000)]
Deduplicated algorithm that computes POSIX precedence matrix in re2c and libre2c.

5 years agoDeduplicated tag history definitions used in re2c and libre2c.
Ulya Trofimovich [Fri, 1 Mar 2019 10:55:38 +0000 (10:55 +0000)]
Deduplicated tag history definitions used in re2c and libre2c.

5 years agoAvoid allocating new scratch buffer for each closure, reuse the same buffer.
Ulya Trofimovich [Thu, 28 Feb 2019 10:57:10 +0000 (10:57 +0000)]
Avoid allocating new scratch buffer for each closure, reuse the same buffer.

5 years agoExit early from loops in GOR1.
Ulya Trofimovich [Wed, 27 Feb 2019 11:46:55 +0000 (11:46 +0000)]
Exit early from loops in GOR1.

5 years agolibre2c: updated benchmark.
Ulya Trofimovich [Wed, 27 Feb 2019 11:28:33 +0000 (11:28 +0000)]
libre2c: updated benchmark.

5 years agolibre2c: use a more efficient algorithm for computing POSIX precedence matrix.
Ulya Trofimovich [Wed, 27 Feb 2019 10:50:07 +0000 (10:50 +0000)]
libre2c: use a more efficient algorithm for computing POSIX precedence matrix.

The old alrorithm was straightforward: it computed precedence for each
pair of core closure items. This has worst-case cubic complexity in the
size of TNFA, because closure can have O(N) states and tag history length
can be also O(N). One example is ((a?){1,1000})*.

The new algorithm avoids that by computing parecedence for all pairs of
core items in one traversal of the shortest path tree constructed by
closure. It avoids traversing deadend branches (those that were cancelled
by search because a better path was found) by marking used nodes of the
tree from each leaf down to the root. The new algorithm has worst-case
quadratic behaviour. It can be slightly slower for simple cases when the
number of core items is low and histories are short.

5 years agoMoved hot functions to header to enable inlining.
Ulya Trofimovich [Wed, 27 Feb 2019 10:44:15 +0000 (10:44 +0000)]
Moved hot functions to header to enable inlining.

5 years agolibre2c (benchmark): do not try to link with re2 if it's not available.
Ulya Trofimovich [Wed, 20 Feb 2019 17:29:03 +0000 (17:29 +0000)]
libre2c (benchmark): do not try to link with re2 if it's not available.

5 years agolibre2c: found pathological case for constant-memory POSIX algorithms.
Ulya Trofimovich [Wed, 20 Feb 2019 17:00:47 +0000 (17:00 +0000)]
libre2c: found pathological case for constant-memory POSIX algorithms.

The regexp is ((a?){1,200})*, and the input string is just "a".

Takes quibic time in the size of counter. This is caused by quadratic-
time computation of precedence matrix on each step (the number of TNFA
states in the closure approaches TNFA size), multiplied by the length
of compared histores (which also approaches TNFA size). Trie-based
algorithms are not affected, but they consume memory proportional to
the length of the input string, and so are also not practical.

5 years agolibre2c: added simplistic benchmark to compare different libre2c algorithms and re2.
Ulya Trofimovich [Wed, 20 Feb 2019 14:46:29 +0000 (14:46 +0000)]
libre2c: added simplistic benchmark to compare different libre2c algorithms and re2.

5 years agoSimplification of GOR1 (following similar changes in libre2c).
Ulya Trofimovich [Tue, 19 Feb 2019 23:10:37 +0000 (23:10 +0000)]
Simplification of GOR1 (following similar changes in libre2c).

5 years agoNo need to fully unwind tag histories when calculating POSIX precedence.
Ulya Trofimovich [Tue, 19 Feb 2019 21:51:43 +0000 (21:51 +0000)]
No need to fully unwind tag histories when calculating POSIX precedence.

Histories are stored in a trie. Each history is represented by an index
to its tail in the trie. In order to compare two histories, we can simply
unwind both indices, step by step, until they coincide or root index is
reached.

Related commits in libre2c:
    062da2dca6908c0f6da54711b897e3bafca2628c
    2a7158b401d7e925696fc1b0c8da37a59e364dd4

5 years agolibre2c: deleted unused buffers.
Ulya Trofimovich [Tue, 19 Feb 2019 21:44:38 +0000 (21:44 +0000)]
libre2c: deleted unused buffers.

5 years agoUse signed integers as tag history indices (this way root index -1 is the least one).
Ulya Trofimovich [Tue, 19 Feb 2019 21:40:46 +0000 (21:40 +0000)]
Use signed integers as tag history indices (this way root index -1 is the least one).

5 years agorun_tests.sh: ignore trailing whitespace in test diffs if compiler name matches ...
Ulya Trofimovich [Tue, 19 Feb 2019 17:52:36 +0000 (17:52 +0000)]
run_tests.sh: ignore trailing whitespace in test diffs if compiler name matches "mingw".

We already ignore trailing whitespace when running in Wine, but binaries
that built with Mingw and run on windows (not in Wine) also need this option.

5 years agoConstified std::map comparator (apparently MSVC fails to compile otherwise).
Ulya Trofimovich [Tue, 19 Feb 2019 16:51:02 +0000 (16:51 +0000)]
Constified std::map comparator (apparently MSVC fails to compile otherwise).

Problem reported by Denis Naumov.

5 years agoCorrectly handle multi-character newlines (e.g. CR LF) when parsing #line directives.
Ulya Trofimovich [Tue, 19 Feb 2019 16:32:15 +0000 (16:32 +0000)]
Correctly handle multi-character newlines (e.g. CR LF) when parsing #line directives.

Bug report thanks to Denis Naumov.

5 years agolibre2c: avoid -Wreturn compiler warnings (no return value from non-void function).
Ulya Trofimovich [Tue, 19 Feb 2019 15:59:20 +0000 (15:59 +0000)]
libre2c: avoid -Wreturn compiler warnings (no return value from non-void function).

5 years agolibre2c: use core state indices when possible.
Ulya Trofimovich [Tue, 19 Feb 2019 14:22:40 +0000 (14:22 +0000)]
libre2c: use core state indices when possible.

5 years agolibre2c: inlined hot function.
Ulya Trofimovich [Tue, 19 Feb 2019 13:50:57 +0000 (13:50 +0000)]
libre2c: inlined hot function.

5 years agolibre2c: updating marker of last matched rule after closure has been constructed...
Ulya Trofimovich [Tue, 19 Feb 2019 13:47:01 +0000 (13:47 +0000)]
libre2c: updating marker of last matched rule after closure has been constructed (non-constant-memory version).

See commit 9998f616a314d635adbfea360d9882276c237e6d (same for constant-memory version).

5 years agolibre2c: compare histories in a trie without reconstructing them (non-constant-memory...
Ulya Trofimovich [Tue, 19 Feb 2019 13:32:34 +0000 (13:32 +0000)]
libre2c: compare histories in a trie without reconstructing them (non-constant-memory version.)

See commit 2a7158b401d7e925696fc1b0c8da37a59e364dd4.

5 years agolibre2c: added forgotten file.
Ulya Trofimovich [Tue, 19 Feb 2019 12:43:27 +0000 (12:43 +0000)]
libre2c: added forgotten file.

5 years agolibre2c: rearranged code to avoid checking configuration type when computing preceden...
Ulya Trofimovich [Tue, 19 Feb 2019 12:12:46 +0000 (12:12 +0000)]
libre2c: rearranged code to avoid checking configuration type when computing precedence matrix.

5 years agolibre2c: ensure TNFA simulation algorithms clean up context before exiting.
Ulya Trofimovich [Tue, 19 Feb 2019 12:03:12 +0000 (12:03 +0000)]
libre2c: ensure TNFA simulation algorithms clean up context before exiting.

5 years agolibre2c: use templates to avoid dynamic dispatch between closure algorithms.
Ulya Trofimovich [Tue, 19 Feb 2019 08:49:27 +0000 (08:49 +0000)]
libre2c: use templates to avoid dynamic dispatch between closure algorithms.

5 years agolibre2c: hid some of the implementation details behind a pointer in regex_t struct.
Ulya Trofimovich [Mon, 18 Feb 2019 22:48:00 +0000 (22:48 +0000)]
libre2c: hid some of the implementation details behind a pointer in regex_t struct.

5 years agolibre2c: simplified construction of configurations.
Ulya Trofimovich [Mon, 18 Feb 2019 22:15:50 +0000 (22:15 +0000)]
libre2c: simplified construction of configurations.

5 years agolibre2c: added GOR1 algorithm for POSIX closure construction.
Ulya Trofimovich [Mon, 18 Feb 2019 21:44:29 +0000 (21:44 +0000)]
libre2c: added GOR1 algorithm for POSIX closure construction.

5 years agolibre2c: compare histories in a trie without reconstructing them.
Ulya Trofimovich [Mon, 18 Feb 2019 16:42:07 +0000 (16:42 +0000)]
libre2c: compare histories in a trie without reconstructing them.

Previously we used to unwing each history, store it as a linear sequence
in a separate buffer, and then iterate buffers from the beginning of
histories to fing the fork point.

This reconstruction step can be avoided, because we can find fork by
tracing both histories together from tail to root, element by element,
until their indices coincide (this is the fork point. Someone called
thos "two-fingers algorithm".

5 years agolibre2c: use signed indices in tag history (this way root index -1 is the least one).
Ulya Trofimovich [Mon, 18 Feb 2019 16:26:44 +0000 (16:26 +0000)]
libre2c: use signed indices in tag history (this way root index -1 is the least one).