]>
granicus.if.org Git - re2c/log
Ulya Trofimovich [Tue, 23 Jul 2019 16:10:55 +0000 (17:10 +0100)]
Paper: use pgfplots instead of gnuplot to render benchmark results.
Ulya Trofimovich [Tue, 23 Jul 2019 11:11:58 +0000 (12:11 +0100)]
Paper: taken care of Angelo's remarks (mostly grammar fixes).
Ulya Trofimovich [Thu, 18 Jul 2019 16:26:27 +0000 (17:26 +0100)]
Don't generate 'YYFILL' calls when EOF rule is used with 're2c:yyfill:enable = 0;' configuration.
Note: in the non-EOF-rule case 're2c:yyfill:enable = 0;' results in suppression
of the whole 'YYLIMIT' check together with 'YYFILL' call. In case of EOF rule,
however, we are not bound by backward compatibility and can do a more logical
thing.
Ulya Trofimovich [Thu, 18 Jul 2019 16:16:04 +0000 (17:16 +0100)]
Fixed the case of missing default label which is used only by EOF checks.
The current fix may sometimes cause generation of unused default label, because
at the time when it's generated we haven't geerated the code for EOF checks yet
and do nt know if default label is really needed. This is what happens in the
updated test cases.
Ulya Trofimovich [Thu, 18 Jul 2019 16:04:05 +0000 (17:04 +0100)]
Omit unnecessary 'else' when generating code for EOF checks.
Ulya Trofimovich [Thu, 18 Jul 2019 15:56:47 +0000 (16:56 +0100)]
Renamed misleadingly named parameter.
Ulya Trofimovich [Wed, 17 Jul 2019 22:43:25 +0000 (23:43 +0100)]
Updated CHANGELOG.
Ulya Trofimovich [Wed, 17 Jul 2019 22:40:51 +0000 (23:40 +0100)]
Updated documentation for new options and directives.
Ulya Trofimovich [Mon, 15 Jul 2019 14:56:29 +0000 (15:56 +0100)]
Moved include file with definitions of Unicode categories to 'include/' subdirectory.
Ulya Trofimovich [Sat, 13 Jul 2019 23:01:35 +0000 (00:01 +0100)]
Use cast to 'void' instead of attribute 'unused', otherwise we get -Wused-but-marked-unused in debug mode.
Ulya Trofimovich [Sat, 13 Jul 2019 22:41:30 +0000 (23:41 +0100)]
Added a few casts to 'printf' args to silence warnings caused by different type width in 32/64 modes.
Ulya Trofimovich [Sat, 13 Jul 2019 21:54:59 +0000 (22:54 +0100)]
libre2c test: always run Cox algorithm and skip expected failures.
Ulya Trofimovich [Sat, 13 Jul 2019 18:09:00 +0000 (19:09 +0100)]
Cosmetic: shortened macro name a bit and removed 'GXX' as it is also relevant to Clang.
Ulya Trofimovich [Sat, 13 Jul 2019 18:00:12 +0000 (19:00 +0100)]
Marked a few debug variables as unused to avoid warnings in release builds.
Ulya Trofimovich [Sat, 13 Jul 2019 17:45:48 +0000 (18:45 +0100)]
Makefile.am: added forgotten 'AM_V_GEN' prefix to a couple of build rules.
Ulya Trofimovich [Sat, 13 Jul 2019 17:36:03 +0000 (18:36 +0100)]
Disable ceratain warnings on bison-generated output (we have no control over it).
Ulya Trofimovich [Sat, 13 Jul 2019 10:42:37 +0000 (11:42 +0100)]
Fixed Clang warning -Wshift-sign-overflow.
Ulya Trofimovich [Sat, 13 Jul 2019 10:39:52 +0000 (11:39 +0100)]
Fixed Clang warning -Wexit-time-destructors.
Ulya Trofimovich [Sat, 13 Jul 2019 10:37:11 +0000 (11:37 +0100)]
Fixed Clang warning -Wconditional-uninitialized.
Ulya Trofimovich [Sat, 13 Jul 2019 10:34:34 +0000 (11:34 +0100)]
Fixed Clang warning -Wmissing-variable-declarations.
Ulya Trofimovich [Sat, 13 Jul 2019 10:32:04 +0000 (11:32 +0100)]
Fixed Clang warning -Wunused-template.
Ulya Trofimovich [Sat, 13 Jul 2019 10:25:25 +0000 (11:25 +0100)]
Fixed Clang warning -Wunreachable-code (these were real serious errors).
Ulya Trofimovich [Sat, 13 Jul 2019 10:23:29 +0000 (11:23 +0100)]
Fixed Clang warning -Wunreachable-code-return.
Ulya Trofimovich [Sat, 13 Jul 2019 10:20:38 +0000 (11:20 +0100)]
Fixed Clang warning -Wmissing-prototypes.
Ulya Trofimovich [Sat, 13 Jul 2019 10:16:21 +0000 (11:16 +0100)]
Fixed Clang warning -Wc++11-extensions.
Ulya Trofimovich [Sat, 13 Jul 2019 10:13:18 +0000 (11:13 +0100)]
Fixed Clang warning -Wmismatched-tags.
Ulya Trofimovich [Sat, 13 Jul 2019 10:09:39 +0000 (11:09 +0100)]
Fixed Clang warnings -Wswitch-enum and -Wcovered-switch-default.
Ulya Trofimovich [Sat, 13 Jul 2019 10:03:13 +0000 (11:03 +0100)]
Fixed Clang warning -Wshadow.
Ulya Trofimovich [Sat, 13 Jul 2019 09:49:09 +0000 (10:49 +0100)]
Fixed Clang warnings -Wextra-semi and -Wextra-semi-stmt.
Ulya Trofimovich [Sat, 13 Jul 2019 09:37:36 +0000 (10:37 +0100)]
configure.ac: silenced a few Clang warnings (too noisy, not very helpful).
Ulya Trofimovich [Fri, 12 Jul 2019 13:57:38 +0000 (14:57 +0100)]
Fixed test (added forgotten qualification after moving functions into 're2c' namespace).
Ulya Trofimovich [Fri, 12 Jul 2019 13:11:20 +0000 (14:11 +0100)]
Added build script that checks that all headers are self-contained.
Ulya Trofimovich [Fri, 12 Jul 2019 12:35:40 +0000 (13:35 +0100)]
Fixed includes using include-what-you-use.
Ulya Trofimovich [Fri, 12 Jul 2019 10:40:43 +0000 (11:40 +0100)]
Added build script for include-what-you-use.
Ulya Trofimovich [Thu, 11 Jul 2019 18:12:04 +0000 (19:12 +0100)]
Updated Mingw build scripts to workaround libtool/slibtool issues.
Currently we need both libtool and slibtool:
- libtool fails to link self-contained DLL (it ignores '-static-libstdc++
-static-libgcc'), but manages to link EXEs with static library
- slibtool manages to link self-contained DLL, but then fails to link the
final EXEs with it (it passes '-lre2c' before some other predefined object
file that contains definitions from libgcc)
Ulya Trofimovich [Thu, 11 Jul 2019 11:28:44 +0000 (12:28 +0100)]
run_tests.sh: check for re2c existence after (possibly) appending .exe extension.
Ulya Trofimovich [Thu, 11 Jul 2019 10:34:59 +0000 (11:34 +0100)]
Build scripts: changed -j5 to -j$(nproc).
Ulya Trofimovich [Wed, 10 Jul 2019 18:03:19 +0000 (19:03 +0100)]
Paper: fixed broken references.
Ulya Trofimovich [Wed, 10 Jul 2019 10:07:07 +0000 (11:07 +0100)]
Paper: references and spell-checking.
Ulya Trofimovich [Mon, 8 Jul 2019 10:55:34 +0000 (11:55 +0100)]
Paper: updated and fixed all the proofs.
Ulya Trofimovich [Wed, 26 Jun 2019 10:20:14 +0000 (11:20 +0100)]
Paper: reworked "Benchmarks" section, added "Conclusions and future work" section.
Ulya Trofimovich [Wed, 26 Jun 2019 10:15:40 +0000 (11:15 +0100)]
libre2c: don't forget to free static lists (AST nodes, etc.) at the end of 'regfree()'.
Otherwise they accumulate at every 'regcomp()'.
Ulya Trofimovich [Wed, 26 Jun 2019 07:56:44 +0000 (08:56 +0100)]
libre2c: clean up cache after matching in lazy algorithms; small tweaks in benchmark.
Cleaning up cache may take considerable time (e.g. if cache is a large
std::map), which should be included in the match time.
Ulya Trofimovich [Tue, 25 Jun 2019 21:57:24 +0000 (22:57 +0100)]
Added 'clear()' method to slab allocator.
Ulya Trofimovich [Sat, 22 Jun 2019 21:27:15 +0000 (22:27 +0100)]
Paper: removed nested negative tags from TNFA and added complexity analysis.
Ulya Trofimovich [Fri, 21 Jun 2019 13:51:26 +0000 (14:51 +0100)]
libre2c: don't add nested negative tags to TNFA, as it increases its size and makes matching slower.
Instead, add only one negative tag (top-level closing one) and record
the remaining tags in its metadata. Use this metadata to update nested
negative tags during TNFA simulation.
Ulya Trofimovich [Mon, 17 Jun 2019 19:46:48 +0000 (20:46 +0100)]
libre2c benchmark: make a couple of warmup iterations before the main run.
Ulya Trofimovich [Mon, 17 Jun 2019 09:29:58 +0000 (10:29 +0100)]
Paper: added "Benchmarks" section.
Ulya Trofimovich [Mon, 17 Jun 2019 09:27:07 +0000 (10:27 +0100)]
libre2c benchmark: print size of each regular expression and the number of capturing groups in it.
Ulya Trofimovich [Thu, 13 Jun 2019 15:49:02 +0000 (16:49 +0100)]
libre2c: removed unused variable.
Ulya Trofimovich [Thu, 13 Jun 2019 15:43:36 +0000 (16:43 +0100)]
Makefile.am: added 'bench_libre2c' to noinst_PROGRAMS (so that automake picks it as a target).
Ulya Trofimovich [Wed, 12 Jun 2019 22:45:46 +0000 (23:45 +0100)]
Paper: added section about lazy disambiguation.
Ulya Trofimovich [Fri, 24 May 2019 14:12:28 +0000 (15:12 +0100)]
Added a test for '--input-encoding utf8' option.
Ulya Trofimovich [Fri, 24 May 2019 12:15:07 +0000 (13:15 +0100)]
Added option --input-encoding <ascii | utf8> that allows to use UTF-8 literals in regular expressions.
Ulya Trofimovich [Fri, 24 May 2019 11:04:47 +0000 (12:04 +0100)]
Allow to mix multiple /*!rules:re2c*/, /*!use:re2c*/ and /*!re2c*/ blocks in -r mode.
Ulya Trofimovich [Sat, 18 May 2019 18:07:57 +0000 (19:07 +0100)]
libre2c: added GOR1 option for lazy disambiguation algorithm.
Ulya Trofimovich [Thu, 9 May 2019 21:38:12 +0000 (22:38 +0100)]
libre2c: do not include benchmark in 'make check' programs.
Ulya Trofimovich [Thu, 9 May 2019 20:44:42 +0000 (21:44 +0100)]
libre2c: added forgotten benchmark data samples.
Ulya Trofimovich [Thu, 9 May 2019 20:29:53 +0000 (21:29 +0100)]
libre2c: added missing include.
Ulya Trofimovich [Thu, 9 May 2019 17:30:32 +0000 (18:30 +0100)]
libre2c: updated benchmark.
Ulya Trofimovich [Wed, 8 May 2019 13:31:14 +0000 (14:31 +0100)]
libre2c: extended lexer to handle some escape sequences in charachter classes.
Ulya Trofimovich [Wed, 8 May 2019 10:01:48 +0000 (11:01 +0100)]
Inlined closure cleanup in the pruning procedure to further speedup determinization.
Ulya Trofimovich [Wed, 8 May 2019 09:54:00 +0000 (10:54 +0100)]
Make unreachable rule analysis conditional to further sppedup closure pruning.
Ulya Trofimovich [Wed, 8 May 2019 09:44:25 +0000 (10:44 +0100)]
Rewrite of closure pruning function to speedup determinization on large automata.
Ulya Trofimovich [Wed, 8 May 2019 08:57:59 +0000 (09:57 +0100)]
Hash in 4-byte chunks to speedup determinization on large automata.
Ulya Trofimovich [Wed, 8 May 2019 06:35:42 +0000 (07:35 +0100)]
Use correct formula to find the next aligned address.
The previous formula worked incorrectly for already aligned addresses
(it added alignment to them).
Ulya Trofimovich [Mon, 8 Apr 2019 12:21:50 +0000 (13:21 +0100)]
Paper: updated "TNFA construction" section.
Ulya Trofimovich [Sat, 6 Apr 2019 10:16:47 +0000 (11:16 +0100)]
configure.ac: use AC_USE_SYSTEM_EXTENSIONS, as --std=c++98 disables POSIX extensions on Cygwin.
Attempted fix for #247.
Ulya Trofimovich [Sat, 6 Apr 2019 09:33:20 +0000 (10:33 +0100)]
Makefile.am: reflected the rename of README to README.md.
This fixes bug #248.
Ulya Trofimovich [Fri, 5 Apr 2019 16:21:30 +0000 (17:21 +0100)]
Paper: updated sections about representation of paths and match results.
Ulya Trofimovich [Fri, 5 Apr 2019 09:34:03 +0000 (10:34 +0100)]
Paper: rearranged "epsilon-closure" section.
Ulya Trofimovich [Tue, 2 Apr 2019 12:48:52 +0000 (13:48 +0100)]
Paper: updated pseudocode for precedence procedures.
Ulya Trofimovich [Mon, 1 Apr 2019 21:49:50 +0000 (22:49 +0100)]
Paper: restructured "formalization" section.
Ulya Trofimovich [Mon, 1 Apr 2019 13:48:48 +0000 (14:48 +0100)]
Paper: wrote "main idea" section.
Ulya Trofimovich [Sat, 30 Mar 2019 09:59:50 +0000 (09:59 +0000)]
libre2c: restructured main loop to avoid double call to closure.
Ulya Trofimovich [Fri, 29 Mar 2019 09:37:38 +0000 (09:37 +0000)]
Paper: wrote the "introduction" section.
Ulya Trofimovich [Tue, 26 Mar 2019 18:22:42 +0000 (18:22 +0000)]
Fixed compilation error caused by extra namespace qualification (second attempt).
Ulya Trofimovich [Tue, 26 Mar 2019 18:18:34 +0000 (18:18 +0000)]
Fixed compilation error caused by extra namespace qualification.
Ulya Trofimovich [Tue, 26 Mar 2019 17:40:51 +0000 (17:40 +0000)]
libre2c: implemented Kuklewicz disambiguation algorithm for POSIX TNFA matching.
Ulya Trofimovich [Mon, 25 Mar 2019 16:40:04 +0000 (16:40 +0000)]
libre2c: free tag history after each step (it's no longer needed).
Seems to be a copy-paste error from TDFA implementation, where history
for each state is needed until the end of determinization.
Ulya Trofimovich [Mon, 25 Mar 2019 14:32:39 +0000 (14:32 +0000)]
Parameterize determinization/simulation context directly by history type.
Ulya Trofimovich [Mon, 25 Mar 2019 07:26:43 +0000 (07:26 +0000)]
Added logs with test failures of backward matching algorithm.
Ulya Trofimovich [Sun, 24 Mar 2019 10:14:54 +0000 (10:14 +0000)]
libre2c: added Cox backward matching algorithm (incorrect, but interesting).
Ulya Trofimovich [Sun, 24 Mar 2019 09:31:56 +0000 (09:31 +0000)]
Fictive tags for right alt/cat must be added after adding all tags for left alt/cat.
Previous order was incorrect: fictive tags for both left and right
alternative or catenation were added before recursing into the right
alternative or catenation. This resulted in right fictive tags habving
higher precedence than left non-fictive tags.
Ulya Trofimovich [Sun, 24 Mar 2019 09:28:59 +0000 (09:28 +0000)]
Debug: don't print NFA in-degree in each node, it takes too much space.
Ulya Trofimovich [Sat, 16 Mar 2019 23:45:48 +0000 (23:45 +0000)]
libre2c: added POSIX tests that show why closure can't use DFS and needs worst-case quadratic shortest path algorithm.
In particular, test ((((a?)+)|(aa))+) and string "aaa" demonstrates why
we cannot replace SSSP (GOR1 or GTOP) with a liner-time DFS that sets
predecessor chains of TNFA states and sorts initial closure configurations
according to POSIX relation imposed by precedence matrix.
Consider what happens after consuming the first to 'a's (see the .dot
encoded TNFA below):
- initial states are 8, 14 and 9 (in the order imposed by D-matrix)
- start DFS from 8
- find path 8->7->6->5->4->3->2->22->21->11->10
- find path 8->7->6->5->4->3->2->22->21->20->19->18->17->16->15
- find path
8->7->6->5->4->3->2->22->21->20->19->18->17->16->14->13->17
compare it to the previous path to 17 and discard the new looping path
- find path
8->7->6->5->4->3->2->22->21->20->19->18->17->16->14->13->12->3
compare it to the previous path to 3 and discard the new looping path
- find path 8->7->6->5->4->3->2->1->0
- DFS from 8 complete, start DFS from 14
- find path 14, compare it to the previous path to 14 and accept the new
one as it is better (but do not attempt to propagate the improvement
and rescan states: this is DFS, not GOR1, and the whole idea is to
avoid rescanning and have linear complexity)
- DFS from 14 complete, start DFS from 9
- find path 9
The end, and we have an incorrect path to 15: the long path through the
outer loop from 8, rather than the short path through the inner loop
from 14. This further leads to incorrect results.
digraph NFA {
rankdir=LR
node[shape=Mrecord fontname=Courier height=0.2 width=0.2]
edge[arrowhead=vee fontname=Courier label=" "]
n23 [label="23"]
n23 -> n22 [label="/0↑(1)"]
n22 [label="22"]
n22 -> n21 [label="/2↑(2)"]
n21 [label="21"]
n21 -> n11
n21 -> n20 [color=lightgray]
n20 [label="20"]
n20 -> n19 [label="/4↓(3)"]
n19 [label="19"]
n19 -> n18 [label="/5↓(2)"]
n18 [label="18"]
n18 -> n17 [label="/6↑(3)"]
n17 [label="17"]
n17 -> n16 [label="/8↑(4)"]
n16 [label="16"]
n16 -> n15
n16 -> n14 [color=lightgray]
n15 [label="15"]
n15 -> n14 [label="97"]
n14 [label="14"]
n14 -> n13 [label="/9↑(3)"]
n13 [label="13"]
n13 -> n17
n13 -> n12 [color=lightgray]
n12 [label="12"]
n12 -> n3 [label="/7↑(2)"]
n11 [label="11"]
n11 -> n10 [label="/4↑(3)"]
n10 [label="10"]
n10 -> n9 [label="97"]
n9 [label="9"]
n9 -> n8 [label="97"]
n8 [label="8"]
n8 -> n7 [label="/5↑(2)"]
n7 [label="7"]
n7 -> n6 [label="/6↓(3)"]
n6 [label="6"]
n6 -> n5 [label="/8↓(4)"]
n5 [label="5"]
n5 -> n4 [label="/9↓(3)"]
n4 [label="4"]
n4 -> n3 [label="/7↓(2)"]
n3 [label="3"]
n3 -> n2 [label="/3↑(1)"]
n2 [label="2"]
n2 -> n22
n2 -> n1 [color=lightgray]
n1 [label="1"]
n1 -> n0 [label="/1↑(0)"]
n0 [label="0"] [fillcolor=gray]
}
Ulya Trofimovich [Fri, 8 Mar 2019 07:21:49 +0000 (07:21 +0000)]
Use state index in closure instead of core index, and don't keep core indices at all.
Ulya Trofimovich [Thu, 7 Mar 2019 23:05:40 +0000 (23:05 +0000)]
Added tests for errors in case of out-of-bounds EOF value.
Ulya Trofimovich [Thu, 7 Mar 2019 22:51:11 +0000 (22:51 +0000)]
Give a concise error message when EOF rule is present, but no other rules are.
Ulya Trofimovich [Thu, 7 Mar 2019 22:09:05 +0000 (22:09 +0000)]
Updated .gitignore.
Ulya Trofimovich [Thu, 7 Mar 2019 18:47:04 +0000 (18:47 +0000)]
Updating .travis.yml after global move.
Ulya Trofimovich [Thu, 7 Mar 2019 18:28:49 +0000 (18:28 +0000)]
Moved re2c subdirectory to root directory and renamed old libre2c to libre2c_old.
Ulya Trofimovich [Thu, 7 Mar 2019 17:21:34 +0000 (17:21 +0000)]
Makefile.am: use wildcard instead of braces in macro expansion (the latter breaks on BSD).
Ulya Trofimovich [Thu, 7 Mar 2019 16:24:16 +0000 (16:24 +0000)]
Avoid using ULLONG_MAX as it is non-standard prior to C++11.
Ulya Trofimovich [Thu, 7 Mar 2019 13:08:51 +0000 (13:08 +0000)]
Handle cases when rename() fails because destination file exists.
In C/C++ rename() behaviour is implementation-defined. POSIX and Windows
have different behaviour when destination file exists: POSIX says rename()
should overwrite it, but Windows says rename() should fail.
Ulya Trofimovich [Wed, 6 Mar 2019 23:28:29 +0000 (23:28 +0000)]
Added more tests for EOF rule.
Ulya Trofimovich [Wed, 6 Mar 2019 21:47:49 +0000 (21:47 +0000)]
Improved label generation with EOF rule (removed unused and added missing labels).
Ulya Trofimovich [Wed, 6 Mar 2019 16:06:39 +0000 (16:06 +0000)]
Added some tests for EOF rule.
Ulya Trofimovich [Wed, 6 Mar 2019 14:54:00 +0000 (14:54 +0000)]
Fixed EOF rule handling in case when EOF symbol is the upper bound of charset.
Ulya Trofimovich [Wed, 6 Mar 2019 12:15:48 +0000 (12:15 +0000)]
run_tests.sh: ignore difference in trailing whitespace with --wine option.