From: Ulya Trofimovich Date: Sat, 5 Aug 2017 08:33:30 +0000 (+0100) Subject: Paper on Lookahead TDFA: added benchmark results and graphs. X-Git-Tag: 1.0~11 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=a626fc5dc1dd5e4763b595afb341e06ec06feb75;p=re2c Paper on Lookahead TDFA: added benchmark results and graphs. --- diff --git a/re2c/doc/tdfa/img/bench/data b/re2c/doc/tdfa/img/bench/data new file mode 100644 index 00000000..d972222e --- /dev/null +++ b/re2c/doc/tdfa/img/bench/data @@ -0,0 +1,47 @@ +1 TDFA(0) 45 452 250 63 135 339 247 12.86 10.27 99.09 55.83 +2 TDFA(0) 18 70 32 15 31 41 31 7.66 5.47 71.60 33.90 +3 TDFA(0) 23 252 152 39 75 203 155 10.01 6.01 111.76 73.75 +4 TDFA(0) 16 26 17 11 19 23 19 8.34 3.55 102.72 59.84 + + +1 TDFA(1) 42 457 183 55 139 213 151 6.43 5.59 67.00 27.93 +2 TDFA(1) 16 73 33 15 35 41 31 5.30 3.83 63.30 26.74 +3 TDFA(1) 20 256 115 35 75 138 103 6.78 3.23 104.36 51.00 +4 TDFA(1) 13 28 19 11 19 25 23 6.04 3.12 100.28 47.85 + + +1 DFA -- 414 135 35 111 145 91 4.96 4.46 62.04 23.67 +2 DFA -- 69 25 15 31 31 23 4.90 3.34 62.00 23.59 +3 DFA -- 198 67 23 55 73 55 7.06 3.19 97.87 51.37 +4 DFA -- 22 10 11 15 14 15 5.89 2.66 97.95 47.01 + + +1 TDFA(0) 45 452 295 63 59 352 267 11.95 10.30 65.47 36.95 +2 TDFA(0) 18 70 31 15 19 31 31 7.12 7.30 31.81 17.44 +3 TDFA(0) 23 252 165 39 35 181 151 8.36 8.58 39.51 31.81 +4 TDFA(0) 16 26 20 11 11 22 23 7.14 6.67 23.19 18.73 + + +1 TDFA(1) 42 457 171 55 51 144 111 6.01 5.40 15.94 10.53 +2 TDFA(1) 16 73 29 15 19 29 27 5.24 4.43 13.50 8.84 +3 TDFA(1) 20 256 127 55 31 130 107 5.21 4.81 12.02 10.01 +4 TDFA(1) 13 28 17 11 11 19 19 4.02 3.08 8.56 6.90 + + +1 DFA -- 414 123 35 39 75 51 4.71 4.76 10.88 5.61 +2 DFA -- 69 19 11 15 15 15 4.64 3.94 11.00 5.77 +3 DFA -- 198 60 19 23 39 35 4.04 4.06 9.13 8.17 +4 DFA -- 22 7 11 11 8 11 3.90 2.52 8.00 4.40 + + +1 TDFA(0) 2054 625 816 275 267 1107 839 14.11 13.25 105.58 59.60 +2 TDFA(0) 72 106 57 23 55 73 55 8.61 6.77 72.96 34.63 +3 TDFA(0) 611 280 426 127 151 536 463 10.39 7.51 127.35 75.23 +4 TDFA(0) 79 29 33 19 23 43 39 7.43 4.05 105.06 61.74 + + +1 TDFA(1) 149 462 200 63 147 233 167 6.47 5.90 68.43 29.09 +2 TDFA(1) 44 82 39 19 43 49 39 6.00 5.39 63.79 27.37 +3 TDFA(1) 64 256 131 43 87 156 123 6.74 3.54 103.91 51.08 +4 TDFA(1) 40 31 28 15 23 36 31 6.27 3.32 101.79 48.15 + diff --git a/re2c/doc/tdfa/img/bench/size_gcc_clang.png b/re2c/doc/tdfa/img/bench/size_gcc_clang.png new file mode 100644 index 00000000..4a221425 Binary files /dev/null and b/re2c/doc/tdfa/img/bench/size_gcc_clang.png differ diff --git a/re2c/doc/tdfa/img/bench/size_tcc_pcc.png b/re2c/doc/tdfa/img/bench/size_tcc_pcc.png new file mode 100644 index 00000000..552b423f Binary files /dev/null and b/re2c/doc/tdfa/img/bench/size_tcc_pcc.png differ diff --git a/re2c/doc/tdfa/img/bench/time_gcc_clang.png b/re2c/doc/tdfa/img/bench/time_gcc_clang.png new file mode 100644 index 00000000..17f1c385 Binary files /dev/null and b/re2c/doc/tdfa/img/bench/time_gcc_clang.png differ diff --git a/re2c/doc/tdfa/img/bench/time_tcc_pcc.png b/re2c/doc/tdfa/img/bench/time_tcc_pcc.png new file mode 100644 index 00000000..24cfbacf Binary files /dev/null and b/re2c/doc/tdfa/img/bench/time_tcc_pcc.png differ diff --git a/re2c/doc/tdfa/tdfa.tex b/re2c/doc/tdfa/tdfa.tex index 2fb0c8c1..f88f51cb 100644 --- a/re2c/doc/tdfa/tdfa.tex +++ b/re2c/doc/tdfa/tdfa.tex @@ -2155,7 +2155,7 @@ Tag $1$ is deterministic for both automata. \begin{multicols}{2} From these examples we can draw the following conclusions. -First, TDFA(1) are generally better than TDFA(0): delaying register operations allows to get rid of many conflicts. +First, TDFA(1) is generally better than TDFA(0): delaying register operations allows to get rid of many conflicts. Second, both kinds of automata are only suitable for RE with modest levels of ambiguity and low submatch detalisation: TDFA can be applied to full parsing, but other methods would probably outperform them. However, RE of such form are very common in practice and for them TDFA can be very efficient. @@ -2271,7 +2271,7 @@ we can also use it for backup. \subsection*{Fixed tags} -It may happen that two tags in TRE are bound: separated by a fixed number of characters, so that +It may happen that two tags in TRE are separated by a fixed number of characters: each offset of one tag is equal to the corresponding offset of the other tag plus some static offset. %the value of one tag is always equal to the value of the other plus some static offset. In this case we can track only one of the tags; we say that the second tag is \emph{fixed} on the first one. @@ -2489,28 +2489,20 @@ and visualized on subsequent plots. \hline \hline \multicolumn{12}{|c|}{re2c} \\ \hline - TDFA(0) & 45 & 452 & 250 & 63 & 135 & 339 & 247 & 12.88 & 10.31 & 99.12 & 55.91 \\ - TDFA(1) & 42 & 457 & 183 & 55 & 139 & 213 & 151 & 6.42 & 5.59 & 67.04 & 27.96 \\ - DFA & -- & 414 & 135 & 35 & 111 & 145 & 91 & 4.96 & 4.46 & 62.15 & 23.74 \\ -% TDFA(0) & 45 & 452 & 255712 & 63544 & 137320 & 346408 & 252024 & 12.88 & 10.31 & 99.12 & 55.91 \\ -% TDFA(1) & 42 & 457 & 186600 & 55352 & 141416 & 217160 & 153720 & 6.42 & 5.59 & 67.04 & 27.96 \\ -% DFA & -- & 414 & 137816 & 34864 & 112728 & 148048 & 92256 & 4.96 & 4.46 & 62.15 & 23.74 \\ + TDFA(0) & 45 & 452 & 250 & 63 & 135 & 339 & 247 & 12.86 & 10.27 & 99.09 & 55.83 \\ + TDFA(1) & 42 & 457 & 183 & 55 & 139 & 213 & 151 & 6.43 & 5.59 & 67.00 & 27.93 \\ + DFA & -- & 414 & 135 & 35 & 111 & 145 & 91 & 4.96 & 4.46 & 62.04 & 23.67 \\ \hline \hline \multicolumn{12}{|c|}{re2c -b} \\ \hline - TDFA(0) & 45 & 452 & 295 & 63 & 59 & 352 & 267 & 11.96 & 10.31 & 65.53 & 36.98 \\ - TDFA(1) & 42 & 457 & 171 & 55 & 51 & 144 & 111 & 6.01 & 5.40 & 15.96 & 10.59 \\ - DFA & -- & 414 & 123 & 35 & 39 & 75 & 51 & 4.73 & 4.78 & 10.93 & 5.63 \\ -% TDFA(0) & 45 & 452 & 301968 & 63544 & 59496 & 360136 & 272504 & 11.96 & 10.31 & 65.53 & 36.98 \\ -% TDFA(1) & 42 & 457 & 174903 & 55352 & 51304 & 147016 & 112760 & 6.01 & 5.40 & 15.96 & 10.59 \\ -% DFA & -- & 414 & 125389 & 34864 & 39000 & 76272 & 51296 & 4.73 & 4.78 & 10.93 & 5.63 \\ + TDFA(0) & 45 & 452 & 295 & 63 & 59 & 352 & 267 & 11.95 & 10.30 & 65.47 & 36.95 \\ + TDFA(1) & 42 & 457 & 171 & 55 & 51 & 144 & 111 & 6.01 & 5.40 & 15.94 & 10.53 \\ + DFA & -- & 414 & 123 & 35 & 39 & 75 & 51 & 4.71 & 4.76 & 10.88 & 5.61 \\ \hline \hline \multicolumn{12}{|c|}{re2c --no-optimize-tags} \\ \hline - TDFA(0) & 2054 & 625 & 816 & 275 & 267 & 1107 & 839 & 14.14 & 13.24 & 105.87 & 59.71 \\ - TDFA(1) & 149 & 462 & 200 & 63 & 147 & 233 & 167 & 6.64 & 5.90 & 68.50 & 29.39 \\ -% TDFA(0) & 2054 & 625 & 835285 & 280632 & 272488 & 1132616 & 858232 & 14.14 & 13.24 & 105.87 & 59.71 \\ -% TDFA(1) & 149 & 462 & 204119 & 63544 & 149608 & 238568 & 170104 & 6.64 & 5.90 & 68.50 & 29.39 \\ + TDFA(0) & 2054 & 625 & 816 & 275 & 267 & 1107 & 839 & 14.11 & 13.25 & 105.58 & 59.60 \\ + TDFA(1) & 149 & 462 & 200 & 63 & 147 & 233 & 167 & 6.47 & 5.90 & 68.43 & 29.09 \\ \hline \end{tabular}\\* \medskip @@ -2536,28 +2528,20 @@ and visualized on subsequent plots. \hline \hline \multicolumn{12}{|c|}{re2c} \\ \hline - TDFA(0) & 18 & 70 & 32 & 15 & 31 & 41 & 31 & 7.65 & 5.50 & 71.60 & 33.96 \\ - TDFA(1) & 16 & 73 & 33 & 15 & 35 & 41 & 31 & 5.31 & 3.83 & 63.36 & 26.78 \\ - DFA & -- & 69 & 25 & 15 & 31 & 31 & 23 & 4.90 & 3.34 & 62.12 & 23.64 \\ -% TDFA(0) & & & & 14392 & 30816 & 41160 & 30840 & 7.65 & 5.50 & 71.60 & 33.96 \\ -% TDFA(1) & & & & 14392 & 34912 & 41704 & 30840 & 5.31 & 3.83 & 63.36 & 26.78 \\ -% DFA & -- & 69 & 24937 & 14384 & 30808 & 31280 & 22624 & 4.90 & 3.34 & 62.12 & 23.64 \\ + TDFA(0) & 18 & 70 & 32 & 15 & 31 & 41 & 31 & 7.66 & 5.47 & 71.60 & 33.90 \\ + TDFA(1) & 16 & 73 & 33 & 15 & 35 & 41 & 31 & 5.30 & 3.83 & 63.30 & 26.74 \\ + DFA & -- & 69 & 25 & 15 & 31 & 31 & 23 & 4.90 & 3.34 & 62.00 & 23.59 \\ \hline \hline \multicolumn{12}{|c|}{re2c -b} \\ \hline - TDFA(0) & 18 & 70 & 31 & 15 & 19 & 31 & 31 & 7.12 & 7.31 & 31.85 & 17.47 \\ - TDFA(1) & 16 & 73 & 29 & 15 & 19 & 29 & 27 & 5.25 & 4.42 & 13.52 & 8.86 \\ - DFA & -- & 69 & 19 & 11 & 15 & 15 & 15 & 4.66 & 3.96 & 11.00 & 5.79 \\ -% TDFA(0) & & & & 14392 & 18528 & 31336 & 30840 & 7.12 & 7.31 & 31.85 & 17.47 \\ -% TDFA(1) & & & & 14392 & 18528 & 29288 & 26744 & 5.25 & 4.42 & 13.52 & 8.86 \\ -% DFA & -- & 69 & 18472 & 10288 & 14424 & 14832 & 14432 & 4.66 & 3.96 & 11.00 & 5.79 \\ + TDFA(0) & 18 & 70 & 31 & 15 & 19 & 31 & 31 & 7.12 & 7.30 & 31.81 & 17.44 \\ + TDFA(1) & 16 & 73 & 29 & 15 & 19 & 29 & 27 & 5.24 & 4.43 & 13.50 & 8.84 \\ + DFA & -- & 69 & 19 & 11 & 15 & 15 & 15 & 4.64 & 3.94 & 11.00 & 5.77 \\ \hline \hline \multicolumn{12}{|c|}{re2c --no-optimize-tags} \\ \hline - TDFA(0) & 72 & 106 & 57 & 23 & 55 & 73 & 55 & 8.61 & 6.77 & 73.05 & 34.68 \\ - TDFA(1) & 44 & 82 & 39 & 19 & 43 & 49 & 39 & 6.01 & 5.38 & 63.87 & 27.44 \\ -% TDFA(0) & 72 & 106 & 57956 & 22584 & 55400 & 73928 & 55416 & 8.61 & 6.77 & 73.05 & 34.68 \\ -% TDFA(1) & 44 & 82 & 39674 & 18488 & 43112 & 49480 & 39032 & 6.01 & 5.38 & 63.87 & 27.44 \\ + TDFA(0) & 72 & 106 & 57 & 23 & 55 & 73 & 55 & 8.61 & 6.77 & 72.96 & 34.63 \\ + TDFA(1) & 44 & 82 & 39 & 19 & 43 & 49 & 39 & 6.00 & 5.39 & 63.79 & 27.37 \\ \hline \end{tabular}\\* \medskip @@ -2584,28 +2568,20 @@ and visualized on subsequent plots. \hline \hline \multicolumn{12}{|c|}{re2c} \\ \hline - TDFA(0) & 23 & 252 & 152 & 39 & 75 & 203 & 155 & 10.03 & 6.10 & 111.90 & 73.81 \\ - TDFA(1) & 20 & 256 & 115 & 35 & 75 & 138 & 103 & 6.75 & 3.24 & 104.56 & 50.90 \\ - DFA & -- & 198 & 67 & 23 & 55 & 73 & 55 & 7.05 & 3.21 & 97.89 & 51.43 \\ -% TDFA(0) & 23 & 252 & 154776 & 38960 & 75864 & 207600 & 157792 & 10.03 & 6.10 & 111.90 & 73.81 \\ -% TDFA(1) & 20 & 256 & 117498 & 34864 & 75864 & 140560 & 104544 & 6.75 & 3.24 & 104.56 & 50.90 \\ -% DFA & -- & 198 & 67617 & 22576 & 55384 & 74384 & 55392 & 7.05 & 3.21 & 97.89 & 51.43 \\ + TDFA(0) & 23 & 252 & 152 & 39 & 75 & 203 & 155 & 10.01 & 6.01 & 111.76 & 73.75 \\ + TDFA(1) & 20 & 256 & 115 & 35 & 75 & 138 & 103 & 6.78 & 3.23 & 104.36 & 51.00 \\ + DFA & -- & 198 & 67 & 23 & 55 & 73 & 55 & 7.06 & 3.19 & 97.87 & 51.37 \\ \hline \hline \multicolumn{12}{|c|}{re2c -b} \\ \hline - TDFA(0) & 23 & 252 & 165 & 39 & 35 & 181 & 151 & 8.40 & 8.56 & 39.56 & 31.84 \\ - TDFA(1) & 20 & 256 & 127 & 55 & 31 & 130 & 107 & 5.23 & 4.83 & 12.04 & 10.02 \\ - DFA & -- & 198 & 60 & 19 & 23 & 39 & 35 & 4.05 & 4.08 & 9.23 & 8.19 \\ -% TDFA(0) & 23 & 252 & 168684 & 38960 & 34904 & 186704 & 153696 & 8.40 & 8.56 & 39.56 & 31.84 \\ -% TDFA(1) & 20 & 256 & 129322 & 55344 & 30808 & 132912 & 108640 & 5.23 & 4.83 & 12.04 & 10.02 \\ -% DFA & -- & 198 & 60759 & 18480 & 22616 & 39376 & 34912 & 4.05 & 4.08 & 9.23 & 8.19 \\ + TDFA(0) & 23 & 252 & 165 & 39 & 35 & 181 & 151 & 8.36 & 8.58 & 39.51 & 31.81 \\ + TDFA(1) & 20 & 256 & 127 & 55 & 31 & 130 & 107 & 5.21 & 4.81 & 12.02 & 10.01 \\ + DFA & -- & 198 & 60 & 19 & 23 & 39 & 35 & 4.04 & 4.06 & 9.13 & 8.17 \\ \hline \hline \multicolumn{12}{|c|}{re2c --no-optimize-tags} \\ \hline - TDFA(0) & 611 & 280 & 426 & 127 & 151 & 536 & 463 & 10.41 & 7.56 & 127.48 & 75.46 \\ - TDFA(1) & 64 & 256 & 131 & 43 & 87 & 156 & 123 & 6.74 & 3.55 & 103.98 & 51.12 \\ -% TDFA(0) & 611 & 280 & 435350 & 129072 & 153696 & 548272 & 473184 & 10.41 & 7.56 & 127.48 & 75.46 \\ -% TDFA(1) & 64 & 256 & 133518 & 43056 & 88160 & 159248 & 125024 & 6.74 & 3.55 & 103.98 & 51.12 \\ + TDFA(0) & 611 & 280 & 426 & 127 & 151 & 536 & 463 & 10.39 & 7.51 & 127.35 & 75.23 \\ + TDFA(1) & 64 & 256 & 131 & 43 & 87 & 156 & 123 & 6.74 & 3.54 & 103.91 & 51.08 \\ \hline \end{tabular}\\* \medskip @@ -2632,28 +2608,20 @@ and visualized on subsequent plots. \hline \hline \multicolumn{12}{|c|}{re2c} \\ \hline - TDFA(0) & 16 & 26 & 17 & 11 & 19 & 23 & 19 & 8.34 & 3.57 & 102.84 & 59.88 \\ - TDFA(1) & 13 & 28 & 19 & 11 & 19 & 25 & 23 & 6.06 & 3.14 & 100.33 & 48.02 \\ - DFA & -- & 22 & 10 & 11 & 15 & 14 & 15 & 5.91 & 2.68 & 98.10 & 47.25 \\ -% TDFA(0) & & & & 10288 & 18520 & 22960 & 18528 & 8.34 & 3.57 & 102.84 & 59.88 \\ -% TDFA(1) & & & & 10288 & 18520 & 25424 & 22624 & 6.06 & 3.14 & 100.33 & 48.02 \\ -% DFA & -- & & & 10288 & 14424 & 14256 & 14432 & 5.91 & 2.68 & 98.10 & 47.25 \\ + TDFA(0) & 16 & 26 & 17 & 11 & 19 & 23 & 19 & 8.34 & 3.55 & 102.72 & 59.84 \\ + TDFA(1) & 13 & 28 & 19 & 11 & 19 & 25 & 23 & 6.04 & 3.12 & 100.28 & 47.85 \\ + DFA & -- & 22 & 10 & 11 & 15 & 14 & 15 & 5.89 & 2.66 & 97.95 & 47.01 \\ \hline \hline \multicolumn{12}{|c|}{re2c -b} \\ \hline - TDFA(0) & 16 & 26 & 20 & 11 & 11 & 22 & 23 & 7.17 & 6.66 & 23.21 & 18.77 \\ - TDFA(1) & 13 & 28 & 17 & 11 & 11 & 19 & 19 & 4.05 & 3.09 & 8.59 & 6.94 \\ - DFA & -- & 22 & 7 & 11 & 11 & 8 & 11 & 3.92 & 2.56 & 8.06 & 4.42 \\ -% TDFA(0) & & & & 10288 & 10328 & 22352 & 22624 & 7.17 & 6.66 & 23.21 & 18.77 \\ -% TDFA(1) & & & & 10288 & 10328 & 18960 & 18528 & 4.05 & 3.09 & 8.59 & 6.94 \\ -% DFA & -- & & 6483 & 10288 & 10328 & 7888 & 10336 & 3.92 & 2.56 & 8.06 & 4.42 \\ + TDFA(0) & 16 & 26 & 20 & 11 & 11 & 22 & 23 & 7.14 & 6.67 & 23.19 & 18.73 \\ + TDFA(1) & 13 & 28 & 17 & 11 & 11 & 19 & 19 & 4.02 & 3.08 & 8.56 & 6.90 \\ + DFA & -- & 22 & 7 & 11 & 11 & 8 & 11 & 3.90 & 2.52 & 8.00 & 4.40 \\ \hline \hline \multicolumn{12}{|c|}{re2c --no-optimize-tags} \\ \hline - TDFA(0) & 79 & 29 & 33 & 19 & 23 & 43 & 39 & 7.46 & 3.94 & 105.22 & 61.72 \\ - TDFA(1) & 40 & 31 & 28 & 15 & 23 & 36 & 31 & 6.29 & 3.33 & 102.00 & 48.22 \\ -% TDFA(0) & 79 & 29 & 33745 & 18480 & 22624 & 43504 & 39008 & 7.46 & 3.94 & 105.22 & 61.72 \\ -% TDFA(1) & 40 & 31 & 28013 & 14384 & 22624 & 36080 & 30816 & 6.29 & 3.33 & 102.00 & 48.22 \\ + TDFA(0) & 79 & 29 & 33 & 19 & 23 & 43 & 39 & 7.43 & 4.05 & 105.06 & 61.74 \\ + TDFA(1) & 40 & 31 & 28 & 15 & 23 & 36 & 31 & 6.27 & 3.32 & 101.79 & 48.15 \\ \hline \end{tabular}\\* \medskip @@ -2745,6 +2713,7 @@ Premnogoe spasibo drugu na bukvu S ! ! ! :) \item \! [Cox10] Russ Cox, \textit{"Regular Expression Matching in the Wild"}, March 2010, \\ https://swtch.com/\textasciitilde rsc/regexp/regexp3.html + \item https://github.com/google/re2/issues/146 \end{enumerate}