From: Ulya Trofimovich Date: Fri, 11 Aug 2017 12:13:21 +0000 (+0100) Subject: Removed obsolete article stubs. X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=ea8e4df9ef594994f1f2426066ceebfa44249c95;p=re2c Removed obsolete article stubs. --- diff --git a/src/about/about.rst b/src/about/about.rst index a554c5cc..c427db0c 100644 --- a/src/about/about.rst +++ b/src/about/about.rst @@ -14,7 +14,7 @@ and described in research article by Peter Bumbulis, Donald D. Cowan, 1994, ACM Letters on Programming Languages and Systems (LOPLAS). -The implementation of submatch extraction in RE2C is described in article +The implementation of submatch extraction in re2c is described in the article :download:`"Tagged Deterministic Finite Automata with Lookahead" <2017_trofimovich_tagged_deterministic_finite_automata_with_lookahead.pdf>` by Ulya Trofimovich, 2017. diff --git a/src/news/contexts/an_old_bug_in_trailing_contexts.rst b/src/news/contexts/an_old_bug_in_trailing_contexts.rst deleted file mode 100644 index f4551adf..00000000 --- a/src/news/contexts/an_old_bug_in_trailing_contexts.rst +++ /dev/null @@ -1,7 +0,0 @@ -=============================== -An old bug in trailing contexts -=============================== - -.. toctree:: - :hidden: - diff --git a/src/news/contexts/parsing_regular_languages.rst b/src/news/contexts/parsing_regular_languages.rst deleted file mode 100644 index bb5e98f2..00000000 --- a/src/news/contexts/parsing_regular_languages.rst +++ /dev/null @@ -1,172 +0,0 @@ -========================= -Parsing regular languages -========================= - -.. toctree:: - :hidden: - -It is a universally acknowledged truth -that *parsing* regular languages is not the same as just *recognizing* them. -Parsing is way more difficult: it introduces the notion of ambiguity, -which immediately poses an ambiguity decision problem -and gives rise to a bunch of disambiguation techniques. -Even if we put ambiguity aside, there is still the problem of extracting parsing results efficiently: -non-determinism in the underlying automata is a clear advantage in this respect, -but NFAs are not as fast as DFAs. - -.. note on pluralizing NFA and DFA: http://english.stackexchange.com/questions/377849/what-is-the-correct-way-to-pluralize-an-initialism-in-which-the-final-word-is-no/377864 - -The question is, given a regular expression recognizer, what is the best way of turning it into a parser? -Well, the real question is, how do we add submatch extraction to re2c -while retaining the extreme speed of a directly executable DFA? -As usual, there's no solution to the problem in general: - - *No servant can serve two masters: - for either he will hate the one, and love the other; - or else he will hold to the one, and despise the other.* - -All existing techniques for parsing regular grammars (those that I'm aware of) -are kind of a compromise: they trade off recognition speed for parsing ability. -In many cases, this is quite acceptable: -interpreting engines are inherently slower than compiled regular expressions. -They are usually NFA-based and don't aim at extreme speed (they are *fast enough*). -Captures? Eh. Such engines don't even hesitate to allow backreferences, -which throws them far behind regular languages and implies exponential recognition complexity. -Captures are simply not an issue. -Well, they have a clear practical goal: regular expressions should be *easy to use*. - -On the other hand, there are compiling engines that are crazy about generating fast code. -Those are willing to increase compilation time and complexity -in order to gain even a little run-time speed. -Their definition of *practical* is somewhat less popular: regular expressions should be a *zero-cost abstraction*; -in fact, the compiled code should be better than hand-crafted code (faster, smaller). -Re2c is one such engine, and in the battle of speed against generality, re2c will surely hold to speed. - -Then there is a third camp: engines that try to get best of both worlds, -kind of like JIT-compilers for regular expressions. -Such engines usually incorporate a bunch of techniques ranging from fast substring search to backtracking, -including NFAs, DFAs and mixed approaches such as cached NFAs, also known as lazy DFAs. -The choice of technique is based on the given regular expression: -it must be the fastest technique still capable of handling the given expression. -In the case of backreferences, the engine will fall back to exponential backtracking; -in the case of captures, it will probably use an NFA. - -And than there are experiments: various research efforts -that yield interesting tools (usually tailored to a particular domain-specific problem). -Some of them look very promising, and we'll definitely take a look at them. -Yet it becomes clear that even DFA-based approaches are not suitable for re2c: -they incur too much overhead. - -So in the end, it seems logical that re2c should have *partial* parsing capabilities. -It should allow captures in unambiguous cases -that can be technically implemented with a simple DFA. -The rest of the article tries to formalize these requirements -and define an effective procedure to find out if a given regular expression conforms to them. - -The analysis is based on the work of many people: -some ideas are taken from papers, -others are inspired by fellow regular expression engines like flex and quex. -The discussion is rather informal; a thoughtful reader might notice -a couple of references at the bottom of the page. - - -Ambiguity -========= - -In short, *ambiguity* in a grammar is the possibility of parsing the same sentence in multiple different ways. - -One should clearly see the difference between *recognition* and *parsing*. -To recognize a sentence means to tell if it belongs to the language. -To parse a sentence means to recognize it and find out its meaning in the given language. -The notion of ambiguity applies to parsing, not to recognition: -if a sentence has two different meanings, then it is ambiguous. - -Ambiguity in regular grammars can take two forms: horizontal or vertical. [2] -Roughly speaking, vertical ambiguity is concerned with intersection of alternative grammar rules, -while horizontal ambiguity deals with overlap in rule concatenation. -Both kinds can be defined in terms of operations on finite-state automata. -The ambiguity problem for regular languages is decidable and has ... complexity. - -Formal definition ------------------ - -Ambiguity in regular expressions is defined in terms of the corresponding parse trees, -so we first need to define regular expressions, - - A *regular expression* over finite alphabet :math:`\Sigma` is: - - :math:`\emptyset` (empty set) - - :math:`\epsilon` (set that contains empty string) - - :math:`a \in \Sigma` (set that contains one-symbol string :math:`a`) - - :math:`R_1 R_2`, where :math:`R_1, R_2` are regular expressions (concatenation) - - :math:`R_1 | R_2`, where :math:`R_1, R_2` are regular expressions (alternative) - - :math:`R^*`, where :math:`R` is a regular expression (iteration, or Kleene star) - - Regular expression :math:`R` is *ambiguous* iff :math:`\exists \thickspace T, T' \in AST_R` - such that :math:`T \neq T'` and :math:`\| T \| = \| T' \|`. - - - - Let grammar :math:`G` be a 4-tuple :math:`(N, \Sigma, P, S)`, where: - - :math:`N` is the set of non-terminal symbols - - :math:`\Sigma` is the set of terminal symbols - - :math:`P` is the set of rules of the form :math:`xAy \longrightarrow z`, - where :math:`x,y,z \in (\Sigma \cup N)^*` (the set of all strings over :math:`\Sigma \cup N`), - :math:`A \in N` - - :math:`S \in N` is the start symbol - - Derivation in one step: - :math:`x \Rightarrow_{G} y` - :math:`\iff` - :math:`\exists u,v \in (\Sigma \cup N)^*` - :math:`| x=upv, y=uqv, p \longrightarrow q \in P` - - Derivation: - :math:`x \Rightarrow_{G}^* y` - :math:`\iff` - :math:`\exists n \in \mathbb{Z} : \thickspace x=z_0 \Rightarrow_{G} ... \Rightarrow_{G} z_n=y` - - Sentential form :math:`u` is a string in :math:`(\Sigma \cup N)^*` - that is derived from the start symbol: :math:`S \Rightarrow_{G}^* u` - - Sentence :math:`v` is a sentential form that is free of non-terminals: :math:`v \in \Sigma^*` - - Language :math:`L(G)` is the set of all sentences generated by grammar :math:`G`. - - Grammar :math:`G` is ambiguous iff a sentence in :math:`L(G)` - has two different derivations: :math:`\exists v \in L(G)` - - Given a grammar :math:`G = (N, \Sigma, P, S)` that generates language :math:`L(G)`, we say that - :math:`G` is ambiguous iff there is a sentence :math:`w \in L(G)` that - has two different parse derivations: :math:`D_{G}(w)` and :math:`D'_{G}(w)` - -.. math:: - - G \in AMB \iff \exists w \in L(G) | T_{G}(w) - -G is ambiguous <==> exists W | :math:`W^{3\beta}_{\delta_1 \rho_1 \sigma_2} \approx U^{3\beta}_{\delta_1 \rho_1}` aaaa - - -.. math:: - - W^{3\beta}_{\delta_1 \rho_1 \sigma_2} \approx U^{3\beta}_{\delta_1 \rho_1} - -Horizontal ambiguity --------------------- - -Vertical ambiguity ------------------- - - -Submatch extraction -=================== - -NFA ---- - -TDFA ----- - -Two DFA -------- - - diff --git a/src/news/contexts/submatch_extraction_the_re2c_way.rst b/src/news/contexts/submatch_extraction_the_re2c_way.rst deleted file mode 100644 index dd5734dd..00000000 --- a/src/news/contexts/submatch_extraction_the_re2c_way.rst +++ /dev/null @@ -1,19 +0,0 @@ -================================= -Submatch extraction: the re2c way -================================= - -.. toctree:: - :hidden: - -For all those who wished re2c had captures, brace yourselves: -the upcoming 0.17 release is about to add a light form of submatch extraction. -Now is the time you can speak out and influence the course of history. - - -`Bug <../../news/contexts/an_old_bug_in_trailing_contexts.html>`_ - -`The problem <../../news/contexts/parsing_regular_languages.html>`_ - -The re2c way ------------- - diff --git a/src/news/news.rst b/src/news/news.rst index 7abe3677..a562358e 100644 --- a/src/news/news.rst +++ b/src/news/news.rst @@ -5,9 +5,6 @@ News .. toctree:: :maxdepth: 1 - Submatch extraction: the re2c way - Parsing regular languages - An old bug in trailing contexts Release 0.16 Release 0.15.3 Release 0.15.2