]> granicus.if.org Git - re2c/commit
Parse character classes in lexer.
authorUlya Trofimovich <skvadrik@gmail.com>
Tue, 18 Aug 2015 14:54:22 +0000 (15:54 +0100)
committerUlya Trofimovich <skvadrik@gmail.com>
Tue, 18 Aug 2015 14:54:22 +0000 (15:54 +0100)
commite42f271e4b284cd6ed590dc06a5239ce13942fc5
tree4594062c7824d1d8a50252e81d9ee86100a78d87
parent6d3c3f46510daa5aa9eda4f6be089a4305d73d22
Parse character classes in lexer.

Before this commit, lexer would barely recognize class as a whole
lexeme and pass it to functions that would further parse is by manually
picking individual code points and control characters out of the lexeme.

Heh, re2c is made for such kind of stuff (and it does it much better).
So now lexer parses individual code points or control characters and
stores them as a sequence of code points. Then an outer function
splits this sequence into ranges and individual characters.

I tried to preserve existing behaviour (judging from test suite only
the text of some error messages has changed). Added some autogenerated
tests (and the generator script itself), but these tests are not exhaustive.
25 files changed:
re2c/Makefile.am
re2c/bootstrap/src/parse/scanner_lex.cc
re2c/src/ir/regexp/encoding/enc.cc
re2c/src/ir/regexp/regexp.cc
re2c/src/parse/scanner.h
re2c/src/parse/scanner_lex.re
re2c/src/parse/unescape.cc
re2c/src/parse/unescape.h [new file with mode: 0644]
re2c/test/class1.i.c [new file with mode: 0644]
re2c/test/class1.i.re [new file with mode: 0644]
re2c/test/class2.i.c [new file with mode: 0644]
re2c/test/class2.i.re [new file with mode: 0644]
re2c/test/class3.i8.c [new file with mode: 0644]
re2c/test/class3.i8.re [new file with mode: 0644]
re2c/test/class4.i.c [new file with mode: 0644]
re2c/test/class4.i.re [new file with mode: 0644]
re2c/test/error10.c
re2c/test/error11.c
re2c/test/error4.c
re2c/test/error5.c
re2c/test/error6.c
re2c/test/error7.c
re2c/test/error8.c
re2c/test/error9.c
re2c/test/gen_class_examples.hs [new file with mode: 0755]