From: helly Date: Sat, 13 Mar 2004 12:34:15 +0000 (+0000) Subject: Not needed here X-Git-Tag: 0.13.6~736 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=56d15c2b15d6b10403ee82923a271cb4800ee3f2;p=re2c Not needed here --- diff --git a/bootstrap/re2c.man b/bootstrap/re2c.man deleted file mode 100644 index b74aaafb..00000000 --- a/bootstrap/re2c.man +++ /dev/null @@ -1,660 +0,0 @@ - - - -RE2C(1) RE2C(1) - - -NNAAMMEE - re2c - convert regular expressions to C/C++ - - -SSYYNNOOPPSSIISS - rree22cc [--eessbb] _n_a_m_e - - -DDEESSCCRRIIPPTTIIOONN - rree22cc is a preprocessor that generates C-based recognizers - from regular expressions. The input to rree22cc consists of - C/C++ source interleaved with comments of the form //**!!rree22cc - ... **// which contain scanner specifications. In the out- - put these comments are replaced with code that, when exe- - cuted, will find the next input token and then execute - some user-supplied token-specific code. - - For example, given the following code - - #define NULL ((char*) 0) - char *scan(char *p){ - char *q; - #define YYCTYPE char - #define YYCURSOR p - #define YYLIMIT p - #define YYMARKER q - #define YYFILL(n) - /*!re2c - [0-9]+ {return YYCURSOR;} - [\000-\377] {return NULL;} - */ - } - - rree22cc will generate - - /* Generated by re2c on Sat Apr 16 11:40:58 1994 */ - #line 1 "simple.re" - #define NULL ((char*) 0) - char *scan(char *p){ - char *q; - #define YYCTYPE char - #define YYCURSOR p - #define YYLIMIT p - #define YYMARKER q - #define YYFILL(n) - { - YYCTYPE yych; - unsigned int yyaccept; - goto yy0; - yy1: ++YYCURSOR; - yy0: - if((YYLIMIT - YYCURSOR) < 2) YYFILL(2); - yych = *YYCURSOR; - if(yych <= '/') goto yy4; - - - -Version 0.5 8 April 1994 1 - - - - - -RE2C(1) RE2C(1) - - - if(yych >= ':') goto yy4; - yy2: yych = *++YYCURSOR; - goto yy7; - yy3: - #line 10 - {return YYCURSOR;} - yy4: yych = *++YYCURSOR; - yy5: - #line 11 - {return NULL;} - yy6: ++YYCURSOR; - if(YYLIMIT == YYCURSOR) YYFILL(1); - yych = *YYCURSOR; - yy7: if(yych <= '/') goto yy3; - if(yych <= '9') goto yy6; - goto yy3; - } - #line 12 - - } - - -OOPPTTIIOONNSS - rree22cc provides the following options: - - --ee Cross-compile from an ASCII platform to an EBCDIC - one. - - --ss Generate nested iiffs for some sswwiittcchhes. Many com- - pilers need this assist to generate better code. - - --bb Implies --ss. Use bit vectors as well in the attempt - to coax better code out of the compiler. Most use- - ful for specifications with more than a few key- - words (e.g. for most programming languages). - - -IINNTTEERRFFAACCEE CCOODDEE - Unlike other scanner generators, rree22cc does not generate - complete scanners: the user must supply some interface - code. In particular, the user must define the following - macros: - - YYYYCCHHAARR Type used to hold an input symbol. Usually cchhaarr or - uunnssiiggnneedd cchhaarr. - - YYYYCCUURRSSOORR - _l-expression of type **YYYYCCHHAARR that points to the - current input symbol. The generated code advances - YYYYCCUURRSSOORR as symbols are matched. On entry, YYYYCCUURR-- - SSOORR is assumed to point to the first character of - the current token. On exit, YYYYCCUURRSSOORR will point to - the first character of the following token. - - - - -Version 0.5 8 April 1994 2 - - - - - -RE2C(1) RE2C(1) - - - YYLLIIMMIITT Expression of type **YYYYCCHHAARR that marks the end of - the buffer (YYLLIIMMIITT[[--11]] is the last character in the - buffer). The generated code repeatedly compares - YYYYCCUURRSSOORR to YYLLIIMMIITT to determine when the buffer - needs (re)filling. - - YYYYMMAARRKKEERR - _l-expression of type **YYYYCCHHAARR. The generated code - saves backtracking information in YYYYMMAARRKKEERR. - - YYYYFFIILLLL((_n)) - The generated code "calls" YYYYFFIILLLL when the buffer - needs (re)filling: at least _n additional charac- - ters should be provided. YYYYFFIILLLL should adjust - YYYYCCUURRSSOORR, YYYYLLIIMMIITT and YYYYMMAARRKKEERR as needed. Note - that for typical programming languages _n will be - the length of the longest keyword plus one. - - -SSCCAANNNNEERR SSPPEECCIIFFIICCAATTIIOONNSS - Each scanner specification consists of a set of _r_u_l_e_s and - name definitions. Rules consist of a regular expression - along with a block of C/C++ code that is to be executed - when the associated regular expression is matched. Name - definitions are of the form ``_n_a_m_e == _r_e_g_u_l_a_r _e_x_p_r_e_s_- - _s_i_o_n;;''. - - -SSUUMMMMAARRYY OOFF RREE22CC RREEGGUULLAARR EEXXPPRREESSSSIIOONNSS - ""ffoooo"" the literal string ffoooo. ANSI-C escape sequences - can be used. - - [[xxyyzz]] a "character class"; in this case, the regular - expression matches either an 'xx', a 'yy', or a 'zz'. - - [[aabbjj--ooZZ]] - a "character class" with a range in it; matches an - 'aa', a 'bb', any letter from 'jj' through 'oo', or a - 'ZZ'. - - _r\\_s match any _r which isn't an _s. _r and _s must be regu- - lar expressions which can be expressed as character - classes. - - _r** zero or more _r's, where _r is any regular expression - - _r++ one or more _r's - - _r?? zero or one _r's (that is, "an optional _r") - - name the expansion of the "name" definition (see above) - - ((_r)) an _r; parentheses are used to override precedence - (see below) - - - -Version 0.5 8 April 1994 3 - - - - - -RE2C(1) RE2C(1) - - - _r_s an _r followed by an _s ("concatenation") - - _r||_s either an _r or an _s - - _r//_s an _r but only if it is followed by an _s. The s is - not part of the matched text. This type of regular - expression is called "trailing context". - - The regular expressions listed above are grouped according - to precedence, from highest precedence at the top to low- - est at the bottom. Those grouped together have equal - precedence. - - -AA LLAARRGGEERR EEXXAAMMPPLLEE - #include - #include - #include - #include - - #define ADDEQ 257 - #define ANDAND 258 - #define ANDEQ 259 - #define ARRAY 260 - #define ASM 261 - #define AUTO 262 - #define BREAK 263 - #define CASE 264 - #define CHAR 265 - #define CONST 266 - #define CONTINUE 267 - #define DECR 268 - #define DEFAULT 269 - #define DEREF 270 - #define DIVEQ 271 - #define DO 272 - #define DOUBLE 273 - #define ELLIPSIS 274 - #define ELSE 275 - #define ENUM 276 - #define EQL 277 - #define EXTERN 278 - #define FCON 279 - #define FLOAT 280 - #define FOR 281 - #define FUNCTION 282 - #define GEQ 283 - #define GOTO 284 - #define ICON 285 - #define ID 286 - #define IF 287 - #define INCR 288 - #define INT 289 - #define LEQ 290 - - - -Version 0.5 8 April 1994 4 - - - - - -RE2C(1) RE2C(1) - - - #define LONG 291 - #define LSHIFT 292 - #define LSHIFTEQ 293 - #define MODEQ 294 - #define MULEQ 295 - #define NEQ 296 - #define OREQ 297 - #define OROR 298 - #define POINTER 299 - #define REGISTER 300 - #define RETURN 301 - #define RSHIFT 302 - #define RSHIFTEQ 303 - #define SCON 304 - #define SHORT 305 - #define SIGNED 306 - #define SIZEOF 307 - #define STATIC 308 - #define STRUCT 309 - #define SUBEQ 310 - #define SWITCH 311 - #define TYPEDEF 312 - #define UNION 313 - #define UNSIGNED 314 - #define VOID 315 - #define VOLATILE 316 - #define WHILE 317 - #define XOREQ 318 - #define EOI 319 - - typedef unsigned int uint; - typedef unsigned char uchar; - - #define BSIZE 8192 - - #define YYCTYPE uchar - #define YYCURSOR cursor - #define YYLIMIT s->lim - #define YYMARKER s->ptr - #define YYFILL(n) {cursor = fill(s, cursor);} - - #define RET(i) {s->cur = cursor; return i;} - - typedef struct Scanner { - int fd; - uchar *bot, *tok, *ptr, *cur, *pos, *lim, *top, *eof; - uint line; - } Scanner; - - uchar *fill(Scanner *s, uchar *cursor){ - if(!s->eof){ - uint cnt = s->tok - s->bot; - if(cnt){ - memcpy(s->bot, s->tok, s->lim - s->tok); - - - -Version 0.5 8 April 1994 5 - - - - - -RE2C(1) RE2C(1) - - - s->tok = s->bot; - s->ptr -= cnt; - cursor -= cnt; - s->pos -= cnt; - s->lim -= cnt; - } - if((s->top - s->lim) < BSIZE){ - uchar *buf = (uchar*) - malloc(((s->lim - s->bot) + BSIZE)*sizeof(uchar)); - memcpy(buf, s->tok, s->lim - s->tok); - s->tok = buf; - s->ptr = &buf[s->ptr - s->bot]; - cursor = &buf[cursor - s->bot]; - s->pos = &buf[s->pos - s->bot]; - s->lim = &buf[s->lim - s->bot]; - s->top = &s->lim[BSIZE]; - free(s->bot); - s->bot = buf; - } - if((cnt = read(s->fd, (char*) s->lim, BSIZE)) != BSIZE){ - s->eof = &s->lim[cnt]; *(s->eof)++ = '\n'; - } - s->lim += cnt; - } - return cursor; - } - - int scan(Scanner *s){ - uchar *cursor = s->cur; - std: - s->tok = cursor; - /*!re2c - any = [\000-\377]; - O = [0-7]; - D = [0-9]; - L = [a-zA-Z_]; - H = [a-fA-F0-9]; - E = [Ee] [+-]? D+; - FS = [fFlL]; - IS = [uUlL]*; - ESC = [\\] ([abfnrtv?'"\\] | "x" H+ | O+); - */ - - /*!re2c - "/*" { goto comment; } - - "auto" { RET(AUTO); } - "break" { RET(BREAK); } - "case" { RET(CASE); } - "char" { RET(CHAR); } - "const" { RET(CONST); } - "continue" { RET(CONTINUE); } - "default" { RET(DEFAULT); } - "do" { RET(DO); } - - - -Version 0.5 8 April 1994 6 - - - - - -RE2C(1) RE2C(1) - - - "double" { RET(DOUBLE); } - "else" { RET(ELSE); } - "enum" { RET(ENUM); } - "extern" { RET(EXTERN); } - "float" { RET(FLOAT); } - "for" { RET(FOR); } - "goto" { RET(GOTO); } - "if" { RET(IF); } - "int" { RET(INT); } - "long" { RET(LONG); } - "register" { RET(REGISTER); } - "return" { RET(RETURN); } - "short" { RET(SHORT); } - "signed" { RET(SIGNED); } - "sizeof" { RET(SIZEOF); } - "static" { RET(STATIC); } - "struct" { RET(STRUCT); } - "switch" { RET(SWITCH); } - "typedef" { RET(TYPEDEF); } - "union" { RET(UNION); } - "unsigned" { RET(UNSIGNED); } - "void" { RET(VOID); } - "volatile" { RET(VOLATILE); } - "while" { RET(WHILE); } - - L (L|D)* { RET(ID); } - - ("0" [xX] H+ IS?) | ("0" D+ IS?) | (D+ IS?) | - (['] (ESC|any\[\n\\'])* [']) - { RET(ICON); } - - (D+ E FS?) | (D* "." D+ E? FS?) | (D+ "." D* E? FS?) - { RET(FCON); } - - (["] (ESC|any\[\n\\"])* ["]) - { RET(SCON); } - - "..." { RET(ELLIPSIS); } - ">>=" { RET(RSHIFTEQ); } - "<<=" { RET(LSHIFTEQ); } - "+=" { RET(ADDEQ); } - "-=" { RET(SUBEQ); } - "*=" { RET(MULEQ); } - "/=" { RET(DIVEQ); } - "%=" { RET(MODEQ); } - "&=" { RET(ANDEQ); } - "^=" { RET(XOREQ); } - "|=" { RET(OREQ); } - ">>" { RET(RSHIFT); } - "<<" { RET(LSHIFT); } - "++" { RET(INCR); } - "--" { RET(DECR); } - "->" { RET(DEREF); } - "&&" { RET(ANDAND); } - - - -Version 0.5 8 April 1994 7 - - - - - -RE2C(1) RE2C(1) - - - "||" { RET(OROR); } - "<=" { RET(LEQ); } - ">=" { RET(GEQ); } - "==" { RET(EQL); } - "!=" { RET(NEQ); } - ";" { RET(';'); } - "{" { RET('{'); } - "}" { RET('}'); } - "," { RET(','); } - ":" { RET(':'); } - "=" { RET('='); } - "(" { RET('('); } - ")" { RET(')'); } - "[" { RET('['); } - "]" { RET(']'); } - "." { RET('.'); } - "&" { RET('&'); } - "!" { RET('!'); } - "~" { RET('~'); } - "-" { RET('-'); } - "+" { RET('+'); } - "*" { RET('*'); } - "/" { RET('/'); } - "%" { RET('%'); } - "<" { RET('<'); } - ">" { RET('>'); } - "^" { RET('^'); } - "|" { RET('|'); } - "?" { RET('?'); } - - - [ \t\v\f]+ { goto std; } - - "\n" - { - if(cursor == s->eof) RET(EOI); - s->pos = cursor; s->line++; - goto std; - } - - any - { - printf("unexpected character: %c\n", *s->tok); - goto std; - } - */ - - comment: - /*!re2c - "*/" { goto std; } - "\n" - { - if(cursor == s->eof) RET(EOI); - s->tok = s->pos = cursor; s->line++; - - - -Version 0.5 8 April 1994 8 - - - - - -RE2C(1) RE2C(1) - - - goto comment; - } - any { goto comment; } - */ - } - - main(){ - Scanner in; - int t; - memset((char*) &in, 0, sizeof(in)); - in.fd = 0; - while((t = scan(&in)) != EOI){ - /* - printf("%d\t%.*s\n", t, in.cur - in.tok, in.tok); - printf("%d\n", t); - */ - } - close(in.fd); - } - - -SSEEEE AALLSSOO - flex(1), lex(1). - - -FFEEAATTUURREESS - rree22cc does not provide a default action: the generated code - assumes that the input will consist of a sequence of - tokens. Typically this can be dealt with by adding a rule - such as the one for unexpected characters in the example - above. - - The user must arrange for a sentinel token to appear at - the end of input (and provide a rule for matching it): - rree22cc does not provide an <<<>>> expression. If the - source is from a null-byte terminated string, a rule - matching a null character will suffice. If the source is - from a file then the approach taken in the example can be - used: pad the input with a newline (or some other charac- - ter that can't appear within another token); upon recog- - nizing such a character check to see if it is the sentinel - and act accordingly. - - rree22cc does not provide start conditions: use a separate - scanner specification for each start condition (as illus- - trated in the above example). - - No [^x]. Use difference instead. - -BBUUGGSS - Only fixed length trailing context can be handled. - - The maximum value appearing as a parameter _n to YYYYFFIILLLL is - not provided to the generated code (this value is needed - - - -Version 0.5 8 April 1994 9 - - - - - -RE2C(1) RE2C(1) - - - for constructing the interface code). Note that this - value is usually relatively small: for typical programming - languages _n will be the length of the longest keyword plus - one. - - Difference only works for character sets. - - The rree22cc internal algorithms need documentation. - - -AAUUTTHHOORR - Please send bug reports, fixes and feedback to: - - Peter Bumbulis - Computer Systems Group - University of Waterloo - Waterloo, Ontario - N2L 3G1 - Internet: peter@csg.uwaterloo.ca - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Version 0.5 8 April 1994 10 - -