From 04836e580b034780199d6c2a3b4374c3651d8943 Mon Sep 17 00:00:00 2001 From: Will Estes Date: Fri, 9 Aug 2002 14:20:01 +0000 Subject: [PATCH] remove faq.texi as it's included in flex.texi --- Makefile.am | 1 - faq.texi | 2893 --------------------------------------------------- 2 files changed, 2894 deletions(-) delete mode 100644 faq.texi diff --git a/Makefile.am b/Makefile.am index c22fc5e..184c761 100644 --- a/Makefile.am +++ b/Makefile.am @@ -78,7 +78,6 @@ include_HEADERS = \ FlexLexer.h info_TEXINFOS = flex.texi -flex_TEXINFOS = faq.texi man_MANS = flex.1 EXTRA_DIST = \ diff --git a/faq.texi b/faq.texi deleted file mode 100644 index ced5933..0000000 --- a/faq.texi +++ /dev/null @@ -1,2893 +0,0 @@ -@c This file is part of flex. - -@c Copyright (c) 1990, 1997 The Regents of the University of California. -@c All rights reserved. - -@c This code is derived from software contributed to Berkeley by -@c Vern Paxson. - -@c The United States Government has rights in this work pursuant -@c to contract no. DE-AC03-76SF00098 between the United States -@c Department of Energy and the University of California. - -@c Redistribution and use in source and binary forms, with or without -@c modification, are permitted provided that the following conditions -@c are met: - -@c 1. Redistributions of source code must retain the above copyright -@c notice, this list of conditions and the following disclaimer. -@c 2. Redistributions in binary form must reproduce the above copyright -@c notice, this list of conditions and the following disclaimer in the -@c documentation and/or other materials provided with the distribution. - -@c Neither the name of the University nor the names of its contributors -@c may be used to endorse or promote products derived from this software -@c without specific prior written permission. - -@c THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR -@c IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED -@c WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR -@c PURPOSE. - -@node FAQ -@unnumbered FAQ - -@menu -* When was flex born?:: -* How do I expand \ escape sequences in C-style quoted strings?:: -* Why do flex scanners call fileno if it is not ANSI compatible?:: -* Does flex support recursive pattern definitions?:: -* How do I skip huge chunks of input (tens of megabytes) while using flex?:: -* Flex is not matching my patterns in the same order that I defined them.:: -* My actions are executing out of order or sometimes not at all.:: -* How can I have multiple input sources feed into the same scanner at the same time?:: -* Can I build nested parsers that work with the same input file?:: -* How can I match text only at the end of a file?:: -* How can I make REJECT cascade across start condition boundaries?:: -* Why cant I use fast or full tables with interactive mode?:: -* How much faster is -F or -f than -C?:: -* If I have a simple grammar cant I just parse it with flex?:: -* Why doesnt yyrestart() set the start state back to INITIAL?:: -* How can I match C-style comments?:: -* The period isnt working the way I expected.:: -* Can I get the flex manual in another format?:: -* Does there exist a "faster" NDFA->DFA algorithm?:: -* How does flex compile the DFA so quickly?:: -* How can I use more than 8192 rules?:: -* How do I abandon a file in the middle of a scan and switch to a new file?:: -* How do I execute code only during initialization (only before the first scan)?:: -* How do I execute code at termination?:: -* Where else can I find help?:: -* Can I include comments in the "rules" section of the file file?:: -* I get an error about undefined yywrap().:: -* How can I change the matching pattern at run time?:: -* Is there a way to increase the rules (NFA states to a bigger number?):: -* How can I expand macros in the input?:: -* How can I build a two-pass scanner?:: -* How do I match any string not matched in the preceding rules?:: -* I am trying to port code from AT&T lex that uses yysptr and yysbuf.:: -* Is there a way to make flex treat NULL like a regular character?:: -* Whenever flex can not match the input it says "flex scanner jammed".:: -* Why doesnt flex have non-greedy operators like perl does?:: -* Memory leak - 16386 bytes allocated by malloc.:: -* How do I track the byte offset for lseek()?:: -* unnamed-faq-16:: -* How do I skip as many chars as possible?:: -* unnamed-faq-33:: -* unnamed-faq-42:: -* unnamed-faq-43:: -* unnamed-faq-44:: -* unnamed-faq-45:: -* unnamed-faq-46:: -* unnamed-faq-47:: -* unnamed-faq-48:: -* unnamed-faq-49:: -* unnamed-faq-50:: -* unnamed-faq-51:: -* unnamed-faq-52:: -* unnamed-faq-53:: -* unnamed-faq-54:: -* unnamed-faq-55:: -* unnamed-faq-56:: -* unnamed-faq-57:: -* unnamed-faq-58:: -* unnamed-faq-59:: -* unnamed-faq-60:: -* unnamed-faq-61:: -* unnamed-faq-62:: -* unnamed-faq-63:: -* unnamed-faq-64:: -* unnamed-faq-65:: -* unnamed-faq-66:: -* unnamed-faq-67:: -* unnamed-faq-68:: -* unnamed-faq-69:: -* unnamed-faq-70:: -* unnamed-faq-71:: -* unnamed-faq-72:: -* unnamed-faq-73:: -* unnamed-faq-74:: -* unnamed-faq-75:: -* unnamed-faq-76:: -* unnamed-faq-77:: -* unnamed-faq-78:: -* unnamed-faq-79:: -* unnamed-faq-80:: -* unnamed-faq-81:: -* unnamed-faq-82:: -* unnamed-faq-83:: -* unnamed-faq-84:: -* unnamed-faq-85:: -* unnamed-faq-86:: -* unnamed-faq-87:: -* unnamed-faq-88:: -* unnamed-faq-89:: -* unnamed-faq-90:: -* unnamed-faq-91:: -* unnamed-faq-92:: -* unnamed-faq-93:: -* unnamed-faq-94:: -* unnamed-faq-95:: -* unnamed-faq-96:: -* unnamed-faq-97:: -* unnamed-faq-98:: -* unnamed-faq-99:: -* unnamed-faq-100:: -* unnamed-faq-101:: -@end menu - -@node When was flex born? -@unnumberedsec When was flex born? - -Vern Paxson took over -the @cite{Software Tools} lex project from Jef Poskanzer in 1982. At that point it -was written in Ratfor. Around 1987 or so, Paxson translated it into C, and -a legend was born :-). - -@node How do I expand \ escape sequences in C-style quoted strings? -@unnumberedsec How do I expand \ escape sequences in C-style quoted strings? - -A key point when scanning quoted strings is that you cannot (easily) write -a single rule that will precisely match the string if you allow things -like embedded escape sequences and newlines. If you try to match strings -with a single rule then you'll wind up having to rescan the string anyway -to find any escape sequences. - -Instead you can use exclusive start conditions and a set of rules, one for -matching non-escaped text, one for matching a single escape, one for -matching an embedded newline, and one for recognizing the end of the -string. Each of these rules is then faced with the question of where to -put its intermediary results. The best solution is for the rules to -append their local value of @code{yytext} to the end of a ``string literal'' -buffer. A rule like the escape-matcher will append to the buffer the -meaning of the escape sequence rather than the literal text in @code{yytext}. -In this way, @code{yytext} does not need to be modified at all. - -@node Why do flex scanners call fileno if it is not ANSI compatible? -@unnumberedsec Why do flex scanners call fileno if it is not ANSI compatible? - -Flex scanners call @code{fileno()} in order to get the file descriptor -corresponding to @code{yyin}. The file descriptor may be passed to -@code{isatty()} or @code{read()}, depending upon which @code{%options} you specified. -If your system does not have @code{fileno()} support, to get rid of the -@code{read()} call, do not specify @code{%option read}. To get rid of the @code{isatty()} -call, you must specify one of @code{%option always-interactive} or -@code{%option never-interactive}. - -@node Does flex support recursive pattern definitions? -@unnumberedsec Does flex support recursive pattern definitions? - -Does flex support recursive pattern definitions? -e.g., - -@example -@verbatim -%% -block "{"({block}|{statement})*"}" -@end verbatim -@end example - -No. You cannot have recursive definitions. The pattern-matching power of -regular expressions in general (and therefore flex scanners, too) is -limited. In particular, regular expressions cannot "balance" parentheses -to an arbitrary degree. For example, it's impossible to write a regular -expression that matches all strings containing the same number of '@{'s -as '@}'s. For more powerful pattern matching, you need a parser, such -as GNU bison. - -@node How do I skip huge chunks of input (tens of megabytes) while using flex? -@unnumberedsec How do I skip huge chunks of input (tens of megabytes) while using flex? - -Use fseek (or lseek) to position yyin, then call yyrestart(). - -@node Flex is not matching my patterns in the same order that I defined them. -@unnumberedsec Flex is not matching my patterns in the same order that I defined them. - -Flex is not matching my patterns in the same order that I defined them. - -This is indeed the natural way to expect it to work, however, flex picks the -rule that matches the most text (i.e., the longest possible input string). -This is because flex uses an entirely different matching technique -("deterministic finite automata") that actually does all of the matching -simultaneously, in parallel. (Seems impossible, but it's actually a fairly -simple technique once you understand the principles.) - -A side-effect of this parallel matching is that when the input matches more -than one rule, flex scanners pick the rule that matched the *most* text. This -is explained further in the manual, in the section "How the input -is Matched". - -If you want flex to choose a shorter match, then you can work around this -behavior by expanding your short -rule to match more text, then put back the extra: - -@example -@verbatim -data_.* yyless( 5 ); BEGIN BLOCKIDSTATE; -@end verbatim -@end example - -Another fix would be to make the second rule active only during the - start condition, and make that start condition exclusive -by declaring it with %x instead of %s. - -A final fix is to change the input language so that the ambiguity for -data_ is removed, by adding characters to it that don't match the -identifier rule, or by removing characters (such as '_') from the -identifier rule so it no longer matches "data_". (Of course, you might -also not have the option of changing the input language ...) - -@node My actions are executing out of order or sometimes not at all. -@unnumberedsec My actions are executing out of order or sometimes not at all. - -My actions are executing out of order or sometimes not at all. What's -happening? - -Most likely, you have (in error) placed the opening @samp{@{} of the action -block on a different line than the rule, e.g., - -@example -@verbatim -^(foo|bar) -{ <<<--- WRONG! - -} -@end verbatim -@end example - -flex requires that the opening @samp{@{} of an action associated with a rule -begin on the same line as does the rule. You need instead to write your rules -as follows: - -@example -@verbatim -^(foo|bar) { // CORRECT! - -} -@end verbatim -@end example - -@node How can I have multiple input sources feed into the same scanner at the same time? -@unnumberedsec How can I have multiple input sources feed into the same scanner at the same time? - -How can I have multiple input sources feed into the same scanner at -the same time? - -If... -@itemize -@item -your scanner is free of backtracking (verified using flex's -b flag), -@item -AND you run it interactively (-I option; default unless using special table -compression options), -@item -AND you feed it one character at a time by redefining YY_INPUT to do so, -@end itemize - -then every time it matches a token, it will have exhausted its input -buffer (because the scanner is free of backtracking). This means you -can safely use select() at the point and only call yylex() for another -token if select() indicates there's data available. - -That is, move the select() out from the input function to a point where -it determines whether yylex() gets called for the next token. - -With this approach, you will still have problems if your input can arrive -piecemeal; select() could inform you that the beginning of a token is -available, you call yylex() to get it, but it winds up blocking waiting -for the later characters in the token. - -Here's another way: Move your input multiplexing inside of YY_INPUT. That -is, whenever YY_INPUT is called, it select()'s to see where input is -available. If input is available for the scanner, it reads and returns the -next byte. If input is available from another source, it calls whatever -function is responsible for reading from that source. (If no input is -available, it blocks until some is.) I've used this technique in an -interpreter I wrote that both reads keyboard input using a flex scanner and -IPC traffic from sockets, and it works fine. - -@node Can I build nested parsers that work with the same input file? -@unnumberedsec Can I build nested parsers that work with the same input file? - -Can I build nested parsers that work with the same input file? - -This is not going to work without some additional effort. The reason is -that flex block-buffers the input it reads from yyin. This means that the -"outermost" yylex(), when called, will automatically slurp up the first 8K -of input available on yyin, and subsequent calls to other yylex()'s won't -see that input. You might be tempted to work around this problem by -redefining YY_INPUT to only return a small amount of text, but it turns out -that that approach is quite difficult. Instead, the best solution is to -combine all of your scanners into one large scanner, using a different -exclusive start condition for each. - -@node How can I match text only at the end of a file? -@unnumberedsec How can I match text only at the end of a file? - -How can I match text only at the end of a file? - -There is no way to write a rule which is "match this text, but only if -it comes at the end of the file". You can fake it, though, if you happen -to have a character lying around that you don't allow in your input. -Then you redefine YY_INPUT to call your own routine which, if it sees -an EOF, returns the magic character first (and remembers to return a -real EOF next time it's called). Then you could write: - -@example -@verbatim -(.|\n)*{EOF_CHAR} /* saw comment at EOF */ -@end verbatim -@end example - -@node How can I make REJECT cascade across start condition boundaries? -@unnumberedsec How can I make REJECT cascade across start condition boundaries? - -How can I make REJECT cascade across start condition boundaries? - -You can do this as follows. Suppose you have a start condition A, and -after exhausting all of the possible matches in , you want to try -matches in . Then you could use the following: - -@example -@verbatim -%x A -%% -rule_that_is_long ...; REJECT; -rule ...; REJECT; /* shorter rule */ -etc. -... -.|\n { -/* Shortest and last rule in , so -* cascaded REJECT's will eventually -* wind up matching this rule. We want -* to now switch to the initial state -* and try matching from there instead. -*/ -yyless(0); /* put back matched text */ -BEGIN(INITIAL); -} -@end verbatim -@end example - -@node Why cant I use fast or full tables with interactive mode? -@unnumberedsec Why can't I use fast or full tables with interactive mode? - -One of the assumptions -flex makes is that interactive applications are inherently slow (they're -waiting on a human after all). -It has to do with how the scanner detects that it must be finished scanning -a token. For interactive scanners, after scanning each character the current -state is looked up in a table (essentially) to see whether there's a chance -of another input character possibly extending the length of the match. If -not, the scanner halts. For non-interactive scanners, the end-of-token test -is much simpler, basically a compare with 0, so no memory bus cycles. Since -the test occurs in the innermost scanning loop, one would like to make it go -as fast as possible. - -Still, it seems reasonable to allow the user to choose to trade off a bit -of performance in this area to gain the corresponding flexibility. There -might be another reason, though, why fast scanners don't support the -interactive option - -@node How much faster is -F or -f than -C? -@unnumberedsec How much faster is -F or -f than -C? - -How much faster is -F or -f than -C? - -Much faster (factor of 2-3). - -@node If I have a simple grammar cant I just parse it with flex? -@unnumberedsec If I have a simple grammar can't I just parse it with flex? - -Is your grammar recursive? That's almost always a sign that you're -better off using a parser/scanner rather than just trying to use a scanner -alone. -@node Why doesnt yyrestart() set the start state back to INITIAL? -@unnumberedsec Why doesn't yyrestart() set the start state back to INITIAL? - -There are two reasons. The first is that there might -be programs that rely on the start state not changing across file changes. -The second is that with flex 2.4, use of yyrestart() is no longer required, -so fixing the problem there doesn't solve the more general problem. - -@node How can I match C-style comments? -@unnumberedsec How can I match C-style comments? - -How can I match C-style comments? - -You might be tempted to try something like this: - -@example -@verbatim -"/*".*"*/" // WRONG! -@end verbatim -@end example - -or, worse, this: - -@example -@verbatim -"/*"(.|\n)"*/" // WRONG! -@end verbatim -@end example - -The above rules will eat too much input, and blow up on things like: - -@example -@verbatim -/* a comment */ do_my_thing( "oops */" ); -@end verbatim -@end example - -Here is one way which allows you to track line information: - -@example -@verbatim -{ -"/*" BEGIN(IN_COMMENT); -} -{ -"*/" BEGIN(INITIAL); -[^*\n]+ // eat comment in chunks -"*" // eat the lone star -\n yylineno++; -} -@end verbatim -@end example - -@node The period isnt working the way I expected. -@unnumberedsec The '.' isn't working the way I expected. - -Here are some tips for using @samp{.}: - -@itemize -@item -A common mistake is to place the grouping parenthesis AFTER an operator, when -you really meant to place the parenthesis BEFORE the operator, e.g., you -probably want this @code{(foo|bar)+} and NOT this @code{(foo|bar+)}. - -The first pattern matches the words @code{foo} or @code{bar} any number of -times, e.g., it matches the text @code{barfoofoobarfoo}. The -second pattern matches a single instance of @code{foo} or a single instance of -@code{ba} followed by one or more @samp{r}s, e.g., it matches the text @code{barrrr} . -@item -A @samp{.} inside []'s just means a literal@samp{.} (period), -and NOT "any character except newline". -@item -Remember that @samp{.} matches any character EXCEPT @samp{\n} (and EOF). -If you really want to match ANY character, including newlines, then use @code{(.|\n)} ---- Beware that the regex @code{(.|\n)+} will match your entire input! -@item -Finally, if you want to match a literal @samp{.} (a period), then use [.] or "." -@end itemize - -@node Can I get the flex manual in another format? -@unnumberedsec Can I get the flex manual in another format? - -Can I get the flex manual in another format? - -As of flex 2.5, the manual is distributed in texinfo format. -You can use the "texi2*" tools to convert the manual to any format -you desire (e.g., @samp{texi2html}). - -@node Does there exist a "faster" NDFA->DFA algorithm? -@unnumberedsec Does there exist a "faster" NDFA->DFA algorithm? - -Does there exist a "faster" NDFA->DFA algorithm? Most standard texts (e.g., -Aho), imply that NDFA->DFA can take exponential time, since there are -exponential number of potential states in NDFA. - -There's no way around the potential exponential running time - it -can take you exponential time just to enumerate all of the DFA states. -In practice, though, the running time is closer to linear, or sometimes -quadratic. - -@node How does flex compile the DFA so quickly? -@unnumberedsec How does flex compile the DFA so quickly? - -How does flex compile the DFA so quickly? - -There are two big speed wins that flex uses: - -@enumerate -@item -It analyzes the input rules to construct equivalence classes for those -characters that always make the same transitions. It then rewrites the NFA -using equivalence classes for transitions instead of characters. This cuts -down the NFA->DFA computation time dramatically, to the point where, for -uncompressed DFA tables, the DFA generation is often I/O bound in writing out -the tables. -@item -It maintains hash values for previously computed DFA states, so testing -whether a newly constructed DFA state is equivalent to a previously constructed -state can be done very quickly, by first comparing hash values. -@end enumerate - -@node How can I use more than 8192 rules? -@unnumberedsec How can I use more than 8192 rules? - -How can I use more than 8192 rules? - -Flex is compiled with an upper limit of 8192 rules per scanner. -If you need more than 8192 rules in your scanner, you'll have to recompile flex -with the following changes in flexdef.h: - -@example -@verbatim -< #define YY_TRAILING_MASK 0x2000 -< #define YY_TRAILING_HEAD_MASK 0x4000 --- -> #define YY_TRAILING_MASK 0x20000000 -> #define YY_TRAILING_HEAD_MASK 0x40000000 -@end verbatim -@end example - -This should work okay as long as your C compiler uses 32 bit integers. -But you might want to think about whether using such a huge number of rules -is the best way to solve your problem. - -@node How do I abandon a file in the middle of a scan and switch to a new file? -@unnumberedsec How do I abandon a file in the middle of a scan and switch to a new file? - -How do I abandon a file in the middle of a scan and switch to a new file? - -Just all yyrestart(newfile). Be sure to reset the start state if you want a -"fresh" start, since yyrestart does NOT reset the start state back to INITIAL. - -@node How do I execute code only during initialization (only before the first scan)? -@unnumberedsec How do I execute code only during initialization (only before the first scan)? - -How do I execute code only during initialization (only before the first scan)? - -You can specify an initial action by defining the macro YY_USER_INIT (though -note that yyout may not be available at the time this macro is executed). Or you -can add to the beginning of your rules section: - -@example -@verbatim -%% -/* Must be indented! */ -static int did_init = 0; - -if ( ! did_init ){ -do_my_init(); -did_init = 1; -} -@end verbatim -@end example - -@node How do I execute code at termination? -@unnumberedsec How do I execute code at termination? - -How do I execute code at termination (i.e., only after the last scan?) - -You can specifiy an action for the <> rule. -@node Where else can I find help? -@unnumberedsec Where else can I find help? - -Where else can I find help? - -The @code{help-flex} email list is served by GNU. See http://www.gnu.org/ for -details how to subscribe or search the archives. - -@node Can I include comments in the "rules" section of the file file? -@unnumberedsec Can I include comments in the "rules" section of the file file? - -Can I include comments in the "rules" section of the file file? - -Yes, just about anywhere you want to. See the manual for the specific syntax. - -@node I get an error about undefined yywrap(). -@unnumberedsec I get an error about undefined yywrap(). - -I get an error about undefined yywrap(). - -You must supply a yywrap() function of your own, or link to libfl.a -(which provides one), or use - -%option noyywrap - -in your source to say you don't want a yywrap() function. -See the manual page for more details concerning yywrap(). - -@node How can I change the matching pattern at run time? -@unnumberedsec How can I change the matching pattern at run time? - -How can I change the matching pattern at run time? - -You can't, it's compiled into a static table when flex builds the scanner. - -@node Is there a way to increase the rules (NFA states to a bigger number?) -@unnumberedsec Is there a way to increase the rules (NFA states to a bigger number?) - -Is there a way to increase the rules (NFA states to a bigger number?) - -With luck, you should be able to increase the definitions in flexdef.h for: - -@example -@verbatim -#define JAMSTATE -32766 /* marks a reference to the state that always jams */ -#define MAXIMUM_MNS 31999 -#define BAD_SUBSCRIPT -32767 -@end verbatim -@end example - -recompile everything, and it'll all work. Flex only has these 16-bit-like -values built into it because a long time ago it was developed on a machine -with 16-bit ints. I've given this advice to others in the past but haven't -heard back from them whether it worked okay or not... - -@node How can I expand macros in the input? -@unnumberedsec How can I expand macros in the input? - -How can I expand macros in the input? - -The best way to approach this problem is at a higher level, e.g., in the parser. - -However, you can do this using multiple input buffers. - -@example -@verbatim -%% -macro/[a-z]+ { -/* Saw the macro "macro" followed by extra stuff. */ -main_buffer = YY_CURRENT_BUFFER; -expansion_buffer = yy_scan_string(expand(yytext)); -yy_switch_to_buffer(expansion_buffer); -} - -<> { -if ( expansion_buffer ) -{ -// We were doing an expansion, return to where -// we were. -yy_switch_to_buffer(main_buffer); -yy_delete_buffer(expansion_buffer); -expansion_buffer = 0; -} -else -yyterminate(); -} -@end verbatim -@end example - -You probably will want a stack of expansion buffers to allow nested macros. -From the above though hopefully the idea is clear. - -@node How can I build a two-pass scanner? -@unnumberedsec How can I build a two-pass scanner? - -How can I build a two-pass scanner? - -One way to do it is to filter the first pass to a temporary file, -then process the temporary file on the second pass. You will probably see a -performance hit, do to all the disk I/O. - -When you need to look ahead far forward like this, it almost always means -that the right solution is to build a parse tree of the entire input, then -walk it after the parse in order to generate the output. In a sense, this -is a two-pass approach, once through the text and once through the parse -tree, but the performance hit for the latter is usually an order of magnitude -smaller, since everything is already classified, in binary format, and -residing in memory. - -@node How do I match any string not matched in the preceding rules? -@unnumberedsec How do I match any string not matched in the preceding rules? - -How do I match any string not matched in the preceding rules? - -One way to assign precedence, is to place the more specific rules first. If -two rules would match the same input (same sequence of characters) then the -first rule listed in the flex input wins. e.g., - -@example -@verbatim -%% -foo[a-zA-Z_]+ return FOO_ID; -bar[a-zA-Z_]+ return BAR_ID; -[a-zA-Z_]+ return GENERIC_ID; -@end verbatim -@end example - -Note that the rule @code{[a-zA-Z_]+} must come *after* the others. It will match the -same amount of text as the more specific rules, and in that case the -flex scanner will pick the first rule listed in your scanner as the -one to match. - -@node I am trying to port code from AT&T lex that uses yysptr and yysbuf. -@unnumberedsec I am trying to port code from AT&T lex that uses yysptr and yysbuf. - -I am trying to port code from AT&T lex that uses yysptr and yysbuf. - -Those are internal variables pointing into the AT&T scanner's input buffer. I -imagine they're being manipulated in user versions of the input() and unput() -functions. If so, what you need to do is analyze those functions to figure out -what they're doing, and then replace input() with an appropriate definition of -YY_INPUT (see the flex man page). You shouldn't need to (and must not) replace -flex's unput() function. - -@node Is there a way to make flex treat NULL like a regular character? -@unnumberedsec Is there a way to make flex treat NULL like a regular character? - -Is there a way to make flex treat NULL like a regular character? - -Yes, \0 and \x00 should both do the trick. Perhaps you have an ancient -version of flex. The latest release is version @value{VERSION}. - -@node Whenever flex can not match the input it says "flex scanner jammed". -@unnumberedsec Whenever flex can not match the input it says "flex scanner jammed". - -Whenever flex can not match the input it says "flex scanner jammed". - -You need to add a rule that matches the otherwise-unmatched text. -e.g., - -@example -@verbatim -%option yylineno -%% -[[a bunch of rules here]] - -. printf("bad input character '%s' at line %d\n", yytext, yylineno); -@end verbatim -@end example - -See %option default for more information. - -@node Why doesnt flex have non-greedy operators like perl does? -@unnumberedsec Why doesn't flex have non-greedy operators like perl does? - -A DFA can do a non-greedy match by stopping -the first time it enters an accepting state, instead of consuming input until -it determines that no further matching is possible (a ``jam'' state). This -is actually easier to implement than longest leftmost match (which flex does). - -But it's also much less useful than longest leftmost match. In general, -when you find yourself wishing for non-greedy matching, that's usually a -sign that you're trying to make the scanner do some parsing. That's -generally the wrong approach, since it lacks the power to do a decent job. -Better is to either introduce a separate parser, or to split the scanner -into multiple scanners using (exclusive) start conditions. - -You might have -a separate start state once you've seen the BEGIN. In that state, you -might then have a regex that will match END (to kick you out of the -state), and perhaps (.|\n) to get a single character within the chunk ... - -This approach also has much better error-reporting properties. - -@node Memory leak - 16386 bytes allocated by malloc. -@unnumberedsec Memory leak - 16386 bytes allocated by malloc. -@anchor{faq-memory-leak} -UPDATED 2002-07-10: As of flex version 2.5.9, this leak means that you did not -call yylex_destroy(). If you are using an earlier version of flex, then read -on. - -The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the read-buffer, and -about 40 for struct yy_buffer_state (depending upon alignment). The leak is in -the non-reentrant C scanner only (NOT in the reentrant scanner, NOT in the C++ -scanner). Since flex doesn't know when you are done, the buffer is never freed. - -However, the leak won't multiply since the buffer is reused no matter how many -times you call yylex(). - -If you want to reclaim the memory when you are completely done scanning, then -you might try this: - -@example -@verbatim -/* For non-reentrant C scanner only. */ -yy_delete_buffer(yy_current_buffer); -yy_init = 1; -@end verbatim -@end example - -Note: yy_init is an "internal variable", and hasn't been tested in this -situation. It is possible that some other globals may need resetting as well. - -@node How do I track the byte offset for lseek()? -@unnumberedsec How do I track the byte offset for lseek()? - -@example -@verbatim -> We thought that it would be possible to have this number through the -> evaluation of the following expression: -> -> seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - yy_current_buffer->yy_ch_buf -@end verbatim -@end example - -While this is the right ideas, it has two problems. The first is that -it's possible that flex will request less than YY_READ_BUF_SIZE during -an invocation of YY_INPUT (or that your input source will return less -even though YY_READ_BUF_SIZE bytes were requested). The second problem -is that when refilling its internal buffer, flex keeps some characters -from the previous buffer (because usually it's in the middle of a match, -and needs those characters to construct yytext for the match once it's -done). Because of this, yy_c_buf_p - yy_current_buffer->yy_ch_buf won't -be exactly the number of characters already read from the current buffer. - -An alternative solution is to count the number of characters you've matched -since starting to scan. This can be done by using YY_USER_ACTION. For -example, - - #define YY_USER_ACTION num_chars += yyleng; - -(You need to be careful to update your bookkeeping if you use yymore(), -yyless(), unput(), or input().) - -@c TODO: Evaluate this faq. -@node unnamed-faq-16 -@unnumberedsec unnamed-faq-16 -@example -@verbatim -To: steves@telebase.com -Subject: Re: flex C++ question -In-reply-to: Your message of Thu, 08 Dec 94 13:10:58 EST. -Date: Wed, 14 Dec 94 16:40:47 PST -From: Vern Paxson - -> We'd like to override the provided LexerInput() and LexerOutput() -> functions, but we'd like to *not* use iostreams. Instead, we'd like -> to use some of our own I/O classes. Is this possible? - -You can do this by passing the various functions nil iostream*'s, and then -dealing with your own I/O classes surreptitiously (i.e., stashing them in -special member variables). This works because the only assumption about -the lexer regarding what's done with the iostream's is that they're -ultimately passed to LexerInput and LexerOutput, which then do whatever -necessary with them. - -When the flex C++ scanning class rewrite finally happens (no date for this -in sight), then this sort of thing should become much easier. - - Vern -@end verbatim -@end example - -@node How do I skip as many chars as possible? -@unnumberedsec How do I skip as many chars as possible? - -How do I skip as many chars as possible -- without interfering with the other -patterns? - -In the example below, we want to skip over characters until we see the phrase -"endskip". The following will @emph{NOT} work correctly (do you see why not?) - -@example -@verbatim -/* INCORRECT SCANNER */ -%x SKIP -%% -startskip BEGIN(SKIP); -... -"endskip" BEGIN(INITIAL); -.* ; -@end verbatim -@end example - -The problem is that the pattern .* will eat up the word "endskip." -The simplest (but slow) fix is: - -@example -@verbatim -"endskip" BEGIN(INITIAL); -. ; -@end verbatim -@end example - -The fix involves making the second rule match more, without -making it match "endskip" plus something else. So for example: - -@example -@verbatim -"endskip" BEGIN(INITIAL); -[^e]+ ; -. ;/* so you eat up e's, too */ -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-33 -@unnumberedsec unnamed-faq-33 -@example -@verbatim -QUESTION: -When was flex born? - -Vern Paxson took over -the Software Tools lex project from Jef Poskanzer in 1982. At that point it -was written in Ratfor. Around 1987 or so, Paxson translated it into C, and -a legend was born :-). -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-42 -@unnumberedsec unnamed-faq-42 -@example -@verbatim -To: Adoram Rogel -Subject: Re: Flex 2.5.2 performance questions -In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT. -Date: Wed, 18 Sep 96 10:51:02 PDT -From: Vern Paxson - -[Note, the most recent flex release is 2.5.4, which you can get from -ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.] - -> 1. Using the pattern -> ([Ff](oot)?)?[Nn](ote)?(\.)? -> instead of -> (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.))) -> (in a very complicated flex program) caused the program to slow from -> 300K+/min to 100K/min (no other changes were done). - -These two are not equivalent. For example, the first can match "footnote." -but the second can only match "footnote". This is almost certainly the -cause in the discrepancy - the slower scanner run is matching more tokens, -and/or having to do more backing up. - -> 2. Which of these two are better: [Ff]oot or (F|f)oot ? - -From a performance point of view, they're equivalent (modulo presumably -minor effects such as memory cache hit rates; and the presence of trailing -context, see below). From a space point of view, the first is slightly -preferable. - -> 3. I have a pattern that look like this: -> pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd) -> -> running yet another complicated program that includes the following rule: -> {and}/{no4}{bb}{pats} -> -> gets me to "too complicated - over 32,000 states"... - -I can't tell from this example whether the trailing context is variable-length -or fixed-length (it could be the latter if {and} is fixed-length). If it's -variable length, which flex -p will tell you, then this reflects a basic -performance problem, and if you can eliminate it by restructuring your -scanner, you will see significant improvement. - -> so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about -> 10 patterns and changed the rule to be 5 rules. -> This did compile, but what is the rule of thumb here ? - -The rule is to avoid trailing context other than fixed-length, in which for -a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use -of the '|' operator automatically makes the pattern variable length, so in -this case '[Ff]oot' is preferred to '(F|f)oot'. - -> 4. I changed a rule that looked like this: -> {and}{bb}/{ROMAN}[^A-Za-z] { BEGIN... -> -> to the next 2 rules: -> {and}{bb}/{ROMAN}[A-Za-z] { ECHO;} -> {and}{bb}/{ROMAN} { BEGIN... -> -> Again, I understand the using [^...] will cause a great performance loss - -Actually, it doesn't cause any sort of performance loss. It's a surprising -fact about regular expressions that they always match in linear time -regardless of how complex they are. - -> but are there any specific rules about it ? - -See the "Performance Considerations" section of the man page, and also -the example in MISC/fastwc/. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-43 -@unnumberedsec unnamed-faq-43 -@example -@verbatim -To: Adoram Rogel -Subject: Re: Flex 2.5.2 performance questions -In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT. -Date: Thu, 19 Sep 96 09:58:00 PDT -From: Vern Paxson - -> a lot about the backing up problem. -> I believe that there lies my biggest problem, and I'll try to improve -> it. - -Since you have variable trailing context, this is a bigger performance -problem. Fixing it is usually easier than fixing backing up, which in a -complicated scanner (yours seems to fit the bill) can be extremely -difficult to do correctly. - -You also don't mention what flags you are using for your scanner. --f makes a large speed difference, and -Cfe buys you nearly as much -speed but the resulting scanner is considerably smaller. - -> I have an | operator in {and} and in {pats} so both of them are variable -> length. - --p should have reported this. - -> Is changing one of them to fixed-length is enough ? - -Yes. - -> Is it possible to change the 32,000 states limit ? - -Yes. I've appended instructions on how. Before you make this change, -though, you should think about whether there are ways to fundamentally -simplify your scanner - those are certainly preferable! - - Vern - -To increase the 32K limit (on a machine with 32 bit integers), you increase -the magnitude of the following in flexdef.h: - -#define JAMSTATE -32766 /* marks a reference to the state that always jams */ -#define MAXIMUM_MNS 31999 -#define BAD_SUBSCRIPT -32767 -#define MAX_SHORT 32700 - -Adding a 0 or two after each should do the trick. -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-44 -@unnumberedsec unnamed-faq-44 -@example -@verbatim -To: Heeman_Lee@hp.com -Subject: Re: flex - multi-byte support? -In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT. -Date: Fri, 04 Oct 1996 11:42:18 PDT -From: Vern Paxson - -> I assume as long as my *.l file defines the -> range of expected character code values (in octal format), flex will -> scan the file and read multi-byte characters correctly. But I have no -> confidence in this assumption. - -Your lack of confidence is justified - this won't work. - -Flex has in it a widespread assumption that the input is processed -one byte at a time. Fixing this is on the to-do list, but is involved, -so it won't happen any time soon. In the interim, the best I can suggest -(unless you want to try fixing it yourself) is to write your rules in -terms of pairs of bytes, using definitions in the first section: - - X \xfe\xc2 - ... - %% - foo{X}bar found_foo_fe_c2_bar(); - -etc. Definitely a pain - sorry about that. - -By the way, the email address you used for me is ancient, indicating you -have a very old version of flex. You can get the most recent, 2.5.4, from -ftp.ee.lbl.gov. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-45 -@unnumberedsec unnamed-faq-45 -@example -@verbatim -To: moleary@primus.com -Subject: Re: Flex / Unicode compatibility question -In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT. -Date: Tue, 22 Oct 1996 11:06:13 PDT -From: Vern Paxson - -Unfortunately flex at the moment has a widespread assumption within it -that characters are processed 8 bits at a time. I don't see any easy -fix for this (other than writing your rules in terms of double characters - -a pain). I also don't know of a wider lex, though you might try surfing -the Plan 9 stuff because I know it's a Unicode system, and also the PCCT -toolkit (try searching say Alta Vista for "Purdue Compiler Construction -Toolkit"). - -Fixing flex to handle wider characters is on the long-term to-do list. -But since flex is a strictly spare-time project these days, this probably -won't happen for quite a while, unless someone else does it first. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-46 -@unnumberedsec unnamed-faq-46 -@example -@verbatim -To: Johan Linde -Subject: Re: translation of flex -In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST. -Date: Mon, 11 Nov 1996 10:33:50 PST -From: Vern Paxson - -> I'm working for the Swedish team translating GNU program, and I'm currently -> working with flex. I have a few questions about some of the messages which -> I hope you can answer. - -All of the things you're wondering about, by the way, concerning flex -internals - probably the only person who understands what they mean in -English is me! So I wouldn't worry too much about getting them right. -That said ... - -> #: main.c:545 -> msgid " %d protos created\n" -> -> Does proto mean prototype? - -Yes - prototypes of state compression tables. - -> #: main.c:539 -> msgid " %d/%d (peak %d) template nxt-chk entries created\n" -> -> Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?) -> However, 'template next-check entries' doesn't make much sense to me. To be -> able to find a good translation I need to know a little bit more about it. - -There is a scheme in the Aho/Sethi/Ullman compiler book for compressing -scanner tables. It involves creating two pairs of tables. The first has -"base" and "default" entries, the second has "next" and "check" entries. -The "base" entry is indexed by the current state and yields an index into -the next/check table. The "default" entry gives what to do if the state -transition isn't found in next/check. The "next" entry gives the next -state to enter, but only if the "check" entry verifies that this entry is -correct for the current state. Flex creates templates of series of -next/check entries and then encodes differences from these templates as a -way to compress the tables. - -> #: main.c:533 -> msgid " %d/%d base-def entries created\n" -> -> The same problem here for 'base-def'. - -See above. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-47 -@unnumberedsec unnamed-faq-47 -@example -@verbatim -To: Xinying Li -Subject: Re: FLEX ? -In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST. -Date: Wed, 13 Nov 1996 19:51:54 PST -From: Vern Paxson - -> "unput()" them to input flow, question occurs. If I do this after I scan -> a carriage, the variable "yy_current_buffer->yy_at_bol" is changed. That -> means the carriage flag has gone. - -You can control this by calling yy_set_bol(). It's described in the manual. - -> And if in pre-reading it goes to the end of file, is anything done -> to control the end of curren buffer and end of file? - -No, there's no way to put back an end-of-file. - -> By the way I am using flex 2.5.2 and using the "-l". - -The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and -2.5.3. You can get it from ftp.ee.lbl.gov. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-48 -@unnumberedsec unnamed-faq-48 -@example -@verbatim -To: Alain.ISSARD@st.com -Subject: Re: Start condition with FLEX -In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST. -Date: Mon, 18 Nov 1996 10:41:34 PST -From: Vern Paxson - -> I am not able to use the start condition scope and to use the | (OR) with -> rules having start conditions. - -The problem is that if you use '|' as a regular expression operator, for -example "a|b" meaning "match either 'a' or 'b'", then it must *not* have -any blanks around it. If you instead want the special '|' *action* (which -from your scanner appears to be the case), which is a way of giving two -different rules the same action: - - foo | - bar matched_foo_or_bar(); - -then '|' *must* be separated from the first rule by whitespace and *must* -be followed by a new line. You *cannot* write it as: - - foo | bar matched_foo_or_bar(); - -even though you might think you could because yacc supports this syntax. -The reason for this unfortunately incompatibility is historical, but it's -unlikely to be changed. - -Your problems with start condition scope are simply due to syntax errors -from your use of '|' later confusing flex. - -Let me know if you still have problems. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-49 -@unnumberedsec unnamed-faq-49 -@example -@verbatim -To: Gregory Margo -Subject: Re: flex-2.5.3 bug report -In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST. -Date: Sat, 23 Nov 1996 17:07:32 PST -From: Vern Paxson - -> Enclosed is a lex file that "real" lex will process, but I cannot get -> flex to process it. Could you try it and maybe point me in the right direction? - -Your problem is that some of the definitions in the scanner use the '/' -trailing context operator, and have it enclosed in ()'s. Flex does not -allow this operator to be enclosed in ()'s because doing so allows undefined -regular expressions such as "(a/b)+". So the solution is to remove the -parentheses. Note that you must also be building the scanner with the -l -option for AT&T lex compatibility. Without this option, flex automatically -encloses the definitions in parentheses. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-50 -@unnumberedsec unnamed-faq-50 -@example -@verbatim -To: Thomas Hadig -Subject: Re: Flex Bug ? -In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST. -Date: Tue, 26 Nov 1996 11:15:05 PST -From: Vern Paxson - -> In my lexer code, i have the line : -> ^\*.* { } -> -> Thus all lines starting with an astrix (*) are comment lines. -> This does not work ! - -I can't get this problem to reproduce - it works fine for me. Note -though that if what you have is slightly different: - - COMMENT ^\*.* - %% - {COMMENT} { } - -then it won't work, because flex pushes back macro definitions enclosed -in ()'s, so the rule becomes - - (^\*.*) { } - -and now that the '^' operator is not at the immediate beginning of the -line, it's interpreted as just a regular character. You can avoid this -behavior by using the "-l" lex-compatibility flag, or "%option lex-compat". - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-51 -@unnumberedsec unnamed-faq-51 -@example -@verbatim -To: Adoram Rogel -Subject: Re: Flex 2.5.4 BOF ??? -In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST. -Date: Wed, 27 Nov 1996 10:56:25 PST -From: Vern Paxson - -> Organization(s)?/[a-z] -> -> This matched "Organizations" (looking in debug mode, the trailing s -> was matched with trailing context instead of the optional (s) in the -> end of the word. - -That should only happen with lex. Flex can properly match this pattern. -(That might be what you're saying, I'm just not sure.) - -> Is there a way to avoid this dangerous trailing context problem ? - -Unfortunately, there's no easy way. On the other hand, I don't see why -it should be a problem. Lex's matching is clearly wrong, and I'd hope -that usually the intent remains the same as expressed with the pattern, -so flex's matching will be correct. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-52 -@unnumberedsec unnamed-faq-52 -@example -@verbatim -To: Cameron MacKinnon -Subject: Re: Flex documentation bug -In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST. -Date: Sun, 01 Dec 1996 22:29:39 PST -From: Vern Paxson - -> I'm not sure how or where to submit bug reports (documentation or -> otherwise) for the GNU project stuff ... - -Well, strictly speaking flex isn't part of the GNU project. They just -distribute it because no one's written a decent GPL'd lex replacement. -So you should send bugs directly to me. Those sent to the GNU folks -sometimes find there way to me, but some may drop between the cracks. - -> In GNU Info, under the section 'Start Conditions', and also in the man -> page (mine's dated April '95) is a nice little snippet showing how to -> parse C quoted strings into a buffer, defined to be MAX_STR_CONST in -> size. Unfortunately, no overflow checking is ever done ... - -This is already mentioned in the manual: - -Finally, here's an example of how to match C-style quoted -strings using exclusive start conditions, including expanded -escape sequences (but not including checking for a string -that's too long): - -The reason for not doing the overflow checking is that it will needlessly -clutter up an example whose main purpose is just to demonstrate how to -use flex. - -The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-53 -@unnumberedsec unnamed-faq-53 -@example -@verbatim -To: tsv@cs.UManitoba.CA -Subject: Re: Flex (reg).. -In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST. -Date: Thu, 06 Mar 1997 15:54:19 PST -From: Vern Paxson - -> [:alpha:] ([:alnum:] | \\_)* - -If your rule really has embedded blanks as shown above, then it won't -work, as the first blank delimits the rule from the action. (It wouldn't -even compile ...) You need instead: - -[:alpha:]([:alnum:]|\\_)* - -and that should work fine - there's no restriction on what can go inside -of ()'s except for the trailing context operator, '/'. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-54 -@unnumberedsec unnamed-faq-54 -@example -@verbatim -To: "Mike Stolnicki" -Subject: Re: FLEX help -In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT. -Date: Fri, 30 May 1997 10:46:35 PDT -From: Vern Paxson - -> We'd like to add "if-then-else", "while", and "for" statements to our -> language ... -> We've investigated many possible solutions. The one solution that seems -> the most reasonable involves knowing the position of a TOKEN in yyin. - -I strongly advise you to instead build a parse tree (abstract syntax tree) -and loop over that instead. You'll find this has major benefits in keeping -your interpreter simple and extensible. - -That said, the functionality you mention for get_position and set_position -have been on the to-do list for a while. As flex is a purely spare-time -project for me, no guarantees when this will be added (in particular, it -for sure won't be for many months to come). - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-55 -@unnumberedsec unnamed-faq-55 -@example -@verbatim -To: Colin Paul Adams -Subject: Re: Flex C++ classes and Bison -In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT. -Date: Fri, 15 Aug 1997 10:48:19 PDT -From: Vern Paxson - -> #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control -> *parm) -> -> I have been trying to get this to work as a C++ scanner, but it does -> not appear to be possible (warning that it matches no declarations in -> yyFlexLexer, or something like that). -> -> Is this supposed to be possible, or is it being worked on (I DID -> notice the comment that scanner classes are still experimental, so I'm -> not too hopeful)? - -What you need to do is derive a subclass from yyFlexLexer that provides -the above yylex() method, squirrels away lvalp and parm into member -variables, and then invokes yyFlexLexer::yylex() to do the regular scanning. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-56 -@unnumberedsec unnamed-faq-56 -@example -@verbatim -To: Mikael.Latvala@lmf.ericsson.se -Subject: Re: Possible mistake in Flex v2.5 document -In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT. -Date: Fri, 05 Sep 1997 10:01:54 PDT -From: Vern Paxson - -> In that example you show how to count comment lines when using -> C style /* ... */ comments. My question is, shouldn't you take into -> account a scenario where end of a comment marker occurs inside -> character or string literals? - -The scanner certainly needs to also scan character and string literals. -However it does that (there's an example in the man page for strings), the -lexer will recognize the beginning of the literal before it runs across the -embedded "/*". Consequently, it will finish scanning the literal before it -even considers the possibility of matching "/*". - -Example: - - '([^']*|{ESCAPE_SEQUENCE})' - -will match all the text between the ''s (inclusive). So the lexer -considers this as a token beginning at the first ', and doesn't even -attempt to match other tokens inside it. - -I thinnk this subtlety is not worth putting in the manual, as I suspect -it would confuse more people than it would enlighten. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-57 -@unnumberedsec unnamed-faq-57 -@example -@verbatim -To: "Marty Leisner" -Subject: Re: flex limitations -In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT. -Date: Mon, 08 Sep 1997 11:38:08 PDT -From: Vern Paxson - -> %% -> [a-zA-Z]+ /* skip a line */ -> { printf("got %s\n", yytext); } -> %% - -What version of flex are you using? If I feed this to 2.5.4, it complains: - - "bug.l", line 5: EOF encountered inside an action - "bug.l", line 5: unrecognized rule - "bug.l", line 5: fatal parse error - -Not the world's greatest error message, but it manages to flag the problem. - -(With the introduction of start condition scopes, flex can't accommodate -an action on a separate line, since it's ambiguous with an indented rule.) - -You can get 2.5.4 from ftp.ee.lbl.gov. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-58 -@unnumberedsec unnamed-faq-58 -@example -@verbatim -To: uocarroll@deagostini.co.uk (Ultan O'Carroll) -Subject: Re: Flex repositries -In-reply-to: Your message of Fri, 12 Sep 1997 15:02:28 PDT. -Date: Fri, 12 Sep 1997 10:31:50 PDT -From: Vern Paxson - -> before I start beavering away I wonder if you know of any -> place/libraries for flex -> desciption files that might already do this or give me a head start ? - -Unfortunately, no, I don't. You might try asking on comp.compilers. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-59 -@unnumberedsec unnamed-faq-59 -@example -@verbatim -To: Adoram Rogel -Subject: Re: Conditional compiling in the definitions section -In-reply-to: Your message of Thu, 25 Sep 1997 11:22:42 PDT. -Date: Thu, 25 Sep 1997 10:56:31 PDT -From: Vern Paxson - -> I'm trying to combine two large lex files that now differ only in -> about 10 lines in the definitions section. -> I would like to have something like this: -> #ifdef FFF -> it \ -> #else -> it \ -> #endif -> -> Now, I can't add states for these, as I have already too many states -> and the program is very complicated, and I won't be able to handle -> 10 or 20 more states. -> -> Any trick to do this ? - -You might try using m4, or the C preprocessor plus a sed script to -clean up the result (strip out the #line's). - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-60 -@unnumberedsec unnamed-faq-60 -@example -@verbatim -To: Steve Antoch -Subject: Re: lex and yacc grammars -In-reply-to: Your message of Mon, 17 Nov 1997 15:31:25 PST. -Date: Mon, 17 Nov 1997 15:27:01 PST -From: Vern Paxson - -> Would you happen to know where I can find grammars for lex and yacc? - -The flex sources have a grammar for (f)lex. Dunno about yacc, - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-61 -@unnumberedsec unnamed-faq-61 -@example -@verbatim -To: Bryan Housel -Subject: Re: Question about Flex v2.5 -In-reply-to: Your message of Tue, 11 Nov 1997 21:30:23 PST. -Date: Mon, 17 Nov 1997 17:12:21 PST -From: Vern Paxson - -> It prints one of those "end of buffer.." messages for each character in the -> token... - -This will happen if your LexerInput() function returns only one character -at a time, which can happen either if you're scanner is "interactive", or -if the streams library on your platform always returns 1 for yyin->gcount(). - -Solution: override LexerInput() with a version that returns whole buffers. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-62 -@unnumberedsec unnamed-faq-62 -@example -@verbatim -To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE -Subject: Re: Flex maximums -In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST. -Date: Mon, 17 Nov 1997 17:16:15 PST -From: Vern Paxson - -> I took a quick look into the flex-sources and altered some #defines in -> flexdefs.h: -> -> #define INITIAL_MNS 64000 -> #define MNS_INCREMENT 1024000 -> #define MAXIMUM_MNS 64000 - -The things to fix are to add a couple of zeroes to: - -#define JAMSTATE -32766 /* marks a reference to the state that always jams */ -#define MAXIMUM_MNS 31999 -#define BAD_SUBSCRIPT -32767 -#define MAX_SHORT 32700 - -and, if you get complaints about too many rules, make the following change too: - - #define YY_TRAILING_MASK 0x200000 - #define YY_TRAILING_HEAD_MASK 0x400000 - -- Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-63 -@unnumberedsec unnamed-faq-63 -@example -@verbatim -To: jimmey@lexis-nexis.com (Jimmey Todd) -Subject: Re: FLEX question regarding istream vs ifstream -In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST. -Date: Mon, 15 Dec 1997 13:21:35 PST -From: Vern Paxson - -> stdin_handle = YY_CURRENT_BUFFER; -> ifstream fin( "aFile" ); -> yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) ); -> -> What I'm wanting to do, is pass the contents of a file thru one set -> of rules and then pass stdin thru another set... It works great if, I -> don't use the C++ classes. But since everything else that I'm doing is -> in C++, I thought I'd be consistent. -> -> The problem is that 'yy_create_buffer' is expecting an istream* as it's -> first argument (as stated in the man page). However, fin is a ifstream -> object. Any ideas on what I might be doing wrong? Any help would be -> appreciated. Thanks!! - -You need to pass &fin, to turn it into an ifstream* instead of an ifstream. -Then its type will be compatible with the expected istream*, because ifstream -is derived from istream. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-64 -@unnumberedsec unnamed-faq-64 -@example -@verbatim -To: Enda Fadian -Subject: Re: Question related to Flex man page? -In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST. -Date: Tue, 16 Dec 1997 14:17:09 PST -From: Vern Paxson - -> Can you explain to me what is ment by a long-jump in relation to flex? - -Using the longjmp() function while inside yylex() or a routine called by it. - -> what is the flex activation frame. - -Just yylex()'s stack frame. - -> As far as I can see yyrestart will bring me back to the sart of the input -> file and using flex++ isnot really an option! - -No, yyrestart() doesn't imply a rewind, even though its name might sound -like it does. It tells the scanner to flush its internal buffers and -start reading from the given file at its present location. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-65 -@unnumberedsec unnamed-faq-65 -@example -@verbatim -To: hassan@larc.info.uqam.ca (Hassan Alaoui) -Subject: Re: Need urgent Help -In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST. -Date: Sun, 21 Dec 1997 21:30:46 PST -From: Vern Paxson - -> /usr/lib/yaccpar: In function `int yyparse()': -> /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)' -> -> ld: Undefined symbol -> _yylex -> _yyparse -> _yyin - -This is a known problem with Solaris C++ (and/or Solaris yacc). I believe -the fix is to explicitly insert some 'extern "C"' statements for the -corresponding routines/symbols. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-66 -@unnumberedsec unnamed-faq-66 -@example -@verbatim -To: mc0307@mclink.it -Cc: gnu@prep.ai.mit.edu -Subject: Re: [mc0307@mclink.it: Help request] -In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST. -Date: Sun, 21 Dec 1997 22:33:37 PST -From: Vern Paxson - -> This is my definition for float and integer types: -> . . . -> NZD [1-9] -> ... -> I've tested my program on other lex version (on UNIX Sun Solaris an HP -> UNIX) and it work well, so I think that my definitions are correct. -> There are any differences between Lex and Flex? - -There are indeed differences, as discussed in the man page. The one -you are probably running into is that when flex expands a name definition, -it puts parentheses around the expansion, while lex does not. There's -an example in the man page of how this can lead to different matching. -Flex's behavior complies with the POSIX standard (or at least with the -last POSIX draft I saw). - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-67 -@unnumberedsec unnamed-faq-67 -@example -@verbatim -To: hassan@larc.info.uqam.ca (Hassan Alaoui) -Subject: Re: Thanks -In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST. -Date: Mon, 22 Dec 1997 14:35:05 PST -From: Vern Paxson - -> Thank you very much for your help. I compile and link well with C++ while -> declaring 'yylex ...' extern, But a little problem remains. I get a -> segmentation default when executing ( I linked with lfl library) while it -> works well when using LEX instead of flex. Do you have some ideas about the -> reason for this ? - -The one possible reason for this that comes to mind is if you've defined -yytext as "extern char yytext[]" (which is what lex uses) instead of -"extern char *yytext" (which is what flex uses). If it's not that, then -I'm afraid I don't know what the problem might be. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-68 -@unnumberedsec unnamed-faq-68 -@example -@verbatim -To: "Bart Niswonger" -Subject: Re: flex 2.5: c++ scanners & start conditions -In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST. -Date: Tue, 06 Jan 1998 19:19:30 PST -From: Vern Paxson - -> The problem is that when I do this (using %option c++) start -> conditions seem to not apply. - -The BEGIN macro modifies the yy_start variable. For C scanners, this -is a static with scope visible through the whole file. For C++ scanners, -it's a member variable, so it only has visible scope within a member -function. Your lexbegin() routine is not a member function when you -build a C++ scanner, so it's not modifying the correct yy_start. The -diagnostic that indicates this is that you found you needed to add -a declaration of yy_start in order to get your scanner to compile when -using C++; instead, the correct fix is to make lexbegin() a member -function (by deriving from yyFlexLexer). - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-69 -@unnumberedsec unnamed-faq-69 -@example -@verbatim -To: "Boris Zinin" -Subject: Re: current position in flex buffer -In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST. -Date: Mon, 12 Jan 1998 12:03:15 PST -From: Vern Paxson - -> The problem is how to determine the current position in flex active -> buffer when a rule is matched.... - -You will need to keep track of this explicitly, such as by redefining -YY_USER_ACTION to count the number of characters matched. - -The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-70 -@unnumberedsec unnamed-faq-70 -@example -@verbatim -To: Bik.Dhaliwal@bis.org -Subject: Re: Flex question -In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST. -Date: Tue, 27 Jan 1998 22:41:52 PST -From: Vern Paxson - -> That requirement involves knowing -> the character position at which a particular token was matched -> in the lexer. - -The way you have to do this is by explicitly keeping track of where -you are in the file, by counting the number of characters scanned -for each token (available in yyleng). It may prove convenient to -do this by redefining YY_USER_ACTION, as described in the manual. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-71 -@unnumberedsec unnamed-faq-71 -@example -@verbatim -To: Vladimir Alexiev -Subject: Re: flex: how to control start condition from parser? -In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST. -Date: Tue, 27 Jan 1998 22:45:37 PST -From: Vern Paxson - -> It seems useful for the parser to be able to tell the lexer about such -> context dependencies, because then they don't have to be limited to -> local or sequential context. - -One way to do this is to have the parser call a stub routine that's -included in the scanner's .l file, and consequently that has access ot -BEGIN. The only ugliness is that the parser can't pass in the state -it wants, because those aren't visible - but if you don't have many -such states, then using a different set of names doesn't seem like -to much of a burden. - -While generating a .h file like you suggests is certainly cleaner, -flex development has come to a virtual stand-still :-(, so a workaround -like the above is much more pragmatic than waiting for a new feature. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-72 -@unnumberedsec unnamed-faq-72 -@example -@verbatim -To: Barbara Denny -Subject: Re: freebsd flex bug? -In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST. -Date: Fri, 30 Jan 1998 12:42:32 PST -From: Vern Paxson - -> lex.yy.c:1996: parse error before `=' - -This is the key, identifying this error. (It may help to pinpoint -it by using flex -L, so it doesn't generate #line directives in its -output.) I will bet you heavy money that you have a start condition -name that is also a variable name, or something like that; flex spits -out #define's for each start condition name, mapping them to a number, -so you can wind up with: - - %x foo - %% - ... - %% - void bar() - { - int foo = 3; - } - -and the penultimate will turn into "int 1 = 3" after C preprocessing, -since flex will put "#define foo 1" in the generated scanner. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-73 -@unnumberedsec unnamed-faq-73 -@example -@verbatim -To: Maurice Petrie -Subject: Re: Lost flex .l file -In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST. -Date: Mon, 02 Feb 1998 11:15:12 PST -From: Vern Paxson - -> I am curious as to -> whether there is a simple way to backtrack from the generated source to -> reproduce the lost list of tokens we are searching on. - -In theory, it's straight-forward to go from the DFA representation -back to a regular-expression representation - the two are isomorphic. -In practice, a huge headache, because you have to unpack all the tables -back into a single DFA representation, and then write a program to munch -on that and translate it into an RE. - -Sorry for the less-than-happy news ... - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-74 -@unnumberedsec unnamed-faq-74 -@example -@verbatim -To: jimmey@lexis-nexis.com (Jimmey Todd) -Subject: Re: Flex performance question -In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST. -Date: Thu, 19 Feb 1998 08:48:51 PST -From: Vern Paxson - -> What I have found, is that the smaller the data chunk, the faster the -> program executes. This is the opposite of what I expected. Should this be -> happening this way? - -This is exactly what will happen if your input file has embedded NULs. -From the man page: - -A final note: flex is slow when matching NUL's, particularly -when a token contains multiple NUL's. It's best to write -rules which match short amounts of text if it's anticipated -that the text will often include NUL's. - -So that's the first thing to look for. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-75 -@unnumberedsec unnamed-faq-75 -@example -@verbatim -To: jimmey@lexis-nexis.com (Jimmey Todd) -Subject: Re: Flex performance question -In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST. -Date: Thu, 19 Feb 1998 15:42:25 PST -From: Vern Paxson - -So there are several problems. - -First, to go fast, you want to match as much text as possible, which -your scanners don't in the case that what they're scanning is *not* -a tag. So you want a rule like: - - [^<]+ - -Second, C++ scanners are particularly slow if they're interactive, -which they are by default. Using -B speeds it up by a factor of 3-4 -on my workstation. - -Third, C++ scanners that use the istream interface are slow, because -of how poorly implemented istream's are. I built two versions of -the following scanner: - - %% - .*\n - .* - %% - -and the C version inhales a 2.5MB file on my workstation in 0.8 seconds. -The C++ istream version, using -B, takes 3.8 seconds. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-76 -@unnumberedsec unnamed-faq-76 -@example -@verbatim -To: "Frescatore, David (CRD, TAD)" -Subject: Re: FLEX 2.5 & THE YEAR 2000 -In-reply-to: Your message of Wed, 03 Jun 1998 11:26:22 PDT. -Date: Wed, 03 Jun 1998 10:22:26 PDT -From: Vern Paxson - -> I am researching the Y2K problem with General Electric R&D -> and need to know if there are any known issues concerning -> the above mentioned software and Y2K regardless of version. - -There shouldn't be, all it ever does with the date is ask the system -for it and then print it out. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-77 -@unnumberedsec unnamed-faq-77 -@example -@verbatim -To: "Hans Dermot Doran" -Subject: Re: flex problem -In-reply-to: Your message of Wed, 15 Jul 1998 21:30:13 PDT. -Date: Tue, 21 Jul 1998 14:23:34 PDT -From: Vern Paxson - -> To overcome this, I gets() the stdin into a string and lex the string. The -> string is lexed OK except that the end of string isn't lexed properly -> (yy_scan_string()), that is the lexer dosn't recognise the end of string. - -Flex doesn't contain mechanisms for recognizing buffer endpoints. But if -you use fgets instead (which you should anyway, to protect against buffer -overflows), then the final \n will be preserved in the string, and you can -scan that in order to find the end of the string. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-78 -@unnumberedsec unnamed-faq-78 -@example -@verbatim -To: soumen@almaden.ibm.com -Subject: Re: Flex++ 2.5.3 instance member vs. static member -In-reply-to: Your message of Mon, 27 Jul 1998 02:10:04 PDT. -Date: Tue, 28 Jul 1998 01:10:34 PDT -From: Vern Paxson - -> %{ -> int mylineno = 0; -> %} -> ws [ \t]+ -> alpha [A-Za-z] -> dig [0-9] -> %% -> -> Now you'd expect mylineno to be a member of each instance of class -> yyFlexLexer, but is this the case? A look at the lex.yy.cc file seems to -> indicate otherwise; unless I am missing something the declaration of -> mylineno seems to be outside any class scope. -> -> How will this work if I want to run a multi-threaded application with each -> thread creating a FlexLexer instance? - -Derive your own subclass and make mylineno a member variable of it. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-79 -@unnumberedsec unnamed-faq-79 -@example -@verbatim -To: Adoram Rogel -Subject: Re: More than 32K states change hangs -In-reply-to: Your message of Tue, 04 Aug 1998 16:55:39 PDT. -Date: Tue, 04 Aug 1998 22:28:45 PDT -From: Vern Paxson - -> Vern Paxson, -> -> I followed your advice, posted on Usenet bu you, and emailed to me -> personally by you, on how to overcome the 32K states limit. I'm running -> on Linux machines. -> I took the full source of version 2.5.4 and did the following changes in -> flexdef.h: -> #define JAMSTATE -327660 -> #define MAXIMUM_MNS 319990 -> #define BAD_SUBSCRIPT -327670 -> #define MAX_SHORT 327000 -> -> and compiled. -> All looked fine, including check and bigcheck, so I installed. - -Hmmm, you shouldn't increase MAX_SHORT, though looking through my email -archives I see that I did indeed recommend doing so. Try setting it back -to 32700; that should suffice that you no longer need -Ca. If it still -hangs, then the interesting question is - where? - -> Compiling the same hanged program with a out-of-the-box (RedHat 4.2 -> distribution of Linux) -> flex 2.5.4 binary works. - -Since Linux comes with source code, you should diff it against what -you have to see what problems they missed. - -> Should I always compile with the -Ca option now ? even short and simple -> filters ? - -No, definitely not. It's meant to be for those situations where you -absolutely must squeeze every last cycle out of your scanner. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-80 -@unnumberedsec unnamed-faq-80 -@example -@verbatim -To: "Schmackpfeffer, Craig" -Subject: Re: flex output for static code portion -In-reply-to: Your message of Tue, 11 Aug 1998 11:55:30 PDT. -Date: Mon, 17 Aug 1998 23:57:42 PDT -From: Vern Paxson - -> I would like to use flex under the hood to generate a binary file -> containing the data structures that control the parse. - -This has been on the wish-list for a long time. In principle it's -straight-forward - you redirect mkdata() et al's I/O to another file, -and modify the skeleton to have a start-up function that slurps these -into dynamic arrays. The concerns are (1) the scanner generation code -is hairy and full of corner cases, so it's easy to get surprised when -going down this path :-( ; and (2) being careful about buffering so -that when the tables change you make sure the scanner starts in the -correct state and reading at the right point in the input file. - -> I was wondering if you know of anyone who has used flex in this way. - -I don't - but it seems like a reasonable project to undertake (unlike -numerous other flex tweaks :-). - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-81 -@unnumberedsec unnamed-faq-81 -@example -@verbatim -Received: from 131.173.17.11 (131.173.17.11 [131.173.17.11]) - by ee.lbl.gov (8.9.1/8.9.1) with ESMTP id AAA03838 - for ; Thu, 20 Aug 1998 00:47:57 -0700 (PDT) -Received: from hal.cl-ki.uni-osnabrueck.de (hal.cl-ki.Uni-Osnabrueck.DE [131.173.141.2]) - by deimos.rz.uni-osnabrueck.de (8.8.7/8.8.8) with ESMTP id JAA34694 - for ; Thu, 20 Aug 1998 09:47:55 +0200 -Received: (from georg@localhost) by hal.cl-ki.uni-osnabrueck.de (8.6.12/8.6.12) id JAA34834 for vern@ee.lbl.gov; Thu, 20 Aug 1998 09:47:54 +0200 -From: Georg Rehm -Message-Id: <199808200747.JAA34834@hal.cl-ki.uni-osnabrueck.de> -Subject: "flex scanner push-back overflow" -To: vern@ee.lbl.gov -Date: Thu, 20 Aug 1998 09:47:54 +0200 (MEST) -Reply-To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE -X-NoJunk: Do NOT send commercial mail, spam or ads to this address! -X-URL: http://www.cl-ki.uni-osnabrueck.de/~georg/ -X-Mailer: ELM [version 2.4ME+ PL28 (25)] -MIME-Version: 1.0 -Content-Type: text/plain; charset=US-ASCII -Content-Transfer-Encoding: 7bit - -Hi Vern, - -Yesterday, I encountered a strange problem: I use the macro processor m4 -to include some lengthy lists into a .l file. Following is a flex macro -definition that causes some serious pain in my neck: - -AUTHOR ("A. Boucard / L. Boucard"|"A. Dastarac / M. Levent"|"A.Boucaud / L.Boucaud"|"Abderrahim Lamchichi"|"Achmat Dangor"|"Adeline Toullier"|"Adewale Maja-Pearce"|"Ahmed Ziri"|"Akram Ellyas"|"Alain Bihr"|"Alain Gresh"|"Alain Guillemoles"|"Alain Joxe"|"Alain Morice"|"Alain Renon"|"Alain Zecchini"|"Albert Memmi"|"Alberto Manguel"|"Alex De Waal"|"Alfonso Artico"| [...]) - -The complete list contains about 10kB. When I try to "flex" this file -(on a Solaris 2.6 machine, using a modified flex 2.5.4 (I only increased -some of the predefined values in flexdefs.h) I get the error: - -myflex/flex -8 sentag.tmp.l -flex scanner push-back overflow - -When I remove the slashes in the macro definition everything works fine. -As I understand it, the double quotes escape the slash-character so it -really means "/" and not "trailing context". Furthermore, I tried to -escape the slashes with backslashes, but with no use, the same error message -appeared when flexing the code. - -Do you have an idea what's going on here? - -Greetings from Germany, - Georg --- -Georg Rehm georg@cl-ki.uni-osnabrueck.de -Institute for Semantic Information Processing, University of Osnabrueck, FRG -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-82 -@unnumberedsec unnamed-faq-82 -@example -@verbatim -To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE -Subject: Re: "flex scanner push-back overflow" -In-reply-to: Your message of Thu, 20 Aug 1998 09:47:54 PDT. -Date: Thu, 20 Aug 1998 07:05:35 PDT -From: Vern Paxson - -> myflex/flex -8 sentag.tmp.l -> flex scanner push-back overflow - -Flex itself uses a flex scanner. That scanner is running out of buffer -space when it tries to unput() the humongous macro you've defined. When -you remove the '/'s, you make it small enough so that it fits in the buffer; -removing spaces would do the same thing. - -The fix is to either rethink how come you're using such a big macro and -perhaps there's another/better way to do it; or to rebuild flex's own -scan.c with a larger value for - - #define YY_BUF_SIZE 16384 - -- Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-83 -@unnumberedsec unnamed-faq-83 -@example -@verbatim -To: Jan Kort -Subject: Re: Flex -In-reply-to: Your message of Fri, 04 Sep 1998 12:18:43 +0200. -Date: Sat, 05 Sep 1998 00:59:49 PDT -From: Vern Paxson - -> %% -> -> "TEST1\n" { fprintf(stderr, "TEST1\n"); yyless(5); } -> ^\n { fprintf(stderr, "empty line\n"); } -> . { } -> \n { fprintf(stderr, "new line\n"); } -> -> %% -> -- input --------------------------------------- -> TEST1 -> -- output -------------------------------------- -> TEST1 -> empty line -> ------------------------------------------------ - -IMHO, it's not clear whether or not this is in fact a bug. It depends -on whether you view yyless() as backing up in the input stream, or as -pushing new characters onto the beginning of the input stream. Flex -interprets it as the latter (for implementation convenience, I'll admit), -and so considers the newline as in fact matching at the beginning of a -line, as after all the last token scanned an entire line and so the -scanner is now at the beginning of a new line. - -I agree that this is counter-intuitive for yyless(), given its -functional description (it's less so for unput(), depending on whether -you're unput()'ing new text or scanned text). But I don't plan to -change it any time soon, as it's a pain to do so. Consequently, -you do indeed need to use yy_set_bol() and YY_AT_BOL() to tweak -your scanner into the behavior you desire. - -Sorry for the less-than-completely-satisfactory answer. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-84 -@unnumberedsec unnamed-faq-84 -@example -@verbatim -To: Patrick Krusenotto -Subject: Re: Problems with restarting flex-2.5.2-generated scanner -In-reply-to: Your message of Thu, 24 Sep 1998 10:14:07 PDT. -Date: Thu, 24 Sep 1998 23:28:43 PDT -From: Vern Paxson - -> I am using flex-2.5.2 and bison 1.25 for Solaris and I am desperately -> trying to make my scanner restart with a new file after my parser stops -> with a parse error. When my compiler restarts, the parser always -> receives the token after the token (in the old file!) that caused the -> parser error. - -I suspect the problem is that your parser has read ahead in order -to attempt to resolve an ambiguity, and when it's restarted it picks -up with that token rather than reading a fresh one. If you're using -yacc, then the special "error" production can sometimes be used to -consume tokens in an attempt to get the parser into a consistent state. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-85 -@unnumberedsec unnamed-faq-85 -@example -@verbatim -To: Henric Jungheim -Subject: Re: flex 2.5.4a -In-reply-to: Your message of Tue, 27 Oct 1998 16:41:42 PST. -Date: Tue, 27 Oct 1998 16:50:14 PST -From: Vern Paxson - -> This brings up a feature request: How about a command line -> option to specify the filename when reading from stdin? That way one -> doesn't need to create a temporary file in order to get the "#line" -> directives to make sense. - -Use -o combined with -t (per the man page description of -o). - -> P.S., Is there any simple way to use non-blocking IO to parse multiple -> streams? - -Simple, no. - -One approach might be to return a magic character on EWOULDBLOCK and -have a rule - - .* // put back .*, eat magic character - -This is off the top of my head, not sure it'll work. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-86 -@unnumberedsec unnamed-faq-86 -@example -@verbatim -To: "Repko, Billy D" -Subject: Re: Compiling scanners -In-reply-to: Your message of Wed, 13 Jan 1999 10:52:47 PST. -Date: Thu, 14 Jan 1999 00:25:30 PST -From: Vern Paxson - -> It appears that maybe it cannot find the lfl library. - -The Makefile in the distribution builds it, so you should have it. -It's exceedingly trivial, just a main() that calls yylex() and -a yyrap() that always returns 1. - -> %% -> \n ++num_lines; ++num_chars; -> . ++num_chars; - -You can't indent your rules like this - that's where the errors are coming -from. Flex copies indented text to the output file, it's how you do things -like - - int num_lines_seen = 0; - -to declare local variables. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-87 -@unnumberedsec unnamed-faq-87 -@example -@verbatim -To: Erick Branderhorst -Subject: Re: flex input buffer -In-reply-to: Your message of Tue, 09 Feb 1999 13:53:46 PST. -Date: Tue, 09 Feb 1999 21:03:37 PST -From: Vern Paxson - -> In the flex.skl file the size of the default input buffers is set. Can you -> explain why this size is set and why it is such a high number. - -It's large to optimize performance when scanning large files. You can -safely make it a lot lower if needed. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-88 -@unnumberedsec unnamed-faq-88 -@example -@verbatim -To: "Guido Minnen" -Subject: Re: Flex error message -In-reply-to: Your message of Wed, 24 Feb 1999 15:31:46 PST. -Date: Thu, 25 Feb 1999 00:11:31 PST -From: Vern Paxson - -> I'm extending a larger scanner written in Flex and I keep running into -> problems. More specifically, I get the error message: -> "flex: input rules are too complicated (>= 32000 NFA states)" - -Increase the definitions in flexdef.h for: - -#define JAMSTATE -32766 /* marks a reference to the state that always j -ams */ -#define MAXIMUM_MNS 31999 -#define BAD_SUBSCRIPT -32767 - -recompile everything, and it should all work. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-89 -@unnumberedsec unnamed-faq-89 -@example -@verbatim -To: John Victor J -Subject: Re: flex---is thread safe -In-reply-to: Your message of Sun, 23 May 1999 12:56:56 +0530. -Date: Sun, 23 May 1999 00:32:53 PDT -From: Vern Paxson - -> I would like to know whether flex is thread safe??? - -I take it you mean the scanners it generates and not flex itself. - -The answer is (still) No, except if you use the -+ option to generate -a C++ scanning class (and if your stream library is thread-safe). - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-90 -@unnumberedsec unnamed-faq-90 -@example -@verbatim -To: "Dmitriy Goldobin" -Subject: Re: FLEX trouble -In-reply-to: Your message of Mon, 31 May 1999 18:44:49 PDT. -Date: Tue, 01 Jun 1999 00:15:07 PDT -From: Vern Paxson - -> I have a trouble with FLEX. Why rule "/*".*"*/" work properly,=20 -> but rule "/*"(.|\n)*"*/" don't work ? - -The second of these will have to scan the entire input stream (because -"(.|\n)*" matches an arbitrary amount of any text) in order to see if -it ends with "*/", terminating the comment. That potentially will overflow -the input buffer. - -> More complex rule "/*"([^*]|(\*/[^/]))*"*/ give an error -> 'unrecognized rule'. - -You can't use the '/' operator inside parentheses. It's not clear -what "(a/b)*" actually means. - -> I now use workaround with state , but single-rule is -> better, i think. - -Single-rule is nice but will always have the problem of either setting -restrictions on comments (like not allowing multi-line comments) and/or -running the risk of consuming the entire input stream, as noted above. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-91 -@unnumberedsec unnamed-faq-91 -@example -@verbatim -Received: from mc-qout4.whowhere.com (mc-qout4.whowhere.com [209.185.123.18]) - by ee.lbl.gov (8.9.3/8.9.3) with SMTP id IAA05100 - for ; Tue, 15 Jun 1999 08:56:06 -0700 (PDT) -Received: from Unknown/Local ([?.?.?.?]) by my-deja.com; Tue Jun 15 08:55:43 1999 -To: vern@ee.lbl.gov -Date: Tue, 15 Jun 1999 08:55:43 -0700 -From: "Aki Niimura" -Message-ID: -Mime-Version: 1.0 -Cc: -X-Sent-Mail: on -Reply-To: -X-Mailer: MailCity Service -Subject: A question on flex C++ scanner -X-Sender-Ip: 12.72.207.61 -Organization: My Deja Email (http://www.my-deja.com:80) -Content-Type: text/plain; charset=us-ascii -Content-Transfer-Encoding: 7bit - -Dear Dr. Paxon, - -I have been using flex for years. -It works very well on many projects. -Most case, I used it to generate a scanner on C language. -However, one project I needed to generate a scanner -on C++ lanuage. Thanks to your enhancement, flex did -the job. - -Currently, I'm working on enhancing my previous project. -I need to deal with multiple input streams (recursive -inclusion) in this scanner (C++). -I did similar thing for another scanner (C) as you -explained in your documentation. - -The generated scanner (C++) has necessary methods: -- switch_to_buffer(struct yy_buffer_state *b) -- yy_create_buffer(istream *is, int sz) -- yy_delete_buffer(struct yy_buffer_state *b) - -However, I couldn't figure out how to access current -buffer (yy_current_buffer). - -yy_current_buffer is a protected member of yyFlexLexer. -I can't access it directly. -Then, I thought yy_create_buffer() with is = 0 might -return current stream buffer. But it seems not as far -as I checked the source. (flex 2.5.4) - -I went through the Web in addition to Flex documentation. -However, it hasn't been successful, so far. - -It is not my intention to bother you, but, can you -comment about how to obtain the current stream buffer? - -Your response would be highly appreciated. - -Best regards, -Aki Niimura - ---== Sent via Deja.com http://www.deja.com/ ==-- -Share what you know. Learn what you don't. -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-92 -@unnumberedsec unnamed-faq-92 -@example -@verbatim -To: neko@my-deja.com -Subject: Re: A question on flex C++ scanner -In-reply-to: Your message of Tue, 15 Jun 1999 08:55:43 PDT. -Date: Tue, 15 Jun 1999 09:04:24 PDT -From: Vern Paxson - -> However, I couldn't figure out how to access current -> buffer (yy_current_buffer). - -Derive your own subclass from yyFlexLexer. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-93 -@unnumberedsec unnamed-faq-93 -@example -@verbatim -To: "Stones, Darren" -Subject: Re: You're the man to see? -In-reply-to: Your message of Wed, 23 Jun 1999 11:10:29 PDT. -Date: Wed, 23 Jun 1999 09:01:40 PDT -From: Vern Paxson - -> I hope you can help me. I am using Flex and Bison to produce an interpreted -> language. However all goes well until I try to implement an IF statement or -> a WHILE. I cannot get this to work as the parser parses all the conditions -> eg. the TRUE and FALSE conditons to check for a rule match. So I cannot -> make a decision!! - -You need to use the parser to build a parse tree (= abstract syntax trwee), -and when that's all done you recursively evaluate the tree, binding variables -to values at that time. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-94 -@unnumberedsec unnamed-faq-94 -@example -@verbatim -To: Petr Danecek -Subject: Re: flex - question -In-reply-to: Your message of Mon, 28 Jun 1999 19:21:41 PDT. -Date: Fri, 02 Jul 1999 16:52:13 PDT -From: Vern Paxson - -> file, it takes an enormous amount of time. It is funny, because the -> source code has only 12 rules!!! I think it looks like an exponencial -> growth. - -Right, that's the problem - some patterns (those with a lot of -ambiguity, where yours has because at any given time the scanner can -be in the middle of all sorts of combinations of the different -rules) blow up exponentially. - -For your rules, there is an easy fix. Change the ".*" that comes fater -the directory name to "[^ ]*". With that in place, the rules are no -longer nearly so ambiguous, because then once one of the directories -has been matched, no other can be matched (since they all require a -leading blank). - -If that's not an acceptable solution, then you can enter a start state -to pick up the .*\n after each directory is matched. - -Also note that for speed, you'll want to add a ".*" rule at the end, -otherwise rules that don't match any of the patterns will be matched -very slowly, a character at a time. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-95 -@unnumberedsec unnamed-faq-95 -@example -@verbatim -To: Tielman Koekemoer -Subject: Re: Please help. -In-reply-to: Your message of Thu, 08 Jul 1999 13:20:37 PDT. -Date: Thu, 08 Jul 1999 08:20:39 PDT -From: Vern Paxson - -> I was hoping you could help me with my problem. -> -> I tried compiling (gnu)flex on a Solaris 2.4 machine -> but when I ran make (after configure) I got an error. -> -> -------------------------------------------------------------- -> gcc -c -I. -I. -g -O parse.c -> ./flex -t -p ./scan.l >scan.c -> sh: ./flex: not found -> *** Error code 1 -> make: Fatal error: Command failed for target `scan.c' -> ------------------------------------------------------------- -> -> What's strange to me is that I'm only -> trying to install flex now. I then edited the Makefile to -> and changed where it says "FLEX = flex" to "FLEX = lex" -> ( lex: the native Solaris one ) but then it complains about -> the "-p" option. Is there any way I can compile flex without -> using flex or lex? -> -> Thanks so much for your time. - -You managed to step on the bootstrap sequence, which first copies -initscan.c to scan.c in order to build flex. Try fetching a fresh -distribution from ftp.ee.lbl.gov. (Or you can first try removing -".bootstrap" and doing a make again.) - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-96 -@unnumberedsec unnamed-faq-96 -@example -@verbatim -To: Tielman Koekemoer -Subject: Re: Please help. -In-reply-to: Your message of Fri, 09 Jul 1999 09:16:14 PDT. -Date: Fri, 09 Jul 1999 00:27:20 PDT -From: Vern Paxson - -> First I removed .bootstrap (and ran make) - no luck. I downloaded the -> software but I still have the same problem. Is there anything else I -> could try. - -Try: - - cp initscan.c scan.c - touch scan.c - make scan.o - -If this last tries to first build scan.c from scan.l using ./flex, then -your "make" is broken, in which case compile scan.c to scan.o by hand. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-97 -@unnumberedsec unnamed-faq-97 -@example -@verbatim -To: Sumanth Kamenani -Subject: Re: Error -In-reply-to: Your message of Mon, 19 Jul 1999 23:08:41 PDT. -Date: Tue, 20 Jul 1999 00:18:26 PDT -From: Vern Paxson - -> I am getting a compilation error. The error is given as "unknown symbol- yylex". - -The parser relies on calling yylex(), but you're instead using the C++ scanning -class, so you need to supply a yylex() "glue" function that calls an instance -scanner of the scanner (e.g., "scanner->yylex()"). - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-98 -@unnumberedsec unnamed-faq-98 -@example -@verbatim -To: daniel@synchrods.synchrods.COM (Daniel Senderowicz) -Subject: Re: lex -In-reply-to: Your message of Mon, 22 Nov 1999 11:19:04 PST. -Date: Tue, 23 Nov 1999 15:54:30 PST -From: Vern Paxson - -Well, your problem is the - -switch (yybgin-yysvec-1) { /* witchcraft */ - -at the beginning of lex rules. "witchcraft" == "non-portable". It's -assuming knowledge of the AT&T lex's internal variables. - -For flex, you can probably do the equivalent using a switch on YYSTATE. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-99 -@unnumberedsec unnamed-faq-99 -@example -@verbatim -To: archow@hss.hns.com -Subject: Re: Regarding distribution of flex and yacc based grammars -In-reply-to: Your message of Sun, 19 Dec 1999 17:50:24 +0530. -Date: Wed, 22 Dec 1999 01:56:24 PST -From: Vern Paxson - -> When we provide the customer with an object code distribution, is it -> necessary for us to provide source -> for the generated C files from flex and bison since they are generated by -> flex and bison ? - -For flex, no. I don't know what the current state of this is for bison. - -> Also, is there any requrirement for us to neccessarily provide source for -> the grammar files which are fed into flex and bison ? - -Again, for flex, no. - -See the file "COPYING" in the flex distribution for the legalese. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-100 -@unnumberedsec unnamed-faq-100 -@example -@verbatim -To: Martin Gallwey -Subject: Re: Flex, and self referencing rules -In-reply-to: Your message of Sun, 20 Feb 2000 01:01:21 PST. -Date: Sat, 19 Feb 2000 18:33:16 PST -From: Vern Paxson - -> However, I do not use unput anywhere. I do use self-referencing -> rules like this: -> -> UnaryExpr ({UnionExpr})|("-"{UnaryExpr}) - -You can't do this - flex is *not* a parser like yacc (which does indeed -allow recursion), it is a scanner that's confined to regular expressions. - - Vern -@end verbatim -@end example - -@c TODO: Evaluate this faq. -@node unnamed-faq-101 -@unnumberedsec unnamed-faq-101 -@example -@verbatim -To: slg3@lehigh.edu (SAMUEL L. GULDEN) -Subject: Re: Flex problem -In-reply-to: Your message of Thu, 02 Mar 2000 12:29:04 PST. -Date: Thu, 02 Mar 2000 23:00:46 PST -From: Vern Paxson - -If this is exactly your program: - -> digit [0-9] -> digits {digit}+ -> whitespace [ \t\n]+ -> -> %% -> "[" { printf("open_brac\n");} -> "]" { printf("close_brac\n");} -> "+" { printf("addop\n");} -> "*" { printf("multop\n");} -> {digits} { printf("NUMBER = %s\n", yytext);} -> whitespace ; - -then the problem is that the last rule needs to be "{whitespace}" ! - - Vern -@end verbatim -@end example -- 2.40.0