From 6260bde22e43d352f949103eb9b54a8191e73e1a Mon Sep 17 00:00:00 2001 From: Will Estes Date: Fri, 9 Aug 2002 20:36:34 +0000 Subject: [PATCH] more faq editing; corrected mistyped nodenames --- flex.texi | 217 ++++++++++++++++++++++-------------------------------- 1 file changed, 86 insertions(+), 131 deletions(-) diff --git a/flex.texi b/flex.texi index 5b0db2b..798f859 100644 --- a/flex.texi +++ b/flex.texi @@ -51,7 +51,7 @@ a tool for generating programs that perform pattern-matching on text. The manual includes both tutorial and reference sections. This edition of the @cite{flex Manual} documents @code{flex} version -@value{VERSION}. It was last updated @value{UPDATED}. +@value{VERSION}. It was last updated on @value{UPDATED}. @menu * Introduction:: @@ -145,10 +145,9 @@ FAQ * How do I execute code only during initialization (only before the first scan)?:: * How do I execute code at termination?:: * Where else can I find help?:: -* Can I include comments in the "rules" section of the file file?:: +* Can I include comments in the "rules" section of the file?:: * I get an error about undefined yywrap().:: * How can I change the matching pattern at run time?:: -* Is there a way to increase the rules (NFA states to a bigger number?):: * How can I expand macros in the input?:: * How can I build a two-pass scanner?:: * How do I match any string not matched in the preceding rules?:: @@ -4766,10 +4765,9 @@ publish them here. * How do I execute code only during initialization (only before the first scan)?:: * How do I execute code at termination?:: * Where else can I find help?:: -* Can I include comments in the "rules" section of the file file?:: +* Can I include comments in the "rules" section of the file?:: * I get an error about undefined yywrap().:: * How can I change the matching pattern at run time?:: -* Is there a way to increase the rules (NFA states to a bigger number?):: * How can I expand macros in the input?:: * How can I build a two-pass scanner?:: * How do I match any string not matched in the preceding rules?:: @@ -4918,7 +4916,7 @@ simple technique once you understand the principles.) A side-effect of this parallel matching is that when the input matches more than one rule, @code{flex} scanners pick the rule that matched the @emph{most} text. This -is explained further in the manual, in the section @xref{Mathing}. +is explained further in the manual, in the section @xref{Matching}. If you want @code{flex} to choose a shorter match, then you can work around this behavior by expanding your short @@ -5003,18 +5001,15 @@ available, it blocks until some input is available.) I've used this technique i interpreter I wrote that both reads keyboard input using a @code{flex} scanner and IPC traffic from sockets, and it works fine. -@c faq edit stopped here @node Can I build nested parsers that work with the same input file? @unnumberedsec Can I build nested parsers that work with the same input file? -Can I build nested parsers that work with the same input file? - This is not going to work without some additional effort. The reason is -that flex block-buffers the input it reads from yyin. This means that the -"outermost" yylex(), when called, will automatically slurp up the first 8K -of input available on yyin, and subsequent calls to other yylex()'s won't +that @code{flex} block-buffers the input it reads from @code{yyin}. This means that the +``outermost'' @code{yylex()}, when called, will automatically slurp up the first 8K +of input available on yyin, and subsequent calls to other @code{yylex()}'s won't see that input. You might be tempted to work around this problem by -redefining YY_INPUT to only return a small amount of text, but it turns out +redefining @code{YY_INPUT} to only return a small amount of text, but it turns out that that approach is quite difficult. Instead, the best solution is to combine all of your scanners into one large scanner, using a different exclusive start condition for each. @@ -5022,14 +5017,12 @@ exclusive start condition for each. @node How can I match text only at the end of a file? @unnumberedsec How can I match text only at the end of a file? -How can I match text only at the end of a file? - -There is no way to write a rule which is "match this text, but only if -it comes at the end of the file". You can fake it, though, if you happen +There is no way to write a rule which is ``match this text, but only if +it comes at the end of the file''. You can fake it, though, if you happen to have a character lying around that you don't allow in your input. -Then you redefine YY_INPUT to call your own routine which, if it sees -an EOF, returns the magic character first (and remembers to return a -real EOF next time it's called). Then you could write: +Then you redefine @code{YY_INPUT} to call your own routine which, if it sees +an @samp{EOF}, returns the magic character first (and remembers to return a +real @code{EOF} next time it's called). Then you could write: @example @verbatim @@ -5040,11 +5033,9 @@ real EOF next time it's called). Then you could write: @node How can I make REJECT cascade across start condition boundaries? @unnumberedsec How can I make REJECT cascade across start condition boundaries? -How can I make REJECT cascade across start condition boundaries? - -You can do this as follows. Suppose you have a start condition A, and -after exhausting all of the possible matches in , you want to try -matches in . Then you could use the following: +You can do this as follows. Suppose you have a start condition @samp{A}, and +after exhausting all of the possible matches in @samp{}, you want to try +matches in @samp{}. Then you could use the following: @example @verbatim @@ -5085,13 +5076,11 @@ as fast as possible. Still, it seems reasonable to allow the user to choose to trade off a bit of performance in this area to gain the corresponding flexibility. There might be another reason, though, why fast scanners don't support the -interactive option +interactive option. @node How much faster is -F or -f than -C? @unnumberedsec How much faster is -F or -f than -C? -How much faster is -F or -f than -C? - Much faster (factor of 2-3). @node If I have a simple grammar cant I just parse it with flex? @@ -5100,19 +5089,18 @@ Much faster (factor of 2-3). Is your grammar recursive? That's almost always a sign that you're better off using a parser/scanner rather than just trying to use a scanner alone. + @node Why doesnt yyrestart() set the start state back to INITIAL? @unnumberedsec Why doesn't yyrestart() set the start state back to INITIAL? There are two reasons. The first is that there might be programs that rely on the start state not changing across file changes. -The second is that with flex 2.4, use of yyrestart() is no longer required, +The second is that beginning with @code{flex} version 2.4, use of @code{yyrestart()} is no longer required, so fixing the problem there doesn't solve the more general problem. @node How can I match C-style comments? @unnumberedsec How can I match C-style comments? -How can I match C-style comments? - You might be tempted to try something like this: @example @@ -5164,37 +5152,31 @@ A common mistake is to place the grouping parenthesis AFTER an operator, when you really meant to place the parenthesis BEFORE the operator, e.g., you probably want this @code{(foo|bar)+} and NOT this @code{(foo|bar+)}. -The first pattern matches the words @code{foo} or @code{bar} any number of -times, e.g., it matches the text @code{barfoofoobarfoo}. The +The first pattern matches the words @samp{foo} or @samp{bar} any number of +times, e.g., it matches the text @samp{barfoofoobarfoo}. The second pattern matches a single instance of @code{foo} or a single instance of -@code{ba} followed by one or more @samp{r}s, e.g., it matches the text @code{barrrr} . +@code{bar} followed by one or more @samp{r}s, e.g., it matches the text @code{barrrr} . @item -A @samp{.} inside []'s just means a literal@samp{.} (period), -and NOT "any character except newline". +A @samp{.} inside @samp{[]}'s just means a literal@samp{.} (period), +and NOT ``any character except newline''. @item -Remember that @samp{.} matches any character EXCEPT @samp{\n} (and EOF). +Remember that @samp{.} matches any character EXCEPT @samp{\n} (and @samp{EOF}). If you really want to match ANY character, including newlines, then use @code{(.|\n)} ---- Beware that the regex @code{(.|\n)+} will match your entire input! +Beware that the regex @code{(.|\n)+} will match your entire input! @item -Finally, if you want to match a literal @samp{.} (a period), then use [.] or "." +Finally, if you want to match a literal @samp{.} (a period), then use @samp{[.]} or @samp{"."} @end itemize @node Can I get the flex manual in another format? @unnumberedsec Can I get the flex manual in another format? -Can I get the flex manual in another format? - -As of flex 2.5, the manual is distributed in texinfo format. -You can use the "texi2*" tools to convert the manual to any format -you desire (e.g., @samp{texi2html}). +The @code{flex} source distribution includes a texinfo manual. You are +free to convert that texinfo into whatever format you desire. The +@code{texinfo} package includes tools for conversion to a number of formats. @node Does there exist a "faster" NDFA->DFA algorithm? @unnumberedsec Does there exist a "faster" NDFA->DFA algorithm? -Does there exist a "faster" NDFA->DFA algorithm? Most standard texts (e.g., -Aho), imply that NDFA->DFA can take exponential time, since there are -exponential number of potential states in NDFA. - There's no way around the potential exponential running time - it can take you exponential time just to enumerate all of the DFA states. In practice, though, the running time is closer to linear, or sometimes @@ -5203,9 +5185,7 @@ quadratic. @node How does flex compile the DFA so quickly? @unnumberedsec How does flex compile the DFA so quickly? -How does flex compile the DFA so quickly? - -There are two big speed wins that flex uses: +There are two big speed wins that @code{flex} uses: @enumerate @item @@ -5224,11 +5204,9 @@ state can be done very quickly, by first comparing hash values. @node How can I use more than 8192 rules? @unnumberedsec How can I use more than 8192 rules? -How can I use more than 8192 rules? - -Flex is compiled with an upper limit of 8192 rules per scanner. -If you need more than 8192 rules in your scanner, you'll have to recompile flex -with the following changes in flexdef.h: +@code{Flex} is compiled with an upper limit of 8192 rules per scanner. +If you need more than 8192 rules in your scanner, you'll have to recompile @code{flex} +with the following changes in @file{flexdef.h}: @example @verbatim @@ -5244,21 +5222,34 @@ This should work okay as long as your C compiler uses 32 bit integers. But you might want to think about whether using such a huge number of rules is the best way to solve your problem. +The following may also be relevant: + +With luck, you should be able to increase the definitions in flexdef.h for: + +@example +@verbatim +#define JAMSTATE -32766 /* marks a reference to the state that always jams */ +#define MAXIMUM_MNS 31999 +#define BAD_SUBSCRIPT -32767 +@end verbatim +@end example + +recompile everything, and it'll all work. Flex only has these 16-bit-like +values built into it because a long time ago it was developed on a machine +with 16-bit ints. I've given this advice to others in the past but haven't +heard back from them whether it worked okay or not... + @node How do I abandon a file in the middle of a scan and switch to a new file? @unnumberedsec How do I abandon a file in the middle of a scan and switch to a new file? -How do I abandon a file in the middle of a scan and switch to a new file? - -Just all yyrestart(newfile). Be sure to reset the start state if you want a -"fresh" start, since yyrestart does NOT reset the start state back to INITIAL. +Just call @code{yyrestart(newfile)}. Be sure to reset the start state if you want a +``fresh start, since @code{yyrestart} does NOT reset the start state back to @code{INITIAL}. @node How do I execute code only during initialization (only before the first scan)? @unnumberedsec How do I execute code only during initialization (only before the first scan)? -How do I execute code only during initialization (only before the first scan)? - -You can specify an initial action by defining the macro YY_USER_INIT (though -note that yyout may not be available at the time this macro is executed). Or you +You can specify an initial action by defining the macro @code{YY_USER_INIT} (though +note that @code{yyout} may not be available at the time this macro is executed). Or you can add to the beginning of your rules section: @example @@ -5277,69 +5268,41 @@ did_init = 1; @node How do I execute code at termination? @unnumberedsec How do I execute code at termination? -How do I execute code at termination (i.e., only after the last scan?) +You can specify an action for the @code{<>} rule. -You can specifiy an action for the <> rule. @node Where else can I find help? @unnumberedsec Where else can I find help? -Where else can I find help? - -The @code{help-flex} email list is served by GNU. See http://www.gnu.org/ for -details how to subscribe or search the archives. - -@node Can I include comments in the "rules" section of the file file? -@unnumberedsec Can I include comments in the "rules" section of the file file? +The @code{help-flex} email list is served by GNU. See @uref{http://www.gnu.org/} for +details on how to subscribe or search the archives. -Can I include comments in the "rules" section of the file file? +@node Can I include comments in the "rules" section of the file? +@unnumberedsec Can I include comments in the "rules" section of the file? Yes, just about anywhere you want to. See the manual for the specific syntax. @node I get an error about undefined yywrap(). @unnumberedsec I get an error about undefined yywrap(). -I get an error about undefined yywrap(). - -You must supply a yywrap() function of your own, or link to libfl.a +You must supply a @code{yywrap()} function of your own, or link to @file{libfl.a} (which provides one), or use +@example +@verbatim %option noyywrap +@end verbatim +@end example -in your source to say you don't want a yywrap() function. -See the manual page for more details concerning yywrap(). +in your source to say you don't want a @code{yywrap()} function. @node How can I change the matching pattern at run time? @unnumberedsec How can I change the matching pattern at run time? -How can I change the matching pattern at run time? - You can't, it's compiled into a static table when flex builds the scanner. -@node Is there a way to increase the rules (NFA states to a bigger number?) -@unnumberedsec Is there a way to increase the rules (NFA states to a bigger number?) - -Is there a way to increase the rules (NFA states to a bigger number?) - -With luck, you should be able to increase the definitions in flexdef.h for: - -@example -@verbatim -#define JAMSTATE -32766 /* marks a reference to the state that always jams */ -#define MAXIMUM_MNS 31999 -#define BAD_SUBSCRIPT -32767 -@end verbatim -@end example - -recompile everything, and it'll all work. Flex only has these 16-bit-like -values built into it because a long time ago it was developed on a machine -with 16-bit ints. I've given this advice to others in the past but haven't -heard back from them whether it worked okay or not... - @node How can I expand macros in the input? @unnumberedsec How can I expand macros in the input? -How can I expand macros in the input? - The best way to approach this problem is at a higher level, e.g., in the parser. However, you can do this using multiple input buffers. @@ -5375,8 +5338,6 @@ From the above though hopefully the idea is clear. @node How can I build a two-pass scanner? @unnumberedsec How can I build a two-pass scanner? -How can I build a two-pass scanner? - One way to do it is to filter the first pass to a temporary file, then process the temporary file on the second pass. You will probably see a performance hit, do to all the disk I/O. @@ -5392,11 +5353,9 @@ residing in memory. @node How do I match any string not matched in the preceding rules? @unnumberedsec How do I match any string not matched in the preceding rules? -How do I match any string not matched in the preceding rules? - One way to assign precedence, is to place the more specific rules first. If two rules would match the same input (same sequence of characters) then the -first rule listed in the flex input wins. e.g., +first rule listed in the @code{flex} input wins. e.g., @example @verbatim @@ -5409,34 +5368,28 @@ bar[a-zA-Z_]+ return BAR_ID; Note that the rule @code{[a-zA-Z_]+} must come *after* the others. It will match the same amount of text as the more specific rules, and in that case the -flex scanner will pick the first rule listed in your scanner as the +@code{flex} scanner will pick the first rule listed in your scanner as the one to match. @node I am trying to port code from AT&T lex that uses yysptr and yysbuf. @unnumberedsec I am trying to port code from AT&T lex that uses yysptr and yysbuf. -I am trying to port code from AT&T lex that uses yysptr and yysbuf. - Those are internal variables pointing into the AT&T scanner's input buffer. I -imagine they're being manipulated in user versions of the input() and unput() +imagine they're being manipulated in user versions of the @code{input()} and @code{unput()} functions. If so, what you need to do is analyze those functions to figure out -what they're doing, and then replace input() with an appropriate definition of -YY_INPUT (see the flex man page). You shouldn't need to (and must not) replace -flex's unput() function. +what they're doing, and then replace @code{input()} with an appropriate definition of +@code{YY_INPUT}. You shouldn't need to (and must not) replace +@code{flex}'s @code{unput()} function. @node Is there a way to make flex treat NULL like a regular character? @unnumberedsec Is there a way to make flex treat NULL like a regular character? -Is there a way to make flex treat NULL like a regular character? - -Yes, \0 and \x00 should both do the trick. Perhaps you have an ancient -version of flex. The latest release is version @value{VERSION}. +Yes, @samp{\0} and @samp{\x00} should both do the trick. Perhaps you have an ancient +version of @code{flex}. The latest release is version @value{VERSION}. @node Whenever flex can not match the input it says "flex scanner jammed". @unnumberedsec Whenever flex can not match the input it says "flex scanner jammed". -Whenever flex can not match the input it says "flex scanner jammed". - You need to add a rule that matches the otherwise-unmatched text. e.g., @@ -5450,7 +5403,7 @@ e.g., @end verbatim @end example -See %option default for more information. +See @code{%option default} for more information. @node Why doesnt flex have non-greedy operators like perl does? @unnumberedsec Why doesn't flex have non-greedy operators like perl does? @@ -5468,26 +5421,27 @@ Better is to either introduce a separate parser, or to split the scanner into multiple scanners using (exclusive) start conditions. You might have -a separate start state once you've seen the BEGIN. In that state, you -might then have a regex that will match END (to kick you out of the -state), and perhaps (.|\n) to get a single character within the chunk ... +a separate start state once you've seen the @samp{BEGIN}. In that state, you +might then have a regex that will match @samp{END} (to kick you out of the +state), and perhaps @samp{(.|\n)} to get a single character within the chunk ... This approach also has much better error-reporting properties. @node Memory leak - 16386 bytes allocated by malloc. @unnumberedsec Memory leak - 16386 bytes allocated by malloc. @anchor{faq-memory-leak} -UPDATED 2002-07-10: As of flex version 2.5.9, this leak means that you did not -call yylex_destroy(). If you are using an earlier version of flex, then read + +UPDATED 2002-07-10: As of @code{flex} version 2.5.9, this leak means that you did not +call @code{yylex_destroy()}. If you are using an earlier version of @code{flex}, then read on. The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the read-buffer, and -about 40 for struct yy_buffer_state (depending upon alignment). The leak is in +about 40 for @code{struct yy_buffer_state} (depending upon alignment). The leak is in the non-reentrant C scanner only (NOT in the reentrant scanner, NOT in the C++ -scanner). Since flex doesn't know when you are done, the buffer is never freed. +scanner). Since @code{flex} doesn't know when you are done, the buffer is never freed. However, the leak won't multiply since the buffer is reused no matter how many -times you call yylex(). +times you call @code{yylex()}. If you want to reclaim the memory when you are completely done scanning, then you might try this: @@ -5500,9 +5454,10 @@ yy_init = 1; @end verbatim @end example -Note: yy_init is an "internal variable", and hasn't been tested in this +Note: @code{yy_init} is an "internal variable", and hasn't been tested in this situation. It is possible that some other globals may need resetting as well. +@c faq edit stopped here @node How do I track the byte offset for lseek()? @unnumberedsec How do I track the byte offset for lseek()? -- 2.40.0