flexdoc \- documentation for flex, fast lexical analyzer generator
.SH SYNOPSIS
.B flex
-.B [\-abcdfhinpstvwBFILTV78+ \-C[efFmr] \-Pprefix \-Sskeleton]
+.B [\-abcdfhilnpstvwBFILTV78+ \-C[efFmr] \-Pprefix \-Sskeleton]
.I [filename ...]
.SH DESCRIPTION
.I flex
or
.B %array
in the first (definitions) section of your flex input. The default is
-.B %pointer.
+.B %pointer,
+unless you use the
+.B -l
+lex compatibility option, in which case
+.B yytext
+will be an array.
The advantage of using
.B %pointer
is substantially faster scanning and no buffer overflow when matching
.I yytext
will have the preserved case (i.e., it will not be folded).
.TP
+.B \-l
+turns on maximum compatibility with the original AT&T
+.I lex
+implementation. Note that this does not mean
+.I full
+compatibility. Use of this option costs a considerable amount of
+performance, and it cannot be used with the
+.B \-+, -f, -F, -Cf,
+or
+.B -CF
+options. For details on the compatibilities it provides, see the section
+"Incompatibilities With Lex And POSIX" below.
+.TP
.B \-n
is another do-nothing, deprecated option included only for
POSIX compliance.
causes the generated scanner to
.I bypass
use of the standard I/O library (stdio) for input. Instead of calling
-.B fread(),
+.B fread()
+or
+.B getc(),
the scanner will use the
.B read()
system call, resulting in a performance gain which varies from system
.fi
.SH INCOMPATIBILITIES WITH LEX AND POSIX
.I flex
-is a rewrite of the Unix
+is a rewrite of the AT&T Unix
.I lex
tool (the two implementations do not share any code, though),
with some extensions and incompatibilities, both of which
are of concern to those who wish to write scanners acceptable
-to either implementation. At present, the POSIX
+to either implementation. The POSIX
.I lex
-draft is
-close to the original
+specification is closer to
+.I flex's
+behavior than that of the original
.I lex
-implementation, so some of these
-incompatibilities are also in conflict with the POSIX draft. But
-the intent is that ultimately
+implementation, but there also remain some incompatibilities between
.I flex
-will be fully POSIX-conformant. Please bear in
-mind that all the comments which follow are with regard to the POSIX
-.I draft
-of Spring 1990 (draft 10), and not the final document (or subsequent
-drafts); they are included so
+and POSIX. The intent is that ultimately
.I flex
-users can be aware of the standardization issues and those areas where
-.I flex
-may in the near future undergo changes incompatible with
-its current definition.
+will be fully POSIX-conformant. In this section we discuss all of
+the known areas of incompatibility.
+.PP
+.I flex's
+.B \-l
+option turns on maximum compatibility with the original AT&T
+.I lex
+implementation, at the cost of a major loss in the generated scanner's
+performance. We note below which incompatibilities can be overcome
+using the
+.B \-l
+option.
.PP
.I flex
is fully compatible with
.I lex
scanner internal variable
.B yylineno
-is not supported. It is difficult to support this option efficiently,
-since it requires examining every character scanned and reexamining
-the characters when the scanner backs up.
-Things get more complicated when the end of buffer or file is reached or a
-NUL is scanned (since the scan must then be restarted with the proper line
-number count), or the user uses the yyless(), unput(), or REJECT actions,
-or the multiple input buffer functions.
-.IP
-The fix is to add rules which, upon seeing a newline, increment
-yylineno. This is usually an easy process, though it can be a drag if some
-of the patterns can match multiple newlines along with other characters.
+is not supported unless
+.B \-l
+is used.
.IP
-yylineno is not part of the POSIX draft.
+yylineno is not part of the POSIX specification.
.IP -
The
.B input()
.I flex
restriction that
.B input()
-cannot be redefined is in accordance with the POSIX draft, but
-.B YY_INPUT
-has not yet been accepted into the draft (and probably won't; it looks
-like the draft will simply not specify any way of controlling the
+cannot be redefined is in accordance with the POSIX specification,
+which simply does not specify any way of controlling the
scanner's input other than by making an initial assignment to
-.I yyin).
+.I yyin.
.IP -
.I flex
scanners are not as reentrant as
.I stdout).
.IP
.B output()
-is not part of the POSIX draft.
+is not part of the POSIX specification.
.IP -
.I lex
does not support exclusive start conditions (%x), though they
-are in the current POSIX draft.
+are in the POSIX specification.
.IP -
When definitions are expanded,
.I flex
.I flex
definition.
.IP
-The POSIX draft interpretation is the same as
-.I flex's.
-.IP -
-To specify a character class which matches anything but a left bracket (']'),
-in
+Using
+.B \-l
+results in the
.I lex
-one can use "[^]]" but with
-.I flex
-one must use "[^\\]]". The latter works with
-.I lex,
-too.
+behavior of no parentheses around the definition.
+.IP
+The POSIX specification is that the definition be enclosed in parentheses.
.IP -
The
.I lex
.B %r
(generate a Ratfor scanner) option is not supported. It is not part
-of the POSIX draft.
+of the POSIX specification.
.IP -
After a call to
.B unput(),
.B %array.
This is not the case with
.I lex
-or the present POSIX draft.
+or the POSIX specification. The
+.B \-l
+option does away with this incompatibility.
.IP -
The precedence of the
.B {}
.I flex
interprets it as "match 'ab'
followed by one, two, or three occurrences of 'c'". The latter is
-in agreement with the current POSIX draft.
+in agreement with the POSIX specification.
.IP -
The precedence of the
.B ^
or 'bar' anywhere", whereas
.I flex
interprets it as "match either 'foo' or 'bar' if they come at the beginning
-of a line". The latter is in agreement with the current POSIX draft.
+of a line". The latter is in agreement with the POSIX specification.
.IP -
.I yyin
is
scanners,
.I yyin
does not have a valid value until the scanner has been called.
+.IP
+The
+.B \-l
+option does away with this incompatibility.
.IP -
The special table-size declarations such as
.B %a
.I flex
features are not included in
.I lex
-or the POSIX draft standard:
+or the POSIX specification:
.nf
yyterminate()
%% fread()/read() definition of YY_INPUT goes here unless we're doing C++
%+ C++ definition
if ( (result = LexerInput( (char *) buf, max_size )) < 0 ) \
-%*
YY_FATAL_ERROR( "input in flex scanner failed" );
+%*
#endif
/* No semi-colon after return; correct usage is to write "yyterminate();" -
*/
int yy_n_chars;
+ /* Whether this is an "interactive" input source; if so, and
+ * if we're using stdio for input, then we want to use getc()
+ * instead of fread(), to make sure we stop fetching input after
+ * each newline.
+ */
+ int is_interactive;
+
/* Whether we've seen an EOF on this buffer. */
int yy_eof_status;
#define EOF_NOT_SEEN 0
int yyleng;
-FILE *yyin = (FILE *) 0, *yyout = (FILE *) 0;
+%% yyin/yyout and (if -l option) yylineno definition & initialization goes here
/* Points to current character in buffer. */
static char *yy_c_buf_p = (char *) 0;
}
#endif
+%% code for yylineno update goes here, if -l option
+
do_action: /* This label is used only to access EOF actions. */
%% debug code goes here
*--yy_cp = (char) c;
+%% update yylineno here, if doing -l
+
/* Note: the formal parameter *must* be called "yy_bp" for this
* macro to now work correctly.
*/
b->yy_buf_pos = &b->yy_ch_buf[1];
+ b->is_interactive = file ? isatty( fileno(file) ) : 0;
+
b->yy_eof_status = EOF_NOT_SEEN;
}
* spprdflt - if true (-s), suppress the default rule
* interactive - if true (-I), generate an interactive scanner
* caseins - if true (-i), generate a case-insensitive scanner
+ * lex_compat - if true (-l), maximize compatibility with AT&T lex
* useecs - if true (-Ce flag), use equivalence classes
* fulltbl - if true (-Cf flag), don't compress the DFA state table
* usemecs - if true (-Cm flag), use meta-equivalence classes
*/
extern int printstats, syntaxerror, eofseen, ddebug, trace, nowarn, spprdflt;
-extern int interactive, caseins, useecs, fulltbl, usemecs;
+extern int interactive, caseins, lex_compat, useecs, fulltbl, usemecs;
extern int fullspd, gen_line_dirs, performance_report, backing_up_report;
extern int C_plus_plus, long_align, use_read, yytext_is_array, csize;
extern int yymore_used, reject, real_reject, continued_action;
set_indent( 2 );
gen_find_action();
+ skelout();
+ if ( lex_compat )
+ {
+ indent_puts( "if ( yy_act != YY_END_OF_BUFFER )" );
+ indent_up();
+ indent_puts( "{" );
+ indent_puts( "int yyl;" );
+ indent_puts( "for ( yyl = 0; yyl < yyleng; ++yyl )" );
+ indent_up();
+ indent_puts( "if ( yytext[yyl] == '\\n' )" );
+ indent_up();
+ indent_puts( "++yylineno;" );
+ indent_down();
+ indent_down();
+ indent_puts( "}" );
+ indent_down();
+ }
+
skelout();
if ( ddebug )
{
skelout();
gen_NUL_trans();
+ skelout();
+ if ( lex_compat )
+ { /* update yylineno inside of unput() */
+ indent_puts( "if ( c == '\\n' )" );
+ indent_up();
+ indent_puts( "--yylineno;" );
+ indent_down();
+ }
+
skelout();
/* Copy remainder of input to output. */
/* these globals are all defined and commented in flexdef.h */
int printstats, syntaxerror, eofseen, ddebug, trace, nowarn, spprdflt;
-int interactive, caseins, useecs, fulltbl, usemecs;
+int interactive, caseins, lex_compat, useecs, fulltbl, usemecs;
int fullspd, gen_line_dirs, performance_report, backing_up_report;
int C_plus_plus, long_align, use_read, yytext_is_array, csize;
int yymore_used, reject, real_reject, continued_action;
if ( performance_report > 0 )
{
+ if ( lex_compat )
+ {
+ fprintf( stderr,
+"-l AT&T lex compatibility option entails a large performance penalty\n" );
+ fprintf( stderr,
+" and may be the actual source of other reported performance penalties\n" );
+ }
+
if ( performance_report > 1 )
{
if ( interactive )
putc( 'd', stderr );
if ( caseins )
putc( 'i', stderr );
+ if ( lex_compat )
+ putc( 'l', stderr );
if ( performance_report > 0 )
putc( 'p', stderr );
if ( performance_report > 1 )
char *arg, *mktemp();
printstats = syntaxerror = trace = spprdflt = caseins = false;
+ lex_compat = false;
C_plus_plus = backing_up_report = ddebug = fulltbl = fullspd = false;
long_align = nowarn = yymore_used = continued_action = reject = false;
yytext_is_array = yymore_really_used = reject_really_used = false;
caseins = true;
break;
+ case 'l':
+ lex_compat = true;
+ break;
+
case 'L':
gen_line_dirs = false;
break;
interactive = true;
}
+ if ( lex_compat )
+ {
+ if ( C_plus_plus )
+ flexerror( "Can't use -+ with -l option" );
+
+ if ( fulltbl || fullspd )
+ flexerror( "Can't use -f or -F with -l option" );
+
+ /* Don't rely on detecting use of yymore() and REJECT,
+ * just assume they'll be used.
+ */
+ yymore_really_used = reject_really_used = true;
+
+ yytext_is_array = true;
+ use_read = false;
+ }
+
if ( (fulltbl || fullspd) && usemecs )
- flexerror( "full table and -Cm don't make sense together" );
+ flexerror( "-f/-F and -Cm don't make sense together" );
if ( (fulltbl || fullspd) && interactive )
- flexerror( "full table and -I are incompatible" );
+ flexerror( "-f/-F and -I are incompatible" );
if ( fulltbl && fullspd )
- flexerror( "full table and -F are mutually exclusive" );
+ flexerror( "-f and -F are mutually exclusive" );
if ( ! use_stdout )
{
if ( ! C_plus_plus )
{
if ( use_read )
+ {
printf(
"\tif ( (result = read( fileno(yyin), (char *) buf, max_size )) < 0 ) \\\n" );
+ printf(
+ "\t\tYY_FATAL_ERROR( \"input in flex scanner failed\" );\n" );
+ }
+
else
{
printf(
-"\tif ( ((result = fread( (char *) buf, 1, max_size, yyin )) == 0) && \\\n" );
+ "\tif ( yy_current_buffer->is_interactive ) \\\n" );
+ printf(
+ "\t\tresult = (buf[0] = getc( yyin )) == EOF ? 0 : 1; \\\n" );
printf(
-"\t ferror( yyin ) ) \\\n" );
+"\telse if ( ((result = fread( (char *) buf, 1, max_size, yyin )) == 0)\\\n" );
+ printf( "\t\t && ferror( yyin ) ) \\\n" );
+ printf(
+ "\t\tYY_FATAL_ERROR( \"input in flex scanner failed\" );\n" );
}
}
skelout();
+ if ( lex_compat )
+ {
+ printf( "FILE *yyin = stdin, *yyout = stdout;\n" );
+ printf( "extern int yylineno;\n" );
+ printf( "int yylineno = 1;\n" );
+ }
+ else
+ printf( "FILE *yyin = (FILE *) 0, *yyout = (FILE *) 0;\n" );
+
+ skelout();
+
if ( C_plus_plus )
printf( "\n#include \"FlexLexer.h\"\n" );
headcnt = 0;
}
- if ( varlength && headcnt == 0 )
+ if ( lex_compat || (varlength && headcnt == 0) )
{ /* variable trailing context rule */
/* Mark the first part of the rule as the
* accepting "head" part of a trailing
headcnt = 0;
}
- if ( varlength && headcnt == 0 )
+ if ( lex_compat || (varlength && headcnt == 0) )
{
/* Again, see the comment in the rule for
* "re2 re" above.
if ( trlcontxt )
{
- if ( varlength && headcnt == 0 )
+ if ( lex_compat || (varlength && headcnt == 0) )
/* Both head and trail are
* variable-length.
*/
return SECTEND;
}
-^"%pointer".*{NL} ++linenum; yytext_is_array = false;
+^"%pointer".*{NL} {
+ if ( lex_compat )
+ warn( "%pointer incompatible with -l option" );
+ else
+ yytext_is_array = false;
+ ++linenum;
+ }
^"%array".*{NL} {
if ( C_plus_plus )
warn( "%array incompatible with -+ option" );
{ /* push back name surrounded by ()'s */
int len = strlen( nmdefptr );
- if ( nmdefptr[0] == '^' ||
+ if ( lex_compat || nmdefptr[0] == '^' ||
(len > 0 && nmdefptr[len - 1] == '$') )
- {
+ { /* don't use ()'s after all */
PUT_BACK_STRING(nmdefptr, 0);
if ( nmdefptr[0] == '^' )
}
-<FIRSTCCL>"^"/[^-\n] BEGIN(CCL); return '^';
-<FIRSTCCL>"^"/- return '^';
+<FIRSTCCL>"^"/[^-\]\n] BEGIN(CCL); return '^';
+<FIRSTCCL>"^"/("-"|"]") return '^';
<FIRSTCCL>. BEGIN(CCL); RETURNCHAR;
<CCL>-/[^\]\n] return '-';