6.Start Conditions in Lex
flex provides a mechanism for
conditionally activating rules. Any rule whose pattern is prefixed
with "<sc>" will only be active when the scanner is
in the start condition named "sc". For example,
<STRING>[^"]* { /*
eat up the string body ... */
...
}
will be active only when the scanner is in the
"STRING" start condition, and
<INITIAL,STRING,QUOTE>\. {
/* handle an escape ... */
...
}
will be active only when the current start
condition is either "INITIAL", "STRING", or
"QUOTE".
Start conditions are declared in
the definitions (first) section of the input using unintended lines
beginning with either %s or %x followed by a list of names. The
former declares inclusive start conditions, the latter exclusive
start conditions.
A start condition is activated
using the BEGIN action. Until the next BEGIN action is executed,
rules with the given start condition will be active and rules
with other start conditions will be inactive.
If the start condition is
inclusive, then rules with no start conditions at all will also be
active.If it is exclusive, then only rules qualified with the start
condition will be active. A set of rules contingent on the same
exclusive start condition describe a scanner which is independent of
any of the other rules in the flex input. Because of this,
exclusive start conditions make it easy to specify "mini-scanners"
which scan portions of the input that are syntactically
different from the rest (e.g., comments).
%s example
%%
<example>foo do_something();
bar something_else();
is equivalent to
%x example
%%
<example>foo do_something();
<INITIAL,example>bar something_else();
Also note that the special start-condition specifier <*> matches every start condition. Thus, the above example could also have been written;
%x example
%%
<example>foo do_something();
<*>bar something_else();
BEGIN(0) returns to the original
state where only the rules with no start conditions are active.
This state can also be referred to as the start-condition "INITIAL",
so BEGIN(INITIAL) is equivalent to BEGIN(0). The parentheses around
the start condition name are not required but are considered good
style.)
BEGIN actions can also be given as
indented code at the beginning of the rules section. For
example, the following will cause the scanner to enter the "SPECIAL"
start condition whenever yylex() is called and the global variable
enter_special is true:
int
enter_special;
%x SPECIAL
%%
if (
enter_special )
BEGIN(SPECIAL);
<SPECIAL>blahblahblah
...more rules follow...
To illustrate the uses of
start conditions, here is a scanner which provides two different
interpretations of a string like "123.456". By default
it will treat it as three tokens, the integer "123", a
dot (’.’), and the integer "456". But if the string
is preceded earlier in the line by the string "expect-floats"
it will treat it as a single token, the floating-point number
123.456:
%{
#include <math.h>
%}
%s expect
%%
expect-floats
BEGIN(expect);
<expect>[0-9]+"."[0-9]+
{
printf( "found a float, = %f\n",
atof( yytext ) );
}
<expect>\n
{
/* that’s
the end of the line, so
* we need
another "expect-number"
* before
we’ll recognize any more
* numbers
*/
BEGIN(INITIAL);
}
[0-9]+ {
printf( "found an integer, = %d\n",
atoi( yytext ) );
}
"."
printf( "found a dot\n" );
Here is a scanner which recognizes (and discards) C comments while maintaining a count of the current input line.
%x comment
%%
int line_num = 1;
"/ *" BEGIN(comment);
<comment>[^*\n]* /* eat anything that’s not a ’*’ */
<comment>"*"+[^*/\n]* /* eat up ’*’s not followed by ’/’s */
<comment>\n ++line_num;
<comment>"*"+"/" BEGIN(INITIAL);
The following program will display all header files from the C source program 'test.c'
%x head
%%
"#include" BEGIN(head);
<head>[ \t]*<
<head>[^<>]* printf("header-%s\n",yytext);
<head>[>] BEGIN(0);
[^#]*
%%
main()
{
yyin=fopen("test.c","r");
yylex();
}
The following program will remove all multi line comments from a C program 't.c' and write to standard output.
%x comm
%%
"/*" {BEGIN(comm);}
<comm>[^*]*
<comm>"*"+[^/]
<comm>"*"+"/" BEGIN(INITIAL);
%%
main()
{
yyin=fopen("t.c","r");
yylex();
}
Comments
Post a Comment