2.Format of the lex input file with Examples
The
'%%' in it:
flex
input file consists of three
sections, separated by a line with just '%%' in it:
**********************************************************************
definitions%%
rules
%%
user code
********************************************************************************
definitions section
The definitions section contains
declarations of simple name definitions to simplify the
scanner specification, and declarations of start conditions.
Name
definitions have the form
name definition
The "name" is a word beginning with a
letter or an underscore ('_') followed by zero or more letters,
digits, '_', or '-' (dash). The definition is taken to begin at the
first non-white-space character following the name and continuing to
the end of the line. The definition can subsequently be referred to
using "{name}", which will expand to "(definition)".
For example,
defines "DIGIT" to be a regular
expression which matches a single digit,
A subsequent reference to
is identical to
([0-9])+"."([0-9])*
([0-9])+"."([0-9])*
and matches one-or-more digits followed by a '.' followed by
zero-or-more digits.
rules section
The rules section of the flex
input
contains a series of rules of the form:pattern action
where the pattern must be unindented and the action must begin on
the same line.
Pattern is usually a regular expression. Actions are simple C
code.( or a return statement which will return a token)
Example
[a-zA-Z]
printf(“alphabet”);
user code section
the user code section is simply copied to
`lex.yy.c' verbatim. It is used for companion routines which
call or are called by the scanner. The presence of this section is
optional; if it is missing, the second `%%' in the input
file may be skipped, too.
In the definitions and rules sections, any
indented text or text enclosed in `%{' and `%}'
is copied verbatim to the output (with the `%{}''s
removed). The `%{%}''s must appear unindented on lines by
themselves.
In the rules section, any indented or %{%} text
appearing before the first rule may be used to declare variables
which are local to the scanning routine and (after the declarations)
code which is to be executed whenever the scanning routine is
entered. Other indented or %{%} text in the rule section is still
copied to the output, but its meaning is not well-defined and it may
well cause compile-time errors (this feature is present for
POSIX
compliance; see below for other such features).
In the definitions section (but not in the rules
section), an unindented comment (i.e., a line beginning with "/*")
is also copied verbatim to the output up to the next "*/".
Example:Count number of lines and number of characters
%{
int num_lines = 0, num_chars =0;
%}
%option noyywrap
%%
\n ++num_lines; ++num_chars;
. ++num_chars;
%%
main()
{
yylex();
printf( "No of lines = %d, No of chars = %d\n",num_lines, num_chars);
}
%%
\n ++num_lines; ++num_chars;
. ++num_chars;
%%
main()
{
yylex();
printf( "No of lines = %d, No of chars = %d\n",num_lines, num_chars);
}
This scanner counts the number of characters and
the number of lines in its input (it produces no output other than
the final report on the counts). The first line declares two globals,
"num_lines" and "num_chars", which are accessible
both inside `yylex()' and in the `main()'
routine declared after the second "%%". There are two
rules, one which matches a newline ("\n") and increments
both the line count and the character count, and one which matches
any character other than a newline (indicated by the "."
regular expression).
Note: save the lex programs in lwc.lex .Run lex lwc.lex, this will generate the lexical analyser lex.yy.c. Compile it using gcc lex.yy.c -lfl.This will generate the default a.out output file which you can run by typing ./a.out
The following program will recognize identifiers
(letter followed by letter or digit)
ID [a-z][a-z0-9]*
%%
{ID} printf("identifier");
.* printf("invalid");
%%
main()
{
yylex();
}
Counting number of identifiers
digit [0-9]
letter [A-Za-z]
%{
int count;
%}
%%
/* match identifier */
{letter}({letter}|{digit})* count++;
.* ;
%%
int main(void)
{
yylex();
printf("number of identifiers = %d\n", count);
return 0;
}
Counting number of identifiers
digit [0-9]
letter [A-Za-z]
%{
int count;
%}
%%
/* match identifier */
{letter}({letter}|{digit})* count++;
.* ;
%%
int main(void)
{
yylex();
printf("number of identifiers = %d\n", count);
return 0;
}
Comments
Post a Comment