2.Format of the lex input file with Examples


The flex input file consists of three sections, separated by a line with just  
'%%' in it:
**********************************************************************
definitions
%%
rules
%%
user code
********************************************************************************

definitions section

The definitions section contains declarations of simple name definitions to simplify the scanner specification, and declarations of start conditions.
Name definitions have the form
name definition
The "name" is a word beginning with a letter or an underscore ('_') followed by zero or more letters, digits, '_', or '-' (dash). The definition is taken to begin at the first non-white-space character following the name and continuing to the end of the line. The definition can subsequently be referred to using "{name}", which will expand to "(definition)". For example,

DIGIT [0-9]

defines "DIGIT" to be a regular expression which matches a single digit,
A subsequent reference to

{DIGIT}+"."{DIGIT}* 
is identical to
([0-9])+"."([0-9])* 
and matches one-or-more digits followed by a '.' followed by zero-or-more digits.

rules section
The rules section of the flex input contains a series of rules of the form:
 
pattern action 

where the pattern must be unindented and the action must begin on the same line.
Pattern is usually a regular expression. Actions are simple C code.( or a return statement which will return a token)

Example
[a-zA-Z] printf(“alphabet”);

user code section
the user code section is simply copied to `lex.yy.c' verbatim. It is used for companion routines which call or are called by the scanner. The presence of this section is optional; if it is missing, the second `%%' in the input file may be skipped, too.
In the definitions and rules sections, any indented text or text enclosed in `%{' and `%}' is copied verbatim to the output (with the `%{}''s removed). The `%{%}''s must appear unindented on lines by themselves.
In the rules section, any indented or %{%} text appearing before the first rule may be used to declare variables which are local to the scanning routine and (after the declarations) code which is to be executed whenever the scanning routine is entered. Other indented or %{%} text in the rule section is still copied to the output, but its meaning is not well-defined and it may well cause compile-time errors (this feature is present for POSIX compliance; see below for other such features).

In the definitions section (but not in the rules section), an unindented comment (i.e., a line beginning with "/*") is also copied verbatim to the output up to the next "*/".

Example:Count number of lines and number of characters
%{
int num_lines = 0, num_chars =0;
%}
%option noyywrap
%%
\n            ++num_lines; ++num_chars;
.               ++num_chars;
%%
main()
{
yylex();
printf( "No of lines = %d, No of chars = %d\n",num_lines, num_chars);
}
This scanner counts the number of characters and the number of lines in its input (it produces no output other than the final report on the counts). The first line declares two globals, "num_lines" and "num_chars", which are accessible both inside `yylex()' and in the `main()' routine declared after the second "%%". There are two rules, one which matches a newline ("\n") and increments both the line count and the character count, and one which matches any character other than a newline (indicated by the "." regular expression).

Note: save the lex programs in lwc.lex .Run lex lwc.lex, this will generate the lexical analyser lex.yy.c. Compile it using gcc lex.yy.c -lfl.This will generate the default a.out output file which you can run by typing ./a.out

The following program will recognize identifiers (letter followed by letter or digit)
ID [a-z][a-z0-9]*
%%
{ID}         printf("identifier");
.*              printf("invalid");
%%
main()
{
yylex();
}

Counting number of  identifiers
digit    [0-9]
letter [A-Za-z]
%{
    int count;
%}
%%
    /* match identifier */
{letter}({letter}|{digit})*  count++;
.*         ;
%%
int main(void)
{
    yylex();
    printf("number of identifiers = %d\n", count);
    return 0;

Comments

Popular posts from this blog

KTU Compiler Lab CSL411 - Dr Binu V P

lexical analyzer for a c program

count frequency of occurrence of a word - lex program