2.Format of the lex input file with Examples

The flex input file consists of three sections, separated by a line with just
'%%' in it:

**********************************************************************

definitions
%%
rules
%%
user code

********************************************************************************

definitions section

The definitions section contains declarations of simple name definitions to simplify the scanner specification, and declarations of start conditions.

Name definitions have the form

name definition

The "name" is a word beginning with a letter or an underscore ('_') followed by zero or more letters, digits, '_', or '-' (dash). The definition is taken to begin at the first non-white-space character following the name and continuing to the end of the line. The definition can subsequently be referred to using "{name}", which will expand to "(definition)". For example,

DIGIT [0-9]

defines "DIGIT" to be a regular expression which matches a single digit,

A subsequent reference to

{DIGIT}+"."{DIGIT}*

is identical to
([0-9])+"."([0-9])*

and matches one-or-more digits followed by a '.' followed by zero-or-more digits.

rules section

The rules section of the flex input contains a series of rules of the form:

pattern action

where the pattern must be unindented and the action must begin on the same line.

Pattern is usually a regular expression. Actions are simple C code.( or a return statement which will return a token)

Example

[a-zA-Z] printf(“alphabet”);

user code section

the user code section is simply copied to `lex.yy.c' verbatim. It is used for companion routines which call or are called by the scanner. The presence of this section is optional; if it is missing, the second `%%' in the input file may be skipped, too.

In the definitions and rules sections, any indented text or text enclosed in `%{' and `%}' is copied verbatim to the output (with the `%{}''s removed). The `%{%}''s must appear unindented on lines by themselves.

In the rules section, any indented or %{%} text appearing before the first rule may be used to declare variables which are local to the scanning routine and (after the declarations) code which is to be executed whenever the scanning routine is entered. Other indented or %{%} text in the rule section is still copied to the output, but its meaning is not well-defined and it may well cause compile-time errors (this feature is present for POSIX compliance; see below for other such features).

In the definitions section (but not in the rules section), an unindented comment (i.e., a line beginning with "/*") is also copied verbatim to the output up to the next "*/".

Example:Count number of lines and number of characters

int num_lines = 0, num_chars =0;

%option noyywrap
%%
\n ++num_lines; ++num_chars;
. ++num_chars;
%%
main()
{
yylex();
printf( "No of lines = %d, No of chars = %d\n",num_lines, num_chars);
}

This scanner counts the number of characters and the number of lines in its input (it produces no output other than the final report on the counts). The first line declares two globals, "num_lines" and "num_chars", which are accessible both inside `yylex()' and in the `main()' routine declared after the second "%%". There are two rules, one which matches a newline ("\n") and increments both the line count and the character count, and one which matches any character other than a newline (indicated by the "." regular expression).

Note: save the lex programs in lwc.lex .Run lex lwc.lex, this will generate the lexical analyser lex.yy.c. Compile it using gcc lex.yy.c -lfl.This will generate the default a.out output file which you can run by typing ./a.out

The following program will recognize identifiers (letter followed by letter or digit)

ID [a-z][a-z0-9]*

{ID} printf("identifier");

.* printf("invalid");

main()

{

yylex();

}

Counting number of identifiers
digit    [0-9]
letter [A-Za-z]
%{
    int count;
%}
%%
    /* match identifier */
{letter}({letter}|{digit})* count++;
.*         ;
%%
int main(void)
{
    yylex();
    printf("number of identifiers = %d\n", count);
    return 0;
}

Search This Blog

KTU Compiler Lab Semester 7 CSL 411 - Dr Binu V P

2.Format of the lex input file with Examples

definitions section

Comments

Post a Comment

Popular posts from this blog

KTU Compiler Lab CSL411 - Dr Binu V P

lexical analyzer for a c program

count frequency of occurrence of a word - lex program