12.Writing yacc programs
... definitions ...
%%
... rules ...
%%
... subroutines ...
Input to yacc is divided into three sections. The
definitions section consists of token declarations
and C code
bracketed by “%{“ and “%}”. The BNF grammar is placed in the
rules section and
user subroutines are added in the subroutines
section.
This is best illustrated by constructing a small
calculator that can add and subtract numbers. We’ll
begin by
examining the linkage between lex and yacc.
Here is the definitions section for the yacc
input file:
%token INTEGER
This definition declares an INTEGER token. Yacc
generates a parser in file y.tab.c and an
include file, y.tab.h:
Lex includes this file and utilizes the
definitions for token values. To obtain tokens yacc calls
yylex.
Function yylex has a return type of int that returns a token. Values
associated with the
token are returned by lex in variable yylval.
For example,
[0-9]+
{
yylval =
atoi(yytext);
return INTEGER;
}
would store the value of the integer in yylval,
and return token INTEGER to yacc. The type of
yylval is determined
by YYSTYPE. Since the default type is integer this works well in this
case.
Token values 0-255 are reserved for character values. For
example, if you had a rule such as
[-+]
return *yytext;
/* return operator */
the character value for minus or
plus is returned. Note that we placed the minus sign first so that
it wouldn’t be mistaken for a range designator. Generated token
values typically start around 258
because lex reserves several
values for end-of-file and error processing. Here is the complete
lex
input specification for our calculator:
%{
#include <stdlib.h>
#include "y.tab.h"
%}
%option noyywrap
%%
[0-9]+ {
yylval=atoi(yytext);
return INTEGER;
}
[-+\n] return *yytext;
[ \t] ; /* skip whitespace */
. printf("invalid character");
%%
Internally yacc maintains two
stacks in memory; a parse stack and a value stack. The parse stack
contains terminals and nonterminals that represent the current
parsing state. The value stack is
an array of YYSTYPE elements and
associates a value with each element in the parse stack. For
example
when lex returns an INTEGER token yacc shifts this token to the parse
stack. At the
same time the corresponding yylval is shifted to the
value stack. The parse and value stacks are
always synchronized so
finding a value related to a token on the stack is easily
accomplished.
Here is the yacc input specification for our
calculator:
%{
#include <stdio.h>
%}
%token INTEGER
%%
program:program expr '\n' { printf("%d\n",
$2); }
|
;
expr : INTEGER { $$ = $1; }
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
;
%%
yyerror(char *s)
{
printf("%s\n",s);
}
main()
{
yyparse();
}
The rules section resembles the
BNF grammar discussed earlier. The left-hand side of a
production,
or nonterminal, is entered left-justified and followed by a colon.
This is followed by the
right-hand side of the production. Actions
associated with a rule are entered in braces.
With left-recursion, we
have specified that a program consists of zero or more expressions.
Each
expression terminates with a newline. When a newline is
detected we print the value of the
expression. When we apply the
rule
expr: expr '+' expr
{ $$ = $1 +
$3; }
we replace the right-hand side of
the production in the parse stack with the left-hand side of the
same production. In this case we pop “expr '+' expr” and push
“expr”. We have reduced the
stack by popping three terms off the
stack and pushing back one term. We may reference
positions in the
value stack in our C code by specifying “$1” for the first term
on the right-hand
side of the production, “$2” for the second,
and so on. “$$” designates the top of the stack after
reduction
has taken place. The above action adds the value associated with two
expressions,
pops three terms off the value stack, and pushes back a
single sum. As a consequence the parse
and value stacks remain
synchronized.
Numeric values are initially
entered on the stack when we reduce from INTEGER to expr. After INTEGER is shifted to the stack we
apply the rule
expr: INTEGER { $$ = $1; }
The INTEGER token is popped off
the parse stack followed by a push of expr. For the value
stack we
pop the integer value off the stack and then push it back on again.
In other words we do
nothing. In fact this is the default action and
need not be specified. Finally, when a newline is
encountered, the
value associated with expr is printed.
In the event of syntax errors yacc
calls the user-supplied function yyerror. If you need to modify
the
interface to yyerror then alter the canned file that yacc includes to
fit your needs. The last
function in our yacc specification is main
which calls yypase() function. This
example still has an ambiguous
grammar. Although yacc will issue shift-reduce warnings it will
still process the grammar using shift as the default operation.
Program Execution
1.Save the lex program in file
calc.lex
2.Save the yacc program in file
calc.y
3.Do yacc -d calc.y , this
will generate y.tab.c and y.tab.h
( if you are using bison -d
calc.y , this will generate calc.tab.c and calc.tab.h)
4.Do lex calc.lex ,
this will generate lex.yy.c
5.Compile
using gcc y.tab.c lex.yy.c -o calc ,
this will generate output file in calc
6.Now
run the program by typing ./calc
Comments
Post a Comment