Skip to main content

Posts

Precedence and conflict resolution in yacc

We had seen that , when you feed yacc a grammar that is not LALR(1), it shows a warning and resolve the conflicts automatically. For a shift/reduce conflict, yacc will choose the shift. In a reduce/reduce conflict, it will reduce using the rule declared first in the file. We can control what happens by explicitly declaring precedence and associativity for operators. Rather than re-writing the grammar to implicitly control the precedence and associativity with a bunch of intermediate non-terminals, we can directly indicate the precedence so that yacc will know how to break ties. In the definitions section, we can add any number of precedence levels, one per line, from lowest to highest, and indicate the associativity (either left, right, or nonassoc). Several terminals can be on the same line to assign them equal precedence. Example: %token T_Int %left '+' %left '*' %right '^' %% E : E '+' E |E '*' E  |E '^' E  …
Recent posts

Writing yacc programs

... definitions ... %% ... rules ... %% ... subroutines ... Input to yacc is divided into three sections. The definitions section consists of token declarations and C code bracketed by “%{“ and “%}”. The BNF grammar is placed in the rules section and user subroutines are added in the subroutines section. This is best illustrated by constructing a small calculator that can add and subtract numbers. We’ll begin by examining the linkage between lex and yacc. Here is the definitions section for the yacc input file: %token INTEGER This definition declares an INTEGER token. Yacc generates a parser in file y.tab.c and an include file, y.tab.h: Lex includes this file and utilizes the definitions for token values. To obtain tokens yacc calls yylex. Function yylex has a return type of int that returns a token. Values associated with the token are returned by lex in variable yylval. For example, [0-9]+  {              yylval = atoi(yytext);              return INTEGER;      …

Grammar specification in yacc

Grammars for yacc are described using a variant of Backus Naur Form (BNF). This technique, pioneered by John Backus and Peter Naur, was used to describe ALGOL60. A BNF grammar can be used to express context-free languages. Most constructs in modern programming languages can be represented in BNF. For example, the grammar for an expression that multiplies and adds numbers is E -> E + E E -> E * E E -> id Three productions have been specified. Terms that appear on the left-hand side (lhs) of a production, such as E (expression) are nonterminals. Terms such as id (identifier) are terminals (tokens returned by lex) and only appear on the right-hand side (rhs) of a production. This grammar specifies that an expression may be the sum of two expressions, the product of two expressions, or an identifier. We can use this grammar to generate expressions: E -> E*E (r2) E->E*z (r3) E- >E+E*z (r1) E->E+y*z (r3) E->x+y*z (r3) At each step we expa…

Introduction to Yacc

Yacc is a parser generator. A parser generator is a program that takes as input a specification of a syntax and produces as output a procedure for recognizing that language. Historically, they are also called compiler-compilers YACC (yet another compiler-compiler) is an LALR(1) (LookAhead, Left-to-right,Rightmost derivation producer with 1 lookahead token) parser generator. YACC was originally designed for being complemented by Lex. An updated version, bison can also be used. yacc is designed for use with C code and generates a parser written in C. The parser is configured for use in conjunction with a lex -generated scanner and relies on standard shared features (token types, yylval , etc.) and calls the function yylex as a scanner co-routine. You provide a grammar specification file, which is traditionally named using a .y extension. You invoke yacc on the .y file and it creates the y.tab.h and y.tab.c files containing a thousand or so lines of intense C code that implements a…

Interfacing flex with yacc

One of the main uses of flex is as a companion to the yacc parser-generator. yacc parsers expect to call a routine named yylex() to find the next input token. The routine is supposed to return the type of the next token as well as putting any associated value in the global yylval. To use flex with yacc, one specifies the -d option to yacc to instruct it to generate the file y.tab.h containing definitions of all the %tokens appearing in the yacc input. This file is then included in the flex scanner. For example, if one of the tokens is "TOK_NUMBER", part of the scanner might look like: %{ #include "y.tab.h" %} %% [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;

Reading and Writing using Files with examples

By default yyin and yyout points to the stdin and stdout.We can set yyin to point to a different file so that flex will read the contents from the file pointed by yyin.Similarly yyout can point to output file.

The following program will find number of lines,words and characters in a text file "text.dat"( similar to wc command in linux)

%{
      int nchar, nword, nline;
%}
%%
\n             { nline++; nchar++; }
[^  \t\n]+ { nword++, nchar += yyleng; }
.               { nchar++; }
%%
int main(void)
{
yyin=fopen("text.dat","r");
yylex();
printf("%d\t%d\t%d\n",nline,nword,nchar);
return 0;
}

The program below reads a text file passed as argument and display each line with line number.
%{
int lineno;
%}
%%
^(.*)\n printf("%4d\t%s", ++lineno, yytext);
%%
int main(int argc, char *argv[])
{
yyin = fopen(argv[1], "r");
yylex();
flose(yyin);
}
The program below reads a data file num.dat wh…

Values available to the user in action section

This section summarizes the various values available to the user in the rule actions. char *yytext holds the text of the current token. It may be modified but not lengthened (you cannot append characters to the end). If the special directive %array appears in the first section of the scanner description, then yytext is instead declared char yytext[YYLMAX], where YYLMAX is a macro definition that you can redefine in the first section if you don’t like the default value (generally 8KB). Using %array results in somewhat slower scanners, but the value of yytext becomes immune to calls to input() and unput(), which potentially destroy its value when yytext is a character pointer. The opposite of %array is %pointer, which is the default. int yyleng holds the length of the current token. FILE *yyin is the file which by default flex reads from. It may be redefined but doing so only makes sense before scanning begins or after an EOF has bee…