10.Introduction to Yacc
Yacc is a parser generator. A parser generator is
a program that takes as input a specification of a syntax and
produces as output a procedure for recognizing that language.
Historically, they are also called
compiler-compilers
YACC (yet another compiler-compiler) is an
LALR(1) (LookAhead, Left-to-right,Rightmost derivation producer with 1 lookahead
token) parser generator.
YACC was originally designed for being
complemented by Lex. An updated version, bison can also be
used.
yacc is designed for use with C
code and generates a parser written in C. The parser is configured
for use in conjunction with a lex -generated scanner and relies on
standard shared features (token types, yylval , etc.) and calls the
function yylex as a scanner co-routine.
You provide a grammar
specification file, which is traditionally named using a .y
extension. You invoke yacc on the .y file and it creates the y.tab.h
and y.tab.c files containing a thousand or so lines of intense
C code that implements an efficient LALR(1) parser for your grammar,
including the code for the actions you specified. The file provides
an extern function yyparse.y that will attempt to successfully parse
a valid sentence. You compile that C file normally, link with the
rest of your code, and you have a parser! By default, the parser
reads from stdin and writes to stdout , just like a lex generated
scanner does.
The patterns in the below
diagram(Figure 1) is a file you create with a text editor. Lex will
read your patterns and generate C code for a lexical analyzer or
scanner. The lexical analyzer matches strings in the input, based on
your patterns, and converts the strings to tokens. Tokens are
numerical representations of strings, and simplify processing.When
the lexical analyzer finds identifiers in the input stream it enters
them in a symbol table.The symbol table may also contain other
information such as data type (integer or real) and location of each
variable in memory. All subsequent references to identifiers refer to
the appropriate symbol table index.
The grammar in the below
diagram is a text file you create with a text edtior. Yacc will read
your grammar and generate C code for a syntax analyzer or parser. The
syntax analyzer uses grammar rules that allow it to analyze tokens
from the lexical analyzer and create a syntax tree.The syntax tree
imposes a hierarchical structure of the tokens. For example,
operator precedence and associativity are apparent in the syntax
tree. The next step, code generation, does a depth-first walk of the
syntax tree to generate code. Some compilers produce machine code,
while others, as shown below, output assembly language.
Figure 2 illustrates the file
naming conventions used by lex and yacc. We'll assume our goal is to
write a BASIC compiler(bas). First, we need to specify all pattern
matching rules for lex (bas.l) and
grammar rules for yacc (bas.y).
Commands to create our compiler, bas.exe, are listed below:
$yacc –d bas.y
# create
y.tab.h, y.tab.c
$lex bas.l
-lfl # create
lex.yy.c
$cc lex.yy.c y.tab.c –o bas.exe
# compile/link
and generate o/p file
Yacc reads the grammar
descriptions in bas.y and generates a syntax analyzer (parser), that
includes function yyparse, in file y.tab.c. The –d
option causes
yacc to generate definitions for tokens and place them in file
y.tab.h. Lex reads the
pattern descriptions in bas.l, includes file
y.tab.h, and generates a lexical analyzer, that includes
function
yylex, in file lex.yy.c.
Finally, the lexer and parser are compiled and
linked together to create executable bas.exe.
From main we call yyparse to run the compiler.
Function yyparse automatically calls yylex to
obtain each token.
Comments
Post a Comment