10.Introduction to Yacc

Yacc is a parser generator. A parser generator is a program that takes as input a specification of a syntax and produces as output a procedure for recognizing that language.
Historically, they are also called compiler-compilers
YACC (yet another compiler-compiler) is an LALR(1) (LookAhead, Left-to-right,Rightmost derivation producer with 1 lookahead token) parser generator.
YACC was originally designed for being complemented by Lex. An updated version, bison can also be used.
yacc is designed for use with C code and generates a parser written in C. The parser is configured for use in conjunction with a lex -generated scanner and relies on standard shared features (token types, yylval , etc.) and calls the function yylex as a scanner co-routine.

You provide a grammar specification file, which is traditionally named using a .y extension. You invoke yacc on the .y file and it creates the y.tab.h and y.tab.c files containing a thousand or so lines of intense C code that implements an efficient LALR(1) parser for your grammar, including the code for the actions you specified. The file provides an extern function yyparse.y that will attempt to successfully parse a valid sentence. You compile that C file normally, link with the rest of your code, and you have a parser! By default, the parser reads from stdin and writes to stdout , just like a lex generated scanner does.
The patterns in the below diagram(Figure 1) is a file you create with a text editor. Lex will read your patterns and generate C code for a lexical analyzer or scanner. The lexical analyzer matches strings in the input, based on your patterns, and converts the strings to tokens. Tokens are numerical representations of strings, and simplify processing.When the lexical analyzer finds identifiers in the input stream it enters them in a symbol table.The symbol table may also contain other information such as data type (integer or real) and location of each variable in memory. All subsequent references to identifiers refer to the appropriate symbol table index.

The grammar in the below diagram is a text file you create with a text edtior. Yacc will read your grammar and generate C code for a syntax analyzer or parser. The syntax analyzer uses grammar rules that allow it to analyze tokens from the lexical analyzer and create a syntax tree.The syntax tree imposes a hierarchical structure of the tokens. For example, operator precedence and associativity are apparent in the syntax tree. The next step, code generation, does a depth-first walk of the syntax tree to generate code. Some compilers produce machine code, while others, as shown below, output assembly language.


Figure 2 illustrates the file naming conventions used by lex and yacc. We'll assume our goal is to write a BASIC compiler(bas). First, we need to specify all pattern matching rules for lex (bas.l) and grammar rules for yacc (bas.y). Commands to create our compiler, bas.exe, are listed below:

$yacc –d bas.y # create y.tab.h, y.tab.c
$lex bas.l -lfl # create lex.yy.c
$cc lex.yy.c y.tab.c –o bas.exe # compile/link and generate o/p file

Yacc reads the grammar descriptions in bas.y and generates a syntax analyzer (parser), that includes function yyparse, in file y.tab.c. The –d option causes yacc to generate definitions for tokens and place them in file y.tab.h. Lex reads the pattern descriptions in bas.l, includes file y.tab.h, and generates a lexical analyzer, that includes function yylex, in file lex.yy.c.
Finally, the lexer and parser are compiled and linked together to create executable bas.exe.
From main we call yyparse to run the compiler. Function yyparse automatically calls yylex to obtain each token.



Comments

Popular posts from this blog

KTU Compiler Lab CSL411 - Dr Binu V P

lexical analyzer for a c program

count frequency of occurrence of a word - lex program