Posts

Showing posts from June, 2018

13.Precedence and conflict resolution in yacc

We had seen that , when you feed yacc a grammar that is not LALR(1), it shows a warning and resolve the conflicts automatically. For a shift/reduce conflict, yacc will choose the shift. In a reduce/reduce conflict, it will reduce using the rule declared first in the file. We can control what happens by explicitly declaring precedence and associativity for operators. Rather than re-writing the grammar to implicitly control the precedence and associativity with a bunch of intermediate non-terminals, we can directly indicate the precedence so that yacc will know how to break ties. In the definitions section, we can add any number of precedence levels, one per line, from lowest to highest, and indicate the associativity (either left, right, or nonassoc). Several terminals can be on the same line to assign them equal precedence. Example: %token T_Int %left '+' %left '*' %right '^' %% E : E '+' E |E '*'

12.Writing yacc programs

... definitions ... %% ... rules ... %% ... subroutines ... Input to yacc is divided into three sections. The definitions section consists of token declarations and C code bracketed by “%{“ and “%}”. The BNF grammar is placed in the rules section and user subroutines are added in the subroutines section. This is best illustrated by constructing a small calculator that can add and subtract numbers. We’ll begin by examining the linkage between lex and yacc. Here is the definitions section for the yacc input file: %token INTEGER This definition declares an INTEGER token. Yacc generates a parser in file y.tab.c and an include file, y.tab.h: Lex includes this file and utilizes the definitions for token values. To obtain tokens yacc calls yylex. Function yylex has a return type of int that returns a token. Values associated with the token are returned by lex in variable yylval. For example, [0-9]+  {              yylval = atoi(yytext);    

11.Grammar specification in yacc

Grammars for yacc are described using a variant of Backus Naur Form (BNF). This technique, pioneered by John Backus and Peter Naur, was used to describe ALGOL60. A BNF grammar can be used to express context-free languages. Most constructs in modern programming languages can be represented in BNF. For example, the grammar for an expression that multiplies and adds numbers is E -> E + E E -> E * E E -> id Three productions have been specified. Terms that appear on the left-hand side (lhs) of a production, such as E (expression) are nonterminals. Terms such as id (identifier) are terminals (tokens returned by lex) and only appear on the right-hand side (rhs) of a production. This grammar specifies that an expression may be the sum of two expressions, the product of two expressions, or an identifier. We can use this grammar to generate expressions: E -> E*E (r2) E->E*z (r3) E- >E+E*z (r1) E->E+y*z (r3) E->x+y*z (r

10.Introduction to Yacc

Image
Yacc is a parser generator. A parser generator is a program that takes as input a specification of a syntax and produces as output a procedure for recognizing that language. Historically, they are also called compiler-compilers YACC ( yet another compiler-compiler ) is an LALR(1) (LookAhead, Left-to-right,Rightmost derivation producer with 1 lookahead token) parser generator. YACC was originally designed for being complemented by Lex. An updated version, bison can also be used. yacc is designed for use with C code and generates a parser written in C. The parser is configured for use in conjunction with a lex -generated scanner and relies on standard shared features (token types, yylval , etc.) and calls the function yylex as a scanner co-routine. You provide a grammar specification file, which is traditionally named using a .y extension. You invoke yacc on the .y file and it creates the y.tab.h and y.tab.c files containing a thousand or so lines of intense C code that imp

9.Interfacing flex with yacc

One of the main uses of flex is as a companion to the yacc parser-generator. yacc parsers expect to call a routine named yylex() to find the next input token. The routine is supposed to return the type of the next token as well as putting any associated value in the global yylval. To use flex with yacc, one specifies the -d option to yacc to instruct it to generate the file y.tab.h containing definitions of all the %tokens appearing in the yacc input. This file is then included in the flex scanner. For example, if one of the tokens is "TOK_NUMBER", part of the scanner might look like: %{ #include "y.tab.h" %} %% [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;

8.Reading and Writing using Files with examples

By default yyin and yyout points to the stdin and stdout.We can set yyin to point to a different file so that flex will read the contents from the file pointed by yyin.Similarly yyout can point to output file. The following program will find number of lines,words and characters in a text file "text.dat"( similar to wc command in linux) %{       int nchar, nword, nline; %} %% \n             { nline++; nchar++; } [^  \t\n]+ { nword++, nchar += yyleng; } .               { nchar++; } %% int main(void) { yyin=fopen("text.dat","r"); yylex(); printf("%d\t%d\t%d\n",nline,nword,nchar); return 0; } The program below reads a text file passed as argument and display each line with line number. %{ int lineno; %} %% ^(.*)\n printf("%4d\t%s", ++lineno, yytext); %% int main(int argc, char *argv[]) { yyin = fopen(argv[1], "r"); yylex(); flose(yyin); } The program below reads a data f

7.Values available to the user in action section

This section summarizes the various values available to the user in the rule actions. char *yytext holds the text of the current token. It may be modified but not lengthened (you cannot append characters to the end). If the special directive %array appears in the first section of the scanner description, then yytext is instead declared char yytext[YYLMAX], where YYLMAX is a macro definition that you can redefine in the first section if you don’t like the default value (generally 8KB). Using %array results in somewhat slower scanners, but the value of yytext becomes immune to calls to input() and unput(), which potentially destroy its value when yytext is a character pointer. The opposite of %array is %pointer, which is the default. int yyleng holds the length of the current token. FILE *yyin is the file which by default flex reads from. It may be redefined but doing so only makes sense before scanning begi

6.Start Conditions in Lex

flex provides a mechanism for conditionally activating rules. Any rule whose pattern is prefixed with "<sc>" will only be active when the scanner is in the start condition named "sc". For example, <STRING>[^"]* { /* eat up the string body ... */                              ...                           } will be active only when the scanner is in the "STRING" start condition, and <INITIAL,STRING,QUOTE>\. { /* handle an escape ... */ ... } will be active only when the current start condition is either "INITIAL", "STRING", or "QUOTE". Start conditions are declared in the definitions (first) section of the input using unintended lines beginning with either %s or %x followed by a list of names. The former declares inclusive start conditions, the latter exclusive s

0.How To Install Flex and Bison Under Ubuntu

Flex and Bison are unix utilities that can help you to write very fast parsers for arbitrary file formats. If your synaptic package manager currently does not include these packages, you can install Flex and Bison through a simple terminal command. Open a terminal by pressing [CTRL] + [ALT] + T . Then, type one of the following commands. This command is more basic: sudo apt-get update sudo apt-get upgrade sudo apt-get install flex bison which flex /*Sanity check to make sure flex is installed*/ which bison /*Sanity check to make sure bison is installed*/

5.Actions in Lex File

Each pattern in a rule has a corresponding action, which can be any arbitrary C statement. The pattern ends at the first non-escaped whitespace character; the remainder of the line is its action. If the action is empty, then when the pattern is matched the input token is simply discarded. For example, here is the specification for a program which deletes all occurrences of "zap me" from its input: %% "zap me" (It will copy all other characters in the input to the output since they will be matched by the default rule.) Here is a program which compresses multiple blanks and tabs down to a single blank, and throws away whitespace found at the end of a line: %% [ \t]+ putchar( ’ ’ ); [ \t]+$ /* ignore this token */ An action consisting solely of a vertical bar (’|’) means "same as the action for the next rule." Actions can include arbitrary C cod

4.Input Matching in lex

BASIC PRINCIPLES. (1) When the generated scanner is run, it analyses its input looking for strings which match any of its patterns. (2) If the current input can be matched by several expressions, then the ambiguity is resolved by the following rules.      (2.1) The longest match is preferred.      (2.2) The rule given first is preferred. (3) Once the match is determined,     (3.1) the text corresponding to it is available in the global character pointer yytext its length is yyleng and the current line number is yylineno ,     (3.2) and the action corresponding to the matched pattern is then executed,     (3.3) and then the remaining input is scanned for another match. Note that yytext can be defined in two different ways: either as a character pointer or as a character array. You can control which definition flex uses by including one of the special directives %pointer or %array in the first (definitions) section of your flex input.

3.Lex Regular Expressions

A LEX regular expression is a word made of  text characters (letters of the alphabet, digits, ...) operators : " \ { } [ ] ^ $ < > ? . * + | () / An operator can be used as a text character if it preceded with the escape operator (backslash). The quotation marks indicate that whatever is contained between a pair of quotes is to be taken as text characters.  For instance xyz"++" matches the string xyz++. A Character Class is a class of characters specified using the operator pair [ ] The expression [ab] matches the string a or b.  Within square brackets most operators are ignored except the three special characters \ - ^   which are used as follows (a) the escape character \ as above, (b) the minus character - which is used for ranges like in digit [0-9] (c) the hat character ^ as first character after the opening square bracket, it is used for complemented matches like in NOTabc [^abc] Repeated Expressions Repetitions of patterns are indicated by the o

2.Format of the lex input file with Examples

The flex input file consists of three sections, separated by a line with just   ' %%' in it: ********************************************************************** definitions %% rules %% user code ******************************************************************************** definitions section The definitions section contains declarations of simple name definitions to simplify the scanner specification, and declarations of start conditions. Name definitions have the form name definition The "name" is a word beginning with a letter or an underscore ('_') followed by zero or more letters, digits, '_', or '-' (dash). The definition is taken to begin at the first non-white-space character following the name and continuing to the end of the line. The definition can subsequently be referred to using "{name}", which will expand to "(definition)". For example, DIGIT [0-9] defines "DIGIT&

1.Introducion to lex/flex

lex (lexical analyzer) flex (fast lexical analyzer) is a tool for generating scanners: programs which recognized lexical patterns in text. Lex can generate analyzers in either C or Ratfor, a language which can be translated automatically to portable Fortran. It is available on the PDP-11 UNIX, Honeywell GCOS, and IBM OS systems. However, will only discuss generating analyzers in C on the UNIX based system.  flex reads the given input files(.lex), or its standard input if no file names are given, for a description of a scanner to generate. The description is in the form of pairs of regular expressions and C code, called rules. flex generates as output a C source file, lex.yy.c, which defines a routine yylex(). This file is compiled and linked with the -lfl library to produce an executable. When the executable is run, it analyzes its input for occurrences of the regular expressions. Whenever it finds one, it executes the corresponding C code. Lets create a sample lex file log.lex %% use