Module 2
Module 2
Module 2
Tech/S6
MODULE 2
A lexical analyzer can identify tokens with the help of regular expressions and pattern rules.
But a lexical analyzer cannot check the syntax of a given sentence due to the limitations of the
regular expressions.
Regular expressions cannot check balancing tokens, such as parenthesis. Therefore, this phase
uses context-free grammar (CFG), which is recognized by push-down automata.
1. Terminals are the basic symbols from which strings are formed. The term "token
name" is a synonym for "terminal" and frequently we will use the word "token" for
terminal when it is clear that we are talking about just the token name.
2. Nonterminals are syntactic variables that denote sets of strings. The nonterminals
define sets of strings that help define the language generated by the grammar. They
also impose a hierarchical structure on the language that is useful for Both syntax
analysis and translation.
3. In a grammar, one nonterminal is distinguished as the start symbol, and the set of
strings it denotes is the language generated by the grammar. Conventionally, the
productions for the start symbol are listed first.
4. The productions of a grammar specify the manner in which the terminals and
nonterminals can be combined to form strings. Each production consists of:
a. A nonterminal called the head or left side of the production; this production
defines some of the strings denoted by the head.
EXAMPLE:
The grammar with the following productions defines simple arithmetic expression:
Notational Conventions
To avoid always having to state that "these are the terminals," "these are the nonterminals,"
and so on, the following notational conventions for grammars will be used.
Lowercase letters late in the alphabet , chiefly u, v, ... ,z, represent (possibly empty)
strings of terminals.
Unless stated otherwise, the head of the first production is the start symbol.
Using these conventions, the grammar for arithmetic expression can be rewritten as:
E→ E + T | E - T | T
T→ T * F | T / F | F
F→ ( E ) | id
Beginning with the start symbol, each rewriting step replaces a nonterminal by the body of
one of its productions.
E → E + E | E * E | - E | ( E ) | id
The production E → - E signifies that if E denotes an expression, then – E must also denote an
expression. The replacement of a single E by - E will be described by writing E => -E which
is read, "E derives - E."
The production E -+ ( E ) can be applied to replace any instance of E in any string of grammar
symbols by (E) , e.g., E * E => (E) * E or E * E => E * (E)
We can take a single E and repeatedly apply productions in any order to get a sequence of
replacements. For example, E => - E => - (E) => - (id)
We call such a sequence of replacements a derivation of - (id) from E. This derivation provides
a proof that the string - (id) is one particular instance of an expression.
Example
Let any set of production rules in a CFG be
X → X+X | X*X |X| a
over an alphabet {a}.
The leftmost derivation for the string "a+a*a" may be –
X → X+X → a+X → a + X*X → a+a*X → a+a*a
Parse Tree
Parse tree is a hierarchical structure which represents the derivation of the grammar
to yield input strings.
Root node of parse tree has the start symbol of the given grammar from where the
derivation proceeds.
If A xyz is a production, then the parse tree will have A as interior node whose
children are x, y and z from its left to right.
Figure above represents the parse tree for the string id+ id*id. The string id + id * id,
is the yield of parse tree depicted in Figure.
2.1.1.2 AMBIGUITY
An ambiguous grammar is one that produces more than one leftmost or more than
one rightmost derivation for the same sentence.
For most parsers, it is desirable that the grammar be made unambiguous, for if it is
not, we cannot uniquely determine which parse tree to select for a sentence.
EXAMPLE
E ===> E + E E ===> E * E
===> id + E ===> E + E * E
===> id + E * E ===> id + id * E
===> id + id * E ===> id + id * E
===> id + id * id ===> id + id * id
E E
/|\ /|\
E + E E * E
| /|\ /|\ |
id E * E E + E id
| | | |
id id id id
Bottom Up Parsing
In top down parsing, parse tree is constructed from top (root) to the bottom (leaves).
In bottom up parsing, parse tree is constructed from bottom (leaves)) to the top (root).
It can be viewed as an attempt to construct a parse tree for the input starting from the
root and creating the nodes of parse tree in preorder.
Pre-order traversal means: 1. Visit the root 2. Traverse left subtree 3. Traverse right
subtree.
Top down parsing can be viewed as an attempt to find a leftmost derivation for an
input string (that is expanding the leftmost terminal at every step).
It may involve backtracking, that is making repeated scans of input, to obtain the correct
expansion of the leftmost non-terminal. Unless the grammar is ambiguous or left-recursive,
it finds a suitable parse tree
EXAMPLE
S cAd
A ab | a
To construct a parse tree for this string top down, we initially create a tree consisting
of a single node labelled S.
An input pointer points to c, the first symbol of w. S has only one production, so we
use it to expand S and obtain the tree as:
The leftmost leaf, labeled c, matches the first symbol of input w, so we advance the
input pointer to a, the second symbol of w, and consider the next leaf, labeled A.
Now, we expand A using the first alternative A → ab to obtain the tree as:
We have a match for the second input symbol, a, so we advance the input pointer to
d, the third input symbol, and compare d against the next leaf, labeled b.
Since b does not match d, we report failure and go back to A to see whether there is
another alternative for A that has not been tried, but that might produce a match.
In going back to A, we must reset the input pointer to position 2 , the position it had
when we first came to A, which means that the procedure for A must store the input
pointer in a local variable.
The second alternative for A produces the tree as:
The leaf a matches the second symbol of w and the leaf d matches the third symbol.
Since we have produced a parse tree for w, we halt and announce successful
completion of parsing. (that is the string parsed completely and the parser stops).
The leaf a matches the second symbol of w and the leaf d matches the third symbol.
Since we have produced a parse tree for w, we halt and announce successful
completion of parsing. (that is the string parsed completely and the parser stops).
step. The goal of predictive parsing is to construct a top-down parser that never
backtracks. To do so, we must transform a grammar in two ways:
These rules eliminate most common causes for backtracking although they do not
guarantee a completely backtrack-free parsing (called LL(1) as we will see later).
Left Recursion
A grammar is said to be left –recursive if it has a non-terminal A such that there is a
derivation A A, for some string .
EXAMPLE
A A
A
It recognizes the regular expression *. The problem is that if we use the
first production for top-down derivation, we will fall into an infinite
derivation chain. This is called left recursion.
Top–down parsing methods cannot handle left recursive grammars, so a
transformation that eliminates left-recursion is needed. The left-recursive
pair of productions A A| could be replaced by two non-recursive
productions.
A A’
A’ A’|
E E + T|T
T T * F|F
F ( E )|id
Eliminating the immediate left recursion to the productions for E and then for T, we
obtain
E T E’
E’ + T E’|
T F T’
T’ * F T’|
F ( E )|id
No matter how many A-productions there are, we can eliminate immediate left
recursion from them by the following technique. First, we group the A productions as
A 1 A’| 2 A’| . . . | n A’
Left Factoring
Left factoring is a grammar transformation that is useful for producing a grammar
suitable for predictive parsing.
The basic idea is that when it is not clear which of two alternative productions to use
to expand a non-terminal A, we may be able to rewrite the A-productions to defer the
decision until we have seen enough of the input to make the right choice
A 1| 2
are two A-productions, and the input begins with a non-empty string derived from
we do not know whether to expand A to 1 or 2.
However, we may defer the decision by expanding A to B. Then, after seeing the
input derived from , we may expand B to 1 or 2 .
A B
B 1| 2
stmt if cond then stmt else stmt |if cond then stmt
The key problem during predictive parsing is that of determining the production to
be applied for a nonterminal.
Requirements
1. Stackv
2. Parsing Table
3. Input Buffer
4. Parsing
Input buffer - contains the string to be parsed, followed by $(used to indicate end of
input string)
The parser is controlled by a program that behaves as follows. The program considers
X, the symbol on top of the stack, and a current input symbol. These two symbols
determine the action of the parser.
2. If X = a $ , the parser pops X off the stack and advances the input pointer to
the next input symbol,
METHOD: Initially, the parser is in a configuration in which it has $S on the stack with
S, the start symbol of G on top, and w$ in the input buffer. The program that utilizes the
predictive parsing table M to produce a parse for the input is shown below.
repeat
let X be the lop stack symbol and a the symbol pointed to by ip;
if X is a terminal or $ then
if X = a then
else error ()
else /* X is a nonterminal */
end
else error ()
EXAMPLE
Consider Grammar:
E T E’
E' +T E' | Є
T F T'
T' * F T' | Є
F ( E ) | id
FIRST()
FOLLOW()
FIRST
If 'α' is any string of grammar symbols, then FIRST(α) be the set of terminals that begin
the string derived from α . If α==*>є then add є to FIRST(α).First is defined for both
terminals and non terminals.
3. If X is a non terminal and XY1Y2Y3...Yn , then put 'a' in FIRST(X) if for some i,
a is in FIRST(Yi) and є is in all of FIRST(Y1),...FIRST(Yi-1).
EXAMPLE
Consider Grammar:
E T E’
E' +T E' | Є
T F T'
T' * F T' | Є
F ( E ) | id
FOLLOW
FOLLOW is defined only for non-terminals of the grammar G.
It can be defined as the set of terminals of grammar G , which can immediately follow
the non-terminal in a production rule from start symbol.
In other words, if A is a nonterminal, then FOLLOW(A) is the set of terminals 'a' that
can appear immediately to the right of A in some sentential form.
EXAMPLE
Consider Grammar:
E T E’
E' +T E' | Є
T F T'
T' * F T' | Є
F ( E ) | id
EXAMPLE
EXAMPLE
METHOD
Parsing Table
Blank entries are error states. For example, E cannot derive a string starting with ‘+’
2.2.3 LL(1)GRAMMERS
LL(l) grammars are the class of grammars from Which the predictive parsers can be
constructed automatically.
A context-free grammar G = (VT, VN, P, S) whose parsing table has no multiple entries
is said to be LL(1).
the first L stands for scanning the input from left to right,
and the 1 stands for using one input symbol of lookahead at each step to make
parsing action decision.
not left-recursive
EXAMPLE
S → i E t S S' | a
S' → eS | ϵ
E→b
**********