Chapter 3 - Syntax Analysis

Chapter- Three
Syntax Analysis
Basic Topics of Chapter -Three
 Introduction of syntax  Parsing techniques

analysis o Top down parsing
 Role of parser
o Bottom up parsing
 Syntax Error Handling
 Parser generators
 Context free grammars
 Ambiguity
 Eliminating ambiguity
3.1. Introduction of syntax analysis
 Syntax Analysis is the second phase of compiler
 Syntax Analysis creates the syntactic structure of the given source program.
 Syntax Analysis is also known as parser.
 The output is a parse tree.
 The syntax of a programming is described by a context-free grammar (CFG).
 Parser: program that takes tokens and grammars (CFGs) as input and
validates the output tokens against the grammar.

3.1. Introduction of syntax analysis: Cont ’d
 The syntax analysis (parser) checks whether a given source program satisfies the
rules implied by a CFG or not.
o If it satisfies, the parser creates the parse tree of that program.
o Otherwise the parser gives the error messages.
 A context-free grammar
o gives a precise syntactic specification of a programming language.
o the design of the grammar is an initial phase of the design of a compiler.
o a grammar can be directly converted into a parser by some tools.

3.2. The role of Parser
token
Source Lexical Parser Parse tree Rest of Front End Intermediate
program Analyzer representation
getNext
Token
Symbol
table
The main Responsibility of Syntax Analysis
 Major task conducted during parsing(syntax analysis), such as
 the parser obtains a stream of tokens from the lexical analyzer and verifies that
the stream of token names can be generated by the grammar for the source
language.
 We expect the parser to report any syntax error and to recover from commonly
occurring errors to continue processing the rest of the program.
 The parser construct the parse tree and passes it to the rest of the program.
Parsing Techniques
 We categorize the parsers into two groups:

1. Top-Down Parser- the parse tree is created top to bottom,
o starting from the root.
2. Bottom-Up Parser- the parse is created bottom to top;
o starting from the leaves
 Both top-down and bottom-up parsers scan the input from left to right (one symbol at a
time).
 Efficient top-down and bottom-up parsers can be implemented only for sub-classes of CFG.
– LL for top-down parsing
– LR for bottom-up parsing

3.3. Syntax Error Handling
 Most programming language specifications do not describe how a compiler should
respond to errors
 Error handling is left to the compiler designer.
 Planning the error handling right from the start can both
o simplify the structure of a compiler and improve its handling of errors.
 Error handler goals:
 Report the presence of errors clearly and accurately
 Recover from each error quickly enough to detect subsequent errors
 Add minimal overhead to the processing of correct programs

 Common Programming Errors
 Lexical errors: include misspellings of identifiers, keywords, or operators

 Syntactic errors: include misplaced semicolons or extra or missing braces; that is, "
{ " or " } . “
 Semantic errors: include type mismatches between operators and operands.
• An example is a return statement in a Java method with result type void.

 Logical errors : can be anything from incorrect reasoning on the part of the
programmer.
Error-Recover Strategies
 Once an error is detected, how should the parser recover?
 Error recovery strategies:
a. Panic-mode recovery
b. Phrase-level recovery
c. Error-productions, and
d. Global-correction.
i. Panic Mode Recovery
 Once an error is found, the parser intends to find designated set of synchronizing tokens
by discarding input symbols one at a time.
 Synchronizing tokens are delimiters, semicolon or } whose role in source program is clear.
 When parser finds an error in the statement, it ignores the rest of the statement by not
processing the input.
 This is the easiest way of error-recovery.

 It prevents the parser from developing infinite loops.
 Example: In case of an error like: a=b + c // no semi-colon
d=e + f ;
 The compiler will discard all subsequent tokens till a semi-colon is encountered.
ii. phrase-level recovery
 On discovering an error, a parser may perform local correction on the remaining
input; that is, it may
o replacing a prefix of the remaining input by some string that allows the parser to continue .
 A typical local correction is to replace a comma by a semicolon,

 delete an extraneous semicolon, or
 insert a missing semicolon.

iii. Error-productions
 By anticipating common errors that might be encountered, we

can augment the grammar for the language at hand with
productions that generate the erroneous construct.
iv. Global-correction
 Compiler to make as few changes as possible in processing an incorrect input
string .
 Given an incorrect input string x and grammar G,
o these algorithms will find a parse tree for a related string y,
o such that the number of insertions, deletions, and changes of tokens
required to transform x into y is as small as possible.

3.4. Context free grammars
 This section reviews the definition of a context-free grammar(CFG) and introduces
terminology for talking about parsing.
 Basically, there are a number of types of grammar, but for compiler design or the
syntactic structure of the programming language, we use the CFG.
 CFG- is used to define the syntactic structure of a programming language

constructions, like: Algebric expression, if else statement, while loop, array
representation etc..
 It contains a set of rules called production rules
3.4. Context free grammars cont ’d.
• What does it mean by syntactic structure?
• Suppose in the English language we have a grammar in this form
I am going.
In a programming language, suppose we write a sentence in this form.
int a, b,c;
define data type, variable name separated by comma, terminated by semicolon
Therefore, CFG used to check the syntax of a programming language
Formal Definition of a CFG
 A grammar is a set of rules that validates the correctness of the sentences in a language. i.
e. grammar defines the rules
 A context-free grammar (grammar for short) consists of four tuple:
 G: {S, T, N, P} where
i. T is a finite set of terminals (in our case, this will be the set of tokens)
ii. N is a finite set of non-terminals (syntactic-variables)
iii. S is a start symbol (one of the non-terminal symbol)
iv. P is a finite set of productions rules in the following form:
A   where A is a non-terminal and  is a string of terminals and non-terminals
(including the empty string)
i. Terminals
 Terminals are the basic symbols from which strings are formed.
 The term "token name" is a synonym for "terminal" and frequently we will use the word
"token" for terminal when it is clear that we are talking about just the token name.
 We assume that the terminals are the first components of the tokens output by the lexical
analyzer.
Notation Conventions
• Terminals:
– Lower case letters
– Operators
– Punctuation , like parenthesis and comma, etc
– The digits
– Bold face string such as, (id or if)
ii. Non-terminals
 Nonterminals are syntactic variables that denote sets of strings.
 The sets of strings denoted by non-terminals help define the language generated by the
grammar.
 Non-terminals impose a hierarchical structure on the language that is key to syntax
analysis and translation.
 Non-terminals:
– Uppercase letters in the alphabets
– The letter S, which when it appears is usually the start symbol.
– Lowercase, italic names such as expr or stmt
For example, non-terminals for expressions, terms, and factors are often represented
by E, T, and F, respectively.
iii. Start Symbol
 In a grammar, one nonterminal is distinguished as the start symbol from where the production
beginning, and the set of strings it denotes is the language generated by the grammar.
 Conventionally, the productions for the start symbol are listed first.
iv. Productions
 The productions of a grammar specify the manner in which the terminals and non-
terminals can be combined to form strings.
 Each Production Consists of:
a. A non-terminal called the head or left side of the production;
this production defines some of the strings denoted by the head.
b. A body or right side consisting of zero or more terminals and non-terminals.
Example #1: Simple Arithmetic Expressions
 In this grammar, the terminal symbols are: id+-*/()

 The nonterminal symbols are expression, term and factor, and expression is the
start symbol .
Example #2: CFG (Algebraic grammar)
• G: SAB
AaAA
AaA
Aa
BbB
Bb
Q1. Identify S, T, N and P
Q2. Check if the following input string is accepted or not by the given G. Input
string= ab, aab, aaab , aabba
Example #3: CFG (Algebraic grammar)
G: EE+E|E-E|E*E|E/E|(E)id
 sentence: id+id*id
( this sentence is derived from the above grammar hence it is a valid sentence)
 Invalid sentence: id++id*id
( this sentence can not be generated using grammar above hence it is invalid
sentence)
NB: so the job of a grammar is to validate correctness of the sentence or find out the
error of the sentence
Example #4: CFG (“If else” Grammar)
• Sif expression then Statement
|if expression then statement else statement

Example: Sentence
If(a<b) then printf(“yes”);
if (a<b) then printf(“yes”)
else
printf(“no”);
CFG - Terminology
 L(G) is the language of G (the language generated by G) which is a set of sentences.
 A sentence of L(G) is a string of terminal symbols of G.
 If S is the start symbol of G then
 w is a sentence of L(G) iff S  w where w is a string of terminals of G.
 If G is a context-free grammar, L(G) is a context-free language.
 Two grammars are equivalent if they produce the same language.
 S - If  contains non-terminals, it is called as a sentential form of G.
- If  does not contain non-terminals, it is called as a sentence of G.

CFG Versus Regular Expressions
 A grammars are a more powerful notation than regular expressions.
 Every construct that can be described by a regular expression can be described by a
grammar, but not vice-versa.
 Alternatively, every regular language is a context-free language, but not vice-versa.

3.5. Representative Grammars
 Constructs that begin with keywords like while or int, are relatively easy to parse,
o because the keyword guides the choice of the grammar production that must be applied to match
the input.
 We therefore concentrate on expressions,
o which present more of challenge, because of the associativity and precedence of operators.
 Associativity and precedence are captured in the following grammar, Expressions,

Terms, and Factors.
 E represents expressions consisting of terms separated by
+ signs,
 T represents terms consisting of factors separated by * signs, and
 F represents factors that can be either parenthesized expressions or identifiers:
Cont’d
G: E → E + T | T
T→T*F|F
F → (E) | id
 Expression grammar above belongs to the class of LR grammars that are suitable for
bottom-up parsing.
 This grammar can be adapted to handle additional operators and additional levels of
precedence.
 However, it cannot be used for top-down parsing because it is left recursive
Cont’d
 The following non-left-recursive variant of the expression grammar below will be
used for top-down parsing:
E → TE’
E’ →+TE’ | Ɛ
T →FT’
T’ → *FT’ | Ɛ
F →(E) | id
Cont’d
 The following grammar treats + and * alike, so it is useful for illustrating techniques
for handling ambiguities during parsing:
 Here, E represents expressions of all types.

 Grammar above permits more than one parse tree for expressions like: a + b*c.
3.6. Derivations and Parse Trees
Derivations
 Derivation is a sequence of production rules.
 It is used to get the input string through these productions rules.
 We have to decide:
• which non-terminal to replace
• Production rule by which the Non-terminals will be replaced
 If there is a production A  α then we say that A derives α and is denoted by A  α
 α A β  α γ β if A  γ is a production
 If α1  α2  …  αn then α1  αn
 Given a grammar G and a string w of terminals in L(G) we can write S  w
 If S  α where α is a string of terminals and non terminals of G then we say that α is a
sentential form of G.
 There are two options for Derivation
a. Left-Most Derivation (LMD) and b. Right-Most Derivation (RMD)
a. Left-Most Derivations (LMD)
• If we always choose the left-most non-terminal in each derivation step, this
derivation is called as left-most derivation.
• In LMD- input string scanned and replaced with the production rule from left to
right.
• In a sentential form only the leftmost non terminal is replaced then it becomes
leftmost derivation.
LMD: Example #1 LMD: Example #2
 G: E  E+E
G: E E+E Input String: -(id+id)
 Input String: id+id
|(E)
 E+E derives from E
o we can replace E by E+E |-E
o to able to do this, we have to have a |id

production rule EE+E in our E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
grammar.
 At each derivation step, we can choose any of
 E  E+E  id+E  id+id
the non-terminal in the sentential form of G for
 A sequence of replacements of non-
the replacement.
terminal symbols is called a derivation of
id+id from E.
b. Right-Most Derivations (LMD)
 If we always choose the right-most non-terminal in each derivation step, this

derivation is called as right-most derivation.
 In RHD input string is scanned and replaced with the production rule from
right to left.
 We will see that the top-down parsers try to find the left-most derivation of the
given source program.
 We will see that the bottom-up parsers try to find the right-most derivation of
the given source program in the reverse order.
RMD: Example #1
i. G: E  E+E|(E)|-E Input String: -(id+id)
Right-Most Derivation
E  -E  -(E)  -(E+E)  -(E+id)  -(id+id)
ii. G: E E+E |(E) |-E Input String: -(id+id+id)
Right-Most Derivation
E  -E  -(E)  -(E+E)  -(E+E+id)  -(E+id+id)  -(id+id+id)

Exercise #1
Q1. Consider the Context free grammar:
G: SS+S|S-S|a|b|c and the Input string: a-b+c
a) Give a leftmost derivation for the string.
b) Give a rightmost derivation for the string.

Parse Trees
 A parse tree is a graphical representation of a derivation
 It shows how the start symbol of a grammar derives a string in the language
 Properties of parse tree:

 Root is labeled by the start symbol
 Leaf nodes are labeled by tokens (=terminal)
 Each internal node is labeled by a non terminal
 if A is a non-terminal labeling an internal node and x 1, x2, …xn are labels of
children of that node then A  x1 x2 … xn is a production

Parse Tree: Example #1
G: E E+E |(E) |-E |id Input String: -(id+id)
LMD will be: E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
Parse tree:
Parse Tree: Example #2
 Construct parse tree for the given grammar:
 G: listlist+list|list-digit|digit Input string: 9-5+2
 Parse tree:
Exercise #2
• Construct parse tree for the following grammar:
1. G: TT+T
|T*T
|a|b|c input String: a*b+c
2. G: SXYZ input String: abd
Xa
Yb
Zc|d
3.7. Ambiguity
 A grammar that produces more than one parse tree for some sentence is said to be
ambiguous. Or
 An ambiguous grammar is one that produces :
o more than one leftmost derivation (lm)or
o more than one rightmost derivation (rm)for the same sentence.
 Drawback of Ambiguity:
o Parsing complexity
o Affects other phases
Ambiguity : Example #1
 Consider grammar:
G: E E+E|E*E|id Input string: id+id*id
E  E+E  id+E  id+E*E E
E + E
 id+id*E  id+id*id id *
E E
id id
E  E*E  E+E*E  id+E*E E
E * E
 id+id*E  id+id*id
E + E id
id id
Ambiguity : Example #2
string  string + string
| string – string
|0|1|…|9
• String 9-5+2 has two parse trees
Ambiguity (cont.)
 For the most parsers, the grammar must be unambiguous.
 unambiguous grammar
 unique selection of the parse tree for a sentence

 We should eliminate the ambiguity in the grammar during the design phase of the
compiler.
 An unambiguous grammar should be written to eliminate the ambiguity.
 We have to prefer one of the parse trees of a sentence (generated by an ambiguous
grammar) to disambiguate that grammar to restrict to this choice.
How to handle Ambiguity ?
 Ambiguity is problematic because meaning of the programs can be incorrect.
 Ambiguity can be handled in several ways
– Enforce associativity and precedence

– Rewrite the grammar (unambiguous Grammar)
 There are no general techniques for handling ambiguity
 It is impossible to convert automatically an ambiguous grammar to an unambiguous
one.
3.8. Precedence and Associativity operators.
 Ambiguous grammars (because of ambiguous operators) can be disambiguated

according to the
 Associativity rules
 Precedence rules
 Associativity of Operators
 If an operand has operator on both the sides, the side on which

operator takes this operand is the associativity of that operator.
 In a+b+c b is taken by left +
+, -, *, / are left associative
^, = are right associative
e.g. 1+2+3 first we evaluate (1+2)+3 left associative
1 ^2 ^3 = 1 ^(2^ 3) right associative
a=b=c right associative

 Precedence of Operator
 Whenever an operators has a higher precedence then the other operators
 it means that the first operator will get its operands before the operators with lower
precedence.
e.g. How the expression a*b+c will be interpreted
 (a*b)+c multiplication is evaluated first because * has higher precedence than +

 Precedence of Operator (cont ’d)
E.g 2 In the expression a*b/c
 since multiplication and division have the same precedence we must use the
associativity
 which means they are grouped left to right as if the expression was (a*b)/c
 If the same precedence associativity is checked .e.g 1+2-3 = (1+2)-3
3.10 Eliminating Ambiguity
 AMBIGUITY. The context-free grammar G = (T, N, S, P) is
 unambiguous if every sentence of G has a unique parse tree,
 ambiguous if is not unambiguous.
 Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.
 Eliminating Ambiguity: Example#1 Unambiguous grammar

will be:
E→E+T
Consider the grammar: E→E+E|E*E|id with input string: E→T
T→T*F
T→F
id+id+id and id+id*id F → id
 Eliminating ambiguity from the grammar or convert into unambiguous grammar.

Eliminating Ambiguity :Dangling else Grammar
Consider Example #2:

 we shall eliminate the ambiguity from the following "dangling else" grammar:
 Here "other" stands for any other statement.

Eliminating Ambiguity(cont ’d)
 According to this grammar, the compound conditional statement .
if E1 then S1 else if E2 then S2 else S3

 has the parse tree shown in Fig 3.1.
Fig 3.1 : Parse tree for a conditional statement.

 the given Grammar is ambiguous since the String
if E1 then if E2 then S1 else S2
has the two parse trees as shown in Fig 3.2
Fig 3.2: Two parse trees for an ambiguous sentence

 In all programming languages with if-then-else statements of this form, the first parse
tree is preferred.
 Hence the general rule is: match each else with the previous closest then.
 This disambiguating rule can be incorporated directly into a grammar by using the
following observations.
 A statement appearing between a then and a else must be matched. (Otherwise there
will be an ambiguity.)
 Thus statements must split into kinds: matched and unmatched.
 A matched statement is
 either an if-then-else statement containing no unmatched statements
 or any statement which is not an if-then-else statement and not an if-then
statement.
 Then an unmatched statement is

o an if-then statement (with no else-part)
o an if-then-else statement where unmatched statements are allowed in the else-part (but
not in the then-part).
o Figure 3.3 : Unambiguous grammar for if-then-else statements

Exercise
Q1. Consider the context-free grammar:

S→S S + \ S S * \ a and the string aa + a*.
a) Give a leftmost derivation for the string.

b) Give a rightmost derivation for the string.
c) Give a parse tree for the string.
d) Is the grammar ambiguous or unambiguous? Justify your answer.
e) Describe the language generated by this grammar.
Recursion
 Recursion can be classified into following three types
i. Left Recursion
 A production of grammar is said to have left recursion,
 if the leftmost variable (LHS) is same as variable of its (RHS).
 A grammar containing a production having left recursion is called as Left Recursive
Grammar.
 Let G be a context-free grammar.

 A production of G is said left recursive if it has the form
 A=> Aα where A is a nonterminal and α is a string of grammar symbols.

 Example S → Sa / ∈ (Left Recursive Grammar)
 The grammar G is left recursive if it has at least one left recursive nonterminal.
Elimination of left recursion
 left recursion is considered to be a problematic situation for Top down parsers.
 Therefore, left recursion has to be eliminated from the grammar.
 Top-down parsing is one of the methods that we will study for generating parse trees.
 This method cannot handle left recursive grammars.
 We present now an algorithm for transforming a left recursive grammar G into a
grammar G'
 which is not left recursive and which generates the same language as G.
How to eliminate left recursion?
 A simple rule for direct left recursion elimination:
– For a rule like:
A →A α|β
– We may replace it with
A → β A’
A’ → α A’ | ɛ
where
Eliminate left recursion: Example #1
 The following grammar which generates arithmetic expressions.
E →E + T | T
T→T*F | F
F→(E) | id
 has two left recursive productions. Applying the above trick leads to
E →TE’
E’ →+TE’ |∈
T →FT’
T’ →*FT’ |∈
F→(E) | id
Elimination of left recursion(cont ’d)
 The Case of Several Left Recursive A-productions.
 Assume that the set of all A-productions has the form
Elimination of left recursion: Example #2
Indirect left recursion
 Let us consider the following grammar.
– S → Aa | b
– A → Ac | Sd | ɛ
 the nonterminal S is left recursive since we have
Elimination of left recursion: Example #3
Consider the following grammar and eliminate left recursion-

S→A
A → Ad / Ae / aB / ac
B → bBc / f
• Solution-
• The grammar after eliminating left recursion is-
ii. Right Recursion
 A production of grammar is said to have right recursion if the rightmost variable of
its RHS is same as variable of its LHS.
 A grammar containing a production having RR is called as Right Recursive Grammar.
Example: S → aS / ∈ (Right Recursive Grammar)
 Note: Right recursion does not create any problem for the Top down parsers.
 Therefore, there is no need of eliminating right recursion from the grammar.
iii. General Recursion
 The recursion which is neither left recursion nor right recursion is called as general recursion.
Example: S → aSb / ∈
Left factoring
 Left factoring is a process by which the grammar with common prefixes is
transformed to make it useful for Top down parsers.
 If the RHS of more than one production starts with the same symbol, then such a
grammar is called as grammar with common prefixes.
• Ex: A → αβ1 / αβ2 / αβ3 (Grammar with common prefixes)
 This kind of grammar creates a problematic situation for Top down parsers.
 Top down parsers can not decide which production must be chosen to parse the string in
hand.
 To remove this confusion, we use left factoring.

Left factoring(cont’d) How?
 In left factoring, we make one production for each common prefixes and rest of the
derivation is added by new productions
 The grammar obtained after the process of left factoring is called as left factored
grammar.
 Example:
 The common prefix may be a terminal or a non-terminal or a combination of both.

Left factoring: Left factoring:
Example #1 Example #2
 Do left factoring in the following
 Do left factoring in the grammar.
following grammar A → aAB / aBc / aAc
solution:
S → iEtS / iEtSeS / a
A → aA’
E→b A’ → AB / Bc / Ac
Solution Again, this is a grammar with common
prefixes.
The left factored grammar is- A → aA’
S → iEtSS’ / a A’ → AD / Bc
S’ → eS / ∈
D→B/c
• Now, this is a left factored grammar.
E→b
Left factoring: Example #3
 Do left factoring in the following grammar
S → bSSaaS / bSSaSb / bSb / a
Solution-
S → bSS’ / a
S’ → SaaS / SaSb / b
• Again, this is a grammar with common prefixes.

S → bSS’ / a
S’ → SaA / b
A → aS / Sb
• Now, this is a left factored grammar.
• Parsing Techniques
Parsing Techniques
 The term parsing comes from Latin pars meaning “part”.
 Parsing is a process that constructs a syntactic structure (i.e. parse tree) from the stream
of tokens.
 Syntax analyzers follow production rules defined by means of context-free grammar.
 The way the production rules are implemented (derivation) divides parsing into two types
:
i. Top-down parsing (LL) and
ii. Bottom-up parsing(LR).

i. Top Down Parsing
 A Top-down parser tries to create a parse tree from the root towards the leafs
scanning input from left to right .
 It can be also viewed as finding a leftmost derivation for an input string.
 It may require backtracking , some grammars are backtrack-free (Predictive) .
 Two types of TDP techniques:
a. Recursive-Descent Parsing - with backtracking
b. Predictive Parsing -without backtracking

Cont’d
• For example, the sequence of parse-tree for the input string
id+id*id as per the top-down parsing for the below grammar
will be like this:
E → TE’
E’→+TE’ | Ɛ
T → FT’
T’ →*FT’ | Ɛ
F → (E) | id
Cont’d
a. Recursive-Descent Parsing
 A RDP program consists a set of procedures, one for each non-terminal.
 It is a general parsing technique, but not widely used.
 Execution begins with the procedure for the start symbol
 General recursive-descent may require backtracking
 Backtracking means (If a choice of a production rule does not work, we backtrack
to try other alternatives.)
 or making repeated scan of its input
 Not efficient
 Recursive descent parsers can’t be used for left-recursive grammars
Recursive descent parsing: Example #1
 Consider the grammar with input string “cad”:
S→cAd
A→ab | a
 Backtracking is needed.
 It tries to find the left-most derivation
Recursive descent parsing: Example #2
 Suppose the given production rules are as follows:

S→aAd|aB
A→b|c
B→ccd
Input string :
accd
Recursive descent parsing: Exercise #1
Q. Construct a recursive descent parser for the following grammar.
AabC|aBd|aAD
BbB|∈
Cd|∈
Da|b|∈
Input string: aaba
b. Predictive Parsing
 A parsing technique that uses a lookahead symbol to determine if the current input
arguments matches the lookahead symbol.
 Predictive parsing is a table-driven parser.
 It is a top-down parser.
 It is also known as LL(1) Parser.
 Construction of a Predictive Parser:

 First and Follow
 LL(1) Grammars
• Predictive Parsing Tables Construction
 Parse the input string
 FIRST and FOLLOW
 First and Follow aids the construction of a predictive parser.
 They allow us to fill in the entries of a predictive parsing table.
 If a is any string of terminals , then
 First(A), is the set of terminals that begin the strings derived from A.
o If a is an empty string(ɛ), then ɛ is also in First(A).
 Follow (A), for a nonterminal A, to be the set of terminals a that can appear immediately to the right of
A in a sentential form.
– If we have S => αAaβ for some α and β then a is in Follow(A)
– If A can be the rightmost symbol in some sentential form, then $ is in Follow(A)

Rules of Computing FIRST
 Rule 1:
For a production rule X → ∈,
 First(X) = { ∈ }
 Rule 2:
For any terminal symbol ‘a’,
 First(a) = { a }
 Rule 3:
If X is a nonterminal and contains two productions.
EX: X → a | b; then FIRST (X) = {a , b}
 Rule 4:
 For a production rule X → Y1Y2Y3,
Cont’d
Calculating First(X)
• If ∈ ∉ First(Y1), then First(X) = First(Y1)
• If ∈ ∈ First(Y1), then First(X) = { First(Y1) – ∈ } ∪ First(Y2Y3)
Calculating First(Y2Y3)
• If ∈ ∉ First(Y2), then First(Y2Y3) = First(Y2)

• If ∈ ∈ First(Y2), then First(Y2Y3) = { First(Y2) – ∈ } ∪ First(Y3)
Similarly, we can make expansion for any production rule X → Y1Y2Y3…..Yn.
Computing FIRST:
Example: #1
Example: #2
 Rules of Computing FOLLOW
 Rules in computing FOLLOW ( X) where X is a nonterminal
1) If X is a part of a production and is succeeded by a terminal,
for example:
A → Xa; then Follow(X) = { a }
2) If X is the start symbol for a grammar, for example:
X → AB
A→a
B → b;
then add $ to FOLLOW (X); FOLLOW(X)= { $ }
 Rules of Computing FOLLOW(cont ’d)
3) If X is a part of a production and followed by another non terminal, get the FIRST of that succeeding
nonterminal.
Ex: A →XD D → aB ;
then FOLLOW(X)= FIRST(D) = { a }; and
if FIRST(D) contains ε ex: D → aB | ε,
FOLLOW (X) = FIRST (D) − {ε} ∪ FOLLOW (A)
4) If X is the last symbol of a production,
ex: S → abX, then FOLLOW(X)= FOLLOW(S)

Cont’d
• ∈ may appear in the first function of a non-terminal.
• ∈ will never appear in the follow function of a non-terminal.
• Before calculating the first and follow functions, eliminate Left Recursion
from the grammar, if present.
• We calculate the follow function of a non-terminal by looking where it is
present on the RHS of a production rule.
Computing FOLLOW: Example 1&2
Computing FIRST and FOLLOW: Exercise #1
 Consider the following grammar G:
S→ABCDE
A →a|∈
B →b|∈
C →c
D →d|∈
E →e|∈
Compute FIRST and FOLLOW sets

 LL(1) Grammars
 Predictive parsers are those recursive descent parsers needing no backtracking.
 Grammars for which we can create predictive parsers are called LL(1)
– The first L means scanning input from left to right
– The second L means leftmost derivation
– And 1 stands for using one input symbol for lookahead
– No ambiguous or left recursive grammar is LL(1)
 A grammar G is LL(1) if and only if whenever A→α|β are two distinct productions of
G, the following conditions hold:
– For no terminal a do α and β both derive strings beginning with a
– At most one of α or β can derive empty string
– If α=> ɛ then β does not derive any string beginning with a terminal in Follow(A).
How LL(1) Parser works?
 How LL(1) Parser works?…
 input buffer
– our string to be parsed.
– We will assume that its end is marked with a special symbol $.
 output
– a production rule representing a step of the derivation sequence (left-most derivation) of the
string in the input buffer.
 stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol S.
$S  initial stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is completed.
 Predictive Parsing Tables Construction
 The general idea is to use the FIRST AND FOLLOW to construct the parsing tables.
 Each FIRST of every production is labeled in the table whenever the input matches with it.
 When a FIRST of a production contains ε, then we get the Follow of the production.
How to construct parsing table:
– a two-dimensional array M[A,a]

– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
– each entry holds a production rule.
Predictive Parsing Tables: Construction -Algorithm
 For each production A→α in grammar do the following:

Input: Grammar G
Output: Parsing table M
Method
1.For each production A→α of the grammar do step 2 and 3

2.For each terminal a in First(α) add A→α to M[A,a]
3.If ɛ is in First(α), add A→∈ to M[A,b] for each terminal b in Follow(A)
If ɛ is in First(α) and $ is in Follow(A), add A→α to M[A,$] .
4. If after performing the above, there is no production in M[A,a] then set M[A,a] to error.
Predictive Parsing:
Tables Construction– Example #1
 Consider grammar G:
E → TE’
E’ → +TE’ | Ɛ
T →FT’
T’ →*FT’ | Ɛ
F → (E) | id
and their First and Follow
Predictive Parsing:
Tables Construction– Example #1…
 Blank entries are error states.

 For example E can not derive a string starting with ‘+’
LL(1) Parser – Stack implementation:
input string: id+id
ii. Bottom-up parsing
• Constructs parse tree for an input string beginning at the leaves (the bottom) and
working towards the root (the top).
• we start from a sentence and then apply production rules in reverse manner in order to
reach the start symbol.
• Attempts to traverse a parse tree bottom up (post-order traversal)
• Reduces a sequence of tokens to the start symbol
• At each reduction step, the RHS of a production is replaced with LHS
• A reduction step corresponds to the reverse of a rightmost derivation.

Example #1:
 Consider the Grammar with input string: id*id.
E→E+T|T
T→T*F|F
F →(E) | id
Why Bottom-up parsing?
 A more powerful parsing technique

 Can handle left recursive grammars
 Can handle virtually all the programming languages
 Detects errors as soon as possible
Bottom-up parsing: Reduction
 The bottom-up parsing as the process of “reducing” a token string to the start
symbol of the grammar.
 At each reduction, the token string matching the RHS of a production is replaced
by the LHS non-terminal of that production.
 The key decisions during bottom-up parsing are about when to reduce and about
what production to apply.

Bottom-up parsing: Example #2
 Consider a grammar: with input string abbcde
S  aABe
A  Abc|b
Bd
And reduction of a string
a bbcde
a Abc de
aAde
aABe
S
• Right most derivation
SaABe
aAde
aAbcde
abbcde
Pruning the Handle
 Removing the children of left hand side non-terminal from the parse tree is called
Handle pruning.
 Describes the process of identifying handles and reducing them to the appropriate left
most non-terminals.
 A right most derivation in reverse can be obtained by Handle pruning.
 A handle is a substring that matches the right side of a production rule.
Example
Consider the following grammar: E→E+E|E*E|id Parse the input string id+id*id using handle
pruning.
Example #2
 Shift-reduce parser and Possible Action
 Bottom-up parsing is also known as shift-reduce parsing because its two main actions are
shift and reduce.
 Shift Reduce parsing is a process of reduction string to the start symbol of a Grammar S.
A string the starting symbol

reduce to
There are 4 possible actions that a Shift reduce parsing can make
i. Shift- at shift action, the current symbol in the input string is pushed to a stack.
ii. Reduce- at each reduction, the symbol will replaced by the non-terminal.
iii. Accept- in an accept actions, the parsers announces successful completion of parsing
iv. Error- A situation in which parsers cannot either shift or reduce the symbol and also
cannot even perform the accept action.
A Stack Implementation of A Shift-Reduce Parser
 To implement shift-reduce,
o we use a stack to hold grammar symbols and an input buffer to hold the string w to be
parsed.
o The initial configuration of shift reduce parsing is
 Initial stack just contains only the end-marker $.

 The end of the input string is marked by the end-marker $.
A Stack Implementation of A Shift-Reduce Parser…
 Shift and Reduce actions will be repeated until it has identified an error or until the stack
includes start symbol (S) and input Buffer is empty, i.e., it contains $.
• After entering this state or configuration, the parser halts and announces the successful
completion of parsing.
• Error: This is the situation in which the parser can neither perform shift action nor reduce
action and not even accept action.
Solution 1
Shift reduce parsing:
Stack Input string action
Example #1
$ id*id+id$ Shift id
 Consider the following $id *id+id$ Reduce Eid

grammar
$E *id+id$ Shift *
E→E+E|E*E|id $E* id+id$ Shift id
Parse the input string id*id+id $E*id +id$ Reduce Eid
using shift reduce parsing. $E*E +id$ Reduce EE*E
$E +id$ Shift +
$E+ id$ Shift id
$E+id $ Reduce Eid
$E+E $ Reduce EE+E
$E $ Accept
Shift reduce parsing: Exercise #1
1.Consider the following grammar

S→S+S|S-S|(S) |a Parse the input string a-(a+a) using shift reduce parsing.
2. Consider the following grammar
E→ 2E2|3E3|4
Parse the input string 32423 using shift reduce parsing.
Conflicts During Shift-Reduce Parsing
 There are context-free grammars for which shift-reduce parsers cannot be used.
 Stack contents and the next input symbol may not decide action:
– shift/reduce conflict: Whether make a shift operation or a reduction.

– reduce/reduce conflict: The parser cannot decide which of several reductions to make.
 If a shift-reduce parser cannot be used for a grammar, that grammar is called as non-LR(k)
grammar.
 Where L-Left to right scanning, R-right most derivation and K-lookahead
 An ambiguous grammar can never be a LR grammar.

 There are two main categories of shift-reduce parsers
a. Operator-Precedence Parser b. LR-Parsers

a. Operator-Precedence Parser
 Operator-precedence parser is mainly used to define mathematical operator in the

compiler.
 Operator grammar
 small, but an important class of grammars
 We may have an efficient operator precedence parser (a shift-reduce parser) for
operator grammar.
 A grammar is said to be operator precedence grammar if it has two properties:
 No Ɛ-transition.
 No two adjacent non-terminals in its right hand side.
Example #1
1. Consider the grammar:
E  EAE | id
A + | *
The above grammar is not an operator grammar but:
E  E + E | E* E | id is an operator grammar
2. Consider the grammar: S  SAS | a

A bSb|b not an operator grammar
But, S  SbSbS |bSb| a
A bSb|b is an operator grammar
Operator Precedence Relations: Parsing Action
 There are three operator precedence relations.
• a ⋗ b: This relation implies that terminal “a” has higher precedence than terminal “b”.
• a ⋖ b: This relation implies that terminal “a” has lower precedence than terminal “b”.
• a ≐ b: This relation implies that terminal “a” has equal precedence to terminal “b”.
Note:
– id has higher precedence than any other symbol
– $ has lowest precedence.
– if two operators have equal precedence, then we check the Associativity of that
particular operator.
Using Operator - Precedence Relations
 E → E+E| E*E | id
PRECEDENCE TABLE
 Then the input string id + id*id with the precedence relations inserted will be: $ <. id .> + <. id .> * <. id .> $
Basic principle
 Scan input string left to right, try to detect .> and put a pointer on its location.
 Now scan backwards till reaching <.
 String between <. And .> is our handle.
 Replace handle by the head of the respective production.
 REPEAT until reaching start symbol.

Operator-Precedence Parsing with input string
Stack Relation Input Buffer Parsing Action
$ ⋖ id+id*id$ Shift
$id ⋗ +id*id$ Reduce by E→ id
$E ⋖ +id*id$ Shift
$E+ ⋖ Id*id$ Shift
$E+id ⋗ *id$ Reduce by E→ id
$E+E ⋖ *id$ Shift
$E+E* ⋖ id$ Shift
$E+E*id ⋗ $ Reduce by E → id
$E+E*E ⋗ $ Reduce by E → E * E
$E+E ⋗ $ Reduce by E → E + E
$E A $ Accept
Example #2
Q1. Construct operator precedence parsing table
Grammar:
E → E+E| E*E | id with input string: id+id+id $
Q2. Consider the following grammar and construct the operator precedence parser.
EEAE|id
A+|*
Then parse the following string: id+id*id
Advantages and Disadvantages of Operator Precedence
Parsing
 Advantages:
 simple
 powerful enough for expressions in programming languages
 Disadvantages:
 Number of size of the table is increasing.
 It cannot handle the unary minus (the lexical analyzer should handle the
unary minus).
 Small class of grammars.

Needs of Precedence functions
 Operator precedence parsers use precedence functions that map terminal symbols to
integers.
 To decrease the size of table we use operator function table.
 To define the missing precedence/error
 To implement the precedence relations in computer program

Algorithm for Constructing Precedence Functions
1. Create functions fa for each grammar terminal a and for the end of string symbol.
2. Partition the symbols in groups so that fa and gb are in the same group
if a =· b (there can be symbols in the same group even if they are not connected by this relation).
3. Create a directed graph whose nodes are in the groups, next for each symbols a and b
do: place an edge from the group of gb to the group of fa if a <· b, otherwise if a ·> b
place an edge from the group of fa to that of gb.
4. If the constructed graph has a cycle then no precedence functions exist.
5. When there are no cycles collect the length of the longest paths from the groups of fa
and gb respectively.
Consider the following table:
 Resulting graph: Longest path travel for each

b. LR Parsing
 The most common type of bottom-up parsers
 Left-scan Rightmost derivation in reverse (LR) parsers are characterized by
 the number of look-ahead symbols that are examined to determine parsing
actions.
 We can make the look-ahead parameter explicit and discuss LR(k) parsers, where
k is the look-ahead size.
 Covers wide range of grammars.

LR Parsing
 Why LR parsers?
– Table driven
– Can be constructed to recognize all programming language constructs
for which CFG can be written
– Most general non-backtracking shift-reduce parsing method
– Can detect a syntactic error as soon as it is possible to do so
– Class of grammars for which we can construct LR parsers are superset
of those which we can construct LL parsers.
LR(k) Parsers
 LR(k), mostly interested on parsers with k<=1
 LR(k) parsers are of interest in that they are the most powerful class of
deterministic bottom-up parsers using at most K look-ahead tokens.
 Deterministic parsers must uniquely determine the correct parsing action at
each step.
they cannot back up or retry parsing actions.
LL vs. LR
LL LR
Does a leftmost derivation. Does a rightmost derivation in reverse.
Starts with the root nonterminal on the stack. Ends with the root nonterminal on the stack.
Ends when the stack is empty. Starts with an empty stack.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Continuously pops a nonterminal off the stack, Tries to recognize a right hand side on the
and pushes the corresponding right hand side. stack, pops it, and pushes the corresponding
nonterminal.
Expands the non-terminals. Reduces the non-terminals.
Reads the terminals when it pops one off the Reads the terminals while it pushes them on
stack. the stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.
Model of an LR parser
 How LR parser works?
Model of an LR parser…
 Stack contains a string of the form S0X1S1X2……XnSn where each Xi is a
grammar symbol and each Si is a state.
 Table contains action and goto parts.
• action table is indexed by state and terminal symbols.
• goto table is indexed by state and non terminal symbols.

LR Parsers (cont.)
In building an LR Parser:
1) Create the Transition Diagram
2) Depending on it, construct: Goto and Action
 Go_to table defines the next state after a shift.
 Action table tells parser whether to:
1) shift (S),
2) reduce (R),
3) accept (A) the source code, or
4) signal a syntactic error (E).
LR Parsers (Cont.)
 An LR parser makes shift-reduce decisions by maintaining states to keep track of
where we are in a parse.
 States represent sets of items.
 LR(k) Parsers:
 4 types of LR(k) parsers:
i. LR(0)
ii. SLR(1) –Simple LR
iii. LALR(1) – Look Ahead LR and
iv. CLR(1) – Canonical LR
LR Parsers (Cont.)
 In order to construct parsing table of LR(0) and SLR(1) we use canonical collection of
LR(0) items
 In order to construct parsing table of LALR(1) and CLR(1) we use canonical
collection of LR(1) items.
i. LR(0) Item
 LR(0) and all other LR-style parsing are based on the idea of: an item of the form:
A→X1…Xi.Xi+1…Xj
 The dot symbol . in an item may appear anywhere in the right-hand side of a
production.
 It marks how much of the production has already been matched.
 An LR(0) item (item for short) of a grammar G is a production of G with a dot at
some position of the RHS.
 The production A → XYZ yields the four items:
A → .XYZ this means at RHS we have not seen anything
A → X . YZ this means at RHS we have seen X
A → XY . Z this means at RHS we have seen X and Y
A → XYZ . this means at RHS we have seen everything
The production A → λ generates only one item, A → .
Constructing Canonical LR(0) item sets
• Augmented Grammar
• If G is a grammar with starting symbol S, then G’ (augmented grammar for G) is a grammar with a new
starting symbol S‘ and productions S’-> .S .
– The purpose of this new starting production is to indicate to the parser when it should stop
parsing and announce acceptance of input.
– Let a grammar be
E→BB
B→ cB | d
• The augmented grammar for the above grammar will be

E’ → .E
B → .BB
B → .cB
B → .d
Constructing canonical LR(0) item sets…
 Closure of a state
 Closure of a state adds items for all productions whose LHS occurs in an item
in the state, just after “.”
– Set of possible productions to be reduced next.
– Added items have “.” located at the beginning
– No symbol of these items is on the stack as yet

• Closure operation
– Let I be a set of items for a grammar G
– closure(I) is a set constructed as follows:

– Every item in I is in closure (I)
– If A → α.Bβ is in closure(I) and B → γ is a production the
B → .γ is in closure(I)
• Intuitively A → α.Bβ indicates that we expect a string derivable from Bβ in input
• If B → γ is a production then we might see a string derivable from γ at this point
Example
• For the grammar

E’ →E
E→E+T|T
T→T*F|F
F → ( E ) | id
If I is , E’ → .E then closure(I) is
• E’ →.E
E → .E + T
E → .T
T → .T * F
T → .F
F →.id
F → .(E)
 Goto operation
 Goto(I,X) , where I is a set of items and X is a grammar symbol,
– is closure of set of item A → αX.β
– such that A → α.Xβ is in I
• Example
• If I is , E’→E. , E → E. + T } then goto(I,+) is
E → E + .T
T → .T * F
T →.F
F →.(E)
F → .id
Steps to construct LR(0) items
 Step 1. Add Augment the given grammar
 Step 2. Draw canonical collection of LR(0) item (apply closure and go-to)
 Step 3. Number the production
 Step 4. Create parsing table
 Step 5. Stack implementation
 Step6. Draw parse tree

Constructing canonical LR(0) item sets: Example 1
 Consider the grammar:
E→BB
B →cB|d Input string: ccdd$
Step 1. Augment the given grammar
E’→E
E→BB
B →cB|d
Step 2. Draw canonical collection of LR(0) item
Step 3. Number the production
• Step 4. Create parsing table

Step 5. Stack implementation Step 6. Draw parse tree
Constructing canonical LR(0) item sets: Example #2
Consider the grammar:

E’ → E
E→E+T|T
T→T*F|F
F → (E) | id
I is the set of one item {E’→.E}
Find CLOSURE(I)
Constructing canonical LR(0) item sets: Example#2
(Cont.)
First, E’ → .E is put in CLOSURE(I) by rule 1.
I0=closure({[E’->.E]}
Then, E-productions with dots at the left end: E’->.E
E → ‧E + T and E → ‧T E->.E+T
Now, there is a T immediately to the right of a dot in E->.T
E → .T, so we add T → .T * F and T → .F T->.T*F
Next, T → .F forces us to add: T->.F
F → ‧(E) and F → .id
F->.(E)
F->.id
Goto Next State
 Given an item set (state) s,
 we can compute its next state, s’, under a symbol X,
that is, Go_to (s, X) = s’

E’ → E
E→E+T|T
T→T*F|F
F → (E) | id
S is the item set (state):

E→E.+T
Goto Next State (Cont.)
S’ is the next state that Goto(S, +) goes to:

E → E +.T
T → .T * F (by closure)
T → .F (by closure)
F → .(E) (by closure)
F → .id (by closure)
We can build all the states of the Transition Diagram this way.
LR(0) Transition Diagram (Cont.)
 Each state in the Transition Diagram,
 either signals a shift (.moves to right of a terminal) or
 signals a reduce (reducing the RHS handle to LHS)

Constructing canonical LR(0) item sets (cont.)
• Goto (I,X) where I is an item set and X is a grammar symbol is closure of set of all items [A→ αX. β]
where [A → α.X β] is in I
I1
• Example
E’->E.
E E->E.+T
I0=closure({[E’->.E]} I2
E’->.E T E’->T.
E->.E+T T->T.*F
E->.T I4
T->.T*F F->(.E)
E->.E+T
T->.F ( E->.T
F->.(E) T->.T*F
F->.id T->.F
F->.(E)
F->.id
Cont’d
E’->E
E -> E + T | T
T -> T * F | F
F -> (E) | id
SLR(1) Parsing Table
ii. Simple LR(1), SLR(1), Parsing
 Few number of states, hence very small table.
 Simple and fast construction.
 Works on smallest class of grammar.
 SLR(1) parsers can parse a larger number of grammars than LR(0).
 SLR(1) has the same Transition Diagram and Goto table as LR(0)
 BUT with different Action table because it looks ahead 1 token.
SLR(1) Look-ahead
 SLR(1) parsers are built first by constructing:
• Transition Diagram,
• then by computing Follow set as SLR(1) look aheads.
 The ideas is:
• A handle (RHS) should NOT be reduced to N

• if the look ahead token is NOT in follow(N)
Cont’d
• SLR (1) refers to simple LR Parsing. It is same as LR(0)
parsing. The only difference is in the parsing table. To construct
SLR (1) parsing table, we use canonical collection of LR (0)
item.
• In the SLR (1) parsing, we place the reduce move only in the
follow of left hand side.
SLR(1) Look-ahead: Example1
 From LR(0) example only reduce move are different.

 Consider the grammar given below SLR(1)
 S->AA
 A->aA|b
LR(0)
iii. LALR(l) - Look ahead LR (1)
 Preferred parsing technique in many parser generators
 Close in power to LR(1), but with less number of states
 Increased number of states in LR(1) is because
• Different lookahead tokens are associated with same LR(0) items
 Works on intermediate size of grammar.
 Number of states in LALR(1) = states in LR(0)
 LALR(1) is based on the observation that
• Some LR(1) states have same LR(0) items
• Differ only in lookahead tokens
 LALR(1) can be obtained from LR(1) by
 Merging LR(1) states that have same LR(0) items
 Obtaining the union of the LR(1) lookahead tokens
iV. LR( 1) or CLR(1) parser
 Also called as Canonical LR parser.

 More powerful LR parsers
 Works on complete set of LR(l) Grammar.
 Generates large table and large number of states.
 Slow construction.
 Use look ahead symbols for items: LR(1) items
 Results in a large collection of items
Summary : LR(0), SLR(1) , LALR(1) and CLR(1)
• LR(0)- Least powerful
• SLR(1) – Simple LR Parser:
– Works on smallest class of grammar
– Few number of states, hence very small table
– Simple and fast construction
• LALR(1) – Look-Ahead LR Parser:
– Works on intermediate size of grammar
– Number of states are same as in SLR(1)
• CLR(1) – LR Parser:
– Works on complete set of LR(1) Grammar
– Generates large table and large number of states
– Slow construction
Summary: LALR(l) - Look ahead LR (1)…
 Relative power of various classes

• SLR(1) ≤ LALR(1) ≤ CLR(1)
• SLR(k) ≤ LALR(k) ≤ CLR(k)
• LL(k) ≤ LR(k)
Exercise
Q1. Construct LL(1) parse table for the expression grammar

bexpr  bexpr or bterm | bterm
bterm  bterm and bfactor | bfactor
bfactor  not bfactor | ( bexpr ) | true | false
 Steps to be followed
– Remove left recursion
– Compute FIRST sets
– Compute FOLLOW sets
– Construct the parse table
Exercise Cont.…
Q2.Consider the following grammar
S→(L)|a
L → L,S|S Parse the input string (a,(a,a)) using shift reduce parsing.
Q3. Construct SLR parse table for following grammar

E →E + E | E - E | E * E | E / E | ( E ) | digit Show steps in parsing of string 9*5+(2+3*7)
Steps to be followed
– Augment the grammar
– Construct set of LR(0) items
– Construct the parse table
– Show states of parser as the given string is parsed

Chapter 3 - Syntax Analysis

Uploaded by

Chapter 3 - Syntax Analysis

Uploaded by

Chapter- Three

 Introduction of syntax  Parsing techniques

 Syntax Analysis is the second phase of compiler

 Syntax Analysis is also known as parser.

 The output is a parse tree.

 The syntax of a programming is described by a context-free grammar (CFG).

validates the output tokens against the grammar.

rules implied by a CFG or not.

o If it satisfies, the parser creates the parse tree of that program.

o Otherwise the parser gives the error messages.

o gives a precise syntactic specification of a programming language.

o the design of the grammar is an initial phase of the design of a compiler.

o a grammar can be directly converted into a parser by some tools.

 Major task conducted during parsing(syntax analysis), such as

occurring errors to continue processing the rest of the program.

 We categorize the parsers into two groups:

– LL for top-down parsing

– LR for bottom-up parsing

 Error handler goals:

 Report the presence of errors clearly and accurately

 Recover from each error quickly enough to detect subsequent errors

 Add minimal overhead to the processing of correct programs

 Lexical errors: include misspellings of identifiers, keywords, or operators

• An example is a return statement in a Java method with result type void.

 This is the easiest way of error-recovery.

 A typical local correction is to replace a comma by a semicolon,

 insert a missing semicolon.

 By anticipating common errors that might be encountered, we

 Compiler to make as few changes as possible in processing an incorrect input

 Given an incorrect input string x and grammar G,

o these algorithms will find a parse tree for a related string y,

o such that the number of insertions, deletions, and changes of tokens

required to transform x into y is as small as possible.

syntactic structure of the programming language, we use the CFG.

 CFG- is used to define the syntactic structure of a programming language

 In this grammar, the terminal symbols are: id+-*/()

|if expression then statement else statement

 A sentence of L(G) is a string of terminal symbols of G.

 If S is the start symbol of G then

 w is a sentence of L(G) iff S  w where w is a string of terminals of G.

 If G is a context-free grammar, L(G) is a context-free language.

 Two grammars are equivalent if they produce the same language.

 S - If  contains non-terminals, it is called as a sentential form of G.

- If  does not contain non-terminals, it is called as a sentence of G.

 A grammars are a more powerful notation than regular expressions.

 Every construct that can be described by a regular expression can be described by a

grammar, but not vice-versa.

 Alternatively, every regular language is a context-free language, but not vice-versa.

 Associativity and precedence are captured in the following grammar, Expressions,

 Here, E represents expressions of all types.

o to able to do this, we have to have a |id

 If we always choose the right-most non-terminal in each derivation step, this

ii. G: E E+E |(E) |-E Input String: -(id+id+id)

E  -E  -(E)  -(E+E)  -(E+E+id)  -(E+id+id)  -(id+id+id)

Q1. Consider the Context free grammar:

G: SS+S|S-S|a|b|c and the Input string: a-b+c

a) Give a leftmost derivation for the string.

b) Give a rightmost derivation for the string.

 Properties of parse tree:

 if A is a non-terminal labeling an internal node and x 1, x2, …xn are labels of

children of that node then A  x1 x2 … xn is a production

G: E E+E |(E) |-E |id Input String: -(id+id)

LMD will be: E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)

• Construct parse tree for the following grammar:

E  E*E  E+E*E  id+E*E E

 unique selection of the parse tree for a sentence

– Enforce associativity and precedence

 Ambiguous grammars (because of ambiguous operators) can be disambiguated

 If an operand has operator on both the sides, the side on which

+, -, *, / are left associative

^, = are right associative

e.g. 1+2+3 first we evaluate (1+2)+3 left associative

1 ^2 ^3 = 1 ^(2^ 3) right associative

a=b=c right associative

 Whenever an operators has a higher precedence then the other operators

e.g. How the expression a*b+c will be interpreted

E  EE  E+EE  id+E*E E

 (ab)+c multiplication is evaluated first because has higher precedence than +

E→E+E|EE|id $E id+id$ Shift id

Parse the input string idid+id $Eid +id$ Reduce Eid

using shift reduce parsing. $EE +id$ Reduce EEE