Unit - Iii
Unit - Iii
Unit - Iii
GRAMMARS FORMALISM
Definition of Grammar:
A phrase-Structure grammar (or simply a grammar) is (V, T, P, S), Where
(i) V is a finite nonempty set whose elements are called variables.
(ii) T is finite nonempty set, whose elements are called terminals.
Computer Science & Engineering Formal Languages and Automata Theory
(iv) P is a finite set whose elements are α β ,where α and β are strings
Example:
1. Construct the regular grammar for regular expression r=0(10) *.
Sol: Right-Linear Grammar:
S0A
A10A| ∈
Left-Linear Grammar:
2
Computer Science & Engineering Formal Languages and Automata Theory
SS10|0
2. Context – Free Grammar:
A context-free grammar (CFG or just grammar) is defined formally as
G=(V,T,P,S)
Where
V: a finite set of variables (“non-terminals”); e.g., A, B, C, …
T: a finite set of symbols (“terminals”), e.g., a, b, c, …
P: a set of production rules of the form A , where A V and (V U T)*
S: a start non-terminal; S V
Eg: E E+E
E E*E
E (E)
E id
In the above example, grammar tuples are defined as follows:
G=({E},{+,*,(,),id},{ E E+E, E E*E, E (E), E id},E).
In this chapter we use the following conventions regarding grammars.
1) The capital letters A,B,C,D,E and S denote variables; S is the start
symbol unless otherwise stated.
2) The lowe-case letters a,b,c,d,e,digits,special symbols and boldface strings
are terminals.
3) The capital letters X,Y and Z denote symbols that may be either
terminals or varibales.
4) The lower-case letters u,v,w,x,y and z denote strings of terminals.
5) The lower-case Greek letters α,β and γ denote strings of varibles and
terminals.
3
Computer Science & Engineering Formal Languages and Automata Theory
Examples:
1. Construct a CFG to generate set of palindromes over alphabet {a,b}.
S aSb | bSb |
S aSb | bSb | a | b
Solution: L={ab,aabb,aaabbb,…}
4
Computer Science & Engineering Formal Languages and Automata Theory
S aSb | ab
Solution: L={,00,11,0110,1001,010010,…}
S 0S0 | 1S1 |
Solution: L={#,0#0,1#1,01#10,10#01,010#010,…}
S 0S0 | 1S1 |#
Solution: L={,a,b,aa,ab,ba,bb,aaa,aab,bbb,bba,…}
S aS | bS |
2.1 DERIVATION
if necessary.
Example: 011AS 0110A1S
5
Computer Science & Engineering Formal Languages and Automata Theory
Types of Derivations:
Derivation/Parse Trees:
6
Computer Science & Engineering Formal Languages and Automata Theory
Sol: Before constructing parse tree, first derive the given input string from the
CFG.
Parse tree:
S AS |
A 0A1 | A1 | 01
The above CFG, the string 00111 has the following two leftmost derivations
from S.
Sol:
7
Computer Science & Engineering Formal Languages and Automata Theory
S S a S b
a S b a S b
a S b
8
Computer Science & Engineering Formal Languages and Automata Theory
Example:
If all productions of a CFG are of the form AxB or Ax, then L(G) is a
regular set where x is a terminal string.
3. SIMPLIFICATION OF CFG
In a CFG we may not use all the symbols for deriving a sentence. So, we
eliminate symbols are productions in G, which are not useful.
9
Computer Science & Engineering Formal Languages and Automata Theory
Examples:
1. Grammar G:
S aS | bA
A aA | , from this grammar eliminate -productions.
Solution:
S aS, S bA gives S bA and S b
A aA gives A aA and A a
After elimination of -productions, the final grammar is
S aS | bA | b
A aA | a
2. Grammar G:
S AaB | aaB
A
B bbA | , from this grammar eliminate - productions and
then eliminate useless symbols.
Solution:
The given grammar is
S AaB | aaB | aB | Aa | a | aa
B bbA | bb
In this grammar Variable A is there, but it is not producing anything. So
that it can eliminated.
The remaking productions are
S aaB | aB | a | aa
B bb
In this grammar no symbol is useless, then the final productions are,
S aaB | aB | a | aa
B bb
Here A, B, C are the unit variables of length one. Then the resultant grammar
is S d. This is called the chain rule.
11
Computer Science & Engineering Formal Languages and Automata Theory
Example:
1. Eliminate unit productions from the following grammar.
S A | bb
A B |b
BS|a
Solution:
In the given grammar, the unit productions are S A, A B and B S.
S A gives S b.
S A B gives S B gives S a.
A B gives A a
A B S gives A S gives A bb.
B S gives B bb.
B S A gives B A gives B b.
The new productions are
S bb | b | a
A b | a | bb
B a | bb | b
It has no unit productions. In order to get the reduced CFG, we have to
eliminate the useless symbols. From the above grammar we can eliminate the
A and B productions.
Then the resultant grammar is S bb | b | a.
12
Computer Science & Engineering Formal Languages and Automata Theory
Examples:
1. Eliminate useless symbols from the grammar
S AB | a
Aa
Solution:
Here we find no terminal string is derivable from B. So that B is to be
eliminated from productions S AB.
Remaking productions are
Sa
Aa
By rule 2, Here A is not useful to derive a string from starting symbol S.
So we can eliminate A a.
The final production is
Sa
2. Eliminate useless symbols from the grammar
S aS | A | C
Aa
B aa
C aCb
Solution:
By rule 2, B is not useful to derive a string from starting symbol S. So we can
eliminate
B aa.
The Remaking productions are,
S aS | A | C
Aa
C aCb
By rule 1, C is not useful to derive some terminal string. So we can
eliminate
S C and CaCb productions.
13
Computer Science & Engineering Formal Languages and Automata Theory
S aS | A
Aa
6. NORMAL FORMS
In a Context Free Grammar, the right hand side of the production can be
any string of variables and terminals. When productions in G satisfy certain
restrictions, then G is said to be in a Normal Form.
There are two widely useful Normal forms of CFG. They are
i. Chomsky Normal Form (CNF)
ii. Greibach Normal Form ( GNF )
A BC or
Aa
where ‘a’ is a terminal, A,B,C are non-terminals, and B,C may not be the start
variable (the axiom)
Note:
1. In CNF number of symbols on right side of production strictly limited.
2. The rule S, where S is the start variable, is not excluded from a CFG
in Chomsky normal form.
Conversion to Chomsky normal form:
Theorem: For every CFG, there is an equivalent grammar G in Chomsky
Normal Form.
Proof:
Construction of grammar in CNF.
14
Computer Science & Engineering Formal Languages and Automata Theory
Step 1:
Eliminate null productions and unit productions.
Step 2:
Eliminate terminals on right hand side of productions as follows.
i. All the productions in P of the form A a and A BC are
included.
ii. Consider A w1w2….wn will some terminal on right hand side. If wi
is a terminal say ai, add a new variable cai and cai P. Repeat same
for all terminals.
Step 3:
Restricting the number of variables on RHS as follows:
i. All the productions in P are added to P, if they are in the required
form.
ii. Consider A A1A2A3 … Am, then we introduce new productions
are,
A A1C1
C1 A2C2
C2 A3C3
Cm-2 Am-1Cm-1
Example:
Convert the following CFG to Chomsky Normal Form (CNF):
S aX | Yb
XS|
Y bY | b
Solution:
Step 1 - Kill all productions:
By inspection, the only nullable non terminal is X.
Delete all productions and add new productions, with all possible
combinations of the nullable X removed.
15
Computer Science & Engineering Formal Languages and Automata Theory
16
Computer Science & Engineering Formal Languages and Automata Theory
Example: Convert the following grammar G into Greibach Normal Form (GNF).
S XA|BB
B b|SB
17
Computer Science & Engineering Formal Languages and Automata Theory
Xb
Aa
Solution:
To write the above grammar G into GNF, we shall follow the following steps:
1. Rewrite G in Chomsky Normal Form (CNF)
It is already in CNF.
2. Re-label the variables
S with A1
X with A2
A with A3
B with A4
After re-labeling the grammar looks like:
A1 A2A3|A4A4
A4 b|A1A4
A2 b
A3 a
3. Identify all productions which do not conform to any of the types listed
below:
Ai Ajxk such that j > i
Zi Ajxk such that j ≤ n
Ai axk such that xk € V* and a € T
4. A4 A1A4 ................ identified
5. A4 A1A4|b.
To eliminate A1 we will use the substitution rule A1 A2A3|A4A4.
Therefore, we have A4 A2A3A4|A4A4A4|b
The above two productions still do not conform to any of the types in step
3.
Substituting for A2 b
A4 bA3A4|A4A4A4|b
18
Computer Science & Engineering Formal Languages and Automata Theory
o substitution
o union
o concatenation
o Kleene star
19
Computer Science & Engineering Formal Languages and Automata Theory
o homomorphism
o reversal
o intersection with a regular set
o inverse homomorphism
o intersection
L1 = {anbnci | n, i ≥ 0} and L2 = {aibncn | n, i ≥ 0 } are CFL's. But
L = L1 ∩ L2 = {anbncn | n ≥ 0 } is not a CFL.
o complement
Suppose comp (L) is context free if L is context free. Since L1
∩ L2 = comp (comp (L1) ∪ comp (L2)), this would imply the
CFL's are closed under intersection.
o difference
Suppose L1 – L2 is a context free if L1 and L2 are context free.
If L is a CFL over Σ, then comp (L) = Σ* - L would be context
free.
Pumping Lemma for CFL’s is used to show that certain languages are non
context free. There are three forms of pumping lemma.
20
Computer Science & Engineering Formal Languages and Automata Theory
Example:
1. The language L = { anbncn | n ≥ 0 } is not context free.
Solution: Refer Class Notes
The proof will be by contradiction. Assume L is context free. Then by the
pumping lemma there is a constant n associated with L such that for all z in L
with |z| ≥ n, z can be written as uvwxy such that
1. vx ≠ ε,
2. |vwx| ≤ n, and
3. for all i ≥ 0, the string uviwxiy is in L.
Consider the string z = anbncn.
From condition (2), vwx cannot contain both a's and c's.
o Two cases arise:
1. vwx has no c's. But then uwy cannot be in L since at least
one of v or x is nonempty.
2. vwx has no a's. Again, uwy cannot be in L.
o In both cases we have a contradiction, so we must conclude L
cannot be context free.
21