ATCD PPT Module-5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 112

21CS51 – AUTOMATA THEORY AND COMPILER DESIGN

MODULE -5
Turing Machine and Undecidability
Other Phases of Compilers: Syntax Directed Translation- Syntax-
Directed Definitions, Evaluation Orders for SDD’s. Intermediate-Code
Generation- Variants of Syntax Trees, Three-Address Code. Code
Generation- Issues in the Design of a Code Generator

1
Algorithms and Decision Procedures for CFLs
The Decidable Questions:
Membership:
Given the context free language L and a string w, there exist a decision
procedure that answers the question “is w in L?”. There are two
approaches:

1) Find a context free grammar to generate it.


2) Find a PDA to accept it
Algorithms and Decision Procedures for CFLs
Using a Grammar to decide:
• Using facts about every derivation that is produced by a grammar in
Chomsky Normal Form(CNF).
• We can construct an algorithm that explores a finite number of
derivation paths and find one that derives a particular string w if such
a path exist.
Algorithms and Decision Procedures for CFLs
Using a Grammar to decide:
decideCFLusingGrammar (L:CFL, w:string)
If w = Ꜫ then if SG is nullable then accept, else reject.

If w != Ꜫ then
Construct G’ in CNF such that L(G’) = L(G) – {Ꜫ}
If G derives w, It does not so in 2|w|-1 steps. Try all derivation in G
of 2|w|-1 steps.
If one of them derives w, accept otherwise reject.
Algorithms and Decision Procedures for CFLs
Using a PDA to decide:
We take a two-step approach.
• We first show that, for every context-free language L, it is possible to
build a PDA that accepts L - {Ꜫ} and that has no Ꜫ -transitions.
• Then we show that every PDA with no Ꜫ -transitions is guaranteed to
halt.
Algorithms and Decision Procedures for CFLs
Emptiness and Finiteness:
Decidability of Emptiness and Finiteness:

Given a context-free language L, there exists a decision procedure that


answers each of the following questions:
1. Given a context-free language L, is L = ɸ?
2. Given a context-free language L, is L infinite?
Algorithms and Decision Procedures for CFLs
Decidability of Emptiness:

decideCFLempty(G: context-free grammar) =


1. Let G’ = removeunproductive(G).
2. If S is not present in G’ then return True else return False.
Algorithms and Decision Procedures for CFLs
Decidability of Finiteness :

decideCFLinfinite(G: context-free grammar) =


1. Lexicographically enumerate all strings in Σ * of length greater than
bn and less than or equal to bn+1 + bn .
2. If, for any such string w, decideCFL(L, w) returns True then return
True. L is infinite.
3. If, for all such strings w, decideCFL(L, w) returns False then return
False. L is not infinite.
Algorithms and Decision Procedures for CFLs
The Undecidable Questions:
• Given a context-free language L, is the complement of L context-free?
• Given a context-free language L, is L regular?
• Given two context-free languages L1 and L2, is L1 = L2?
• Given a context-free language L, is L inherently ambiguous?
Algorithms and Decision Procedures for CFLs
Algorithms for Context Free Languages:
• removeunproductive(G: context-free grammar)
• removeunreachable(G: context-free grammar)
• removeEps(G: context-free grammar)
• removeUnits(G: context-free grammar)
• converttoChomsky(G: context-free grammar)
• converttoGreibach(G: context-free grammar)
• cfgtoPDA(G: context-free grammar):
Algorithms and Decision Procedures for CFLs
Decision procedures for Context Free Languages:
• decideCFLusingPDA(L: CFL, w: string)
• decideCFLusingGrammar(L: CFL, w: string)
• decideCFL(L: CFL, w: string)
• decideCFLempty(G: context-free grammar, w: string)
• decideCFLinfinite(G: context-free grammar, w: string)
Turing Machine Model
The Turing machine can be thought of as finite control connected to a
R/W (read/write) head.
It has one tape which is divided into a number of cells.
Turing Machine Model
In one move, the machine examines the present symbol under the R!W
head on the tape and the present state of an automaton to determine.
(i) a new symbol to be written on the tape in the cell under the R/W
head,
(ii) a motion of the R/W head along the tape: either the head moves
one cell left (L). or one cell right (R),
(iii) the next state of the automaton, and
(iv) whether to halt or not.
Turing Machine Definition
A Turing machine M is a 7-tuple, namely (Q, Σ, Γ, δ, q0, b, F), where

Q is a finite nonempty set of states.


Σ is a nonempty set of input symbols and is a subset of Γ and b !∈ Σ.
Γ is a finite nonempty set of tape symbols,
δ is the transition function mapping (q, x) onto (q’, y, D) where D
denotes the direction of movement of R/W head: D =L or R according
as the movement is to the left or right.
q0 ∈ Q is the initial state
b ∈ Γ is the blank.
F ⊆ Q is the set of final states.
Transition function
• One move (denoted by |---)
in a TM does the following:
• (q,X) = (p,Y,D)
X / Y,D
• q is the current state q p
• X is the current tape symbol pointed by tape head
• State changes from q to p
• After the move: You can also use:
➔ for R
• X is replaced with symbol Y  for L
• If D=“L”, the tape head moves “left” by one
position.
Alternatively, if D=“R” the tape head moves “right”
by one position.
15
Way to check for Membership

• Is a string w accepted by a TM?

• Initial condition:
• The (whole) input string w is present in TM, preceded and followed by infinite
blank symbols
• Final acceptance:
• Accept w if TM enters final state and halts
• If TM halts and not final state, then reject

16
Representation Of Turing Machines
(i) Instantaneous descriptions using move-relations.
(ii) Transition table. and
(iii) Transition diagram (transition graph).
Example: L = {0n1n | n≥1}

• Strategy: w = 000111
q2
q0 …
B B 0 0 0 1 1 1 B B … B B X X 0 Y Y 1 B B …
… …
q1 q1 …
… B B X 0 0 1 1 1 B B … B B X X X Y Y 1 B B …
q2 … …
… B B X 0 0 Y 1 1 B B … B B X X X Y Y Y B B …
q0 …
… B B X X 0 Y 1 1 B B …
B B X X X Y Y Y B B …
q1
Accept 18
TM for {0n1n | n≥1} 1. Mark next unread 0 with X and
move right
Y / Y,R
0 / 0,R
2. Move to the right all the way to
the first unread 1, and mark it with
0 / X,R Y
q0 q1
3. Move back (to the left) all the way
1 / Y,L to the last marked X, and then
Y / Y,R
X / X,R
move one position to the right
q2
4. If the next position is 0, then goto
Y / Y,R q3
step 1.
B / B,R Y / Y,L Else move all the way to the right
0 / 0,L to ensure there are no excess 1s. If
q4
not move right to the next blank
symbol and stop & accept.
19
*state diagram representation preferred

TM for n
{0 1n | n≥1}
Next Tape Symbol

Curr. 0 1 X Y B
State
q0 (q1,X,R) - - (q3,Y,R) -

q1 (q1,0,R) (q2,Y,L) - (q1,Y,R) -

q2 (q2,0,L) - (q0,X,R) (q2,Y,L) -

q3 - - - (q3,Y,R) (q4,B,R)

*q4 - -- - - -

Table representation of the state diagram 20


Variants Of Turing Machines
The Turing machine we have introduced has a single tape. δ(q, a) is
either a single triple (p, y, D), where D = R or L, or is not defined. We
introduce two new models of TM:
(i) a TM with more than one tape
(ii) TM where δ(q. a) ={(P1, Y1, D1 ), (P2, Y2, D2 )…. (Pr, Yr, Dr )} The
first model is called a multitape TM and the second a
nondeterministic TM.
Multi-tape Turing Machines

• TM with multiple tapes, each tape with a separate head


• Each head can move independently of the others

control

k separate heads

Tape 1 … …

… …
Tape 2

Tape k … … 22
Non-deterministic TMs

• A TM can have non-deterministic moves:


• (q,X) = { (q1,Y1,D1), (q2,Y2,D2), … }
• Simulation using a multitape deterministic TM:
Control

Input tape
ID1 ID2 ID3 ID4
Marker tape * * * *
Scratch tape

23
DECIDABILITY

• A recursive language is a formal language for which there exists


a Turing machine that, when presented with any finite input
string, halts and accepts if the string is in the language, and
halts and rejects otherwise.
DECIDABILITY

• A recursively enumerable language is a formal language for


which there exists a Turing machine (or other computable
function) that will halt and accept when presented with any
string in the language as input but may either halt and reject or
loop forever when presented with a string not in the language.
DECIDABILITY
• A Language L is decidable if it is recursive language. All
decidable languages are recursive languages and vice versa.
• A Language id Undecidable if it is not decidable.
DECIDABILITY
Definition:
A language L ⊆ Σ* is recursively enumerable if there exists a TM M.
such that L = T(m).

Definition:
A language L ⊆ Σ* is recursive if there exists some TM M that satisfies
the following two conditions.

(i) If w ∈ L then M accepts w (that is. reaches an accepting state on


processing w) and halts.
(ii) If w !∈ L then M eventually halts. without reaching an accepting
state.
DECIDABLE LANGUAGES
Theorem: ADFA is decidable.
Proof:
To prove the theorem. we have to construct a TM that always halts
and also accepts ADFA. Note that a DFM B always ends in some state of B after
n transitions for an input string of length n.

We define a TM M as follows:
1. Let B be a DFA and w an input string. (B, w) is an input for the Turing
machine M.
2. Simulate B and input H' in the TM M.
3. If the simulation ends in an accepting state of B. then M accepts w. If it ends
in a nonaccepting state of B, then M rejects w.
DECIDABLE LANGUAGES
Theorem: ACFG is decidable.
Proof:
We convert a CFG into Chomsky Normal form. Then any derivation of w of
length k requires 2k - 1 steps if the grammar is in CNF. Now we design a TM M
that halts as follows.
1. Let G be a CFG in Chomsky normal form and w an input string. (G, w) is an
input for M.
2. If k = 0, list all the single-step derivations. If k !=0, list all the derivations with
2k - 1 steps.
3. If any of the derivations in step 2 generates the given string w, M accepts (G,
w). Otherwise M rejects.
UNDECIDABLE LANGUAGES
• There is an existence of languages that are not recursively enumerable and
address the undecidability of recursively enumerable languages.

• Theorem: There exists a language over Σ that is not recursively enumerable.


• Proof A language L is recursively enumerable if there exists a TM M such
that L =T(M). As Σ is finite, Σ * is countable.
• As a Turing machine M is a 7-tuple (Q, Σ, Γ, δ, q0, b, F), and each member of
the 7-tuple is a finite set M can be encoded as a string. So the set L of all
TMs is countable.
• We prove this by contradiction. If L were countable then L can be written as a
sequence {L1, L2, L3, ... }
UNDECIDABLE LANGUAGES

• As L is countable; L should have some members not corresponding


to any TM in L. This proves the existence of a language over TM in L
is not recursively enumerable.
UNDECIDABLE LANGUAGES
• Theorem: ATM is undecidable.
• Proof: We can prove that ATM is recursively enumerable. Construct a
TM U as follows:
• (M, w) is an input to U. Simulate M on w. If M enters an accepting
state, U accepts (M, w). Hence ATM is recursively enumerable.
• We prove that ATM is undecidable by contradiction. We assume that
ATM is decidable by a TM H that eventually halts on all inputs. Then
UNDECIDABLE LANGUAGES
• We construct a new TM D with H as subroutine. D calls H to
determine what M does when it receives the input <M>, the encoded
description of M as a string.
• Based on the received information on (M, <M>), D rejects M if M
accepts <M> and accepts M if M rejects <M>. D is described as
follows:
Chomsky Language Hierarchy
• Other Phases of Compilers: Syntax Directed Translation- Syntax-
Directed Definitions, Evaluation Orders for SDD’s. Intermediate-Code
Generation- Variants of Syntax Trees, Three-Address Code. Code
Generation- Issues in the Design of a Code Generator
Attributes in Syntax Directed Definition
• Each nonterminal ( an intermediate node in the parse
tree) has associated properties or attributes
• Example attributes could be:
• Name – Type,
• Memory Location – Font-size, etc
• For each separate instance of the grammar symbol,
there will be separate instance of the attribute.
• A node in the parse tree for the grammar symbol will
be a record holding these attributes as fields
• Value of the attribute is defined by Semantic Rule
associated with the production used at that node
36
Types of Attributes E2 .val

+
1. Synthesized Attributes E1 .val num .val
• An attribute of a grammar symbol is synthesized if it is computed from the
attributes of its children in the parse tree corresponding to the rightmost
derivation
• Example: E --> E + num | num
• E2.val = E1.val + num.val ( val of E2 computed based on the VAL of E1 and
num)
• Used Extensively in LR parsing practice
• A syntax directed Definition that uses Synthesized Attributes is known as “S –
attributed Definition” (S – attributed Definitions are based on LR Grammar)

37
D.type

2. Inherited Attributes T.type L.type


• An attribute of a grammar symbol is inherited, if it is
computed from the attributes of its parents and
siblings L.ty
• D → T {$2.type:=$1.type} L pe
• L → id , {$3.type:=$1.type} L
• Typical for LL parsing (top down). id , L.ty
pe
• Convenient for expressing the dependence of a
programming language construct on the context in
which it appears
• More natural to use with Syntax Directed Definition
(Although it is possible to write a syntax directed
Definition using only synthesized attributes)

38
Semantic Rules
• Define the value of an attribute associated with a
grammar symbol
• Associated with the grammar rule used at that node

Production Semantic Rule ▪ Associates an integer


valued synthesized attribute
L→En Print (E: Val) called val with each
E → E1 + T E.Val = E1.val + T.val nonterminal E,T, & F
▪ The token digit has a
E→T E.val = T.val synthesized attribute lexval
whose value will be supplied
T → T1 *F T.Val = T1.val x F.val
by the Lexical Analyzer
T→F T.val = F.val
F→(E) F.val = E.val
F → digit F.Val = digit.lexval
Semantic Rules
• Define the value of an attribute associated with a grammar
symbol
• Associated with the grammar rule used at that node
Production Semantic Rule ▪ Notes:
▪ Associates an integer
L→En Print (E: Val)
valued synthesized
E → E1 + T E.Val = E1.val + T.val attribute called val with
E→T E.val = T.val each nonterminal E,T, & F
▪ The token digit has a
T → T1 *F T.Val = T1.val x F.val synthesized attribute lexval
T→F T.val = F.val whose value will be
supplied by the Lexical
F→(E) F.val = E.val
Analyzer
F → digit F.Val = digit.lexval
S-attributed definition Example
• The Parse Tree for 3* 5+ 4 Using L
Production rules below is:

Production E
L→En
E T
E→E+T
+

E→T T F
T → T *F Digit
T * F
T→F
F→(E) F Digit
F → digit
Digit 41
S-attributed definition Example
(Contd.) L
Semantic Rule
E.Val
E = 19
Print (E: Val)
E.Val = E1.val
E.ValE= 15 + T.ValT = 4
+ T.val
E.val = T.val F.ValF= 4
T.ValT = 15
T.Val = T1.val x
Digit = 4
digit.lexVal
F.val T.ValT = 3 * F.ValF= 5
T.val = F.val
F.ValF = 3 Digit
digit.lexVal =5
F.val = E.val
F.Val = Digit = 3
digit.lexVal
digit.lexval
42
Attribute grammar Example
• CFG for signed binary numbers:
• (1) Number →Sign List , (2) Sign → + | –
• (3) List → List Bit | Bit , (4) Bit → 0 | 1
◼ Parse tree for string “ - ◼ Parse tree for string “ -
1” is: 101” is:
1. Number → Sign List 1. Number → Sign List
2. → Sign Bit 2. → Sign List Bit
3. → Sign 1 3. → Sign List 1
4. →–1 4. → Sign List Bit 1
5. → Sign List 0 1
6. → Sign Bit 0 1
7. → Sign 1 0 1
8. → – 101
43
Attribute Grammar Example (Contd.)
◼ Attribute Grammar for the CFG of signed binary numbers is :

44
Attribute Grammar Example
(Contd.)
Attribute Grammar Example (Contd.)
Val = – 5
Pos = 0
Neg =True Val = 5
Pos =0
Val =1
Pos = 1
Val = 4
Pos = 2
Val =4
Pos =1
Val = 0
Pos = 2
Val = 4
Attribute Grammar Example (Contd.)
Val = – 5 • The example on
the side Shows
Pos = 0 the decorated
Neg =True Val = 5 parse tree with
Pos = 0 attribute
Val = 1 dependencies
shown with
Pos = 1 arrows.
Val = 4
• If the parse tree is
taken out,
Pos = 2
Val =4 • What is left is a
Pos = 1
Val = 0
DEPENDENCY
Pos = 2 GRAPH
Val = 4
Dependency Graph
• A directed graph that depicts the interdependencies
among the inherited and synthesized attributes at the
nodes in a parse tree
• Example:
• Consider the grammar production E → E1 + E2 and the
Corresponding semantic rule E.val : = E1.val+ E2.val
• The parse tree would be:
Val

The Dependency Graph would E
be shown separately as
● +
E1 Val E2 ●
Val
Dependency Graph
• A directed graph that depicts the interdependencies
among the inherited and synthesized attributes at
the nodes in a parse tree
• The order in which the semantic rules are evaluated
is decided by the earlier node to the later node.
• The dependency graph is constructed topologically;
• Grammar is used to construct the parse tree from the
input
• We then obtain an order of evaluation of semantic rules.
• Translation of the semantic rules in this order yields the
translation of the input string.
Algorithm For constructing Dependency graph
for each node in the parse tree do
for each attribute a of the grammar symbol at n do
Construct a node in the dependency Graph for
a;
for each node n in the parse tree do
for each semantic rule b:= f(c1, c2, ….ck) associated
with the production used at n do
for i := 1 to k do
Construct an edge from the node ci to the
node for b;
50
Abstract Syntax Tree (AST)
• A condensed form of a parse tree useful in representing language
constructs of a programming language
• An intermediate representation that allows translation to be
decoupled from parsing
• For Example: The production: S → if B then S1 else S2
will have a syntax tree as:

If-then-else Similarly, 3 * 5 + 4 +
will result in
* 4
B S1 S2
3 5
AST versus Parse Tree
• While a parse tree keeps track of all productions used to build
the parse tree, an AST is a condensed form of this with just the
information needed in later stages of compilation or
interpretation.
AST Construction for Expressions
• Similar to the translation of expression into postfix form
• Sub-trees are constructed for the sub-expressions by
creating a node for each operator & operand
• The children of the Operator node are the nodes
representing the sub-expressions constituting the operands
of the operator
• In a node for operator, ( often called the Label of the node)
One field identifies the operator & the rest contain pointer
to the nodes of operands
• Each node in the syntax tree can be represented as a
record with several fields
Constructing AST (Contd.)
• Following functions are used to create the nodes of AST. ( Each
function returns pointer to the newly created node)
1. mknode (op, left, right)
 Creates an operator node with label op and two fields
containing pointers left and right.
2. mkleaf (id, entry)
 Creates an identifier node with label id and a field containing entry
, a pointer to the symbol table entry.
3. mkleaf (num, entry)
 Creates a number node with label num and a field containing val,
the value of the number
Constructing AST – An Example
+ | |
Sentence a – 4 + c

- | | id |
to entry for c
num| 4
id |
to entry for a p1 := mkleaf (id, entry a)
The tree is built bottom-up using p2 := mkleaf (num, 4)
p3 := mknode (‘-’, p1, p2)
the function call sequence →
p4:= mkleaf (id, entry c)
p5 := mknode (‘+’, p3, p4 )
A SDD for Constructing AST
PRODUCTION SEMANTIC RULE
E→E+T E.nptr := mknode ( ‘+’ E.nptr, T.nptr )
E→E–T E.nptr := mknode ( ‘–’ E.nptr, T.nptr )
E→T E.nptr := T.nptr
T→(E) T.nptr := (E.nptr )
T → id T.nptr := mkleaf ( id, id, entry )
T → num T.nptr := mkleaf ( num, num, val)
• Uses underlying productions of the grammar to schedule
calls to the functions mknode and mkleaf
NOTE:
• The synthesized attribute nptr for E and T keeps track of
the pointers returned by function calls
A SDD for Constructing AST
E. Nptr - + | | T. nptr
E. Nptr +
id .nptr
T. nptr
- | | id |
num .nptr
to entry for c
E. nptr num| 4
T. nptr PRODUCTION SEMANTIC RULE
id .nptr E → E1 + T E.nptr := mknode ( ‘+’ E1.nptr, T.nptr )
E → E1 – T E.nptr := mknode ( ‘–’ E1.nptr, T.nptr )
E→T E.nptr := T.nptr )
id | T→(E) T.nptr := E.nptr )
T → id T.nptr := mkleaf ( id, id, entry )
to entry for a T → num T.nptr := mkleaf ( num, num, val)
Directed A-cyclic graph (DAG)
• Identifies common sub-expressions
• Like syntax tree,
• it has an interior node representing an operator and
two children as operands; left & right.
• The only difference is that the node in a DAG
representing a common sub-expression has more than
one parent in the syntax tree.
• The common sub-expression would be
represented as a duplicated sub-tree
• Can also be used to represent a set of expressions
DAG Usage – An Example
• Consider the expression : a:= b*-c + b*-c ◼ The corresponding
• The corresponding Syntax Tree would be: DAG would be:
assign assign
a + a +
* *
uminus *
b uminus b b uminus
t1 := - c;
c c t2 := b * t1;
c
t3 := -c; t1 := - c;
t4 := b* t3; t2 := b * t1;
t5 := t2 + t4; t3 := t2 + t2;
a := t5; a := t3;
Translators for SDD
• A translator for an arbitrary SDD can be difficult to
build
• However:
1. Large classes of syntax directed Definitions exist for
which it s easier to build a translator
2. SDD’s with synthesized attributes ( A.k.a. S–attributed
definitions) can be evaluated using Bottom-up
EvaluationTechnique
3. SDD’s with Inherited attributes ( A.k.a. L–Attributed
definitions ) are evaluated using top – down method
known as “depth–first order”
Bottom-Up Evaluation of SDD’s
• A translator for an S-attributed definition can often
be implemented with the help of an LR-Parser
generator.

• From the S-attributed definition, the parser


generator can construct a translator that can
evaluate attributes as it parses the input.

• One can have extra fields in the parser stack to hold


the values of the synthesized attributes value.

• One can straight away build code segments.


Syntax Directed Definition Summary
1. SDD is a generalization of the Context Free Grammar.
2. Each symbol can have 2 kinds of attributes
i. synthetic & inherited
3. The value of an attribute is defined by the semantic rule
associated with the production
4. Synthetics computed from the children at the node and
inherited from the siblings and parents of the node.
5. Semantic rules set up the dependency of the attributes
that will be represented as a graph known as

Dependency Graph ………. Contd.


Syntax Directed Definition Summary
6. The Dependency Graph deciding the order of evaluation of
the semantic rules..
7. The evaluation defines the values of the attribute at the
nodes in the parse tree
8. A parse tree showing the attributes at each node is called an
Annotated Parse Tree and the process of evaluation is called
decorating the tree.
9. An SDD with only synthesized attributes exclusively is called
S-attributed definition and can always be implemented
bottom up
Syntax Directed Definition –
Another example
• In this example, an inherited attribute distributes type
information to the various identifiers in the declaration:
◼ Notes
Production Semantic Rule ◼ The nonterminal T has a
D→TL L.in:= T.type synthesized attribute type
whose value is determined by
T → int T.Type := integer the type declaration
◼ Associated with the productions
T → real T.Type := real for L call procedure addtype to
add the type of each identifier to
L → L1, id L1.in := L.in its entry in the symbol table
addtype (id.entry, L.in)
L → id addtype (id.entry, L.in)
Syntax Directed Definition –
Another example
• The corresponding Decorated Parser tree for the sentence
• real id1, id2, id3 D

Production Semantic Rule T.type = real L.in = real


D→TL L.in:= T.type
realL.in = real Id3
T → int T.Type := integer ,
T → real T.Type := real
L.in = real , Id2
L → L1, id L1.in := L.in
Addtype (id.entry, L.in)
L → id Addtype (id.entry, L.in) Id1
Dependency Graph Example
Dependency Graph for the construct: real id1,id2,id3
D
Semantic Rule
L.in:= T.type 4 type in 5
T.Type := integer T.type = real L.in = real In 6
T.Type := real
in 7
L1.in := L.in 3 entry
addtype (id.entry, L.in) real L.in = real , id3
addtype (id.entry, L.in) in 8
in 9
L.in = real , id2
2 entry
In 10 1,2,3 entry nodes
4,5,6,7,9 are internal
id1 1 entry 8,10 are link nodes
Desk Calculator SDD with LR parser
Input Stack Val Production Rule Input
a * b+ cn - - a*b+c
* b+ cn a a F → digit
* b+ cn F a L→En
* b+ cn T a T→F
b+ cn T* a E → E1 + T
+ cn T*b a,b
+ cn T*F a,b F → digit E→T
+ cn T ab T→T*F
T → T1 *F
+ cn E ab E→T
cn E + ab , T→F
n E+c ab , c
n E+F ab , c F → digit F→(E)
n E+T ab , c T→F
n E ab+c E→E+T F → digit
En ab+c
L – Attributed Definitions
• A syntax directed definition where each inherited
attribute of Xi for 1 <= i <= n, on the right side of A→X1,
X2, Xn, depends only on
• The attribute symbols X1, X2, Xi-1 to the left of Xi in
the production &
• The inherited attributes of A
• Note : Every S-attribute definition is L-attributed since the
restrictions 1 and 2 above applies only to inherited
attributes.
• The order of evaluation of the attributes is the order in
which the nodes of the parse tree are “created” by the
parsing method ( Depth-first order )
DEPTH – FIRST ORDER
• Depth first evaluation order for attributes
• procedure dfvisit (n : node );
• begin
• for each child m of n, from left to right
• do begin
• evaluate inherited attributes of m;
• dfvisit ( m );
• end;
• evaluate synthesized attributes of n;
• end
Parse Tree for 9 – 5 + 2
GRAMMAR
E→TR
R→ addtop T {print ( addtop.lexeme) } | R | 
T →num { print ( num,val ) }
E

T R

9 print(‘9’) - T print(‘-’) R

5 print(‘5’) + T print((‘2’) R

2 print(‘2’)
Example 9 - 5 + 2
• Production • Semantic rule
E → E1 + T E. t → E1 .t || T.t ||’+’
E → E1 - T E. t → E1 .t || T.t ||’-’
E→T E. t → T.t
T→ 0 |1 |…..|9 T → ‘0’ |’1’ |…..|’9’

E.t =95-2+
E.t = 95-
E.t= 9 T.t = 2
T.t =5
T.t= 9
9 - 5 + 2
Problem
Consider the problem of translating decimal
numbers between 0 to 99 into their English
words / phrases.
Number word / phrase
0 zero
1 One
10 Ten
11….. Eleven
19 Nineteen
20 Twenty
30 Thirty
31 Thirty one
Grammar
1. N := D | D P
2. P := D
D := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
3.
PARSE TREE ANNOTATED
PARSE TREE
N
N .trans
P
D
D.val .in P .trans
D
D 1.val
6
6
8
8
Grammar
1. N := D | D P
2. P := D
3. D := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Production Semantic rule
N := D N.trans := spell (D.val)
N := D P P.in := D.val
N.trans := P.trans
P := D P.trans := if D1.val = 0 then decade (P.in)
else if P.in <= 1 then spell (10*p.in + D1.val)
else decade (P.in) || spell (D1.val)
D := 0 D.val := 0 spell ( 1 ) = One
….
----- decade (1) = Ten
……..
decade ( 9 ) = ninety
Intermediate Representation (IR) –
What ?
• An abstract machine language that can express target-machine
operations without too much machine-specific detail.
• A compile time data structure
• Encodes the compiler’s knowledge of the program

Front-End not Back-End not


bothered with
bothered with information
machine- specific to source
Specific details language
IR – Where used ?

• Front end - produces an intermediate representation


(IR) ( E.g. Parse tree)
• Middle end - transforms the IR into an equivalent IR
that runs more efficiently ( E.g. AST, DAG etc..)
• Back end - transforms the IR into native code
• Middle end usually consists of several passes
IR – How used ?
• There is no standard Intermediate Representation. IR is a step in
expressing a source program so that machine understands it
• As the translation takes place, IR is repeatedly analyzed and
transformed
• Compiler users want analysis and translation to be fast and correct
• Compiler writers want optimizations to be simple to write, easy to
understand and easy to extend
• IR should be simple and light weight while allowing easy expression of
optimizations and transformations.
IR – Why Used ?
• Good software engineering technique
• Break the compiler into manageable pieces (Modularity)
• Isolates back end from front end (Flexibility to manage / change them
independently)
• Enables machine independent optimization
• Lets compiler consider more than one option
• Simplifies retargeting to new host
• Allow a complete pass before code is emitted
IR – How created ?
• More of a wizardry rather than science
• each compiler uses 2-3 IRs (Compiler writers have
tried to define Universal IRs and have failed. (UNCOL
in 1958)
• HIR (high level IR) preserves loop structure & array
bounds
• MIR (medium level IR) reflects range of features in a set of
source languages
• language independent
• good for code generation for one or more architectures
• appropriate for most optimizations
• LIR (low level IR) low level similar to the machines
Important Properties of IR
1. Ease of generation
2. Ease of manipulation
3. Cost of manipulation
4. Level of abstraction
5. Freedom of expression
6. Size of typical procedure
Decisions in IR design affect the speed and
efficiency
of the compiler
Categories of IR
1. Structural IR’s Examples:
• Graphically oriented notations AST, DAG,
• Heavily used in source to source translators
• Have large number of nodes & edges
Examples:
2. Linear IR’s Three - Address
• pseudocode for some abstract machine Code,
• Large variation in level of abstraction Stack Machine
• Simple compact data structures Code
• Easier to rearrange Example:
3. Hybrids Control Flow
Graph (CFG)
• Combination of graphs & linear code
• Attempt to take best of each
Stack Machine Code
• Originally used for stack-based computers (famous
example: Burroughs 5000 series)
• Now used for Java, C# (MSIL) Example:
• Advantages x–2*y
– Compact; mostly 0-address opcodes becomes
– Easy to generate
– Simple to translate to naïve machine code push x
– But need to do better in production push 2
compilers
– Useful where code is transmitted over slow push y
communication links multiply
Subtract
Precedence value

Input (out side the Stack (In side the


stack) stack)
+ ,- : return 1; + ,- : return 2;
/ ,*: return 3; / ,*: return 4;
( : return 9 ( : return 0
) : return 0 # : return -1
Operand : return 7 Operand : return 8
Three - Address code
• A term used to describe a variety of representations
• In general they allow statements of the form
x ← y op z
with a single operator and at most three names
• Simpler form of expression
Example:
• Advantages
• compact form direct naming Z←x–2*y
• Adds names for intermediate values becomes
• Can include forms of prefix or postfix code t1 ← 2 * y
• Often easy to rearrange
t2 ← x – t1
Statements in Three – address code
• Typical statements allowed include:
1. Assignment like x := y op z and x:= op y
2. Copy statements like x := y
3. Unconditional Jumps like Goto L
4. Conditional Jumps like if x relop y goto L
5. Indexed assignments like x := y[j] or s[j] := z
6. Address and pointer assignments (for C) like
• x := &y, x := *p; *x := y
7. Procedure call statements like
• Parm x; call p, n; return y; (for calls)
SDT To Produce 3-address Code for
assignment
E.place to hold the value of E
E.code The sequence of 3-address statements evaluating E
PRODUCTION SEMANTIC RULES
S→ id := E; S.code := E.code;
gen(id.place’:=‘E.place)
E → E1 + E2; E.place := newtemp;
gen(E.place’:=‘E1.place’+’ E2.place) E.code:= E1.code || E2.code||
E → E1 * E2; E.place := newtemp;
gen(E.place’:=‘E1.place’*’ E2.place) E.code:= E1.code || E2.code||
E → - E1; E.place := newtemp;
gen(E.place’:=‘ ‘uminus’E1. place)
E.code:= E1.code ||
E → ( E1 ); E.place := E1.place;
E.code:= E1.code ||
E → id; E.place := id.place;
E.code:= ‘’
Generating code for a while statement
S := while E do S1
S.begin: SEMANTIC RULES
S.begin:= newlabel;
S.after := newlabel;
E.code

if E.place = 0 go to S.after S.code := gen(S.begin ‘:’)|| E.code||


gen(‘if’ E.place’=‘ ‘0’ ’ go to’ S.after) ||
S1.code S1. Code ||
gen(‘go to ‘ S.begin)|| gen( S.after’:)
go to S.begin

S.after:
SDT for Boolean Expressions
• Have two primary purpose in programming languages
• Computation of logical values
• Conditional expressions inflow control statements
• Composed of boolean opeators – AND OR and NOT
• Relational Expressions are of the form E1 relop E2
• The grammar for Boolean Expressions :
E → E or E | E and E \ not E \ (E )| id relop id | true |
false
• Two techniques are used for translationg Boolean
expressions
1. Numerical Representation
2. Implicit Representation
………. Contd.
SDT for Boolean Expressions
1. Numerical representation
• Use 1 to represent true, use 0 to represent false
• For three-address code store this result in a temporary
• For stack machine code store this result in the stack
2. Implicit representation
• For the boolean expressions which are used in flow-of-
control statements (such as if-statements, while-
statements etc.) boolean expressions do not have to
explicitly compute a value, they just need to branch to
the right instruction
• Generate code for boolean expressions which branch to
the appropriate
• instruction based on the result of the boolean expression

………. Contd.
Numerical representation
Attributes : E.place: location that holds the value of expression E
E.code: sequence of instructions that are generated for E
id.place: location for id
relop.func: the type of relational function

………. Contd.
Implicit Representation
E.code: sequence of instructions that are generated for E
E.false: instruction to branch to if E evaluates to false
E.true: instruction to branch to if E evaluates to true
(E.code is synthesized whereas E.true and E.false are inherited)
id.place: location for id
Three – Address Code Implementation I
1. Quadruples
• Naïve representation of three address code
• Simple record structure of the form
• Table of k * 4 small integers
• Easy to reorder op arg1 arg2 res
• Explicit names
Load y 1
• For the following code Loadi 2 2
load r1, y mult 2 1 3
loadI r2, 2 load X 4
mult r3, r2, r1 sub 4 3 5
load r4, x
sub r5, r4, r3 the corresponding Quadruple entry is:
92
Three – Address Code Implementation II
2. Triples
• Temporary values referred with values by position of the
statement that computes it
op arg1 arg2
• 25% less space consumed
than quads (1) Load y
• Much harder to reorder (2) Loadi 2
• For the following code (3) mult (1) (2)
load r1, y
(4) load x
loadI r2, 2
(5) sub (4) (3)
mult r3, r2, r1
load r4, x Index as
implicit name
sub r5, r4, r3 takes no space
the corresponding Triple entry is:
Three – Address Code Implementation III
2. Indirect Triples
• Listing pointers to triples, rather than listing triples themselves
• Uses an array to list pointers
to statements op arg1 arg2
• Uses more space,
• but easier to reorder
0 (11) Load y
• For the following code 1 (12) Loadi 2
load r1, y 2 (13) mult (11) (12)
loadI r2, 2 3 (14) load x
mult r3, r2, r1
4 (15) sub (14) (13)
load r4, x
sub r5, r4, r3
the corresponding Indirect Triple entry is:
Hybrid IR – Control Flow graph (CFG)
• Models the transfer of control in the procedure
• Nodes in the graph are basic blocks ( maximum length
sequential code)
• Can be represented with quads or any other linear
rnotation
• Edges in the graph represent control flow
Which one to use when ?
• Common choice: all(!)
1. AST or other structural representation built by
parser and used in early stages of the compiler
• Closer to source code
• Good for semantic analysis
• Facilitates some higher-level optimizations
2. Flatten to linear IR for later stages of compiler
• Closer to machine code
• Exposes machine-related optimizations
3. Hybrid forms in optimization phases
A summary of IRs
• Many kinds of IR are used in practice

• Best choice depends on application

• There is no widespread agreement on this subject

• A compiler may need several different IRs

• Choose IR with right level of detail

• Keep manipulation costs in mind


Example
Three address code
Void main() 1. A=10
{ 2. B=12
int A=10,B=12,C=10,D=8,E,H,R; 3. C= 10
E = A+B*C; 4. D =8
H = E + B / A; 5. T1 = A+B
R = H * E / 20; 6. T2 = T1 * C
printf(“%d %d %d “,E,H,R); 7. T3 = E+ B
} 8. T4 = T3 / A
9. T5 = H * E
10. T6 = T5 / 20
Code Generator
• The final phase of the compiler
• Code Optimization is the optional preceding phase
• Code generator should
• Effectively use the features of target machine and
• Itself should run efficiently

Source Front IR Code IR Code Target


Code End Code Optimizer Code Generat Code
or

Symbol
Table
Optimization versus Generation
• Code generation:
• usually transforms from one representation to another
• However, in as much as a code generator selects among
possibilities, it is also doing a restricted form of
optimization.
• Optimization:
• transforms a computation to an equivalent but ‘‘better’’
computation in the same representation language
or
• annotates the representation with additional
information about the computation.
Code Generator Issues
1. Input to code generator –Intermediate
Representations
• Several choices including graphical ones
• Assumption is prior stage has done type checking & job
to be done is allocating the target machine variable
types
• Further assumption is error checking is done thing
2. Target Code
• Could be absolute machine code or relocatable code or
assembly language code
3. Memory management
• Mapping of variables to memory locations ( absolutely
or relatively
Code Generator Issues (Contd.)
4. Instruction Selection
• Decided by the nature of the instruction set, supported
data types, instruction speeds machine idioms etc.... of
the target machine
• Each three address code line can be individually
translated into target machine code ; but this could
result in poor code
5. Register Allocation
• Instructions involving registers (RR) tend to go fast
• Effective use of variables in registers speeds up the
program execution
• Limited number of registers in the system calls for
efficient use of these register locations
Code Generation – Phases
• Instruction Selection
• Mapping IR into assembly code
• Assumes a fixed storage mapping & code shape
• Combining operations, using address modes
• Instruction scheduling
• Reordering operations to hide latencies
• Assumes a fixed program (set of operations)
• Changes demand for registers
• Register allocation
• Deciding which values will reside in registers
• Changes the storage mapping, may add false sharing
• Concerns about placement of data & memory operations
Code Generation – Step by Step
1. Study the target machine
• registers, data formats, addressing modes, instructions, instruction
formats, ...
2. Design the run-time data structures
• layout of stack frames, layout of the global data area, layout of heap
objects, layout of the constant pool, ...
3. Implement the code buffer
• instruction encoding, instruction patching, ...
4. Implement register allocation
• Optimal use of register speeds up execution
Code Generation – Step by Step
(Contd.)
5. Implement code generation routines (in the
following order)
1. Load values and addresses into registers (or onto the
stack)
2. Process designators (x.y, a[i], ...)
3. Translate expressions
4. Manage labels and jumps
5. Translate statements
6. Translate methods and parameter passing
Code generation methods
• Several Alternatives exist: ( We look at Four of them )
1. Macro-expansion of internal form
• Each quadruple is translated to one or more instructions
• very simple but poor quality code (slow & needs much
memory)
• poor use of registers
2. "A simple code generation algorithm” using address and
register descriptors
• Uses following:
1. reg(R): register descriptor, specifies the content of
register R
2. adr(A): address descriptor, specifies where the value of
A is (possibly in both register and memory)
"A simple code generation algorithm”
• Code for quadruple: x:= y op z is generated using the
following algorithm:
1. Invoke a function getreg to determine the location L where the
result of computation will be stored
2. Consult address descriptor for y to locate Y’ – one of the
current location (s) of Y – preferably a register. If y is ot in a
register generate instruction MOV y’, L to put Y in L
3. Generate instruction OP Z’ L where Z’ is the current location of
Z (again preferably a register) and update the address
descriptor of X to indicate that x is in Location L. If L is a register
update its descriptor to indicate that it contains the value of x
and remove X from all other register descriptors
4. If the current value of Y and Z have no next uses and are in
registers, alter the register descriptor to indicate that y and z
are no longer in register
Function getreg
• Used for getting a register for use ( returns the location of
register L to hold the result); Involves:
if a register holds y and no other value,
then update the address descriptor of y to
indicate Y is no longer in L and that register as L
elseif an empty register is available return it as
L
elseif X is used in next block find an occupied
register, save it into memory and
return as L
else select a memory location and return it as L
…………………………..
Example
MOV m, R0 m ← R0
• Generate code for the following ADD n, R0 t1 ← in R0
quadruples.
MOV a, R1 a ← R1
• Assumptions: ADD b, R1 t2 in R1
• Only two registers, R0 and R1 are MOV R0, t1 ; empty R0, it is needed for
available something else now!
• All variables are live at the end of MOV k, R0 ; t3 will be in R0 after next add
the block but not the temps ADD R1, R0 ; free R1, as it is not used
again in the block
• Show the contents of registers and
finally the cost MOV t1, R1 ; t1 is in memory, load it into
a register
1. t1 := m + n SUB R0, R1 ; calculated t1-t3, the result
is in R1
2. t2 := a + b
MOV R1, z ; end of block=> save R1 in
3. t3 := k + t2 z’s memory address
4. z := t1 - t3 Total cost of the block = 10 instructions
Code generation using AST
• Involves 2 phases :
1. Calculate the need for registers for each sub-tree
2. Traverse the tree and generate code. (The register need
guides the traversal.)
Register needs using syntax
directed translation with
bottom-up parsing
(postorder traversal).
Code gen proceeds
depending on 5 possible
options we encounter in
traversal. ( Details not
covered)
Codegen with Pattern matching
• The target machine is described using a set of code templates.
• Corresponding instructions are attached to each code template.
• The tree is matched with the code templates top down.
• If there is a ’’complicated’’ code template which
corresponds to the whole tree , write the corresponding
instructions.
• Otherwise match with the children of the node.

1. IF (A - B)=0 THEN ….. A:=B {CMP A,B} Test


2. D:=(A-B)*C { MOV A,R0 ; SUB B,R0 } Value

You might also like