ATCD PPT Module-5
ATCD PPT Module-5
ATCD PPT Module-5
MODULE -5
Turing Machine and Undecidability
Other Phases of Compilers: Syntax Directed Translation- Syntax-
Directed Definitions, Evaluation Orders for SDD’s. Intermediate-Code
Generation- Variants of Syntax Trees, Three-Address Code. Code
Generation- Issues in the Design of a Code Generator
1
Algorithms and Decision Procedures for CFLs
The Decidable Questions:
Membership:
Given the context free language L and a string w, there exist a decision
procedure that answers the question “is w in L?”. There are two
approaches:
If w != Ꜫ then
Construct G’ in CNF such that L(G’) = L(G) – {Ꜫ}
If G derives w, It does not so in 2|w|-1 steps. Try all derivation in G
of 2|w|-1 steps.
If one of them derives w, accept otherwise reject.
Algorithms and Decision Procedures for CFLs
Using a PDA to decide:
We take a two-step approach.
• We first show that, for every context-free language L, it is possible to
build a PDA that accepts L - {Ꜫ} and that has no Ꜫ -transitions.
• Then we show that every PDA with no Ꜫ -transitions is guaranteed to
halt.
Algorithms and Decision Procedures for CFLs
Emptiness and Finiteness:
Decidability of Emptiness and Finiteness:
• Initial condition:
• The (whole) input string w is present in TM, preceded and followed by infinite
blank symbols
• Final acceptance:
• Accept w if TM enters final state and halts
• If TM halts and not final state, then reject
16
Representation Of Turing Machines
(i) Instantaneous descriptions using move-relations.
(ii) Transition table. and
(iii) Transition diagram (transition graph).
Example: L = {0n1n | n≥1}
• Strategy: w = 000111
q2
q0 …
B B 0 0 0 1 1 1 B B … B B X X 0 Y Y 1 B B …
… …
q1 q1 …
… B B X 0 0 1 1 1 B B … B B X X X Y Y 1 B B …
q2 … …
… B B X 0 0 Y 1 1 B B … B B X X X Y Y Y B B …
q0 …
… B B X X 0 Y 1 1 B B …
B B X X X Y Y Y B B …
q1
Accept 18
TM for {0n1n | n≥1} 1. Mark next unread 0 with X and
move right
Y / Y,R
0 / 0,R
2. Move to the right all the way to
the first unread 1, and mark it with
0 / X,R Y
q0 q1
3. Move back (to the left) all the way
1 / Y,L to the last marked X, and then
Y / Y,R
X / X,R
move one position to the right
q2
4. If the next position is 0, then goto
Y / Y,R q3
step 1.
B / B,R Y / Y,L Else move all the way to the right
0 / 0,L to ensure there are no excess 1s. If
q4
not move right to the next blank
symbol and stop & accept.
19
*state diagram representation preferred
TM for n
{0 1n | n≥1}
Next Tape Symbol
Curr. 0 1 X Y B
State
q0 (q1,X,R) - - (q3,Y,R) -
q3 - - - (q3,Y,R) (q4,B,R)
*q4 - -- - - -
control
k separate heads
Tape 1 … …
… …
Tape 2
…
Tape k … … 22
Non-deterministic TMs
Input tape
ID1 ID2 ID3 ID4
Marker tape * * * *
Scratch tape
23
DECIDABILITY
Definition:
A language L ⊆ Σ* is recursive if there exists some TM M that satisfies
the following two conditions.
We define a TM M as follows:
1. Let B be a DFA and w an input string. (B, w) is an input for the Turing
machine M.
2. Simulate B and input H' in the TM M.
3. If the simulation ends in an accepting state of B. then M accepts w. If it ends
in a nonaccepting state of B, then M rejects w.
DECIDABLE LANGUAGES
Theorem: ACFG is decidable.
Proof:
We convert a CFG into Chomsky Normal form. Then any derivation of w of
length k requires 2k - 1 steps if the grammar is in CNF. Now we design a TM M
that halts as follows.
1. Let G be a CFG in Chomsky normal form and w an input string. (G, w) is an
input for M.
2. If k = 0, list all the single-step derivations. If k !=0, list all the derivations with
2k - 1 steps.
3. If any of the derivations in step 2 generates the given string w, M accepts (G,
w). Otherwise M rejects.
UNDECIDABLE LANGUAGES
• There is an existence of languages that are not recursively enumerable and
address the undecidability of recursively enumerable languages.
+
1. Synthesized Attributes E1 .val num .val
• An attribute of a grammar symbol is synthesized if it is computed from the
attributes of its children in the parse tree corresponding to the rightmost
derivation
• Example: E --> E + num | num
• E2.val = E1.val + num.val ( val of E2 computed based on the VAL of E1 and
num)
• Used Extensively in LR parsing practice
• A syntax directed Definition that uses Synthesized Attributes is known as “S –
attributed Definition” (S – attributed Definitions are based on LR Grammar)
37
D.type
38
Semantic Rules
• Define the value of an attribute associated with a
grammar symbol
• Associated with the grammar rule used at that node
Production E
L→En
E T
E→E+T
+
E→T T F
T → T *F Digit
T * F
T→F
F→(E) F Digit
F → digit
Digit 41
S-attributed definition Example
(Contd.) L
Semantic Rule
E.Val
E = 19
Print (E: Val)
E.Val = E1.val
E.ValE= 15 + T.ValT = 4
+ T.val
E.val = T.val F.ValF= 4
T.ValT = 15
T.Val = T1.val x
Digit = 4
digit.lexVal
F.val T.ValT = 3 * F.ValF= 5
T.val = F.val
F.ValF = 3 Digit
digit.lexVal =5
F.val = E.val
F.Val = Digit = 3
digit.lexVal
digit.lexval
42
Attribute grammar Example
• CFG for signed binary numbers:
• (1) Number →Sign List , (2) Sign → + | –
• (3) List → List Bit | Bit , (4) Bit → 0 | 1
◼ Parse tree for string “ - ◼ Parse tree for string “ -
1” is: 101” is:
1. Number → Sign List 1. Number → Sign List
2. → Sign Bit 2. → Sign List Bit
3. → Sign 1 3. → Sign List 1
4. →–1 4. → Sign List Bit 1
5. → Sign List 0 1
6. → Sign Bit 0 1
7. → Sign 1 0 1
8. → – 101
43
Attribute Grammar Example (Contd.)
◼ Attribute Grammar for the CFG of signed binary numbers is :
44
Attribute Grammar Example
(Contd.)
Attribute Grammar Example (Contd.)
Val = – 5
Pos = 0
Neg =True Val = 5
Pos =0
Val =1
Pos = 1
Val = 4
Pos = 2
Val =4
Pos =1
Val = 0
Pos = 2
Val = 4
Attribute Grammar Example (Contd.)
Val = – 5 • The example on
the side Shows
Pos = 0 the decorated
Neg =True Val = 5 parse tree with
Pos = 0 attribute
Val = 1 dependencies
shown with
Pos = 1 arrows.
Val = 4
• If the parse tree is
taken out,
Pos = 2
Val =4 • What is left is a
Pos = 1
Val = 0
DEPENDENCY
Pos = 2 GRAPH
Val = 4
Dependency Graph
• A directed graph that depicts the interdependencies
among the inherited and synthesized attributes at the
nodes in a parse tree
• Example:
• Consider the grammar production E → E1 + E2 and the
Corresponding semantic rule E.val : = E1.val+ E2.val
• The parse tree would be:
Val
●
The Dependency Graph would E
be shown separately as
● +
E1 Val E2 ●
Val
Dependency Graph
• A directed graph that depicts the interdependencies
among the inherited and synthesized attributes at
the nodes in a parse tree
• The order in which the semantic rules are evaluated
is decided by the earlier node to the later node.
• The dependency graph is constructed topologically;
• Grammar is used to construct the parse tree from the
input
• We then obtain an order of evaluation of semantic rules.
• Translation of the semantic rules in this order yields the
translation of the input string.
Algorithm For constructing Dependency graph
for each node in the parse tree do
for each attribute a of the grammar symbol at n do
Construct a node in the dependency Graph for
a;
for each node n in the parse tree do
for each semantic rule b:= f(c1, c2, ….ck) associated
with the production used at n do
for i := 1 to k do
Construct an edge from the node ci to the
node for b;
50
Abstract Syntax Tree (AST)
• A condensed form of a parse tree useful in representing language
constructs of a programming language
• An intermediate representation that allows translation to be
decoupled from parsing
• For Example: The production: S → if B then S1 else S2
will have a syntax tree as:
If-then-else Similarly, 3 * 5 + 4 +
will result in
* 4
B S1 S2
3 5
AST versus Parse Tree
• While a parse tree keeps track of all productions used to build
the parse tree, an AST is a condensed form of this with just the
information needed in later stages of compilation or
interpretation.
AST Construction for Expressions
• Similar to the translation of expression into postfix form
• Sub-trees are constructed for the sub-expressions by
creating a node for each operator & operand
• The children of the Operator node are the nodes
representing the sub-expressions constituting the operands
of the operator
• In a node for operator, ( often called the Label of the node)
One field identifies the operator & the rest contain pointer
to the nodes of operands
• Each node in the syntax tree can be represented as a
record with several fields
Constructing AST (Contd.)
• Following functions are used to create the nodes of AST. ( Each
function returns pointer to the newly created node)
1. mknode (op, left, right)
Creates an operator node with label op and two fields
containing pointers left and right.
2. mkleaf (id, entry)
Creates an identifier node with label id and a field containing entry
, a pointer to the symbol table entry.
3. mkleaf (num, entry)
Creates a number node with label num and a field containing val,
the value of the number
Constructing AST – An Example
+ | |
Sentence a – 4 + c
- | | id |
to entry for c
num| 4
id |
to entry for a p1 := mkleaf (id, entry a)
The tree is built bottom-up using p2 := mkleaf (num, 4)
p3 := mknode (‘-’, p1, p2)
the function call sequence →
p4:= mkleaf (id, entry c)
p5 := mknode (‘+’, p3, p4 )
A SDD for Constructing AST
PRODUCTION SEMANTIC RULE
E→E+T E.nptr := mknode ( ‘+’ E.nptr, T.nptr )
E→E–T E.nptr := mknode ( ‘–’ E.nptr, T.nptr )
E→T E.nptr := T.nptr
T→(E) T.nptr := (E.nptr )
T → id T.nptr := mkleaf ( id, id, entry )
T → num T.nptr := mkleaf ( num, num, val)
• Uses underlying productions of the grammar to schedule
calls to the functions mknode and mkleaf
NOTE:
• The synthesized attribute nptr for E and T keeps track of
the pointers returned by function calls
A SDD for Constructing AST
E. Nptr - + | | T. nptr
E. Nptr +
id .nptr
T. nptr
- | | id |
num .nptr
to entry for c
E. nptr num| 4
T. nptr PRODUCTION SEMANTIC RULE
id .nptr E → E1 + T E.nptr := mknode ( ‘+’ E1.nptr, T.nptr )
E → E1 – T E.nptr := mknode ( ‘–’ E1.nptr, T.nptr )
E→T E.nptr := T.nptr )
id | T→(E) T.nptr := E.nptr )
T → id T.nptr := mkleaf ( id, id, entry )
to entry for a T → num T.nptr := mkleaf ( num, num, val)
Directed A-cyclic graph (DAG)
• Identifies common sub-expressions
• Like syntax tree,
• it has an interior node representing an operator and
two children as operands; left & right.
• The only difference is that the node in a DAG
representing a common sub-expression has more than
one parent in the syntax tree.
• The common sub-expression would be
represented as a duplicated sub-tree
• Can also be used to represent a set of expressions
DAG Usage – An Example
• Consider the expression : a:= b*-c + b*-c ◼ The corresponding
• The corresponding Syntax Tree would be: DAG would be:
assign assign
a + a +
* *
uminus *
b uminus b b uminus
t1 := - c;
c c t2 := b * t1;
c
t3 := -c; t1 := - c;
t4 := b* t3; t2 := b * t1;
t5 := t2 + t4; t3 := t2 + t2;
a := t5; a := t3;
Translators for SDD
• A translator for an arbitrary SDD can be difficult to
build
• However:
1. Large classes of syntax directed Definitions exist for
which it s easier to build a translator
2. SDD’s with synthesized attributes ( A.k.a. S–attributed
definitions) can be evaluated using Bottom-up
EvaluationTechnique
3. SDD’s with Inherited attributes ( A.k.a. L–Attributed
definitions ) are evaluated using top – down method
known as “depth–first order”
Bottom-Up Evaluation of SDD’s
• A translator for an S-attributed definition can often
be implemented with the help of an LR-Parser
generator.
T R
9 print(‘9’) - T print(‘-’) R
5 print(‘5’) + T print((‘2’) R
2 print(‘2’)
Example 9 - 5 + 2
• Production • Semantic rule
E → E1 + T E. t → E1 .t || T.t ||’+’
E → E1 - T E. t → E1 .t || T.t ||’-’
E→T E. t → T.t
T→ 0 |1 |…..|9 T → ‘0’ |’1’ |…..|’9’
E.t =95-2+
E.t = 95-
E.t= 9 T.t = 2
T.t =5
T.t= 9
9 - 5 + 2
Problem
Consider the problem of translating decimal
numbers between 0 to 99 into their English
words / phrases.
Number word / phrase
0 zero
1 One
10 Ten
11….. Eleven
19 Nineteen
20 Twenty
30 Thirty
31 Thirty one
Grammar
1. N := D | D P
2. P := D
D := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
3.
PARSE TREE ANNOTATED
PARSE TREE
N
N .trans
P
D
D.val .in P .trans
D
D 1.val
6
6
8
8
Grammar
1. N := D | D P
2. P := D
3. D := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Production Semantic rule
N := D N.trans := spell (D.val)
N := D P P.in := D.val
N.trans := P.trans
P := D P.trans := if D1.val = 0 then decade (P.in)
else if P.in <= 1 then spell (10*p.in + D1.val)
else decade (P.in) || spell (D1.val)
D := 0 D.val := 0 spell ( 1 ) = One
….
----- decade (1) = Ten
……..
decade ( 9 ) = ninety
Intermediate Representation (IR) –
What ?
• An abstract machine language that can express target-machine
operations without too much machine-specific detail.
• A compile time data structure
• Encodes the compiler’s knowledge of the program
S.after:
SDT for Boolean Expressions
• Have two primary purpose in programming languages
• Computation of logical values
• Conditional expressions inflow control statements
• Composed of boolean opeators – AND OR and NOT
• Relational Expressions are of the form E1 relop E2
• The grammar for Boolean Expressions :
E → E or E | E and E \ not E \ (E )| id relop id | true |
false
• Two techniques are used for translationg Boolean
expressions
1. Numerical Representation
2. Implicit Representation
………. Contd.
SDT for Boolean Expressions
1. Numerical representation
• Use 1 to represent true, use 0 to represent false
• For three-address code store this result in a temporary
• For stack machine code store this result in the stack
2. Implicit representation
• For the boolean expressions which are used in flow-of-
control statements (such as if-statements, while-
statements etc.) boolean expressions do not have to
explicitly compute a value, they just need to branch to
the right instruction
• Generate code for boolean expressions which branch to
the appropriate
• instruction based on the result of the boolean expression
………. Contd.
Numerical representation
Attributes : E.place: location that holds the value of expression E
E.code: sequence of instructions that are generated for E
id.place: location for id
relop.func: the type of relational function
………. Contd.
Implicit Representation
E.code: sequence of instructions that are generated for E
E.false: instruction to branch to if E evaluates to false
E.true: instruction to branch to if E evaluates to true
(E.code is synthesized whereas E.true and E.false are inherited)
id.place: location for id
Three – Address Code Implementation I
1. Quadruples
• Naïve representation of three address code
• Simple record structure of the form
• Table of k * 4 small integers
• Easy to reorder op arg1 arg2 res
• Explicit names
Load y 1
• For the following code Loadi 2 2
load r1, y mult 2 1 3
loadI r2, 2 load X 4
mult r3, r2, r1 sub 4 3 5
load r4, x
sub r5, r4, r3 the corresponding Quadruple entry is:
92
Three – Address Code Implementation II
2. Triples
• Temporary values referred with values by position of the
statement that computes it
op arg1 arg2
• 25% less space consumed
than quads (1) Load y
• Much harder to reorder (2) Loadi 2
• For the following code (3) mult (1) (2)
load r1, y
(4) load x
loadI r2, 2
(5) sub (4) (3)
mult r3, r2, r1
load r4, x Index as
implicit name
sub r5, r4, r3 takes no space
the corresponding Triple entry is:
Three – Address Code Implementation III
2. Indirect Triples
• Listing pointers to triples, rather than listing triples themselves
• Uses an array to list pointers
to statements op arg1 arg2
• Uses more space,
• but easier to reorder
0 (11) Load y
• For the following code 1 (12) Loadi 2
load r1, y 2 (13) mult (11) (12)
loadI r2, 2 3 (14) load x
mult r3, r2, r1
4 (15) sub (14) (13)
load r4, x
sub r5, r4, r3
the corresponding Indirect Triple entry is:
Hybrid IR – Control Flow graph (CFG)
• Models the transfer of control in the procedure
• Nodes in the graph are basic blocks ( maximum length
sequential code)
• Can be represented with quads or any other linear
rnotation
• Edges in the graph represent control flow
Which one to use when ?
• Common choice: all(!)
1. AST or other structural representation built by
parser and used in early stages of the compiler
• Closer to source code
• Good for semantic analysis
• Facilitates some higher-level optimizations
2. Flatten to linear IR for later stages of compiler
• Closer to machine code
• Exposes machine-related optimizations
3. Hybrid forms in optimization phases
A summary of IRs
• Many kinds of IR are used in practice
Symbol
Table
Optimization versus Generation
• Code generation:
• usually transforms from one representation to another
• However, in as much as a code generator selects among
possibilities, it is also doing a restricted form of
optimization.
• Optimization:
• transforms a computation to an equivalent but ‘‘better’’
computation in the same representation language
or
• annotates the representation with additional
information about the computation.
Code Generator Issues
1. Input to code generator –Intermediate
Representations
• Several choices including graphical ones
• Assumption is prior stage has done type checking & job
to be done is allocating the target machine variable
types
• Further assumption is error checking is done thing
2. Target Code
• Could be absolute machine code or relocatable code or
assembly language code
3. Memory management
• Mapping of variables to memory locations ( absolutely
or relatively
Code Generator Issues (Contd.)
4. Instruction Selection
• Decided by the nature of the instruction set, supported
data types, instruction speeds machine idioms etc.... of
the target machine
• Each three address code line can be individually
translated into target machine code ; but this could
result in poor code
5. Register Allocation
• Instructions involving registers (RR) tend to go fast
• Effective use of variables in registers speeds up the
program execution
• Limited number of registers in the system calls for
efficient use of these register locations
Code Generation – Phases
• Instruction Selection
• Mapping IR into assembly code
• Assumes a fixed storage mapping & code shape
• Combining operations, using address modes
• Instruction scheduling
• Reordering operations to hide latencies
• Assumes a fixed program (set of operations)
• Changes demand for registers
• Register allocation
• Deciding which values will reside in registers
• Changes the storage mapping, may add false sharing
• Concerns about placement of data & memory operations
Code Generation – Step by Step
1. Study the target machine
• registers, data formats, addressing modes, instructions, instruction
formats, ...
2. Design the run-time data structures
• layout of stack frames, layout of the global data area, layout of heap
objects, layout of the constant pool, ...
3. Implement the code buffer
• instruction encoding, instruction patching, ...
4. Implement register allocation
• Optimal use of register speeds up execution
Code Generation – Step by Step
(Contd.)
5. Implement code generation routines (in the
following order)
1. Load values and addresses into registers (or onto the
stack)
2. Process designators (x.y, a[i], ...)
3. Translate expressions
4. Manage labels and jumps
5. Translate statements
6. Translate methods and parameter passing
Code generation methods
• Several Alternatives exist: ( We look at Four of them )
1. Macro-expansion of internal form
• Each quadruple is translated to one or more instructions
• very simple but poor quality code (slow & needs much
memory)
• poor use of registers
2. "A simple code generation algorithm” using address and
register descriptors
• Uses following:
1. reg(R): register descriptor, specifies the content of
register R
2. adr(A): address descriptor, specifies where the value of
A is (possibly in both register and memory)
"A simple code generation algorithm”
• Code for quadruple: x:= y op z is generated using the
following algorithm:
1. Invoke a function getreg to determine the location L where the
result of computation will be stored
2. Consult address descriptor for y to locate Y’ – one of the
current location (s) of Y – preferably a register. If y is ot in a
register generate instruction MOV y’, L to put Y in L
3. Generate instruction OP Z’ L where Z’ is the current location of
Z (again preferably a register) and update the address
descriptor of X to indicate that x is in Location L. If L is a register
update its descriptor to indicate that it contains the value of x
and remove X from all other register descriptors
4. If the current value of Y and Z have no next uses and are in
registers, alter the register descriptor to indicate that y and z
are no longer in register
Function getreg
• Used for getting a register for use ( returns the location of
register L to hold the result); Involves:
if a register holds y and no other value,
then update the address descriptor of y to
indicate Y is no longer in L and that register as L
elseif an empty register is available return it as
L
elseif X is used in next block find an occupied
register, save it into memory and
return as L
else select a memory location and return it as L
…………………………..
Example
MOV m, R0 m ← R0
• Generate code for the following ADD n, R0 t1 ← in R0
quadruples.
MOV a, R1 a ← R1
• Assumptions: ADD b, R1 t2 in R1
• Only two registers, R0 and R1 are MOV R0, t1 ; empty R0, it is needed for
available something else now!
• All variables are live at the end of MOV k, R0 ; t3 will be in R0 after next add
the block but not the temps ADD R1, R0 ; free R1, as it is not used
again in the block
• Show the contents of registers and
finally the cost MOV t1, R1 ; t1 is in memory, load it into
a register
1. t1 := m + n SUB R0, R1 ; calculated t1-t3, the result
is in R1
2. t2 := a + b
MOV R1, z ; end of block=> save R1 in
3. t3 := k + t2 z’s memory address
4. z := t1 - t3 Total cost of the block = 10 instructions
Code generation using AST
• Involves 2 phases :
1. Calculate the need for registers for each sub-tree
2. Traverse the tree and generate code. (The register need
guides the traversal.)
Register needs using syntax
directed translation with
bottom-up parsing
(postorder traversal).
Code gen proceeds
depending on 5 possible
options we encounter in
traversal. ( Details not
covered)
Codegen with Pattern matching
• The target machine is described using a set of code templates.
• Corresponding instructions are attached to each code template.
• The tree is matched with the code templates top down.
• If there is a ’’complicated’’ code template which
corresponds to the whole tree , write the corresponding
instructions.
• Otherwise match with the children of the node.