Compiler Design Concepts, Worked Out Examples and Mcqs For Net/Set
Compiler Design Concepts, Worked Out Examples and Mcqs For Net/Set
Compiler Design Concepts, Worked Out Examples and Mcqs For Net/Set
net/publication/316560026
CITATIONS READS
0 28,688
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Annal Ezhil Selvi S on 23 March 2021.
INTRODUCTION TO COMPILER
Computers are a balanced mix of software and hardware. Hardware is just a piece of
mechanical device and its functions are being controlled by a compatible software.
Hardware understands instructions in the form of electronic charge, which is the
counterpart of binary language in software programming. Binary language has only two
alphabets, 0 and 1. To instruct the machine, the hardware codes must be written in
binary format, which is simply a series of 1s and 0s. It would be a difficult and
cumbersome task for computer programmers to write such codes that is why we have
compilers to write such codes.
1
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
A loader loads all of them into memory and then the program is executed.
Before diving straight into the concepts of compilers, we should understand a few
other tools that work closely with compilers.
1.1.1. Preprocessor
A preprocessor, generally considered as a part of compiler, is a tool that produces
input for compilers. It deals with macro-processing, augmentation, file inclusion, language
extension, etc.
1.1.2. Interpreter
An interpreter, like a compiler, translates high-level language into low-level machine
language. The difference lies in the way they read the source code or input. A compiler
reads the whole source code at once, creates tokens, checks semantics, generates
intermediate code, executes the whole program and may involve many passes. In contrast,
an interpreter reads a statement from the input, converts it to an intermediate code,
executes it, then takes the next statement in sequence. If an error occurs, an interpreter
stops execution and reports it; whereas a compiler reads the whole program even if it
encounters several errors.
1.1.3. Assembler
An assembler translates assembly language programs into machine code. The output
of an assembler is called an object file, which contains a combination of machine
instructions as well as the data required to place these instructions in memory.
1.1.4 Linker
Linker is a computer program that links and merges various object files together in
order to make an executable file. All these files might have been compiled by separate
assemblers. The major task of a linker is to search and locate referenced module/routines
in a program and to determine the memory location where these codes will be loaded,
making the program instruction to have absolute references.
1.1.5 Loader: Loader is a part of operating system and is responsible for loading
executable files into memory and execute them. It calculates the size of a program
(instructions and data) and creates memory space for it. It initializes various registers to
initiate execution.
1.2. Cross-Compiler
A compiler that runs on platform and is capable of generating executable code for
platform is called a cross-compiler.
2
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
3
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
4
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
the semantic analyzer keeps track of identifiers, their types and expressions; whether
identifiers are declared before use or not, etc. The semantic analyzer produces an
annotated syntax tree as an output.
1.6.4 Intermediate Code Generation
After semantic analysis, the compiler generates an intermediate code of the source
code for the target machine. It represents a program for some abstract machine. It is in
between the high-level language and the machine language. This intermediate code should
be generated in such a way that it makes it easier to be translated into the target machine
code. The intermediate code may be a Three Address code or Assembly code.
1.6.5 Code Optimization
The next phase does code optimization, it is an optional phase. Optimization can be
assumed as something that removes unnecessary code lines, and arranges the sequence
of statements in order to speed up the program execution without wasting resources like
CPU, memory. The output of this phase is an optimized intermediate code.
1.6.6 Code Generation
In this phase, the code generator takes the optimized representation of the
intermediate code and maps it to the target machine language. The code generator
translates the intermediate code into a sequence of re-locatable machine code. Sequence
of instructions of machine code performs the task as the intermediate code would do.
1.6.7 Symbol Table
Symbol Table is also known as Book Keeping. It is a data-structure maintained
throughout all the phases of a compiler. All the identifiers‟ names along with their
information like type, size, etc., are stored here. The symbol table makes it easier for the
compiler to quickly search and retrieve the identifier‟s record. The symbol table is also
used for scope management.
1.6.8 Error Hander
A parser should be able to detect and report any error in the program. It is expected
that when an error is encountered, the parser should be able to handle it and do parsing
with the rest of the inputs. Mostly it is expected from the parser to check for errors. But
errors may be encountered at various stages of the compilation process.
Summary
A compiler is a program that converts high-level language to assembly language.
A linker tool is used to link all the parts of the program together for execution.
A loader loads all of them into memory and then the program is executed.
A compiler that runs on machine and produces executable code for another machine is
called a cross-compiler.
A Compiler divided into two parts namely Analysis and Synthesis.
The compilation process is done in various phases.
Two or more phases can be combined to form a pass.
A parser should be able to detect and report any error in the program.
Questions
1. Write a short note on Compiler Writing tools.
2. Differentiate Linker and Loader
3. Explain Bootstrapping.
4. Differentiate Analysis phase and Synthesis phase.
5. Describe the phases of the Compiler.
5
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
LEXICAL ANALYSIS
2.1. Introduction
Lexical analysis is the first phase of a compiler. It takes the modified source code from
language preprocessors that are written in the form of sentences. The lexical analyzer
breaks these sentences into a series of tokens, by removing any whitespace or comments
in the source code. If the lexical analyzer finds a token as invalid, it generates an error.
The lexical analyzer works closely with the syntax analyzer. It reads character streams
from the source code, checks for legal tokens, and passes the data to the syntax analyzer
when it demands.
2.2 Tokens
Lexemes are said to be a sequence of characters (alphanumeric) which is also called as
tokens. There are some predefined rules for every lexeme to be identified as a valid token.
These rules are defined by grammar rules, by means of a pattern. A pattern explains what
can be a token, and these patterns are defined by means of regular expressions.
In programming language, keywords, constants, identifiers, strings, numbers,
operators, and punctuations symbols can be considered as tokens.
For example, in C language, the variable declaration line
int value = 100;
Contains the tokens:
1) int (keyword)
2) value (identifier)
3) = (operator)
4) 100 (constant)
5) ; (symbol)
2.2.1.1 Alphabets
Any finite set of symbols {0,1} is a set of binary alphabets,
{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets, {a-z, A-Z} is a set of
English language alphabets.
2.2.1.2 Strings
Any finite sequence of alphabets is called a string. Length of the string is the total
number of alphabets in the string, e.g., the string S is “INDIA”, the length of the string, S
is 5 and is denoted by |S|= 5. A string having no alphabets, i.e. a string of zero length is
known as an empty string and is denoted by ε (epsilon).
6
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
2.3 Language
A language is considered as a finite set of strings over some finite set of alphabets.
Computer languages are considered as finite sets, and mathematically set operations can
be performed on them. Finite languages can be described by means of regular
expressions.
7
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
8
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
9
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Example 1: We assume FA accepts any three digit binary value ending in digit 1.
FA = {Q(q0, qf), Σ(0,1), q0, qf, δ}
Example 2:
Regular Expression R= (a|b)*abb (FA accepts the string which is ending with abb)
10
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
11
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Each state D is a set of state which N could be in after reading some sequence of input
symbols. Thus D is able to simulate in parallel all possible moves N can make on a given
input string.
Let us define the function - closure(s) to be the set of states of N built by applying the
following rules:
1) S is added to -closure(s).
2) If t is in -closure(s) and there is an edge labeled from t to u, repeated until no more
states can be added to -closure(s).
Example: 1
Solution:
E-closure(0) = {0, 1, 2, 4, 7} = A
Move(A,a) = {3, 8} 2 reads „a‟ goes to 3 & 8
Move(A, b) = {5}
E-closure(Move(A,a)) = {3, 6, 1, 2, 4, 7, 8}
Ie. {3, 8} = {1, 2, 3, 4, 6, 7, 8} = B
E-closure(Move(A, b)) = {5, 6, 1, 2, 4, 5, 7}
Ie. {5} = {1, 2, 4, 5, 6, 7} = C
Move (B, a) = {3, 8}
Move (B, b) = {5, 9}
Move (C, a) = {3, 8}
Move (C, b) = {5}
E-closure (Move (B, a)) = {1, 2, 3, 4, 6, 7, 8} = B
{3, 8}
E-closure (Move (B, b)) = {1, 2, 4, 5, 6, 7, 9} = D
{5, 9}
E-closure (Move (C, a)) = {1, 2, 3, 4, 6, 7, 8} = B
{3, 8}
E-closure (Move (C, b)) = {1, 2, 4, 5, 6, 7} = C
{5}
Move (D, a) = {3, 8}
Move (D, b) = {5, 10}
E-closure(Move(D, a)) = {1, 2, 3, 4, 6, 7, 8} = B
{3, 8}
12
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Minimizing DFA:
= {A, B, C, D, E}
new = {A, B, C, D} {E}
= new
= {A, B, C, D} {E}
new = {E} {A, B, C} {D} =
= {E} {D} {A, C} {B}
Transition table:
States Input System
a b
A B A
B B D
D B E
E B A
13
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
2.7.1 Implementation
LEX can build from its input, a lexical analyzer that behaves roughly like a finite
automaton
The idea is to construct a NFA „N‟ for each token pattern P in the translation rules.
Constructing an NFA from a regular expression and then link these NFA‟s together
with a new start state.
14
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Summary
Lexical Analyzer is also known as Scanner.
Lexical Analyzer and parser can be in same pass.
Lexical Analysis simplifies the overall design.
Preliminary scanning is also done in the first phase.
Extra Buffering is needed for preliminary scanning.
LEX is the tool to construct lexical analyzer.
The output of lexical analyzer is a stream of tokens.
Finite Automata is also called recognizer.
Regular expressions is needed to define the tokens.
15
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Questions
1. Explain the need and role of the lexical analyzer.
2. Describe regular expressions.
3. Construct an NFA for (i) R=(a/b)* a (a/b)
(ii) R= (a/b)* (a/b)*
4. Convert it into minimized DFA
(i) aa*/ bb*
(ii) (a/b)(a/b)(a/b)
(iii) (a/b)*abb (a/b)*
(iv) 001*(1|0)*11
(v) (00)*|(11)*
16
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
SYNTAX ANALYSIS
3.1 Introduction
Syntax analysis or parsing is the second phase of a compiler. In this chapter, we shall
learn the basic concepts used in the construction of a parser.
We have seen that a lexical analyzer can identify tokens with the help of regular
expressions and pattern rules. Due to the limitations of regular expressions the lexical
analyzer cannot check the syntax of a given sentence. Regular expressions cannot check
balancing tokens, such as parenthesis. Therefore, this phase uses context-free grammar
(CFG), which is recognized by push-down automata.
17
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
The strings are derived from the start symbol by repeatedly replacing a non-terminal
(initially the start symbol) by the right side of a production, for that non-terminal.
Example
We take the problem of palindrome language, which cannot be described by means of
Regular Expression. That is, L = { w | w = w R } is not a regular language. But it can be
described by means of CFG, as illustrated below:
G = (N, T, P, S) or G = (Vn, Vt, P, S )
Where,
N or Vn = { Q, Z, N }
T or Vt= { 0, 1 }
P = { Q → Z | Q → N | Q → ℇ | Z → 0Q0 | N → 1Q1 }
S={Q}
This grammar describes palindrome language, such as: 1001, 11100111, 00100,
1010101, 11111, etc.
In this way, the parser accomplishes two tasks, i.e., parsing the code and looking for
errors. Finally a parse tree is generated as the output of this phase.
Parsers are expected to parse the whole code even if some errors exist in the program.
Parsers use error recovering strategies, which we will learn later in Error Handling
Chapter.
3.4 Derivation
A derivation is basically a sequence of production rules, in order to get the input
string. During parsing, we take two decisions for some sentential form of input:
1) Deciding the non-terminal which is to be replaced.
2) Deciding the production rule, by which, the non-terminal will be replaced.
To decide which non-terminal to be replaced with production rule, we can have two
options.
18
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Example
Production rules:
E→E+E
E→E*E
E → id
Input string: id + id * id
19
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
In a parse tree:
1. All leaf nodes are terminals.
2. All interior nodes are non-terminals.
3. In-order traversal gives original input string.
A parse tree depicts associativity and precedence of operators. The deepest sub-tree is
traversed first, therefore the operator in that sub-tree gets precedence over the operator
which is in the parent nodes.
Exercise 1:
Consider the grammar
S iCtS
S iCtSeS
Sa
Sb
20
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
The following example explains bottom up approach of the leftmost derivations for the
sentence W=ibtibtaea
S aAcBe; AAb; A b; Bd and the input string is abbcde, we have to reduce it to
S.
Abbcde abbcBe
aAbcBe
aAcBe
S
3.6 Ambiguity
A grammar G is said to be an ambiguous if it has more than one parse tree (left or
right derivation) for at least one string.
Example
E→E+E
E→E–E
E → id
For the string id + id – id, the above grammar generates two parse trees:
When the non-terminal on the right side of given production depends on the non-
terminal on the left side of the same production, the grammar thus formed is called
inherently Ambiguous. From the above example, the language generated by an
ambiguous grammar is said to be inherently ambiguous. Ambiguity in grammar is not
good for a compiler construction. No method can detect and remove ambiguity
automatically, but it can be removed by either re-writing the whole grammar without
ambiguity, or by setting and following associativity and precedence constraints.
3.6.1 Associativity
If an operand has operators on both sides, the side on which the operator takes this
operand is decided by the associativity of those operators. If the operation is left-
associative, then the operand will be taken by the left operator; or if the operation is right-
associative, the right operator will take the operand.
Example
Operations such as Addition, Multiplication, Subtraction, and Division are left
associative. If the expression contains:
id op id op id
it will be evaluated as:
(id op id) op id
For example, (id + id) + id
21
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Operations like Exponentiation are right associative, i.e., the order of evaluation in the
same expression will be:
id op (id op id)
For example, id ^ (id ^ id)
3.6.2 Precedence
If two different operators share a common operand, the precedence of operators
decides which will take the operand. That is, 2+3*4 can have two different parse trees, one
corresponding to (2+3)*4 and another corresponding to 2+(3*4). By setting precedence
among operators, this problem can be easily removed. As in the previous example,
mathematically * (multiplication) has precedence over + (addition), so the expression
2+3*4 will always be interpreted as:
2 + (3 * 4)
These methods decrease the chances of ambiguity in a language or its grammar.
22
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
A top-down parser will first parse A, which in-turn will yield a string consisting of A
itself and the parser may go into a loop forever.
Example
The production set
S => Aα | β
A => Sd
after applying the above algorithm, should become
S => Aα | β
A => Aαd | βd
and then, remove immediate left recursion using the first technique.
A => βdA'
A' => αdA' | ε
Now none of the production has either direct or indirect left recursion.
Example
Production Rule No With Left Recursion Without Left Recursion
If the production is A Aα | β AβA'
A' => αA' | ε
1 EE+T | T E TE‟
E‟ +TE‟ | ε
2 TT*F | F TFT‟
T‟ *FT‟ | ε
3 F(E) | id No need for Left Recursion.
23
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Example
The above productions can be written as
A => αA‟
A‟=> β | |…
Now the parser has only one production per prefix which makes it easier to take
decisions.
24
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Top-down Parsing
When the parser starts constructing the parse tree from the start symbol and then
tries to transform the start symbol to the input, it is called top-down parsing.
Recursive descent parsing: It is a common form of top-down parsing. It is called
recursive, as it uses recursive procedures to process the input. Recursive descent
parsing suffers from backtracking.
Backtracking: It means, if one derivation of a production fails, the syntax analyzer
restarts the process using different rules of same production. This technique may
process the input string more than once to determine the right production.
Bottom-up Parsing
As the name suggests, bottom-up parsing starts with the input symbols and tries to
construct the parse tree up to the start symbol.
Note:
In both the cases the input to the parser is being scanned from left to right, one
symbol at a time.
The bottom-up parsing method is called “Shift Reduce” parsing. The top-down parsing
is called “Recursive Decent” parsing.
An operator-precedence parser is one kind of shift reduce parser and predictive parser
is one kind of recursive descent parser.
Example:
Input string : a + b * c
Production rules:
S→E
E→E+T
E → E *T
E→T
T → id
Let us start bottom-up parsing.
a+b*c
Read the input and check if any production matches with the input:
a+b*c
T+b*c
E+b*c
E+T*c
E*c
E*T
E
S
3.9.1 TOP-DOWN PARSING
We have learnt in the last chapter that the top-down parsing technique parses the
input, and starts constructing a parse tree from the root node gradually moving down to
the leaf nodes. The types of top-down parsing are depicted below:
25
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Back-tracking
Top- down parsers start from the root node (start symbol) and match the input string
against the production rules to replace them (if matched). To understand this, take the
following example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-most
letter of the input, i.e. „r‟. The very production of S (S → rXd) matches with it. So the top-
down parser advances to the next input letter (i.e. „e‟). The parser tries to expand non-
terminal „X‟ and checks its production from the left (X → oa). It does not match with the
next input symbol. So the top-down parser backtracks to obtain the next production rule
of X, (X → ea).
Now the parser matches all the input letters in an ordered manner. The string is
accepted.
26
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
In recursive descent parsing, the parser may have more than one production to choose
from for a single instance of input; whereas in predictive parser, each step has at most
one production to choose. There might be instances where there is no production
matching the input string, making the parsing procedure to fail.
Predictive Parser
Predictive parser is a recursive descent parser, which has the capability to predict
which production is to be used to replace the input string. The predictive parser does not
suffer from backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points
to the next input symbols. To make the parser back-tracking free, the predictive parser
puts some constraints on the grammar and accepts only a class of grammar known as
LL(k) grammar.
27
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Predictive parsing uses a stack and a parsing table to parse the input and generate a
parse tree. Both the stack and the input contains an end symbol $ to denote that the
stack is empty and the input is consumed. The parser refers to the parsing table to take
any decision on the input and stack element combination.
First Set
This set is created to know what terminal symbol is derived in the first position by a
non-terminal.
To compute FIRST(X) for all grammar symbols X, apply the following rules until no
more terminals or ε can be added to any FIRST set.
For example,
A→ t β
That is, A derives t (terminal) in the very first position. So, t ∈ FIRST(A).
28
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Follow Set
Likewise, we calculate what terminal symbol immediately follows a non-terminal A in
production rules. We do not consider what the non-terminal can generate but instead, we
see what would be the next terminal symbol that follows the productions of a non-
terminal.
To compute follow (A) for all non-terminals A, apply the following rules until nothing
can be added to any FOLLOW set.
29
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Then
FIRST (E) = FIRST (T) =FIRST (F) = {(, id}
FIRST (E‟) = {+, ε}
FIRST (T‟) = {*, ε}
FOLLOW (E) =FOLLOW (E‟) = { ), $}
FOLLOW (T) =FOLLOW (T‟) = { ), +, $}
FOLLOW (F) = { ), *, +, $}
30
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Shift step
The shift step refers to the advancement of the input pointer to the next input symbol,
which is called the shifted symbol. This symbol is pushed onto the stack. The shifted
symbol is treated as a single node of the parse tree.
Reduce step
When the parser finds a complete grammar rule (RHS) and replaces it to (LHS), it is
known as reduce-step. This occurs when the top of the stack contains a handle. To
reduce, a POP function is performed on the stack which pops off the handle and replaces
it with LHS non-terminal symbol.
Reducing a string W to the start symbol S of a grammar.
At each step a string matching the right side of a production is replaced by the symbol
on the left.
Example:
S aAcBe; AAb; A b; Bd and the string is abbcde, we have to reduce it to S.
Abbcde abbcBe
aAbcBe
aAcBe
S
Each replacement of the right side of the production the left side in the process above
is called reduction .by reverse of a right most derivation is called Handle
S*αAwαβw,then Aβ in partition following is a handle of αβw. The string w to the
right of the handle contains only terminal symbol.
A rightmost derivation in reverse often called a canonical reduction sequence, is
obtained by “Handle Pruning”.
Example:
E E+E
E E*E
E(E)
Eid
Input: id1+id2*id3E
Right Sentential Form Handle Reducing production
id1+id2*id3 id1 Eid
E+id2*id3 id2 Eid
E+E*id3 id3 Eid
E+E*E E*E EE*E
E+E E+E EE+E
E
Ie. This example is the reverse of the sequential in the rightmost derivations.
31
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
4) Error
1) In shift action, the next input symbol is shifted to the top of the stack.
2) In the reduce action the parser knows
Right end of the handle.
To locate left end of the handle within the stack.
Decide what non-terminal to replace the handle.
3) In an accept action, the parser announces successful completion of parsing.
4) In an error action, parser looks for syntax error and calls an error recovery routine.
Example:
E E+E
E E*E
E(E)
Eid
Input: id1+id2*id3
Stack Input Action
$ Id1+id2*id3$ Shift
$id1 +id2*id3$ Reduce by Eid
$E +id2*id3$ Shift
$E+ id2*id3$ Shift
$E+id2 *id3$ Reduce by Eid
$E+E *id3$ Shift
$E+E* id3$ Shift
$E+E* id3 $ Reduce by Eid
$E+E*E $ Reduce by EE*E
$E+E $ Reduce by EE+E
$E $ Accept
32
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Definition:
An operator precedence is an € free operator grammar in which the precedence relations
<. ,=. &.> constructed are disjoint
Note:
LEADING(A)={a/A=+> γaδ, where γ is € or a single non terminal}
TRAILING(A)={a/A=+> γaδ, where δ is € or a single non terminal}
Precedence Function
Precedence table can be encoded by a precedence functions f & g.
We select f & g for symbol a & b
f(a)<g(b) whenever a<.b
f(a)=g(b) whenever a=.b
f(a)>g(b) whenever a.>b
33
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Example
Id + * $
3.10 LR Parser
The LR parser is a non-recursive, shift-reduce, bottom-up parser. It uses a wide class
of context-free grammar which makes it the most efficient syntax analysis technique. LR
parsers are also known as LR(k) parsers, where L stands for left-to-right scanning of the
input stream; R stands for the construction of right-most derivation in reverse, and k
denotes the number of look ahead symbols to make decisions.
An LL Parser accepts LL grammar. LL grammar is a subset of context-free grammar
but with some restrictions to get the simplified version, in order to achieve easy
implementation. LL grammar can be implemented by means of both algorithms, namely,
recursive-descent or table-driven.
LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to right,
the second L in LL(k) stands for left-most derivation and k itself represents the number of
look aheads. Generally k = 1, so LL(k) may also be written as LL(1).
34
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
There are three widely used algorithms available for constructing an LR parser:
I. SLR(1) – Simple LR Parser:
1. Works on smallest class of grammar
2. Few number of states, hence very small table
3. Simple and fast construction
II. LR(1) – LR Parser:
1. Works on complete set of LR(1) Grammar
2. Generates large table and large number of states
3. Slow construction
III. LALR(1) Look-Ahead LR Parser:
i. Works on intermediate size of grammar
ii. Number of states are same as in SLR(1)
35
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Example:
Closure (E‟.E) = E‟.E
E.E+T
E.T
T.T*F I0
T.F
F. (E)
F.id
Step 3: perform GOTO (I, X) (I is CLOURE set and X is all Grammar symbol)
This is computed for a set I on a grammar symbol X. GOTO (I, X) is defined to be the
closure of the set of all items [AαX.β] such that [Aα.Xβ] is in I.
Consider an item in I‟s production is maybe like this means,
A.XBY then
GOTO (I, X) will be performed based on the following Rules:
1. If AX.BY where B is a Terminal, including this item only in the CLOSURE (X) Item.
2. If AX.BY where B is a Non-Terminal including this item along with B‟s CLOSURE (B).
Example:
I0 : E‟.E
E.E+T
E.T
T.T*F
T.F
F. (E)
F.id
GOTO (I1, E) =GOTO (I1, T) = GOTO (I1, F) =GOTO (I1,*) = GOTO (I1,
36
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
I 2: ET.
TT.*F GOTO(I2,X)
GOTO(I2,E)=GOTO(I2,T)=GOTO(I2,F)=GOTO(I2,+)=GOTO (I2,() = GOTO (I2, ))
=GOTO (I2,id)=Null
GOTO(I2,*)=TT*.F // Rule 2 in
F.(E) step 3
F.id I7
T.T*F
T.F
F. (E)
F.id GOTO(I4,X) is,
GOTO(I4,E)= F(E.)
EE.+T I8
GOTO(I4, T)= TT.*F
E T . I2
GOTO(I4,F)= TF. I3
GOTO(I4,+)=Null ; GOTO(I4,*)=null ; GOTO(I4,))=Null .
GOTO(I4,()= F(.E)
E.E+T
E.T
T.T*F I4
T.F
F(.E)
F.id
GOTO(I4,id)=Fid. I5
I5: Fid.
GOTO(I5,E)= GOTO(I5,T)= GOTO(I5,F)= GOTO(I5,+)= GOTO(I5,*)=
37
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
GOTO(I5,()=GOTO(I5,))= GOTO(I5,ID)=Null
I6: EE+.T
T.T*F
T.F
F. (E)
F.id
GOTO (I6,E)=GOTO(I6,+)=GOTO(I6,*)= GOTO(I6,))= Null
GOTO (I6,T)=EE+T.
TT.*F I9
GOTO (I6,F)=TF. I3
GOTO (I6,()=F(.E)
E.E+T
E.T
T.T*F I4
T.F
F. (E)
F.id
GOTO(I6,id)=Fid I5
I7 : TT*.F
F.(E)
F.id
GOTO(I7,E)=GOTO(I7,T)=GOTO(I7,+)= GOTO(I7,*)= GOTO(I7,))=Null
GOTO(I7,E)=TT*F. I10
GOTO(I7,() =F(.E)
E.E+T
E.T
T.T*F I4
T.F
F.(E)
F.id
GOTO(I7,id)=Fid. I5
I8: F(E.)
EE.+T
GOTO(I8,E)=GOTO(I8,T)=GOTO(I8,F)=GOTO(I8,*)= GOTO(I8,()=
GOTO(I8,id)=Null
GOTO(I8,+)= EE+.T
T.T*F
T.F I6
F.(E)
F.id
GOTO (I8,)) =F(E). I11
I9: EE+T.
38
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
TT.*F
GOTO(I9,E)= GOTO(I9,T)= GOTO(I9,F)= GOTO(I9,+)= GOTO(I9,()= GOTO(I9,))=
GOTO(I9,id)=Null
GOTO(I9,*)= TT*.F
F.(E) I7
Fid
I10: TT*F.
GOTO(I10,E)=GOTO(I10,T)=GOTO(I10,F)=GOTO(I10,+)= GOTO(I10,*)=
GOTO(I10,()= GOTO(I10,))= GOTO(I10,id)=Null
I11: F (E).
GOTO(I11,E)=GOTO(I11,T)=GOTO(I11,F)=GOTO(I11,+)= GOTO(I11,*)=
GOTO(I11,()= GOTO(I11,))= GOTO(I11,id)=Null
LR (0) items:
I 0: E‟.E
E.E+T
E.T
T.T*F
T.F
F.(E)
Fid.
I 1: E‟E.
EE. +T
I 2: ET.
TT.*F
I3: TF.
I4: F (.E)
E.E+T
E.T
T.T*F
T.F
F. (E)
F.id
I5: Fid.
I6: EE+ .T
T.T*F
T.F
F. (E)
39
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
F.id
I7: TT* .F
F. (E)
F.id
I8: F (E.)
TE. +T
I9: EE+T.
TT. *F
I10: TT*F.
I11: F (E).
Example:
1) EE+T. is in I9 ,FOLLOW (T) ={+,$,)}
2) ET. is in I2 ,FOLLOW (T) ={+,$,)}
3) TT*F. is in I10 ,FOLLOW (F) ={+,*,$,)}
4) TF. is in I3 ,FOLLOW (F) ={+,*,$,)}
5) F(E). is in I11 ,FOLLOW ( ) ) =FOLLOW(F)={+,*,$,)}
6) Fid. is in I5 ,FOLLOW (id) =FOLLOW(F)={+,*,$,)}
40
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Id + * ( ) $ E T F
0 S5 S4 1 2 3
1 S6 A
2 r2 S7 r2 r2
3 r4 r4 r4 r4
4 S5 S4 8 2 3
5 r6 r6 r6 r6
6 S5 S4 9 3
7 S5 S4 10
8 S6 S11
9 r1 S7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
41
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
LL vs. LR
LL LR
Starts with the root nonterminal on the Ends with the root nonterminal on the stack.
stack.
Uses the stack for designating what is still Uses the stack for designating what is already
to be expected. seen.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Continuously pops a nonterminal off the Tries to recognize a right hand side on the
stack, and pushes the corresponding right stack, pops it, and pushes the corresponding
hand side. nonterminal.
Expands the non-terminals. Reduces the non-terminals.
Reads the terminals when it pops one off Reads the terminals while it pushes them on
the stack. the stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.
Summary
Syntax analyzer is also called parser.
Context free grammar is used as recognizer.
Parse tree is an output of Parser
Precedence rules are used to operate the operators.
Two types of parsing, Bottom up parsing and Top Down parsing.
Recursive descent parsing is a kind of Top Down parsing.
Shift reduce parsing is a kind of Bottom Up parsing.
Predictive Parser is a kind of Recursive Decent parser.
Questions
1. Explain the types of Parser.
2. Construct a Parsing Table for the Grammar
EE+T/T
TT*F/F
F(E)/id
42
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
ERROR HANDLING
4.1 Introduction
A parser should be able to detect and report any error in the program. It is expected
that when an error is encountered, the parser should be able to handle it and carry on
parsing the rest of the input. Mostly it is expected from the parser to check for errors but
errors may be encountered at various stages of the compilation process. A program may
have the following kinds of errors at various stages:
43
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
44
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
If watched closely, we find most of the leaf nodes are single child to their parent nodes.
This information can be eliminated before feeding it to the next phase. By hiding extra
information, we can obtain a tree as shown below:
ASTs are important data structures in a compiler with least unnecessary information.
ASTs are more compact than a parse tree and can be easily used by a compiler.
Summary
Errors may be encountered at various stages of the compilation process.
The parser may discover Syntactic errors.
Syntactic occurs when there no precedence errors.
AST is a data structure more compact than the parse tree.
Augmented grammar is used to generate erroneous constructs.
Global error correction may done.
Questions
1. Explain various types of errors.
2. Describe error handling with neat diagram.
3. Write note on panic mode.
4. Differentiate parse tree and abstract syntax tree.
45
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
SEMANTIC ANALYSIS
5.1 Introduction
We have learnt how a parser constructs parse trees in the syntax analysis phase. The
plain parse-tree constructed in that phase is generally of no use for a compiler, as it does
not carry any information of how to evaluate the tree. The productions of context-free
grammar, which makes the rules of the language, do not accommodate how to interpret
them.
For example:
E→E+T
The above CFG production has no semantic rule associated with it, and it cannot help
in making any sense of the production.
Semantics
Semantics of a language provide meaning to its constructs, like tokens and syntax
structure. Semantics help interpret symbols, their types, and their relations with each
other. Semantic analysis judges whether the syntax structure constructed in the source
program derives any meaning or not.
CFG + semantic rules = Syntax Directed Definitions
For example:
int a = “value”;
Should not issue an error in lexical and syntax analysis phase, as it is lexically and
structurally correct, but it should generate a semantic error as the type of the assignment
differs. These rules are set by the grammar of the language and evaluated in semantic
analysis. The following tasks should be performed in semantic analysis:
Scope resolution
Type checking
Array-bound checking
The semantic action is enclosed in braces, and it appears after the production.
This translation is not suitable for a compiler, but for a “DESK CALCULATOR”
program that actually evaluates expressions rather than generating code for them.
46
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Example :
S E $
E E+E
E E*E
E (E)
E (I)
I I
I I Digit
[Note I – for integer]
To implement this syntax – directed translation scheme, we need to construct a lexical
analyzer and a bottom-up parser.
47
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
A compiler – compiler would tie the parser and the semantic action program fragments
together, producing one module.
PRODUCTION PROGRAM FRAGMENT
1. SE$ Print VAL [TOP]
2. EE+E VAL[TOP]:=VAL[TOP]+VAL[TOP-2]
3. EE*E VAL[TOP]:=VAL[TOP]*VAL[TOP-2]
4. E(E) VAL[TOP]:=VAL[TOP-1]
5. EI none
6. II digit VAL[TOP]:=10*VAL[TOP]+LEXVAL
7. Idigit VAL[TOP]:=LEXVAL
Input : 23*5+4$
NO INPUT STATE VAL PRODUCTION USED
1. 23*5+4$ -- --
2. 3*5+4$ 2 --
3. 3*5+4$ I 2 I digit
4. *5+4$ I3 2-
5. *5+4$ I (23) I I digit
6. *5+4$ E (23) EI
7. 5+4$ E* (23)-
8. +4$ E*5 (23)- -
9. +4$ E*I (23)-5 I digit
10. +4$ E*E (23)-5 EI
11. +4$ E (115) EE*E
12. 4$ E+ (115)-
13. $ E+4 (115)- -
14. $ E+I (115)-4 I digit
48
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Example:
E → E + T { E.value = E.value + T.value }
The right part of the CFG contains the semantic rules that specify how the grammar
should be interpreted. Here, the values of non-terminals E and T are added together and
the result is copied to the non-terminal E.
Semantic attributes may be assigned to their values from their domain at the time of
parsing and evaluated at the time of assignment or conditions. Based on the way the
attributes get their values, they can be broadly divided into two categories : synthesized
attributes and inherited attributes.
49
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
A can get values from S, B, and C. B can take values from S, A, and C. Likewise, C can
take values from S, A, and B.
Expansion: When a non-terminal is expanded to terminals as per a grammatical rule.
Reduction
When a terminal is reduced to its corresponding non-terminal according to grammar
rules. Syntax trees are parsed top-down and left to right. Whenever reduction occurs, we
apply its corresponding semantic rules (actions).
Semantic analysis uses Syntax Directed Translations to perform the above tasks.
Semantic analyzer receives AST (Abstract Syntax Tree) from its previous stage (syntax
analysis).
Semantic analyzer attaches attribute information with AST, which are called
Attributed AST.
Attributes are two tuple value, <attribute name, attribute value>
For example:
int value = 5;
<type, “integer”>
<presentvalue, “5”>
For every production, we attach a semantic rule.
S-attributed SDT
If an SDT uses only synthesized attributes, it is called as S-attributed SDT. These
attributes are evaluated using S-attributed SDTs that have their semantic actions written
after the production (right hand side).
50
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
L-attributed SDT
This form of SDT uses both synthesized and inherited attributes with restriction of not
taking values from right siblings.
In L-attributed SDTs, a non-terminal can get values from its parent, child, and sibling
nodes. As in the following production,
S → ABC
S can take values from A, B, and C (synthesized). A can take values from S only. B can
take values from S and A. C can get values from S, A, and B. No non-terminal can get
values from the sibling to its right.
Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing
manner.
51
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Whenever a procedure is executed, its activation record is stored on the stack, also
known as control stack. When a procedure calls another procedure, the execution of the
caller is suspended until the called procedure finishes execution. At this time, the
activation record of the called procedure is stored on the stack.
We assume that the program control flows in a sequential manner and when a
procedure is called, its control is transferred to the called procedure. When a called
procedure is executed, it returns the control back to the caller. This type of control flow
makes it easier to represent a series of activations in the form of a tree, known as the
activation tree.
To understand this concept, we take a piece of code as an example:
printf(“Enter Your Name: “); scanf(“%s”, username); show_data(username);
printf(“Press any key to continue…”);
...
int show_data(char *user)
{
printf(“Your name is %s”, username); return 0;
}
Given below is the activation tree of the code:
52
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Now we understand that procedures are executed in depth-first manner, thus stack
allocation is the best suitable form of storage for procedure activations.
SUMMARY
Syntax directed translation is a context free grammar.
Syntax directed translation is helps to generates Intermediate code.
Sematic actions is enclosed in curly braces.
The translation of a non-terminal on the Right side of the production is defined in
terms of non-terminal on the Left side is called inherited translation.
Input and output mapping is described using Syntax directed translator.
The parse tree having values in the node is called Annotated Parse tree.
Questions
1. Explain syntax directed translation.
2. Define sematic actions.
3. Write note on decency graph.
4. Generate a sematic action for production S E$
EE + E/ E*E/(E)/(I)
II/Idigit (Where I is a Integer)
53
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
6. SYMBOL TABLE
8.1 Introduction
Symbol table is an important data structure created and maintained by compilers in
order to store information about the occurrence of various entities such as variable
names, function names, objects, classes, interfaces, etc. Symbol table is used by both the
analysis and the synthesis parts of a compiler.
A symbol table may serve the following purposes depending upon the language in hand:
To store the names of all entities in a structured form at one place.
To verify if a variable has been declared.
To implement type checking, by verifying assignments and expressions in the source
code are semantically correct.
To determine the scope of a name (scope resolution).
8.2 Symbol Tables
A compiler needs to collect and use information about the names appearing in the
source program. This information is entered into a data structure called a Symbol
Table.
Thus, the information collected about the name includes the string of characters by
which it is denoted, its type (e.g. Integer, real, string), its form (e.g. simple variable a
structure), its location in memory and other attributes depending on the language
Each entry in the symbol table is a pair of the form (name and information)
Each time a name is encountered, the symbol table is searched to see whether that
name has been seen previously. If it is new, then it is entered into the table.
Information about the name is entered into the symbol table during lexical and
syntactic analysis.
Symbol table is used in the several stages of the compiler.
54
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
55
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Self-Organizing Lists
Needs little extra space, save a little bit time.
Three fields, NAME1, DATA1, and LINK1 are there.
56
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Hashing Method
Consist of a fixed away of m pointers to table entries.
Table entries organized into „m‟ separate linked list called buckets
The hash table consists of K words, numbered a, 1…. K-1. There are pointers to
the storage table.
Hash function h such that h (NAME) is an integer value between 0 and k-1, is used to
find whether NAME is in the symbol table.
57
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Characteristic
Uniform Distribution
8.4 Operations
A symbol table, either linear or hash, should provide the following operations.
8.4.1 insert()
This operation is more frequently used by analysis phase, i.e., the first half of the
compiler where tokens are identified and names are stored in the table. This operation is
used to add information in the symbol table about unique names occurring in the source
code. The format or structure in which the names are stored depends upon the compiler
in hand.
An attribute for a symbol in the source code is the information associated with that
symbol. This information contains the value, state, scope, and type about the symbol. The
insert() function takes the symbol and its attributes as arguments and stores the
information in the symbol table.
For example:
int a;
should be processed by the compiler as:
insert(a, int);
8.4.2 lookup()
lookup () operation is used to search a name in the symbol table to determine:
1. If the symbol exists in the table.
2. If it is declared before it is being used.
3. If the name is used in the scope.
4. If the symbol is initialized.
5. If the symbol declared multiple times.
The format of lookup() function varies according to the programming language. The
basic format should match the following:
lookup(symbol)
This method returns 0 (zero) if the symbol does not exist in the symbol table. If the
symbol exists in the symbol table, it returns its attributes stored in the table.
{
int one_3; inner scope 1
58
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
int one_4;
}
int one_5;
{
int one_6; inner scope 2
int one_7;
}
}
void pro_two()
{
int two_1;
int two_2;
{
int two_3; inner scope 3
int two_4;
}
int two_5;
}
...
The above program can be represented in a hierarchical structure of symbol tables:
The global symbol table contains names for one global variable (int value) and two
procedure names, which should be available to all the child nodes shown above. The
names mentioned in the pro_one symbol table (and all its child tables) are not available for
pro_two symbols and its child tables.
This symbol table data structure hierarchy is stored in the semantic analyzer and
whenever a name needs to be searched in a symbol table, it is searched using the
following algorithm:
1. first a symbol will be searched in the current scope, i.e., current symbol table,
2. if a name is found, then search is completed, else it will be searched in the parent
symbol table until,
3. either the name is found or the global symbol table has been searched for the name.
8.5 Implementation of A simple stack – Allocation Scheme
Consider an implementation of the UNIX programming language C.
59
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Data in C can be global, meaning it is allocated as static storage and available to any
procedure or local, meaning it can be accessed only by the procedure in which it is
declared.
60
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Procedures : Their text part is static but they are called in a random manner. That is
why, stack storage is used to manage procedure calls and activations.
Variables : Variables are known at the runtime only, unless they are global or
constant. Heap memory allocation scheme is used for managing allocation and de-
allocation of memory for variables in runtime.
6.9.1 Static Allocation
In this allocation scheme, the compilation data is bound to a fixed location in the
memory and it does not change when the program executes. As the memory requirement
and storage locations are known in advance, runtime support package for memory
allocation and de-allocation is not required.
6.9.2 Stack Allocation
Procedure calls and their activations are managed by means of stack memory
allocation. It works in last-in-first-out (LIFO) method and this allocation strategy is very
useful for recursive procedure calls.
6.9.3 Heap Allocation
Variables local to a procedure are allocated and de-allocated only at runtime. Heap
allocation is used to dynamically allocate memory to the variables and claim it back when
the variables are no more required. Except statically allocated memory area, both stack
and heap memory can grow and shrink dynamically and unexpectedly. Therefore, they
cannot be provided with a fixed amount of memory in the system.
Parameter Passing
The communication medium among procedures is known as parameter passing. The
values of the variables from a calling procedure are transferred to the called procedure by
some mechanism. Before moving ahead, first go through some basic terminologies
pertaining to the values in a program.
r-value
The value of an expression is called its r-value. The value contained in a single variable
also becomes an r-value if it appears on the right-hand side of the assignment operator. r-
values can always be assigned to some other variable.
l-value
The location of memory (address) where an expression is stored is known as the l-
value of that expression. It always appears at the left hand side of an assignment operator.
61
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
For example:
day = 1;
week = day * 7;
month = 1;
year = month * 12;
From this example, we understand that constant values like 1, 7, 12, and variables
like day, week, month, and year, all have r-values. Only variables have l-values, as they
also represent the memory location assigned to them.
For example:
7 = x + y;
is an l-value error, as the constant 7 does not represent any memory location.
Formal Parameters
Variables that take the information passed by the caller procedure are called formal
parameters. These variables are declared in the definition of the called function.
Actual Parameters
Variables whose values or addresses are being passed to the called procedure are
called actual parameters. These variables are specified in the function call as arguments.
Example:
fun_one()
{
int actual_parameter = 10;
call fun_two(int actual_parameter);
}
fun_two(int formal_parameter)
{
print formal_parameter;
}
Formal parameters hold the information of the actual parameter, depending upon the
parameter passing technique used. It may be a value or an address.
Pass by Value
In pass by value mechanism, the calling procedure passes the r-value of actual
parameters and the compiler puts that into the called procedure‟s activation record.
Formal parameters then hold the values passed by the calling procedure. If the values
held by the formal parameters are changed, it should have no impact on the actual
parameters.
Pass by Reference
In pass by reference mechanism, the l-value of the actual parameter is copied to the
activation record of the called procedure. This way, the called procedure now has the
address (memory location) of the actual parameter and the formal parameter refers to the
same memory location. Therefore, if the value pointed by the formal parameter is changed,
the impact should be seen on the actual parameter, as they should also point to the same
value.
62
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Pass by Copy-restore
This parameter passing mechanism works similar to „pass-by-reference‟ except that
the changes to actual parameters are made when the called procedure ends. Upon
function call, the values of actual parameters are copied in the activation record of the
called procedure. Formal parameters, if manipulated, have no real-time effect on actual
parameters (as l-values are passed), but when the called procedure ends, the l-values of
formal parameters are copied to the l-values of actual parameters.
Example:
int y; calling_procedure()
{
y = 10;
copy_restore(y); //l-value of y is passed printf y; //prints 99
}
copy_restore(int x)
{
x = 99; // y still has value 10 (unaffected) y = 0; // y is now 0
}
When this function ends, the l-value of formal parameter x is copied to the actual
parameter y. Even if the value of y is changed before the procedure ends, the l-value of x is
copied to the l-value of y, making it behave like call by reference.
Pass by Name
Languages like Algol provide a new kind of parameter passing mechanism that works
like preprocessor in C language. In pass by name mechanism, the name of the procedure
being called is replaced by its actual body. Pass-by-name textually substitutes the
argument expressions in a procedure call for the corresponding parameters in the body of
the procedure so that it can now work on actual parameters, much like pass-by-reference.
Summary
Symbol table also called Book Keeping.
Each entry in the Symbol table is a pair of the form Name and Information.
Information are entered into the Symbol table during Lexical and Syntactic phases.
AVAILABLE is a pointer which indicates the beginning of the empty portion of the
Array.
Stack pointer is used to point a particular position an activation record.
LIFO mechanism is using in handling temporaries.
Symbol tables are mostly implemented as hash tables.
The source code symbol itself is treated as a key for the hash function.
The return value from hash table is the information about the symbol.
Question
1. Explain the contents of Symbol Table.
2. Describe the data structures for Symbol table.
3. Explain the Stack allocation scheme.
4. Write note on Storage Allocation in Block structure Language.
5. Briefly explain
a. Common Data area.
b. Equivalence Data area.
63
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
If a compiler translates the source language to its target machine language without
having the option for generating intermediate code, then for each new machine, a full
native compiler is required.
Intermediate code eliminates the need of a new full compiler for every unique machine
by keeping the analysis portion same for all the compilers. The second part of compiler,
synthesis, is changed according to the target machine.
It becomes easier to apply the source code modifications to improve code performance
by applying code optimization techniques on the intermediate code.
64
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Quadruples are close to machine instruction. But they are not machine instructions.
Some programming languages have well defined intermediate languages.
Java java virtual machine.
Prolog warren obstruct machine.
In fact, there are byte code emulators to execute instructions in these intermediate
languages.
8.8.1 Postfix: (Reverse Polish or Postfix Polish)
Places the operator at the right end.
Examples:
S.No Infix Postfix
1 a+b ab+
2 (a+b)*c ab+c*
3 a*(b+c) abc+*
4 (a+b)*(c+d) ab+cd+*
5 If a then if c-d then a+c else a*c else a+b acd-ac+ac*? ab+?
The syntax directed translation scheme for a simple grammar is given by,
65
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
66
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
usually the “Three address code” contains address two for the operand and one for the
result
Additional 3-address statements:
1. Assignment statements of the form A:=B op C (binary arithmetic logical operator)
2. Assignment instruction, A:=op B (Unary operator) (A:=B B is assigned to A)
3. Unconditional: goto L.
4. Conditional : if A reop B goto L (relational op)
5. param A and call p,n
Eg: Procedure call:
Param A1
-
-
Param An
Call p,n
6. Indexed assignment : A:=B[I] (Location (array))
7. Pointer and address assignments,
A: = addr B( Address of B), A= *B(Value at B) and *A = B
8.8.3.2 Triples
Used to avoid temporary names into the symbol table.
Here only 3 Fields are used
67
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Parenthesized numbers are used to represent pointers into the triple structure
E.g.:
A: = -B * (C+D)
A: = T3
A: = -B * (C+D)
Example:
A -> id: = E Consists of code to evaluate E into sometimes operators
E -> E+E | E*E | -E | (E)| id
A means assignment statement.
68
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
69
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
70
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
71
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
CODE OPTIMIZATION
8.11 Introduction
Optimization is a program transformation technique, which tries to improve the code
by making it consume less resources (i.e. CPU, Memory) and deliver high speed.
1. The term “code optimization” refers to techniques, a compiler can employ in an
attempt to produce a better object language program than the most obvious for a given
source program.
2. The quality of the object program is generally measured by its size (for small
computation) or its running time (for large computation).
3. It is theoretically impassible for a compiler to produce the best possible object program
for every source program under any reasonable cast function.
4. The accurate term for “code optimization” is ”code improvement”.
5. There are many aspects to code optimization.
(i) Cast
(ii) Quick & straight forward translation (time).
72
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
1. At the beginning, users can change/rearrange the code or use better algorithms to
write the code.
2. After generating intermediate code, the compiler can modify the intermediate code by
address calculations and improving loops.
3. While producing the target machine code, the compiler can make use of memory
hierarchy and CPU registers.
Optimization can be categorized broadly into two types : machine independent and
machine dependent.
73
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
A program can have various constructs as basic blocks, like IF-THEN-ELSE, SWITCH-
CASE conditional statements and loops such as DO-WHILE, FOR, and REPEAT-UNTIL,
etc.
8.13.3.1 Basic Block Identification
We may use the following algorithm to find the basic blocks in a program:
1. Search header statements of all the basic blocks from where a basic block starts:
i. First statement of a program.
ii. Statements that are target of any branch (conditional/unconditional).
iii. Statements that follow any branch statement.
2. Header statements and the statements following them form a basic block.
3. A basic block does not include any header statement of any other basic block.
Basic blocks are important concepts from both code generation and optimization point
of view.
Basic blocks play an important role in identifying variables, which are being used more
than once in a single basic block. If any variable is being used more than once, the
register memory allocated to that variable need not be emptied unless the block finishes
execution.
74
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
75
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
The above control flow graph depicts a chunk of program where variable „a‟ is used to
assign the output of expression „x * y‟. Let us assume that the value assigned to „a‟ is
never used inside the loop. Immediately after the control leaves the loop, „a‟ is assigned the
value of variable „z‟, which would be used later in the program. We conclude here that the
assignment code of „a‟ is never used anywhere, therefore it is eligible to be eliminated.
Likewise, the picture above depicts that the conditional statement is always false,
implying that the code, written in true case, will never be executed, hence it can be
removed.
76
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
If (condition)
{
...
tmp = y OP z;
a = tmp;
...
}
else
{
...
tmp = y OP z;
}
c = tmp;
Here, whether the condition is true or false; y OP z should be computed only once.
77
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Summary
Code optimization are called code improvement.
The aspect to Code is cost and Time
Constant timing is used for later optimization.
Optimization is mainly depends on the algorithm
Data structure to analyze basic blocks DAG.
DAG is the Direct Graph with no cycles.
DAG is useful to determine common sub expressions.
Questions
1. Explain the principle sources of Optimization.
2. Describe Loop Optimization.
3. Write note on reduction and Strength.
4. Describe DAG representation.
5. State the properties and uses of DAG.
78
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
9. CODE GENERATION
9.1 Introduction
Code generation can be considered as the final phase of compilation. Through post
code generation, optimization process can be applied on the code, but that can be seen as
a part of code generation phase itself. The code generated by the compiler is an object
code of some lower-level programming language, for example, assembly language. We have
seen that the source code written in a higher-level language is transformed into a lower-
level language that results in a lower-level object code, which should have the following
minimum properties:
1. It should carry the exact meaning of the source code.
2. It should be efficient in terms of CPU usage and memory management.
We will now see how the intermediate code is transformed into target object code
(assembly code, in this case).
9.4 Descriptor
The code generator has to track both the registers (for availability) and addresses
(location of values) while generating the code. For both of them, the following two
descriptors are used
Register descriptor: Register descriptor is used to inform the code generator about the
availability of registers. Register descriptor keeps track of values stored in each register.
Whenever a new register is required during code generation, this descriptor is consulted
for register availability.
79
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Address descriptor: Values of the names (identifiers) used in the program might be stored
at different locations while in execution. Address descriptors are used to keep track of
memory locations where the values of identifiers are stored. These locations may include
CPU registers, heaps, stacks, memory or a combination of the mentioned locations.
Code generator keeps both the descriptor updated in real-time. For a load statement, LD
R1, x, the code generator:
updates the Register Descriptor R1 that has value of x and
updates the Address Descriptor (x) to show that one instance of x is in R1.
Register Allocation & Assignment:
There are various strategies for deciding what names in a program should reside in
registers, i.e. register allocation & in which register each should reside i.e. register
assignment.
Advantage: Simplifies the design of a compiler.
Disadvantage: When strictly handled, it uses registers inefficiently.
80
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
MOV y‟, L
OP z‟, L
Other code constructs like loops and conditional statements are transformed into
assembly language in general assembly way.
Example:
t0 = a + b
t1 = t0 + c
d = t0 + t1
81
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
T1:=A+B;
T2:=C+D;
T3:=E-T2;
T4:=T1-T3;
Called a heuristic ordering for DAG‟S.
2. Optimal ordering of trees:
The DAG representation of a quadruple is a tree and if should be in a designed
register.
Using labeling algorithm- we can find the node and the leaf.
Multi register and algebraic properties are made.
Sub expressions are made i.e. partition of trees into sub trees.
Example:
void add_ten(int x)
82
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
{
return x + 10;
printf(“value of x is %d”, x);
}
In this code segment, the printf statement will never be executed, as the program control
returns back before it can execute, hence printf can be removed.
Strength Reduction
There are operations that consume more time and space. Their „strength‟ can be
reduced by replacing them with other operations that consume less time and space, but
produce the same result.
For example, x * 2 can be replaced by x << 1, which involves only one left shift. Though
the output of a * a and a2 is same, a2 is much more efficient to implement.
Summary
Code generation can be considered as the final phase of compilation.
The output of the code generator is machine code.
The address descriptor is used to track the location in the memory.
DAG representation made the code generation simple.
If the value of a name is found at more than one place, the register‟s value will be
preferred over the cache and main memory.
Redundancy in loads and stores are reduced in peephole optimization.
Question
1. Explain register descriptor and address descriptor.
2. Write note on GETREG ().
83
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
84
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
85
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
86
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
87
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
88
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
91. ------------- is a tree in which each leaf represents an operand and each interior node
an operator.
a) Parser Tree b)Semantic Tree c)Syntax Tree d)Structured Tree Ans:c
92. The properties of an entity are called as --------------.
a)values b)attributes c)numbers d)digits Ans:b
93. Usually the “Three address code” contains address two for the ---------- and one for the
result.
a) operand b)operator c)result d) statement Ans:a
94. The --------------- statement is an abstract form of intermediate code.
a)2-address b)3-address c)Intermediatecode d)address Ans:b
95. Which is not the way of implement the 3-address statement.
a) Quadruples b) Triples c) Indirect Triples d) Parse Tree Ans:d
96. -------------- record structure has 4 fields.
a) Quadruples b) Triples c) Indirect Triples d) Parse Tree Ans:a
97. Parenthesized numbers are used to represent ----------- into the triple structure.
a)pointer b)stack c)queue d)value Ans:a
98. ---------------- Triples are listing pointers to triples, rather than listing the triples
themselves.
a)Direct b)Indirect c)Multiple d)New Ans:b
99. ---------------- refers to the location to store the value for a symbol.
a) value b)place c)code d)number Ans:b
100. ----------- refers to the expression or expressions in the form of three address codes.
a) value b)place c)code d)number Ans:c
101. ----------- is associating the attributes with the grammar symbols.
a)rotation b)translation c)transformation d)evolving Ans:b
102. In 3-address code for array reference we assume static allocation of arrays, where
subscripts range from 1 to some limit known at ------------------ time.
a) compile b) run c) execution d) process Ans:a
103. In Triples uses only 3 ----------------.
a) fields b) operator c) operand d) instruction Ans:a
104. _____________ is used in the several stages of the compiler.
a)Table b) Symbol Table c) Records d) Program. Ans:b
105. Information about the name is entered into the symbol table during ____________
and ___________.
a) lexical and syntactic analysis b) lexical and code generation
c) lexical and error handler d) lexical and code optimization. Ans:a
106. Each entry in the symbol table is a pair of the form ______ and _______.
a) Name and information. b) Name and function.
c) Name and Data. d) Name and procedures. Ans:a
107. A compiler needs to collect and use information about the names appearing in the
source program. This information is entered into a data structure called a
______________.
a)Symbol Table b) Lexical analysis
c) Syntactic analysis d) Records. Ans:a
108. Undeclared name and type incompatibilities in _____________.
a)Syntactic errors b) Semantic errors
c) Lexical Phase errors d) Reporting errors. Ans:b
89
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
90
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
91
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
Exercise
1. If W is a string of terminals and A, b are two non-terminals, then which of the
following are right-linear grammars?
(a) A Bw
(b) A Bw|w
(c) Awb|w
(d) None of the above
2. If a is a terminal and S, A, B are three non-terminals, then which of the following are
regular grammars?
(a) S E
A aS|b
(b) A aB|a
B bA|b
(c) A Ba|Bab
(d) A abB|aB
92
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
93
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
14. The language generated by the above grammar is the set of all strings made uo of a, b,
c, such that
(a) The number of a‟s, b‟s, and c‟s will be equal
(b) a‟s always precede b‟s
(c) b‟s always precede c‟s
(d) The number of a‟s b‟s and c‟s are same and the a‟s precede b‟s. Which precede c‟s.
19. If two finite states machine M and N isomorphic, then M can be transformed to N by
relabeling
(a) The states alone
(b) The edges alone
(c) Both the states and edges
(d) None of the above.
94
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
22. For which of the following situations, inherited attribute is a natural choice?
(a) Evaluation of arithmetic expressions
(b) Keeping track of variable declaration
(c) Checking for the correct use of L-values and R-values
(d) All of the above.
23. The graph depicting the inter-dependencies of the attributes of different nodes in a
parse tree is called a
(a) Flow graph
(b) Dependency graph
(c) Karnaugh‟s graph
(d) Steffi graph
95
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
31. The best way to compare the different implementation of symbol table is to compare
the time required to
(a) Add a new name
(b) Make an inquiry
(c) Add a new name and make an inquiry
(d) None of the above.
32. Which of the following symbol table implementation is based on the property of locality
of reference?
(a) Linear list
(b) Search tree
(c) Hash table
(d) self-organization list
33. which of the following symbol table implementation is best suited if access time to be
minimum?
(a) Linear list
(b) Search tree
(c) Hash table
(d) self-organization list
34. which of the following symbol table implementation, makes efficient use of memory?
(a) List
(b) Search tree
(c) Hash table
(d) Self-organizing list.
96
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
58. Which of the following optimization techniques are typically applied on loops?
(a) Removal of invariant computation
(b) Elimination of induction variables
(c) Peephole optimization
(d) Invariant computation
97
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
59. The technique of replacing run time computation by compile time computations is
called
(a) Constant folding
(b) Code hoisting
(c) Peephole optimization
(d) Invariant computation
60. The graph that shows the basic blocks and their successor relationship is called.
(a) Control graph
(b) Flow graph
(c) DAG
(d) Hamiltonian graph
65. Which of the following concepts are not loop optimization tecniques?
(a) Jamming
(b) Unrolling
(c) Induction variable elimination
(d) None of the above
98
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
67. du-chaining
(a) Stands for use definition chaining
(b) Is useful for copy propagation removal
(c) Is useful for induction variable removal
(d) None of the above
68. Which of the following comments about peep-hole optimization are true?
(a) It is applied to a small part of the code.
(b) It can be used to optimize intermediate code
(c) To get the best out of this technique, it has to be applied repeatedly.
(d) It can be applied to a portion of the code that is not contiguous.
99
Compiler Design Concepts, Worked out Examples and MCQs for NET/SET
REFERENCES
[1] Alfred V. Aho and Jeffrey D. Ullman, “Principles of Compiler Design”, 1989.
[2] Y.N. Srikant and Priti Shankar, “The Compiler Design Handbook: Optimizations and
Machine Code Generation, Second Edition”, Dec 7, 2007.
[3] Adesh K.Pandey,” Concepts of compiler Design”, S.K.Kataria and ons publisher of
India Books, India
[4] Reinhard Wilhelm & Dieter Maurer,” Compiler Design”, Addion-Wesley, I edition
[5] Gajendra Sharma,” Compiler Design”, S.K.Kataria and ons publisher of India Books,
India
[6] K.Muneewaran,” Compiler Design”, Oxford University Press, 2012
[7] K.Krishnakumari,”Compiler Design”, ARS Publications,2013
[8] A.A.Puntambekar, “Compiler Design(Principles of Compiler Design), Technical
Publications, 2013
[9] Steven S.Muchnick,” Advanced Compiler Design Implementation”, Morgan Kaufman
Publisher,2012
[10] Alexander Meduna,”Elements of Compiler Design”, Auerbach Publication, 2009.
[11] G.Sudha Sadasivam,”Compiler Design”, Scitech Publication, 2009.
[12] P.Kalaiselvi, AAR Senthilkumaar,”Principles of Compiler Deign”,Charulatha
Publications,2013.
[13] https://www.tutorialspoint.com/compiler_design/
[14] http://www.dreamincode.net/forums/topic/260592-an-introduction-to-compiler-
design-part-i-lexical-analysis/
100