Unit 1
Unit 1
Unit 1
Structure of compiler – Functions and Roles of lexical phase – Input buffering – Representation of
tokens using regular expression – Properties of regular expression – Finite Automata – Regular
Expression to Finite Automata – NFA to Minimized DFA.
1.STRUCTURE OF COMPILER:
Compiler is a translator program that reads a program written in one language -the source
language- and translates it into an equivalent program in another language-the target language. As
an important part of this translation process, the compiler reports to its user the presence of errors in
the source program.
error
messages
Lexical errors:
Few errors are discernible at the lexical level alone, because a LA has a very localized view of the
source program. A LA may be unable to proceed because none of the patterns for tokens matches a
prefix of the remaining input.
Error-recovery actions are:
i. Deleting an extraneous character.
ii. Inserting a missing character.
iii. Replacing an incorrect character by a correct character.
iv. Transposing two adjacent characters.
3. INPUT BUFFERING
A two-buffer input scheme that is useful when look ahead on the input is necessary to identify
tokens is discussed. Later, other techniques for speeding up the LA ,such as the use of "sentinels" to
mark the buffer end is also discussed.
Buffer Pairs:
A large amount of time is consumed in scanning characters ,specialised buffering techniques are
developed to reduce the amount of overhead required to process an input character. A buffer divided
into two N-character halves is shown in Fig. 1.7.Typically, N is the number of characters on one disk
block, e.g., 1024 or 4096.
Sentinels:
• With the previous algorithm , we need to check each time we move the forward pointer that
we have not moved off one half of the buffer. If so, then we must reload the other half.
• This can be reduced ,if we extend each buffer half to hold a sentinel character at the end.
• The new arrangement and code is shown in Fig. 1.9and 1.10.This code performs only one test
to see whether forward points to an eof.
Regular Language
• A regular language over an alphabet is the one that can be obtained from the basic languages
using the operations Union, Concatenation and Kleene *.
• A language is said to be a regular language if there exists a Deterministic Finite Automata (DFA)
for that language.
• The language accepted by DFA is a regular language.
• A regular language can be converted into a regular expression by leaving out {} or by replacing
{} with () and by replacing U by +.
Regular Expressions
Regular expression is a formula that describes a possible set of string.
Component of regular expression..
X the character x
. any character, usually accept a new line
[x y z] any of the characters x, y, z, …..
R? a R or nothing (=optionally as R)
R* zero or more occurrences…..
R+ one or more occurrences ……
R1R2 an R1 followed by an R2
R2R1 either an R1 or an R2.
A token is either a single string or one of a collection of strings of a certain type. If we view the set of
strings in each token class as an language, we can use the regular-expression notation to describe
tokens.
Consider an identifier, which is defined to be a letter followed by zero or more letters or digits. In
regular expression notation we would write.
Here are the rules that define the regular expression over alphabet .
Regular expression is used to describe the structure of tokens. Formal definition is,
1. Ɛ is said to be a regular expression.
2.Φ is said to be a regular expression.
3.a in ∑ is said to be regular expression.
4.Suppose r1 and r2 are regular expressions denoting the languages L(r1) and L(r2). Then,
a) (r1 )| (r2) is a regular expression denoting L(r1)Ư L(r2).
b) (r1 ). (r2) is a regular expression denoting L(r1). L(r2).
c) (r1 )* is a regular expression denoting L(r1)*.
Unnecessary parentheses can be avoided in regular expressions if we adopt the conventions that:
1. The unary operator * has the highest precedence and is left associative.
2. concatenation has the second highest precedence and is left associative.
3. | ,the alternate operator has the lowest precedence and is left associative.
q0 q1
2. R = r1
r1
q0 q1
3. R=r1r2
r1 r2
q0 q1 q2
r1
q1 q2
q0 q5
q3 q4
r2
5. R=r*
r
q0 q1 q1 q2
6. R=r+
r
q0 q1 q1 q2
Problems
1. Construct NFA for: (0+1)*10
0
q2 q3
1 0
q0 q0 q0 q6
q0 q1
q4 q5
1
The algorithm for constructing a DFA from a given NFA such that it recognizes the same language is
called subset construction. The reason is that each state of the DFA machine corresponds to a set of
states of the NFA. The DFA keeps in a particular state all possible states to which the NFA makes a
transition on the given input symbol. In other words, after processing a sequence of input symbols the
DFA is in a state that actually corresponds to a set of states from the NFA reachable from the starting
symbol on the same inputs.
Operation Definition
-closure( s ) set of NFA states reachable from state s on -transition
-closure( T ) set of NFA states reachable from some s in T on -transition
move( T, a ) set of NFA states to which there is transition on input a
from some state s in the set T
The staring state of the automaton is assumed to be s0. The -closure( s ) operation computes exactly all
the states reachable from a particular state on seeing an input symbol. When such operations are defined
the states to which our automaton can make a transition from set T on input a can be simply specified as:
-closure( move( T, a ) )
Step 1: Convert the above expression in to NFA using Thompson rule constructions.
a
2 3
start a b b
0 1 6 7 8 9 10
b
4 5
The derivation of the states and transitions of the DFA, including the computation of the -closure and
move functions, can be demonstrated as follows:
-closure({0}) = {1,2,4,7} = A
-closure(move(A,a))= {3, 8}
-closure( {3, 8} )={1,2,3,4,6,7,8}= B
-closure(move(A,b))= {5}
-closure( {5} )={1,2,4,5,6,7}= C
-closure(move(B,a))=-closure( {3, 8} )= B
-closure(move(B,b) =-closure( {5, 9} )={1,2,4,5,6,7,9}= D
-closure(move(C,a))=-closure( {3,8} )= B
-closure(move(C,b))=-closure( {5} )= C
-closure(move(D,a))=-closure( {3,8} )= B
-closure(move(D,b))=-closure( {5,10} )={1,2,4,5,6,7,10}= E
-closure(move(E,a))=-closure( {3,8} )= B
-closure(move(E,b))=-closure( {5} )= C
STATE a b
A B C
B B D
C B C
D B E
E B C
C b
b a
start A
a B
b D
b E
a
a
a
Minimized DFA.
Step 3: Convert the above DFA in to minimized DFA by applying the following algorithm.
Minimized DFA algorithm:
Input: DFA with ‘s’ no of states
Output: Minimized DFA with reduced no of states.
Steps:
1. Partition the set of states in to two groups. They are set of accepting states and non accepting states.
2. For each group G of π do the following steps until π=π new .
begin
divide G in to as many groups as possible, such that two states s and t are in the same group only
when for all states s and t have transitions for all input symbols ‘s’ are in the same group itself.
Place newly formed group in π new.
end
3. Choose representative state for each group.
4. Remove any dead state from the group. After applying minimized DFA algorithm for the regular
expression (a|b)*abb , the transition table for the minimized DFA becomes
Transition table for Minimized state DFA :
Minimized DFA:
Exercises:
Convert the following regular expression in to minimized state DFA,
1. (a|b)*
2. (b|a)*abb(b|a)*
3. ((a|c)*)ac(ba)*