FLT Course Manual
FLT Course Manual
FLT Course Manual
UNIVERSITY
Formal Language
Theory
Haftu Hagos
Chapter One
The Theory of Computation
Introduction
Computer science is a practical discipline. Those who have worked in it often have a marked
preference for useful and tangible problems over theoretical speculation. This is certainly true of
computer science students who are interested mainly in working on difficult applications from
the real world.
Theoretical questions are interesting to them only if they help in finding good solutions. This
attitude is appropriate, since without applications there would be little interest in computers. But
given this practical orientation, one might well ask why study theory.
The first answer is that theory provides concepts and principles that help us to understand the
general nature of the discipline. The field of computer science includes a wide range of special
topics from machine design to programming. The use of computers are in the real world involves
a wealth of specific detail that must be learned of a successful application. This makes computer
science a very diverse and broad discipline. But in spite of this diversity, there are some common
underlying principles. To study these basic principles, we construct abstract models of computers
and computation. These models embody the important features that are common to both
hardware and software and that are essential to many of the special and complex constructs we
encounter while working with computers.
A second, and perhaps not so obvious answer, is that the ideas we will discuss have some
intermediate and important applications. The fields of digital design, programming languages
and compiler designs are the most obvious examples, but there are many others. The concepts we
study here run like a thread through computer science from operating system to pattern
recognition.
The third answer is one which we try to convince the reader. The subject matter is interest
intellectually and fun. It provides many challenging, puzzle like problems that can lead to some
sleepless nights.
Therefore, in this course we will look at models that represent feature at the core of all computers
and their applications. To model the hardware of the computer, we introduce the notion of
automation (plural, automata). Automation is a construct that processes all the indispensable
features of a digital computer. It accepts an input, produces output, may have some temporary
storage, and can make decision in transforming the input into the output. A formal language is an
abstract of general characteristics of programming languages.
A formal language consists of a set of symbols and some rules of formation by which these
symbols can be combined into entities called sentences. A formal language is the set of all
strings permitted by the rules of formation. Though some the formal languages we study here are
simpler than programming languages, they have many of the same essential features. We can
learn a great deal about programming languages from formal languages. Finally, we will
formalize the concept of mechanical computation by giving a precise definition of the term
algorithm and we study the kinds of problems that are (and are not) suitable for solution by such
mechanical means.
S. A
set is specified by enclosing some description of its elements in curly braces; for example, the set
of integers 0, 1, 2, is shown as
S={0,1,2}
.
Ellipses are used whenever the elements are clear. Thus, {a, b, c,,z} stands for all the lower
case letters of the English alphabet, while {2,4,6,}denotes the set of all positive even integers.
When the need arises, we use more explicit notation, in which we write
S={i:i>0, iis even }
We read this as S is set of all i, such that i is greater than zero and i is even implies of course
that i is an integer.
The usual set operations are union () and intersection ( ), difference (-) and
Complementation defined as
S 1 S 2={ x : x S 1x S 2 }
S 1 S 2={ x : x S 1x S 2 }
S 1S 2={x : x S 1x S 2 }
S={ x : x Ux S }
The set with no elements, called the empty set or the null set is denoted by . From the
definition of a set, it is obvious that
S =S=S
S =
=U
S =S
The following useful identity equalities
S 1 S 2=S 1 S 2
DeMorgans
Law
S 1 S 2=S 1 S 2
A B
and
B A
If S1S, but S contains an element not on S1 we say that S1 is proper subset of S: we write this
as
S 1 S
IF S1 and S2 has no common elements that is S1 S2 = , then the sets are said to be
Disjoint sets.
Theorem: If A and B are both finite sets, then
n(A B) = n(A) + n(B) n(A B)
A set is said to be finite if it contains finite number of elements; otherwise it is said to be infinite.
The size of finite set is the number of elements in it. And is denoted by |S| .
A given set normally has many subsets. The set of all the subsets of a set S is called the powerset
of set S and is denoted by 2S. Observe that 2s is set of sets.
Example: set S is the set {a, b, c}, then its poweset is
2S = { , {a},{b},{c},{a, b},{a, c},{b, c},{a, b, c}} = 8
Sets are said to be Cartesian product of other sets. For the Cartesian product of two sets, which
itself is a set ordered pairs, we write
S 1 S 2={( x , y ) : x S 1 , y S 2 }
Example: let S1 = {2, 4} and S2= {2, 3, 5, 6}. Then
S 1 S 2={( 2,2 ) , ( 2,3 ) , ( 2,5 ) , ( 2,6 ) , ( 4,2 ) , ( 4,3 ) , ( 4,5 ) ,(4,6) }
f : S 1 S 2
To indicate that the domain of f is a subset of S1 and that the range of f is a subset of S2. If the
domain of f is all of S1, we say that f is the total function on S1. Otherwise, f is said to be a
partial function
Relations are more general than functions: in a function each element of the domain has exactly
one associate element in the range; in a relation there may be several elements in the range.
Figure 1.1
Graphs are conveniently visualized by diagrams in which the vertices are represented as circles
and the edges as lines with arrows connecting the vertices as shown above.
The graph with vertices {v1, v2, v3} and edges {(v1, v3), (v2, v3), (v3, v1), (v3, v3)} is depicted
in figure 1.1.
Trees are particular types of graphs. A tree is a digraph that has no cycles, and that one distinct
vertex, called the root, Such that there is exactly one path from the root to every other vertex.
This definition implies that the root has no incoming edges and that there are some vertices
without outgoing edges. These are called the leaves of the tree.
The height of the tree is the largest level number of any vertex.
Figure 1.2
Theorem 1.3: if
x4
then 2 x
First notice that, hypothesis H is x 4 , this hypothesis has a parameter x, and thus is neither
true nor false. Rather, its truth depends on the value of the parameter x; H is true for x =6 and
false for x =2.
x
2
Likewise the conclusion C is 2 x , this statement also uses parameter x and is true for
3
certain value of x and not others. For example, C is false for x =3, since 2 =8
which is not as
2
2
4
large as 3 =9 . On the other hand C is true for x =4 since 4 =2 =16 . For x = 5, the
whenever x 4 .
x
2
Theorem 1.4: if x is sum of the squares of four positive integers, then 2 x .
Poof:
Step-1: we have repeated one of the given statements of the theorem: that x is the sum of the
squares of four integers. It often helps in proofs if we name quantities that are referred but not
named, and we have done so here, giving the four integers the names, a, b, c and d.
Proof: intuitively, this theorem says that if you have an infinite supply of something (U), and you
take a finite amount away (S), then you still have an infinite amount left. Let us begin by
restating
the
fact
of
the
theorem
as
given
below.
S T =U
and
S T =
of U are exactly the elements of S and T. thus, there must be n + m elements of U. since m + n is
an integer and we have shown U=n+ m , it follows that U is finite. More precisely, we
showed that the number of elements U is finite integers, which is the definition of finite. But
the statement that U is finite contradicts the given statement that U is infinite. We have thus
used the contradiction of our conclusion to prove contradiction of one of the given statements of
the hypothesis and by principle of proof by contradiction we may conclude the theorem is true.
1) Learning its alphabet - the symbols that are used in the language.
2) Its words - as various sequences of symbols of its alphabet.
3) Formation of sentences - sequence of various words that follow
certain rules of the language.
In this learning, step 3 is the most difficult part. Let us postpone discussing construction of
sentences and concentrating on steps 1 and 2. For the time being instead of completely ignoring
about sentences one may look at the common features of a word and a sentence to agree upon
both are just sequences of some symbols of the underlying alphabet. For example, the English
sentence
"The English articles - a, an and the are categorized into two types:
indefinite and definite."
may be treated as a sequence of symbols from the Roman alphabet along with enough
punctuation marks such as comma, full-stop, colon and further one more special symbol, namely
blank-space which is used to separate two words. Thus, abstractly, a sentence or a word may be
interchangeably used for a sequence of symbols from an alphabet. With this discussion we start
with the basic definitions of alphabets and strings and then we introduce the notion of language
formally.
Further, in this chapter, we introduce some of the operations on languages and discuss algebraic
properties of languages with respect to those operations. We end the chapter with an introduction
to finite representation of languages via regular expressions.
1.5.1 Alphabets
Definition: An alphabet is a finite set of objects called symbols.
Notation: ={a ,b , , z }
1.5.2
String
1.5.3 Language
Definition: A language over an alphabet is a set of strings over .
Notation: L, M, N ... for languages. |L| for the size (number of strings) of L.
Notation: ' will denote a set of all strings over
over is just a subset of .
Notation:
. Then, a language L
Review Questions
1) Suppose Walters online music store conducts a customer survey to determine the preferences
of its customers. Customers are asked what type of music they like. They may choose from
the following categories: Pop (P), Jazz (J), Classical (C), and none of the above (N). Of 100
customers some of the results are as follows:
44 like Classical
27 like all three
15 like only Pop
10 like Jazz and Classical, but not Pop
How many like Classical but not Jazz? We can fill in the Venn diagram below to keep track of
the numbers.
There are n(C) = 44 total that like Classical, and n(C J) = 27+10 = 37 that like both Jazz and
10
b)
c)
d)
e)
f)
A B
A C
(A C)
(A B) C
(A B) A
3) Determine if the following statements are true or false. Here A represents any set.
a) A
b) A A
c) (A) = A
4) Let U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and A = {1, 3, 5, 7, 9} and B = {1, 4, 5, 9}.
a) Find A B
b) Find A B
c) Use a Venn diagram to represent these sets.
5) One hundred students were surveyed and asked if they are currently taking math (M),
English (E) and/or History (H) The survey findings are summarized here:
Survey Results
n(M) = 45 n(M E) = 15
n(E) = 41 n(M H) = 18
n(H) = 40 n(M E H) = 7
n[(M E) (M H) (E H)] = 36
11
.
Example: the kleene closure of the languange {01} is
{ , 01,0101,010101, .}
a) If L = {0, 10}, then L* is ?
********************THE END**********************
Chapter-2
Finite Automata
Introduction
We will be making use of mathematical models of physical systems called finite automata, or
finite state machines to recognize whether or not a string is in a particular language. This section
introduces this idea and gives the precise definition of what constitutes a finite automaton. We
look at several variations on the definition (to do with the concept of determinism) and see that
they are equivalent for the purpose of recognizing whether or not a string is in a given language.
Finite automata are a useful model for many important kinds of hardware and software. Let us
just list some of the most important kinds.
1) Software for designing and checking the behavior of digital circuits.
2) The lexical analyzer of a typical compiler that is the compiler component that breaks
the input into logical units such as identifiers, keywords and punctuation.
3) Software for scanning large bodies of text, such as collection of web pages, to finite
occurrences of words, phrases or other patterns.
4) Software for verifying systems of all types that have a finite number distinct state such as
communication protocols or protocols for secure exchange of information.
Finite Automaton can be classified into two types
In DFA, for each input symbol, one can determine the state to which the machine
will move. Hence, it is called Deterministic Automaton. As it has a finite number
of states, the machine is called Deterministic Finite Machine or Deterministic
Finite Automaton.
In NDFA, for a particular input symbol, the machine can move to any combination of
the states in the machine. In other words, the exact state to which the machine
moves cannot be determined. Hence, it is called Non-deterministic Automaton.
As it has finite number of states, the machine is called Non-deterministic Finite
Machine or Non-deterministic Finite Automaton.
q 0 Q
A deterministic finite accepter operates in the following manner. At this initial time, it is assumed
to be in the initial state q0, with its input mechanism on the left symbol of the input string.
During each move the automation, the input mechanism advances one position to the right, so
each move consumes one input symbol. When the end of the string is reached, the string is
accepted if the automaton is in its final state. Otherwise the string is rejected.
The input mechanism can move from left to right and reads exactly one symbol on each step.
The transitions from one internal state to another are governed by the transition function .
For example: if
13
( q 0,a )=q 1
Then if the dfa is in the state q0, and the current input symbol a, the dfa will go into state q1.
To visualize and represent finite automata, we use transition graphs, in which the vertex
represent states and the edges represent transitions. The labels in vertices are the names of the
states, while the labels on the edges are the current value of the input symbol.
For example: if q0 and q1 are the internal states of some dfa M, then the graph associated with M
will have one vertex labeled q0 and another labeled q1. And edge (q0.q1) labeled a represents the
transition ( q 0,a )=q 1 . The initial state will be identified by an incoming unlabeled arrow
not originating at any vertex. Final states are drawn with doubled circle.
For every transition rule, ( qi , a )=qj , the graph has an edge (qi,qj) labeled a.
Where
is given as
14
The dfa accepts the string 01. Stating in the q0, the symbol 0 is read first. Looking at the edges of
the graph, we see that the automaton remains in the state q0. Next 1 is read and the automaton
goes into state q1. We ate know at the end of the string and at the same time in a final state q1.
Therefore, the string 01 is accepted. The dfa doesnt accept the string 00, since after reading two
consecutive 0s, it will be in state q0.by similar reason we see that the automaton will accept the
strings 101, 0111, 11001, but not 1100 or 100.
argument
the second
is a string, rather than a single symbol and its value gives the state the
recursively by
( q 0, )=q
2.1
( ( q , w ) , a)
( q , wa )=
2.3
15
For all
q Q , w , a
( ( q 0, a ) , b)
( q 0, ab )=
2.3
But
( ( q 0, ) , a)
( q 0, a )=
( q 0, a )
q1
move is defined, so that we are justified in calling such an automaton deterministic. A dfa will
process every string and either accepted it or rejected it. Non acceptance means that the
dfa is stops in a no final state, so that
16
The automaton in Linz Figure 2.2 accepts all strings consisting of arbitrary numbers of as
followed by a single b.
In set notation, the language accepted by the automaton is L = {anb : n 0}.
Note that q2 has two self-loop edges, each with a different label. We write this compactly with
multiple labels.
A trap state is a state from which the automaton can never escape.
Note that q2 is a trap state in the dfa transition graph shown in Linz Figure 2.2.
Transition graphs are quite convenient for understanding finite automata.
17
Find a deterministic finite accepter that recognizes the set of all string on = {a, b} starting with
the prefix ab. Linz Figure 2.4 shows a transition graph for a dfa for this example.
The dfa must accept ab and then continue until the string ends.
This dfa has a final trap state q2 (accepts) and a non-final trap state q3 (rejects).
Find a dfa that accepts all strings on {0, 1}, except those containing the substring 001.
need to remember whether last two inputs were 00
use state changes for memory
{a, b} } is regular.
*
Construct a dfa.
Check whether begin/end with a.
Am in final state when second a input.
18
Construct a dfa.
Use Example 2.5 dfa as starting point.
Accept two consecutive strings of form awa.
Note that any two consecutive as could start a second string.
19
The last example suggests the conjecture that if a language L then so is L2, L3, etc. We will come
back to this issue in chapter 4.
accepters, but
:Q ( { }) 2Q
Remember for dfas:
Consider the transition graph shown in Linz Figure 2.8. Note the nondeterminism in state q0
with two possible transitions for input a. Also state q3 has no transition for any input.
20
Consider the transition graph for an nfa shown in Linz Figure 2.9.
Note the nondeterminism and the -transition.
Note: Here means the move takes place without consuming any input symbol.
This is different from accepting an empty string.
Transitions:
for
for
for
for
(q0,
(q1,
(q2,
(q2,
0)?
0)?
0)?
1)?
Requirement:
(qi, w) = Qj where Qj is the set of all possible states the automaton may be
function is defined so that ( (qi,w) contains qj if there is a walk in the transition graph from
qi to qj labeled w. This holds for all qi, qj Q
and w
E*.
(q 2, a)
( q 1, a ) ( q 2, )
from q1 to itself. By using some of the -edges twice we see that there are also walks involving -transitions to q0
and q2.
Thus
( q 1, a )={q 0, q 1, q 2}
( q 2, )=q 2 .
Therefore;
( q 2, )={q 0. q 2 }
Using many -transitions as needed you can also check that,
( q 2 , a a )={q 0, q 1, q 2}
L (M) = {w : (q0, w) F 0 }.
That is, L(M) is the set of all strings w for which there is a walk labeled w from the initial vertex
of the transition graph to some final vertex.
22
Lets again examine the automaton given in Linz Figure 2.9 (Example 2.8).
This nfa, called it M:
must end in q0
L(M) = {(10)n : n 0}
Why Nondeterminism
Chapter-3
Grammars
Introduction
In this chapter we introduce the notation grammar called the context-free grammar (CFG) as a
language generator. The notation of derivation is instrumental in understanding how the strings
are generated in a grammar.
In the context of natural languages, the grammar of the language is the set of rules which are
used to construct /validate sentence of the language. We look into the general features of the
grammars (of natural languages) to formalize the notation in the present context which facilitate
for the better understanding of formal languages.
Consider the English language
23
24
25
It is convenient to write
26
i In DFA, for each input symbol, one can determine the state to which the machine will
move. Hence, it is called Deterministic Automaton. As it has a finite number of states,
the machine is called Deterministic Finite Machine or Deterministic Finite
Automaton.