Chomsky Hierarchy in Theory of Computation

Introduction to Theory of Computation

Last Updated : 28 Jan, 2025

Automata theory, also known as the Theory of Computation, is a field within computer science and mathematics that focuses on studying abstract machines to understand the capabilities and limitations of computation by analyzing mathematical models of how machines can perform calculations.

Why we study Theory of Computation?

Automata theory is a fundamental area of computer science, with real-world applications in various systems and domains.

1. Regular Expressions (RE) in Systems

Regular expressions are powerful tools for pattern matching and text processing used extensively in many systems.

Examples:

UNIX: In UNIX, regular expressions like a.*b are used for matching text patterns within files, making it easier to search for specific content across vast datasets.
XML and DTDs: Document Type Definitions (DTDs) describe the structure of XML documents using regular expressions. For example, a tag like person (name, addr, child*) ensures that the person tag must include a name, an addr, and optionally multiple child tags.
Programming Languages: Almost all modern programming languages have libraries for regular expression that allow us to do text processing.

2. Finite Automata in Modeling Systems

Modeling Protocols and Circuits: Finite automata (FA) are used to model protocols, like those in network communication, and to design electronic circuits that operate based on a set of predefined rules or states.
Model-Checking: FA theory is also applied in model-checking, which is used to verify whether a system behaves as expected under all possible conditions.

3. Context-Free Grammars (CFG)

Syntax of Programming Languages: Context-free grammars are essential in describing the syntax of most programming languages. They define the rules that specify how programs should be written and structured.
Natural Language Processing: CFGs also play a vital role in computational linguistics, helping to describe the structure of natural languages like English.
XML and DTDs as CFGs: DTDs (Document Type Definitions) can be thought of as a specific application of context-free grammars, as they define the structure of XML documents.

Core Areas of the Theory of Computation

The field of computation theory can be broadly divided into three major areas:

1. Automata Theory

Automata theory studies abstract computational models and their applications. It forms the basis for understanding how machines process inputs and produce outputs. Key components include:

Finite Automata: Used to model simple systems like lexical analyzers in compilers.
Pushdown Automata: A more powerful model capable of recognizing context-free languages, essential for parsing programming languages.
Turing Machines: The most powerful automata, used as a standard for defining what is computable.

2. Formal Languages and Grammars

This area examines the syntax and structure of languages used in computation. It involves:

Regular Languages: Described by regular expressions and finite automata, representing simple patterns.
Context-Free Languages: Defined by context-free grammars, crucial for designing compilers.
Chomsky Hierarchy: A classification of languages into regular, context-free, context-sensitive, and recursively enumerable languages.

3. Computability and Decidability

Computability theory addresses the question: What problems can a computer solve? It studies concepts like:

Decidable Problems: Problems with an algorithmic solution.
Undecidable Problems: Problems, such as the Halting Problem, for which no algorithm can determine a solution for all inputs.

4. Complexity Theory

Complexity theory focuses on the efficiency of algorithms by analyzing the time and space resources they require. It categorizes problems into classes such as:

P (Polynomial Time): Problems solvable in polynomial time.
NP (Nondeterministic Polynomial Time): Problems whose solutions can be verified in polynomial time.
NP-Complete and NP-Hard: The most challenging problems in NP, with applications in cryptography, optimization, and artificial intelligence.

Basic Terminologies of Theory of Computation

Now, let’s understand the basic terminologies, which are important and frequently used in the Theory of Computation.

1. Symbol

A symbol (often also called a character) is the smallest building block, which can be any alphabet, letter, or picture.

2. Alphabets (Σ)

A finite, non-empty set of symbols used to construct strings and languages. For example, Σ = {a, b}.

3. String

A string is a finite sequence of symbols from some alphabet. A string is generally denoted as w and the length of a string is denoted as |w|. Empty string is the string with zero occurrence of symbols, represented as ε.

Number of Strings (of length 2)
that can be generated over the alphabet {a, b}:
– –
a a
a b
b a
b b

Length of String |w| = 2
Number of Strings = 4

Conclusion:
For alphabet {a, b} with length n, number of
strings can be generated = 2ⁿ.

Automata theory is used in modeling computational problems hence enhancing the understanding and design of systems such as compilers, interpreters among others.

Closure Representation in TOC

1. L⁺: It is a Positive Closure that represents a set of all strings except Null or ε-strings.

2. L^*: It is “Kleene Closure“, that represents the occurrence of certain alphabets for given language alphabets from zero to the infinite number of times. In which ε-string is also included.

From the above two statements, it can be concluded that:

L* = εL⁺

Example:

(a) Regular expression for language accepting all combination of g’s over Σ={g}:
R = g^*
R={ε,g,gg,ggg,gggg,ggggg,…}

(b) Regular Expression for language accepting all combination of g’s over Σ={g} :
R = g⁺
R={g,gg,ggg,gggg,ggggg,gggggg,…}

Note: Σ* is a set of all possible strings(often power set(need not be unique here or we can say multiset) of string) So this implies that language is a subset of Σ*.This is also called a “Kleene Star”.

Kleene Star is also called a “Kleene Operator” or “Kleene Closure”. Engineers and IT professionals make use of Kleene Star to achieve all set of strings which is to be included from a given set of characters or symbols. It is one kind of Unary operator. In Kleene Star methodology all individual elements of a given string must be present but additional elements or combinations of these alphabets can be included to any extent.

Example:

Input String: “GFG”.
Σ* = { ε,”GFG”,”GGFG”,”GGFG”,”GFGGGGGGGG”,”GGGGGGGGFFFFFFFFFGGGGGGGG”,…}
(Kleene Star is an infinite set but if we provide any grammar rules then it can work as a finite set.
Please note that we can include ε string also in given Kleene star representation.)

Language

A language is a set of strings formed using the symbols of a given alphabet Σ\SigmaΣ.
Formally, a language is a subset of Σ∗\Sigma^*Σ∗, where Σ∗\Sigma^*Σ∗ is the set of all possible strings (including ε) over the alphabet Σ\SigmaΣ.

Examples of Languages:

Finite Language:
L1 = { set of string of 2 }
L1 = { xy, yx, xx, yy }

Infinite Language:
L1 = { set of all strings starts with ‘b’ }
L1 = { babb, baa, ba, bbb, baab, ……. }

Types of Languages in TOC

Languages are classified based on the computational model or grammar generating them:

Regular Languages: Defined using regular expressions or finite automata. Example: $L=a^n∣n≥0.$
Context-Free Languages: Defined using context-free grammars or pushdown automata. Example: $L=a^n b^n ∣n≥0.$
Context-Sensitive Languages: Defined using context-sensitive grammars or linear-bounded automata.
Recursive and Recursively Enumerable Languages: Defined using Turing machines.