The Limits of Computation: 1. The Church-Turing Thesis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Chapter 4

The limits of computation

1. The Church–Turing Thesis


[See Sipser, §3.3]
The Church–Turing Thesis states:

Any algorithmic procedure can be carried out on a Turing machine.

Here’s how Turing put it, in 1948: “[Turing machines] can do anything that
could be described as ‘rule of thumb’ or ‘purely mechanical’ . . . This is sufficiently
well established that it is now agreed amongst logicians that ‘calculable by means
of [a Turing machine]’ is the correct accurate rendering of such phrases.”
It is generally believed that the Church–Turing thesis is true; that is, any “al-
gorithm” can be turned into a set of instructions for a machine that has a finite
processing unit, input, output, and memory.

Reasons to agree with Church and Turing

1. All known algorithmic procedures can be carried out on a Turing machine.

2. All other attempts to give “algorithm” a precise meaning (and there have been
many) have been shown to be equivalent to the standard Turing machine.

3. We will see shortly that there exists a single Turing machine U that can
simulate any other Turing machine. This was invented by Turing and is
called the universal Turing machine.

However, the Church–Turing Thesis is only a useful assumption, it is not a


mathematically provable fact. We will, however, assume it to be true for the rest

36
37

of this course, and will define an algorithm to be something that can be carried out
on a Turing machine. As a consequence, in this chapter we will only give higher
level descriptions of our Turing machine algorithms.

Definition 1.1. A programming language is said to be Turing-complete if we can


simulate every Turing machine in it.

All Turing-complete languages are of equivalent power: C, Java, Python, Perl,


Haskell, Prolog etc etc. All other languages are of lesser power.
By the Church-Turing thesis, anything that could be input to any computer in
any programming language can be input to a suitable Turing machine. We write
hXi to mean a Turing machine-readable encoding of the object X, which can be a
graph, a polynomial, or any other object that can be described on a computer. We
do not in general care about the details of the encoding, just that it is a (finite)
word that could be an input word of a suitable Turing machine.

Example 1.2. Let M = (Q, ⌃, , , qs , qa , qr ) be a Turing machine. We can number


the states of M so that qs is state 0, qa is state |Q| 2, and qr is state |Q| 1. If the
size of ⌃ is n, we can assume without loss of generality that ⌃ = {0, . . . , n 1}. Sim-
ilarly, if | | = m then we can assume that = ⌃[{n, . . . , m 1}, and assume that t
is symbol m 1. Given these relabellings, we encode M over {0, 1, . . . , 9, #, ⇤, L, R}
writing down:
1. |Q|, followed by #.
2. n, followed by #.
3. m n, followed by #.
4. For each pair of numbers i 2 {0, . . . , |Q| 3} and a 2 {0, . . . , m 1}, if
specifies that when M is in state i and sees letter a, the machine M goes to
state j, replaces a with b, and moves in direction D 2 {L, R}, we write down
the 5-tuple
i ⇤ a ⇤ j ⇤ b ⇤ D#.

So, for example, to encode the Turing machine M1 from Chapter 3, Example
1.2, we would first number the states as q0 , q1 , q2 = qa , q3 = qr . We would encode
the alphabet ⌃ = {a, b} with {0, 1}, and the alphabet = {a, b, t} with {0, 1, 2}.
The beginning of the word encoding hM1 i would then be

4#2#1#0 ⇤ 0 ⇤ 0 ⇤ 0 ⇤ R#0 ⇤ 1 ⇤ 1 ⇤ 1 ⇤ R#

See if you can work out the rest of the word.

We emphasise that the above is just one example of a way of encoding a Turing
machine: there are many others.
Let’s show that Turing machines are themselves Turing-complete.

Theorem 1.3 (Turing 1936) There exists a Turing machine U , called the Uni-
versal Turing machine ( UTM), which on input hM, wi, an encoding of a Turing
machine M and an input word w for M , simulates the action of M on w.

Proof. It’s easiest to describe U as a 2-tape machine: by Chapter 3, Theorem 2.3,


there is an equivalent single-tape machine.
The input hM, wi is initially all on Tape 1, but U copies w to Tape 2 and then
deletes w from Tape 1. The machine U stores the current position of M ’s read/write
head (initially the first cell of Tape 2) by putting a ⇤ to indicate the corresponding
38 CHAPTER 4. THE LIMITS OF COMPUTATION

cell of w, just as in the proof of Chapter 3, Theorem 2.3. The machine U stores the
number of the current state of M (initially the start state) after the description of
M on Tape 1.
The machine U simulates the action of M on w by repeatedly checking the
current letter a of M ’s tape that is indicated by the ⇤ on Tape 2, and the current
state qi of M stored on Tape 1. The machine U then finds on Tape 1 the part
of M ’s transition function that corresponds to (qi , a), and copies the output of
(qi , a) to update the state of M on Tape 1, and the current letter of M on Tape
2. The machine U then moves the ⇤ on Tape 2 to indicate the new location of M ’s
tape head.
If M accepts then U accepts, and if M rejects then U rejects. ⌅

The Universal Turing Machine is often described as the first fully general con-
cept of a programmable computer: on input hM, wi, the Turing machine M is the
programme, and the word w is the input to the programme. It is not too difficult
to give a formal description of a Universal Turing Machine: for example, Marvin
Minsky in 1962 managed to design one with only 7 states, and a tape alphabet of
size 4.

2. Decidability
[See Sipser §4.1]
The study of decidability is the study of what can be computed in finite time.

Definition 2.1. A decision problem is any well-specified problem with a yes/no


answer.

Decision questions are very varied, for example:

• Is this integer prime?

• Is this graph connected?

• Is this list sorted?

• Does this program perform to its specification?

• Given a finite state automaton and an input, is the input accepted?

• Is the language recognised by this Turing machine regular?

We want to know whether there exist algorithms to solve these problems: by


the Church–Turing Thesis this is equivalent to asking whether one can solve these
problems using a Turing machine.
If a Turing machine M is given input w then one of three things can happen.
Firstly, M can accept w, which means that M reaches qa and halts. Secondly, M
can reject w, which means that M reaches qr and halts. The third possibility is
that M never halts, but instead runs forever. It may not go round in an obvious
cycle, so an observer may never know whether the machine is about to stop.
Even with easy mathematical programmes, it is not always clear whether or not
they will terminate on all input. Consider the following recursively-defined function,
for n 2 N 8
< 1 if n = 1
f (n) = f (3n + 1) if n is odd and at least 3
:
f (n/2) if n is even.
39

Then, for example

f (17) = f (3 ⇥ 17 + 1) = f (52) = f (52/2) = f (26) = f (13) = f (40)


= f (20) = f (10) = f (5) = f (16) = f (8) = f (4) = f (2) = f (1) = 1.

Despite much study, it is still not known whether f (n) returns 1 for all n 2 N,
or if there are values of n for which the recursion never terminates.

Definition 2.2. A Turing machine is a decider if it halts on all input, so it is


guaranteed never to run forever. A Turing machine M that is a decider is said to
decide L(M ). A language L is Turing decidable if there exists a decider M with
L = L(M ).

Lemma 2.3. If a language L is Turing decidable then L is Turing recognisable.

Proof. By assumption, there is a Turing machine M such that L = L(M ) and


M is a decider. So, in particular, M is a Turing machine that recognises L. ⌅

Example 2.4. The Turing machine M1 from Chapter 3, Example 1.2 is a decider.
To see this, notice that M1 always moves right unless it is going to a halt state. So
each letter of the input word is read at most once, and the first blank cell is read
at most once and is the only blank cell that is read. So M1 always halts in finite
time. So the language from this example is decidable.
The Turing machine M2 from Chapter 3, Example 1.5 is also a decider. To
see this, notice that at each pass through the word, M2 crosses out half of the
remaining 0s, and looks at at most one blank cell. Either M2 stops early (if it
rejects), or eventually all but one of the 0s will be crossed o↵, and M will accept.
n
In either case, M2 is guaranteed to halt in finite time, so L2 = {02 : n 0} is
decidable.

Example 2.5. Let L5 = {hp(x)i : p(x) = an xn + · · · + a1 x + a0 is a polynomial


with ai 2 Z for all i, and is such that there exists n 2 Z with p(n) = 0}.
Here is a Turing machine M5 to recognise L5 .
M5 = on input an encoding hp(x)i:’
1. Evaluate p(x) with x set to 0, 1, 1, 2, 2, 3, . . .
2. If at any point p(x) = 0, accept.
If p(x) has no integral roots then M5 will never discover it, so M5 is not a decider.
However, that doesn’t show that L5 is undecidable: maybe there is another Turing
machine that decides L5 .
The problem of finding an algorithmic solution to testing membership of L5 was
the 10th of the 23 problems posed by David Hilbert at the International Congress
of Mathematicians in 1900 (this is the conference where the Fields medals are
awarded). It was shown by Yuri Matiyasevich in 1970, building on important work
by Martin Davis, Hilary Putnam and Julia Robinson, that L5 is undecidable.

The proof of the following theorem is omitted, as it uses Chomsky Normal Form
for CFGs, which we haven’t covered. If you’d like to see the proof anyway, it’s
Sipser Theorem 4.8 (you’ll also need to read Sipser Theorem 4.6).

Theorem 2.6. Let L be a context-free language. Then L is Turing decidable.

In particular, Theorem 2.6 shows that all regular languages are decidable.

Definition 2.7. Let AD = {hB, wi| B is a DFA that accepts w}.


40 CHAPTER 4. THE LIMITS OF COMPUTATION

Theorem 2.8. The language AD is decidable.

Proof. We construct a Turing machine, M , which takes as input hB, wi and


simulates B = (A, S, F, s0 , T ) on input w = w1 . . . wn . We can assume that each
hB, wi is encoded similarly to Example 1.2: see Tutorial Sheet 2.
The machine M first checks that the tape contains a valid encoding; if not, M
rejects. The machine M then simulates B by writing the current state of B and the
current letter of w on the tape (after hB, wi): initially these are s0 and w1 . The
machine M then checks the transition function F of B to update the current state
of B, and reads w to find the next letter of w. After reading wn , M accepts if and
only if the current state of B is in T . Since B halts on all input, so does M , and so
M is a decider. ⌅

3. The halting problem


[See Sipser §4.2]
We have seen a number of examples of decidable languages, which correspond
to yes/no questions that have algorithmic solutions. We now examine the two most
famous undecidable languages, one which describes whether a given Turing machine
M accepts an input string w, and one which describes whether M halts on input w.

Problem 3.1. [The Acceptance Problem] Find a decider for

AT = {hM, wi : M is a TM which accepts the word w}.

That is, design a Turing machine that will decide whether or not a given Turing
machine accepts a given word.

Lemma 3.2. The language AT is Turing-recognisable.

Proof. The Universal Turing Machine U from Theorem 1.3 recognises AT : on


input hM, wi the machine U simulates M on input w, and accepts if and only if M
accepts. So U accepts hM, wi if and only if hM, wi 2 AT , and so L(U ) = AT . ⌅

Notice that the Universal Turing Machine is not a decider. If the machine M
runs forever on input w, then U runs forever on input hM, wi.

Theorem 3.3 (Alan Turing, 1936) The language AT is not decidable, so the
acceptance problem cannot be solved.

Proof. We will prove this theorem by contradiction. Suppose, by way of contra-


diction, that there exists a decider MT for AT : a machine which accepts all words
in AT , and rejects (in finite time) all words not in AT .
Let

L1 = {hM i : The Turing machine M accepts the word hM i}.

Let M1 =“on input hM i, an encoding of a Turing machine:

1. Modify the tape, so it contains hM, hM ii.

2. Simulate MT on input hM, hM ii.

3. Accept if MT accepts; reject if MT rejects.”


41

The machine M1 accepts hM i if and only if the machine M accepts hM i, so L(M1 ) =


L1 . Step 1 only takes finite time, since it’s basically just copying the input. The
machine MT always halts, so M1 always halts. So M1 is a decider for L1 .
Now let
L2 = {hM i : The Turing machine M does not accept the word hM i}
= {hM i : M rejects or runs forever on input hM i}.

Let M2 = “on input hM i, an encoding of a Turing machine:

1. Simulate M1 on input hM i.

2. Accept if M1 rejects; reject if M1 accepts.”

The language of the machine M2 is L2 . Since M1 always terminates, so does M2 .


So M2 is a decider for L(M2 ) = L2 .
Now we consider the question: is hM2 i 2 L2 ?

• If hM2 i 2 L2 , then from L(M2 ) = L2 we see that M2 accepts hM2 i. The


definition of L2 is that it only contains machines that do not accept their own
descriptions, so hM2 i 2/ L2 . This is a contradiction.

• If hM2 i 2/ L2 , then from L(M2 ) = L2 we see that M2 rejects hM2 i. The


definition of L2 is that it contains all machines that do not accept their own
description, so in particular hM2 i 2 L2 . This is also a contradiction.

Hence either way around there is a contradiction, and so our initial assumption
must have been wrong: there does not exist a decider MT for AT . ⌅

Problem 3.4. [The Halting Problem] Find a decider for the language

H = {hM, wi : M is a TM that halts on input w}.

Theorem 3.5. The language H is undecidable, so the Halting problem cannot be


solved.

Proof. Assume, by way of contradiction, that there exists a Turing machine MH


which decides H. We use MH to design a Turing machine S, as follows:
S = “On input hM, wi, an encoding of a Turing machine M and an input word
w for M :

1. Run MH on input hM, wi.

2. If MH rejects, reject.

3. If MH accepts, simulate M on w until it halts.

4. Accept if M accepts; reject if M rejects.

If M accepts w then S accepts hM, wi. If M rejects w then S rejects hM, wi.
If M runs forever on w then S rejects hM, wi. Hence L(S) = AT , the language
from the Acceptance Problem 3.1. Furthermore, S is a decider. But we showed in
Theorem 3.3 that AT is undecidable. This is a contradiction, so there is no such
Turing machine S, and H is undecidable. ⌅

At first it might seem as if the halting problem is of purely academic interest,


but undecidability is contagious.
42 CHAPTER 4. THE LIMITS OF COMPUTATION

Definition 3.6. A property P of Turing machines is just a statement that is true


or false about any given Turing machine. The property P is non-trivial if there
is at least one Turing machine that satisfies P , and at least one that does not.
The property P is about the languages recognised by Turing machines if whenever
L(M ) = L(N ) then P is true of M if and only if P is true of N : it’s a property of
L(M ) rather than just of M .
Example 3.7. Some examples of nontrivial properties of the languages recognised
by Turing machines:
1. The Turing machine accepts no words. It is easy to design a Turing machine
to do this, and we have seen examples of Turing machines that don’t, so this
property is nontrivial. If L(M ) = ; and L(N ) = L(M ), then L(N ) = ;, so
this is a property of the language not the machine.
2. The Turing machine M recognises a context-free language. We saw in Chapter
3, Example 1.2 a Turing machine that accepts a context-free language, and
in Chapter 3, Example 1.5 a Turing machine that accepts a non-context-
free language, so the property is nontrivial. If L(M ) is context-free, and
L(M ) = L(N ), then L(N ) is also context-free, so it is a property of the
language.
3. The Turing machine M accepts only python programmes that do not crash.
4. The Turing machine M accepts only C programmes that print some output.
Theorem 3.8 (Rice’s Theorem) Let P be a nontrivial property of the languages
recognised by Turing machines. Then the following language is undecidable:
LP = {hM i : M satisfies P }.
That is, there is no algorithm to decide whether a Turing machine satisfies P .
Proof. If a Turing machine M with L(M ) = ; satisfies P , then without loss of
generality we can replace P by its negation (“not P ”), since we are looking for a
decider. Hence we can assume that any machine M with L(M ) = ; does not satisfy
P.
The property P is nontrivial, so there exists at least one Turing machine MP
that satisfies P . Let the input alphabet of MP be ⌃.
Let M be any Turing machine, and let w be an input word for M . We design a
Turing machine Mw as follows.
Mw = “on input x 2 ⌃⇤ , an input word for MP :
1. Simulate M on w.
2. If M rejects w, then reject.
3. If M accepts w, then simulate MP on input x. Accept if MP accepts x, reject
if MP rejects x.”
If M accepts w, then L(Mw ) = L(MP ). So if M accepts w then our new Turing
machine Mw satisfies P (since P is a property of L(MP )). If M rejects w then Mw
rejects all x 2 ⌃⇤ . If M runs forever on input w, then Mw does not accept any
x 2 ⌃⇤ . Hence if M does not accept w, then L(Mw ) = ;. We ensured that no
Turing machine whose language is empty satisfies P . Thus if M does not accept w,
then Mw does not satisfy P .
Recall the language AT from the Acceptance Problem 3.1. We have proved that
Mw satisfies P if and only if M accepts w. That is, we have proved that hMw i 2 LP
if and only if hM, wi 2 AT . It follows that if LP is decidable then AT is decidable.
By Theorem 3.3, the language AT is not decidable. So the language LP is not
decidable. ⌅
43

4. Turing recognisability
[See Sipser §4.2]
We finish this part of the course by showing that in fact almost all languages are
not even Turing recognisable. Hence almost all decision problems cannot be solved
on a computer.
Recall that if K ✓ ⌃⇤ is a language then the complement of K, denoted K, is
the language of all words in ⌃⇤ that are not in K.

Lemma 4.1. Let K ✓ ⌃⇤ be a language. If K and K are both Turing recognisable,


then K is decidable.

Proof. Let TK be a Turing machine which recognises K and let TK be a Turing


machine which recognises K. Let W be the following Turing machine:
W = “on input w 2 ⌃⇤ :

• Run TK and TK in parallel.


• If TK accepts, accept. If TK accepts, reject.”
At least one of TK and TK accepts in finite time, so W is a decider for K. ⌅

Corollary 4.2. Let AT be the language from the Acceptance Problem 3.1. Then
the language AT is not Turing recognisable.

Proof. Assume, by way of contradiction, that AT is Turing recognisable. By


Lemma 3.2, the language AT is Turing recognisable. Hence both AT and AT are
Turing recognisable. Then, by Lemma 4.1, the language AT is decidable. But we
showed in Theorem 3.3 that AT is not decidable. This is a contradiction, so the
language AT is not Turing recognisable. ⌅

We now show that almost all languages are not recognisable. The proof is via
a mathematical trick called diagonalisation, which can be used to show that one
infinite set is bigger than another, and is due to Georg Cantor in 1873. Cantor
observed that two finite sets have the same size if the elements of the first set can
be paired up with the elements of the other sets: this is a way of thinking about size
that doesn’t rely on counting both sets, and so lets us measure the sizes of infinite
sets.

Definition 4.3. A set A is countable if it is finite or if its elements can be put in


an infinite list. Otherwise it is uncountable.

Some countable sets:


• The set of all natural numbers: 1, 2, 3, . . .
• The set of all integers: 0, 1, 1, 2, 2, . . .
• The set of all primes: 2, 3, 5, 7, 11, 13, . . .
• The set ⌃⇤ of all words over a finite alphabet ⌃: list them in len-lex order.
See Tutorial Sheet 3 for the following:
Lemma 4.4. 1. The union of a finite number of countable sets is countable.
2. The union of countably many countable sets is countable.

Theorem 4.5. The set B of all infinitely-long words over {0, 1} is uncountable.
44 CHAPTER 4. THE LIMITS OF COMPUTATION

Proof. We prove this by a method, due to Cantor, called diagonalisation. The


method is a proof by contradiction, so assume by way of contradiction that B is
countable. Then we can list the elements of B as b1 , b2 , b3 , . . .. For each such infinite
word bi , write bi as bi1 bi2 bi3 . . ., so that bij 2 {0, 1} for all i.
Define an infinite word x = x1 x2 x3 . . . by

1 if bii = 0
xi =
0 if bii = 1.

Then xi 6= bii for all i, so x 6= bi for all i, and so x 2


/ B. However, x is an infinitely
long word over {0, 1}, so x 2 B.
This is a contradiction, so our assumption that B is countable must be wrong.
Hence B is uncountable. ⌅

Theorem 4.6. 1. There are only countably many languages that are Turing
recognisable.

2. There are uncountably many languages that are not Turing recognisable.

Proof. 1. The set of all Turing machines is countable, because we showed in


Example 1.2 that we can encode each Turing machine M as a word over the alphabet
⌃ = {0, . . . , 9, #, ⇤, L, R}, and the set of all words over a finite alphabet is countable.
Each Turing recognisable language L must be the language of some Turing machine
M , so the number of Turing recognisable languages is at most the number of Turing
machines. Hence the set of Turing recognisable languages is countable.
2. We first show that the set L of all languages over ⌃ = {a, b} is uncountable.
Consider the words in ⌃⇤ in len-lex order.
Each infinite word w = w1 w2 . . . in the set B from Theorem 4.5 can be used to
define a language Aw ✓ ⌃⇤ . If wi = 1, then put the ith word from ⌃⇤ into Aw . If
wi = 0, then don’t put it in.
Since every infinite word w 2 B defines a di↵erent language, L is at least as big
as B. By Theorem 4.5 the set L is therefore uncountable.
To finish the proof, assume, by way of contradiction, that there are only count-
ably many languages over ⌃ that are not Turing recognisable. By Part 1, there are
also countably many languages that are Turing recognisable. The union of the set
of Turing recognisable languages and the set of non-Turing recognisable languages
is all of L. By Lemma 4.4, the union of two countable sets is countable. Hence L is
countable. We just showed that L is uncountable, so this is a contradiction. Hence
there are uncountably many languages that are not Turing recognisable. ⌅

Thus we have the following hierarchy of languages:

Corollary 4.7. Regular languages ⇢ Context-free languages ⇢ Decidable languages


⇢ Recognisable languages ⇢ All languages. Each of these containments is proper.
Furthermore, only countably many languages are recognisable, so almost all lan-
guages are un-recognisable.

Proof. We proved the first containment in Chapter 2, Theorem 1.4, and asserted
the second in Theorem 2.6. The fact that every decidable language is recognisable
was Lemma 2.3. Now we know that almost all languages, including AT in particular,
are not recognisable. ⌅

You might also like