This two-part blog post has three main goals.
The first goal is to demonstrate that the proofs of Gödel’s incompleteness theorems (both the first and the second one) can be naturally expressed using the language of the theory of computation (henceforth ToC). This is because one of the defining features of mathematical reasoning is that it is a computation.
The second goal is to show that the incompleteness theorems can be understood without the knowledge of any particular formalization of mathematics (like first-order logic and the axiomatic systems PA or ZFC). This is similar to being able to understand the undecidability of the halting problem without the knowledge of any particular formalization of computation (like Turing machines or lambda calculus).
The third goal is to highlight that the use of the language of ToC leads to relatively intuitive proofs of the incompleteness theorems. In fact, we hope that you will feel you could have come up with the results yourself.
Gödel’s incompleteness theorems are mathematically and philosophically some of the most important intellectual discoveries, highlighting the inherent limits of mathematical reasoning. As we will see, the limits of mathematical reasoning can be viewed as a corollary of the limits of computation. And at its root, the limits of computation (and therefore the limits of mathematical reasoning) are about the difference between the finite and the infinite. The foundational work of Georg Cantor in understanding infinity will be our main hammer to prove the limits of computation and mathematics.
This first blog post is about providing the right background for the reader. After talking about formalization of computation, we introduce Cantor’s work and explain how it directly implies limits of computation. In the next post, we will talk about formalization of mathematical reasoning and then prove the incompleteness theorems.
In its most general form, computation is manipulation of information/data. Usually there is some input data, which is manipulated/processed in some way to produce output data.
Given this general description, we can see computation pretty much everywhere. That being said, Alan Turing, in coming up with his computing model, was not really thinking about how information is manipulated, say, by black holes. He had a much narrower focus. He wanted to model “computers”, which before the invention of modern computers, referred to people trained in carrying out calculations. These computers would have a set of instructions (i.e. an algorithm) at hand. They would then carry out the given instructions on various input data to produce the output data.
Even though we did not have a formal way of representing algorithms until the 20th century, people have been discovering and using algorithms for millennia (e.g. the grade-school multiplication algorithm, Euclid’s algorithm for computing the greatest common divisor of two numbers, and many many others). We are going to call these informal representations of algorithms “Good Old Regular Algorithms”, or GORA for short. A computational model is a mathematical formalization of GORA such that for every algorithm, there is a precise representation of the algorithm in the computational model.
An important component of a computational model is a precise representation of data. In ToC, we represent any kind/type of data using just a single type, string. Let be some non-empty finite set that we think of as a set of symbols. We call an alphabet. Then denotes the set of all finite-length strings over . We think of the input data as well as the output data for an algorithm as elements of . (The finiteness restriction is a reasonable one because we don’t know a way to store or process infinite amount of data.)
Given a set , we say is encodable if there is an injective function for some alphabet . We use the notation to denote the encoding of . If we want our algorithm to take as input some object that is not a string (say a number), we would first encode it and view as the input.
We often think of an algorithm as solving some problem. A function problem is a function . For example, in the multiplication problem, if the input is the encoding of a pair of numbers, the output is the encoding of the product of those numbers. A decision problem is a function of the form , representing a problem with a binary output. We can equivalently think of the output as or . In ToC, we often restrict our attention to decision problems since more general computational problems often have an “equivalent” decision version. One of the most basic questions in ToC is whether all decision problems are computable/decidable or not.
For the formal representation of algorithms we’ll use Turing machines. Every Turing machine (TM) represents an algorithm. And every algorithm has a corresponding Turing machine. This correspondence is known as the Church-Turing Thesis.
The Turing machine computational model is a mathematical formalization of GORA. Any algorithm can be expressed using a TM. Or in other words, every algorithm “compiles down” to a TM.
When we are proving properties of algorithms/computation, we do not directly work with TMs, but rather appeal to the Church-Turing thesis: we work with high-level descriptions of algorithms with the understanding that they can be translated to TMs. Thanks to the Church-Turing thesis, we won’t bother defining the TM model formally. We don’t need it. We’ll describe all our TMs using algorithms with high-level instructions.
It is not clear whether Turing (and others at the time) really appreciated the full reach of the Turing machine model. Even though Turing didn’t build his mathematical model of computation thinking about the fundamental laws of physics, today, our understanding of computation, as well as physics, lead us to make a stronger claim.
Any computation that can be realized in the universe (only constrained by the laws of physics) can be expressed using a TM.
Our focus will be on TMs computing/deciding decision problems . So given a TM and an input string , we’ll write if returns True (equivalently, “accepts”) when the input is . We’ll write if returns False (i.e. “rejects”). And we’ll write if never stops running and gets stuck in an infinite loop. A TM that halts (returns True or False) on all inputs is called a decider TM. If the input/output behavior of a TM exactly matches a decision problem (i.e. for all , ), then we say computes/decides . And we call a decidable decision problem.
One defining feature of a TM (and a feature that will play a crucial role in our arguments) is that it is a finite object. This means every TM has a finite-length string representation. Or in other words, the set of all TMs is an encodable set. As with the finiteness of strings, the finiteness of TMs/algorithms is a reasonable restriction since we can’t physically realize an infinite object. Given a TM , denotes its encoding.
The fact that we can encode a Turing machine means that an input to a TM can be the encoding of another TM (in fact, a TM can take the encoding of itself as the input). One important implication of this (that Turing pointed out in his paper) is the existence of a universal Turing machine, which given as input the encoding of any Turing machine , and some string , can simulate when it is given the input .
The existence of such a machine shouldn’t be surprising (once you have the right perspective). If we consider the human computers that Turing was modeling, and think about the role of the human in the process, we see that the human is the universal TM. We just have to change our viewpoint slightly and think of the instructions as input.
The high-level description of a universal Turing machine is as follows. Note that if loops forever, then loops forever as well.
Above, when denoting the encoding of a tuple where is a Turing machine and is a string, we used to make clear what the types of and are. When we give a high-level description of a TM, we often assume that the input given is of the correct form/type. But technically, the input is allowed to be any finite-length string. Even though this is not explicitly written, we will implicitly assume that the first thing our machine does is check whether the input is a valid encoding of objects with the expected types. If it is not, the machine rejects (returns False). If it is, then it will carry on with the specified instructions.
From now on, in the context of TMs and decision problems, we will fix our alphabet to be . So a decision problem is of the form .
An important part of mathematics is about understanding the relationships among sets. Is a set equal to another set ? Are they equal in cardinality? Can we find an element in that is not in (or vice versa)? Many fundamental questions like these involve infinite sets. For instance, an interesting open question in the 18th century and early 19th century was whether the set of real algebraic numbers (i.e. real numbers that can be expressed as a root of a non-zero polynomial in one variable with integer coefficients) is equal to the set of all real numbers. In the previous section, we have come across another question of this flavor. Is the set of all decidable decision problems equal to the set of all decision problems? In fact, ToC is full of these types of questions. For example, one of the major goals is to classify problems according to the resources needed to solve them, i.e. to define complexity classes, and then try to understand the relationships among the complexity classes.
We need some general techniques to compare infinite sets, and for this, we go to Georg Cantor, who had two major insights.
The first major insight is that bijections, injections and surjections can be used to compare infinite sets. In fact, the most meaningful way to define and for infinite sets is to use injections and bijections. In particular, we’ll write if there is a bijection between and . We’ll write (or ) if there is an injection from to . And we’ll write if there is no injection from to .
Many familiar infinite sets have a bijection between them. E.g. where is any finite, non-empty alphabet. It is a nice exercise to show that if a set is infinite, then . So there are three categories of sets:
finite sets,
sets such that , where is any of the sets listed above (like ),
all other sets (i.e. sets with , where is any of the sets listed above).
It makes sense to give a name for these different categories. The first two categories combined is known as countable sets. The second category is known as countably infinite sets. And the third category is known as uncountable sets.
We can define a countable set as any set with , where . But the choice of here is somewhat arbitrary. We can also choose, for instance, , in which case we see countability is equivalent to encodability. So:
A set is countable if its elements have a finite string representation/description.
Encodability captures the essence of countable sets really well. For instance, it is easy to see that for all the sets listed above, the elements have finite string representations. And if you consider another set like , which is defined to be the set of all infinite-length binary strings, it is almost by definition that the set is not countable/encodable (though this still requires a proof). Furthermore, encodability highlights that, just like the main difference between the first two categories (finite sets and countably infinite sets) is about the finite vs the infinite, the main difference between countably infinite sets and uncountable sets is also about the finite vs the infinite! In the former, we look at the set themselves, and in the latter, we look at the elements of the sets.
How can we prove that a set like is in fact uncountable? This brings us to the second major insight of Cantor, diagonalization, which is one of the most important proof techniques in mathematics. In fact, diagonalization is the root of all the results that we will present. Perhaps its power and reach is surprising given that the idea is actually pretty straightforward, which we describe now.
The basic question that diagonalization is trying to answer is the following.
Given a set of objects that is part of a larger universe (so ), is ? And if , can we construct an element ?
Below are some examples of important questions in this flavor. Diagonalization can be used to answer all of them.
Let be the set of rational numbers and be the set of reals. Is there an irrational number, and if there is, can we construct one?
It is not hard to give a direct proof that is irrational, so the above question may not be terribly interesting. Here is a much more non-trivial question. Let be the set of real algebraic numbers, and be the set of reals. Is there a real non-algebraic number (i.e. is there a real transcendental number)? And if there is, can we construct one?
Let be the set of all decidable decision problems, and let be the set of all decision problems. Is there an undecidable decision problem, and if there is, can we construct one?
Let be the set of all decision problems that can be solved in at most time, and let be the set of all decision problems that can be solved in at most time. Is there a decision problem in that is not in , and if there is, can we construct one?
Let be the set of all provable statements and let be the set of all true statements. Is there a true statement that is not provable?
Even though all these questions have the same basic structure, they are quite different since they involve different types of objects. In order to identify a general technique that can be applied in different settings, we’ll express diagonalization as a statement involving functions, because many mathematical objects can be conveniently viewed as functions.
Sets. A set can be viewed as a function , where if and only if . This is called the characteristic function of the set.
Finite sequences. A sequence of length with elements from a set can be viewed as a function , where is the ’th element of the sequence (starting counting from ). In other words, given , the corresponding sequence is .
Infinite sequences. Similarly, an infinite-length sequence with elements from can be viewed as a function .
Numbers. Numbers can be viewed as functions since the binary representation of a number is a sequence of bits (possibly infinite-length).
We now rephrase the question above using functions.
Given a set of functions , each of the form , can we construct another function not in ?
Let’s develop a general technique to answer this question. We’ll start the discussion with a finite set of functions and then observe that things naturally generalize to infinite sets.
Suppose you are given a set of functions , each of the form , and asked to construct another function that is not in . We know that two functions and are different if there is some input such that . So one natural idea is to go through the functions in , one by one, and then make differ from by making it different for some input , i.e. make . For this strategy to work, all we need is that for each , we can pick a different so that can be constructed in a consistent way. That is, all we need is that .
We can visualize this strategy with a table where the rows are labeled , the columns are labeled , and the entry corresponding to row and column is . Then the construction of involves going down the diagonal of the table and switching its value. For example, if , we might have the following table.
This strategy naturally generalizes to infinite sets, leading to the following lemma.
Let be any set and let be a set of functions where . If , we can construct a function such that .
The main idea is the following. For each , pick a unique input and define in a way such that it is different from . Here, it is important that we pick a unique for each so that can be defined consistently. The ability to pick a unique for each is equivalent to
A bit more formally, since there is an injection Let So implies Define such that for all (this is where the assumption is used). This ensures that is different from every and therefore (Note that our description of leaves it under-specified, but see the comment below.)
When we apply the above lemma to construct an explicit , we call this diagonalizing against the set And we call a diagonal element. Typically, there are several choices for the definition of :
Different injections can lead to different diagonal elements.
If , we have more than one choice on what value we assign to that makes (here denotes ).
If there are elements not in the range of , then we can define any way we like.
One of key features of the diagonalization technique is that it is a constructive argument: It does not just establish the existence of a certain object, but it gives us a way to construct it.
A direct corollary of the Diagonalization Lemma is the following. When is the set of all functions , we know we cannot diagonalize against and construct . So then we must have . And if is an infinite set, this implies is uncountable. For example, the set of all functions is uncountable.
We state the corollary below for the special case of , which is known as Cantor’s theorem.
Let denote the set of all functions . Then for any ,
Let be an infinite set. Then the set is uncountable.
Are all decision problems decidable? We can view this question as a comparison of two sets. Let denote the set of all decidable decision problems. And let be the set of all decision problems. Obviously , but is the inclusion strict? And if it is, can we find an explicit undecidable decision problem?
To answer the first question, note that the corollary to Cantor’s theorem implies that is uncountable. So most decision problems cannot be finitely represented/encoded. On the other hand, is countable because every decidable decision problem can be finitely described using a TM that solves/decides it. Since is uncountable and is countable, we can conclude that almost all decision problems are undecidable.
Can we identify an explicit undecidable decision problem? Given our discussion in the previous section, obviously we should try to diagonalize against the set of all decidable decision problems . The condition to apply diagonalization is , and indeed, this is true since is encodable. So we can construct an explicit undecidable decision problem. Let’s see which explicit undecidable decision problem diagonalization spits out for us.
Instead of diagonalizing against the set of all decidable decision problems, it will be a bit more convenient to diagonalize against the set consisting of all TMs. We can do this because:
We can view a TM as a mapping from to .
Since the set of all TMs is encodable, we have , and therefore the condition to apply diagonalization is satisfied.
To explicitly define the diagonal function , we pick an injection from to . The most obvious one is . Then, we make differ from a TM by making sure .
To be more explicit,
Once again, by construction, any TM differs from on input , i.e. Therefore, is not decidable by any TM.
The decision problem is known as Not-Self-Accepts, and from now on we’ll denote it by . The inputs such that are the encodings of TMs such that does not return True (i.e. does not self-accept), so either returns False or loops forever.
The decision problem is undecidable.
Often the diagonalization technique is referred to as a self-referencing technique. However, this probably suggests more meaning than is deserved. For instance, even though in the proof of the above theorem we chose the injection , we certainly didn’t have to. The injection can be quite arbitrary and the string that maps to can be unrelated to . We therefore prefer to think of self-reference as one instantiation of a more general technique.
Now that we have the 1st undecidable problem, we can use reductions to show that other problems are undecidable. To illustrate the concept, let’s consider the obvious attempt in trying to write an algorithm deciding . (Recall that denotes a universal TM.) The problem with this algorithm is that if , then loops forever, and so loops forever and fails to return the correct output. On the other hand, if we knew for sure that halts (so either accepts or rejects), then we could safely run and return the correct answer.
What we have implicitly argued above is that deciding reduces to deciding the halting decision problem. The halting problem is defined such that if and only if where the TM halts on input . We argued above that if is decidable (so there is a TM deciding ), then we can decide as well, as follows. Since we know that is undecidable, this shows that must be undecidable as well.
In general, the idea of a reduction is as follows. Let and be two decision problems. We say that deciding reduces to deciding (or simply, reduces to ), if we are able to do the following: assume is decidable (for the sake of argument), and then show that is decidable by using the decider for as a black-box subroutine (i.e. a helper function). Here the problems and may or may not be decidable to begin with. But if reduces to and is decidable, then is also decidable. Equivalently, if reduces to and is undecidable, then is also undecidable.
We can use reductions to expand the landscape of undecidable problems. Above we reduced to to show is undecidable. We could, for example, show that reduces to another problem to prove that is undecidable. And indeed, is a very popular choice for undecidability proofs like this.
In the next post, we will prove Gödel’s incompleteness theorems. The main ingredient in the proof of the first incompleteness theorem will be the diagonalization argument that we used to show is undecidable. In particular, we’ll construct an explicit true statement that is unprovable. The second incompleteness theorem will be established via the concept of a reduction for provability. This is similar in flavor to showing is undecidable using a reduction.